Our question is, “How can we increase speech input from the environment in order to aid in language skills development in people with Down syndrome? The solution we chose to move forward with is: Build an AI that tracks the child’s language development and their speech intake via microphones on a wearable device. The product would inform caretakers of the most effective ways to speak to their child in terms of increasing language skills. We think this is a great solution because it allows for increased communication between the child and the caretaker. This would be a great option because it would increase speech input and facilitate conversation practice between child and caretaker, even at home. This is also a completely non-invasive solution that relies only on a small wearable device.

After reviewing our previous feedback and discussing potential issues, we identified the following 5 issues. The issues surround the feasibility of the AI and the wearable device and the efficacy of the product.

  1. The speech recognition software within the AI may misinterpret some speech.

Speech recognition software market is booming. It is reasonable to think that because the market is increasing, the technology will continue to advance exponentially and thus allow speech recognition software to become more and more accurate. Currently, speech recognition software is pretty accurate. In 2017, Google claimed that their speech recognition software was 95% accurate. We conclude that although AI may misinterpret some speech, current speech recognition technology is accurate enough to produce meaningful speech data.

Sources: https://hbr.org/2019/05/voice-recognition-still-has-significant-race-and-gender-biases

https://eurowire.co/uncategorized/451040/speech-recognition-software-market-is-booming-worldwide-to-show-significant-growth-over-the-forecast-period-2020-2025/

2. The tone of speech may be impossible for the AI to take into account.

First, voice recognition is already being used when it comes to our everyday technology including our phones to translate our voice inputs into text. Voice recognition has already been used to detect a person's health and emotional state by recognizing the tone of voice. Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) have already been working with an artificially intelligent, wearable system that can predict if a conversation’s tone based on a person's speech patterns. The system created is able to analyze audio, text transcriptions, and physiological signals to determine the overall tone of one’s conversation with 83% accuracy. The AI uses deep-learning techniques to provide a "sentiment score" for specific five-second intervals within a conversation. For example, the AI picks up long pauses and monotonous vocal tones as associated with sadder stories. In terms of body language, the AI associates sadder with increased fidgeting and cardiovascular activity. With further research and data using this AI system, there is a real possibility to detect the tone of conversations throughout the day for the wearer.

Sources:

https://digitalcareers.csiro.au/en/Emerging-tech/Voice-tone-recognition

https://www.sciencedaily.com/releases/2017/02/170201110635.htm

3. The device may be uncomfortable to wear and may be difficult to transport.

We chose to include this potential issue because it was brought up by a classmate within their feedback. Smartphones are extremely powerful pieces of technology that include microphones, wireless data transmitters, and computer processors, which are essentially the three necessary components of the device. Considering that the computational functions of the device (the AI within the device) could occur in the “cloud” on a separate server, the technological demands of the device would be easily accomplished by a smartphone, probably even a smartphone from 5 years ago. Thus, it is certainly possible to have a very small device capable of listening to speech and sending the data to the AI in the cloud. Google glass is a great example of a wearable device that is capable of speech recognition and wireless communication. This device is quite small and easily transported. Thus, we conclude that the technological demands of the device would be easily packed into a small, comfortable device. We think that the device could be worn around the ear (similar to a hearing aid) so that the device is comfortable, inconspicuous, and easy to transport.

Source: https://www.google.com/glass/tech-specs/

4. The AI may not have sufficient data to produce effective suggestions.

There is an AI that exists called Empower Me, which is a coaching application that runs on Google Glass hardware and Affectiva emotion-recognition software. It is a wearable device that is targeted at children and adults with autism in order to help with teaching social and cognitive skills. The software can analyze vocal signals like tone, loudness, tempo, and voice quality to distinguish emotions and determine gender. With such high level AI already created, there is a real possibility for the AI to process the information it receives and produce feedback for caretakers, as long as there is a large data set of different feedback points for what the AI analyzes.

Source: https://emerj.com/ai-sector-overviews/using-wearable-data-for-artificial-intelligence-applications-current-use-cases/

5. The AI may not be able to process different languages if families speak more than one language.

Firstly, speech recognition software is capable of recognizing many languages. For example, Facebook’s speech recognition software can understand 51 languages. Thus, speaking languages other than English is not an issue. Secondly, the device will be built with multiple languages to choose from, which has been designed by companies like Google. That is, Google’s speech recognition software (and thus our device as well) can detect which language is being spoken automatically based on an initial selection of a few languages. For example, if Spanish and English are spoken in the household, the caregiver could select Spanish and English as potential language inputs and the AI would be able to determine when each language is being spoken. Another use for this feature would be if the main user is fluent in English, but the family members prefer to speak in a different language at home. The device would be convenient and accessible for everyone involved, no matter their language preferences or habits.

Sources:

https://cloud.google.com/speech-to-text/docs/multiple-languages

https://venturebeat.com/2020/07/08/facebooks-speech-recognition-model-supports-51-different-languages/

After considering and researching these potential issues, we believe that this moonshot is feasible and would provide effective suggestions for caretakers of people with Down syndrome regarding methods of increasing environmental speech input for Down syndrome.