The two solutions that we identified from our previous post were:

  1. Direct brain stimulation → translation: identifying parts of the brain that register semantic and/or emotional/cultural/contextual information of certain words, and directly stimulating those parts of the brain in conjunction with learning the word
  2. Incorporating physiological/emotion/contextual states as you input translation (spoken/typed) to better help AI build a more comprehensive translation or rank translation outcomes

We really appreciate all the feedback given during the Rapid Evaluation session, as they helped us to re-evaluate and further refine our solutions.

Firstly, the solution that we decided to scrap: Direct Brain Stimulation

The most significant feedback we received was “Yes, but…” feedback, mainly that this solution was far too invasive and time/cost-consuming to be practical, especially in the hospital/legal setting that we were aiming to improve.

A summary of these comments were:

  • Direct brain stimulation comes with risks because it is highly invasive. Moreover, they can only be performed on certain patients, and thus be impractical to implement on a large scale.
  • Costs associated with the solution would be too much to make it practical, especially for hospitals + low income areas
  • What would happen after mapping out specific parts of the insula? Since every individual is different, how would we know that stimulating this brain area would cause a similar understanding in another person?

After considering these comments, we also agreed that this solution was possibly too ambitious, and science-fiction-y. We acknowledge that each individual's brain develops differently - for example, a person’s STG develops differently depending on their medium of language, much less being able to pinpoint the specific brain area for an individual word or concept. In addition, the risks and costs related to direct brain stimulation would be impractical to incur in a large scale - it would probably also be time-consuming and unreliable, and given the context of the situations we are trying to improve, we largely impractical. A doctor cannot possibly wait for an “electric shock to the brain” to understand a patient’s language, especially if they are in dire need.

However, we also appreciate the “Yes, and…” comments given for this solution, which were:

  • Solution could augment one’s learning of languages, since people internalize information they can attach to emotion)
  • Could help translate abstract concepts for which there may be a word in one language but not another
  • Solution could help us finally learn about how higher-level language tasks, such as translation or code-switching take place in the brain

We agree that this solution is indeed an interesting area to explore, but would probably need to wait for a time when technologies for direct brain stimulation is less invasive (e.g. just using a cap). Alternatively, we were considering using VR capabilities to model the kind of context and scenarios native speakers imagine when using a word, and using it to teach non-native speakers. But of course, this would have to be implemented in the teaching of the language rather than in the direct and immediate translation contexts we have identified.

For the solution that we want to focus on: Incorporating emotional/contextual states alongside input (spoken/typed) to better help AI build a more comprehensive translation or rank translation outcomes

Feedback from groups encouraged us by saying how we could consider:

  • Using skin conductance or facial recognition technology

We did indeed consider using these features especially since there are existing AIs which use screen cameras and recording devices built into the computers/smartphones to track facial expression and voice inflection to detect moods - usually by Advertising companies to personalise ads for their customers (see https://www.affectiva.com/how/how-it-works/). As such, the existing database and trained AI would be useful in testing the viability of our solution

This would also answer “Yes, but…” questions we received that were: How to detect the “emotional state” of a person? And What kind of AI tech would we use that is not invasive?

A question that we received was How would the translation software know which context is needed? And How does emotion tie into how language is learned?

For this, our intention is for the AI to be already trained emotion and context by integrating multi-modal information in their training set, such that when deployed in a hospital setting, it will be able to judge the best translation outcome at the moment. Using the example in our previous post, if someone came into the hospital complaining of “intoxicado”, the software should be able to pick up on the panic in the voice, as well as the adjacent words used to with it, such as “eating food” to produce “poisoning” as the output, as opposed to “party with friends” to produce “drug intoxication”. Essentially, the AI should be trained the identify contextual information in the surrounding speech and facial expressions to better refine their translation output.

Other groups also pointed out issues about invasion of privacy, especially with facial recognition software. One way to solve this would be to have a specific device to aid in translation, therefore instead of having it turned on all the time, it would only be used in situations that needed translation, thus mitigating the situation where people’s emotions and expressions are picked up willy-nilly.

After the rapid evaluation phase and internal evaluation, we do feel that a device that manages to integrate contextual and emotional information into translation algorithm will be the most practical solution for the fast-paced situations we are aiming to help. Moreover, AI frameworks processing emotional information seem to already exist, suggesting that such a solution is actually possible. The question now, would be what challenges might there be in the implementation such that it has not be done yet, and how might we overcome them.