In the two weeks since our last blogpost, we’ve been actively engaged in thinking more about our ideas and what they would entail. After getting our peers’ feedback, we wanted to update you guys on where we stand right now! But first, to recap, our current working ideas:

  1. Prosody-preserving language translator
    We thought this idea was extremely promising. This is definitely lacking in most AI/neural network translations such as Google Translate, which focus on literal translations but fail to note the tone of voice. We read that researchers at Amazon Alexa have been working on this in the context of making their devices better at responding to voice commands, but we think this could also be transferred to translation as well, with huge implications. Voiced translations would contain a new layer of meaning, and be more easily understood than with the bland robot voice of current translators.
  2. Audience-specific translation
    We decided to also shift our focus to the most important aspect of translation: the audience. It’s apparent that different audiences have a different breadth of everyday words and phrases that they use. Age, sex, background, geographic location can all become factors in shaping our language. That is why we came up with an audience-specific translation-- a translation device that learns the audience’s words and mannerisms (much like the predictive keyboard feature on the iPhone) to deliver more accurate translations. Since this was a promising idea, we decided to move forward with it.

Our Peer Feedback

For each of our ideas, we received some really good feedback from you! Here’s a summary of the highlights, and our responses:

Prosody-preserving language translator

  • “Yes, and it would definitely augment the quality of the translation!” - this is definitely one of our goals with this project! Prosody is such a big part of conveying our meaning in speech, and we believe it is really important to be able to bring this through to machine translations.
  • “Yes, and this can be an easy to use app.” - our goal with this is definitely to promote ease-in-use. When making apps, ensuring the user interface is intuitive is half the battle in making sure people actually use your app!
  • “Yes, and this could improve cultural relations between groups that are communicating.” - translations are such a big part of bridging the gap between peoples around the world, and we’re excited that our solution could be a part of that!
  • “Yes, but how can you apply this to something like auto-translate in closed-captioning?” - this is a really good question! While this idea is focused mostly on audio translations, some of our ideas did involve captions as well. We could potentially look into conveying prosody via text, by using the techniques many authors do, such as using commas and using italics when you really want to emphasize something.
  • “Yes but I think we need to dive a bit deeper into the actual work involved in going from improved voice assistants to solving your problem.” - since we’re still in the “thinking big” phase, we haven’t thought much about the actual implementation of our solution yet. That’s definitely a next step though!
  • “Would this be enough to ease or eliminate possible miscommunications between interpersonal communications? Are there more important variables we should add or expand on?” - this is a great point! However, while we think that our idea can definitely ease some miscommunication, this definitely isn’t its main goal - after all, miscommunications can happen among speakers of the same language! Our goal really is to improve the baseline quality of machine translation to bring in more than just the pure speech which is said, considering how much more info we convey with our voices.

Audience-specific translation

  • “Yes, and the cultural gap can be bridged.” - as with our last idea, this is definitely one of the goals for this one as well! Countless studies have shown that people understand best when things are phrased in their everyday language, and our goal was to make this accessible to everyone.
  • “Yes, and socioeconomic status could be included.” - this is an interesting concept, and definitely in line with our idea! However, we think this could fall to the same stereotyping issue we will address shortly.
  • “Yes, and app users can select the audience they want to learn as.” - allowing users to choose a mode for themselves would definitely be an integral part of our idea.
  • “Yes, but how can a device learn mannerisms of different audiences? What do you mean by mannerisms specifically?” - our thought process coming into this was that people talk differently, depending on where they are from and their background. Regionalisms, technical jargon, and dialectal differences can serve to impede communication between people, and the goal here is to address that.
  • “Yes but we're worried that this will actually lead to losing info in translation as opposed to aiding it” - this is a valid concern, not just for this idea but for any translation, especially ones intended for different audiences. Localisms, cultural references, and colloquialisms are often impossible to fully translate over. A possible solution is to look into audience-oriented equivalents for these. Many movie and book translations do this in order to keep the same tone and message for the novel audience, even if the actual content has changed.
  • “Yes, but in what context is this really necessary?” - many contexts! Fields like marketing and advertising yield many instances of ads simply not translating appropriately to different cultures, and there are people in public health who dedicate their careers to finding the best way to disseminate health information to specific populations. This definitely has potential for far-reaching effects!

Overall, the feedback we received from both definitely enabled us to think a bit more about both our ideas. The positive feedback validated that our ideas were worth considering, and the negative feedback guided where we should think next. We didn’t feel like our prosody-preserving language translator needed much revision - our negative feedback for that was mostly future-oriented and asked about next steps, which is definitely where we are headed anyway! However, one of our group members brought up a valid concern about our second idea which was not brought up in our peer feedback. Audience-specific translations could very easily in implementation lead to cultural stereotyping. We knew we had to think carefully about how to approach this barrier, and whether this idea was feasible.

Revised Solution

After reading and discussing our peers’ feedback, our group decided that it would be best to move forward with our original solution of creating a prosody-preserving language translator, albeit with minor revisions, and abandon the idea of designing an audience-specific translator.

The idea of an audience-specific translator appealed to us at first because we imagined it would personalize translations by screening for descriptive features, such as age, sex, gender, geographical location, etc. However, one of our group members expressed the concern of having the translator backfire, and instead encourage group categorization and stereotyping upon public use. We concluded that the risk of social segregation did not outweigh the possible benefits of the audience-specific translator and agreed to leave this possible “solution” in the past.

Although we chose the prosody-preserving language translator as being the best idea to currently pursue, we wish to alleviate any of our peers’ confusion or concerns regarding the details of our solution. Our goal with designing this translator, is to preserve the prosodic features of speech (intonation, stress, and rhythm) in efforts to better convey the speaker’s message. Furthermore, the preservation of prosody has the potential to influence oral language fluency, reading comprehension, and even reading fluency. In conclusion, we hope to enhance the quality, meaning and understanding of computer-generated translations by incorporating prosodic features of speech into already existing language translators.