CR: Automatically adjust audio signal for listeners of a translation

URL

https://dev.fairteaching.net/b/den-gwm-s1c-hlp

Current state

  • As a participant of a conference with translations I can choose which translation I would like to hear
  • Given a conference with two languages (e.g. German and Russian) - any of those languages may be spoken in the main room
  • As a participant who only speaks Russian I need to choose the Russian translation
  • Now when somebody speaks German in the main room, I hear the Russian translation
  • But when somebody speaks Russian in the main room, I hear nothing, because the translator does not speak in the translation channel

Desired state

There are a couple of possible solutions to this scenario.

The most important requirement above all is: The user should not have to react to the changing languages. This so important, as the client would like to be able to stream a conference (using a virtual participant). And when streaming it is not possible to interact with the UI once set up.

  • Solution 1: Automatically adjust volume when there is nothing going on in the translation channel
    • As a participant - you always listen to both - the main room as well as the translation
    • When the translation has any output, play the translation at a high volume, but keep the main room in the background with a very low volume
    • When the translation currently has no output, turn down the volume of the translation and turn up the volume of the main room
    • So basically this is about analyzing the audio signals and adjusting the volumes based on it.
  • Solution 2: Play translation, when any translator is unmuted. Play main audio, when all translators are muted.
    • The desired behavior is similar to the one above, but this time you don't analyze the volume of the translation.
    • Instead you will need a way to count the number of unmuted speakers in an audio channel and adjust volumes based on that.
    • It requires the possibility to mute/unmute oneself in a translation channel, see proposal in #12 (closed)

The second solution would technically be more safe, because it relies on the decision of interpreters and not on audio signal processing.

Resources

Edited by Kollotzek Markus