Relevant and useful content to keep you up to date with all the changes in the market and in the speech to text sector.

Computers are already able to play chess games and they became unbeatable opponents; we let them read our texts and they started to write. They also learned to paint and retouch photographs. Did anyone doubt that artificial intelligence would be able to do the same with speeches and music?

Google’s research division has presented AudioLM, a framework for generating high-quality audio that remains consistent over the long term. To do this, it starts with a recording of just a few seconds in length, and is able to prolong it in a natural and coherent way. What is remarkable is that it achieves this without being trained with previous transcriptions or annotations even though the generated speech is syntactically and semantically correct Moreover, it maintains the identity and prosody of the speaker to such an extent that the listener is unable to discern which part of the audio is original and which has been generated by an artificial intelligence.

The examples of this artificial intelligence are striking. Not only is it able to replicate articulation, pitch, timbre and intensity, but it is able to input the sound of the speaker’s breathing and form meaningful sentences. If it does not start from a studio audio, but from one with background noise, AudioLM replicates it to give it continuity. More samples can be heard on the AudioLM website. Read more

OpenAI has introduced a new automatic speech recognition (ASR) system called Whisper as an open-source software kit on GitHub. Whisper’s AI can transcribe conversations in multiple languages and translate them into English, and the GPT-3 teams claim that Whisper’s training makes it easier to distinguish voices in noisy environments and understand heavy accents and technical language.

Automatic speech recognition, often called ASR, turns spoken language into text. Speech-to-text software that automatically converts your voice into written language.

This technology has many applications, including dictation and visual voice messaging software. Read more

One of the steps when translating an audio/video file involves adapting the transcription into a foreign language. This step is necessary for the subsequent subtitling phase, or for setting up a voice-over in a different language, or simply to understand the audio/video content, or to improve its visibility on the web. Read more