Relevant and useful content to keep you up to date with all the changes in the market and in the speech to text sector.

As we step into 2024, LenseUp is thrilled to unveil a suite of advanced services that redefine the landscape of multilingual audio and video production. With a team of native professionals and translators, LenseUp is your go-to partner for audio and video translations, now enhanced with cutting-edge technology. Read more

OpenAI Unveils Whisper 3: The Next-Gen Open Source ASR Model

OpenAI’s recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. This development marks a significant leap in speech recognition technology, with OpenAI planning to extend its reach through an accessible API for users in the near future. Read more

AI has drastically altered the way people go about their daily lives. Voice recognition has simplified activities like taking notes, typing documents, and more. Its speed and efficiency are what makes it so popular. With the progress made in AI, many voice recognition applications have been created. Google, Alexa, and Siri are a few examples of virtual assistants that use voice recognition software to communicate with users. Additionally, texttospeech, speechtotext, and texttotext have been widely adopted in various applications. Read more

Computers are already able to play chess games and they became unbeatable opponents; we let them read our texts and they started to write. They also learned to paint and retouch photographs. Did anyone doubt that artificial intelligence would be able to do the same with speeches and music?

Google’s research division has presented AudioLM, a framework for generating high-quality audio that remains consistent over the long term. To do this, it starts with a recording of just a few seconds in length, and is able to prolong it in a natural and coherent way. What is remarkable is that it achieves this without being trained with previous transcriptions or annotations even though the generated speech is syntactically and semantically correct Moreover, it maintains the identity and prosody of the speaker to such an extent that the listener is unable to discern which part of the audio is original and which has been generated by an artificial intelligence.

The examples of this artificial intelligence are striking. Not only is it able to replicate articulation, pitch, timbre and intensity, but it is able to input the sound of the speaker’s breathing and form meaningful sentences. If it does not start from a studio audio, but from one with background noise, AudioLM replicates it to give it continuity. More samples can be heard on the AudioLM website. Read more

OpenAI has introduced a new automatic speech recognition (ASR) system called Whisper as an open-source software kit on GitHub. Whisper’s AI can transcribe conversations in multiple languages and translate them into English, and the GPT-3 teams claim that Whisper’s training makes it easier to distinguish voices in noisy environments and understand heavy accents and technical language.

Automatic speech recognition, often called ASR, turns spoken language into text. Speech-to-text software that automatically converts your voice into written language.

This technology has many applications, including dictation and visual voice messaging software. Read more

One of the steps when translating an audio/video file involves adapting the transcription into a foreign language. This step is necessary for the subsequent subtitling phase, or for setting up a voice-over in a different language, or simply to understand the audio/video content, or to improve its visibility on the web. Read more