Speech recognition and speech-to-text: news and updates

LenseUp’s new services: voice cloning, lip sync and multilingual dubbing

News, Speech recognition and speech-to-text

As we step into 2024, LenseUp is thrilled to unveil a suite of advanced services that redefine the landscape of multilingual audio and video production. With a team of native professionals and translators, LenseUp is your go-to partner for audio and video translations, now enhanced with cutting-edge technology. Read more

17 January 2024/by LenseUp

OpenAI’s Whisper 3 – A Game Changer in Speech Recognition

Speech recognition and speech-to-text, Text-to-Speech (TTS) / AI voices

OpenAI Unveils Whisper 3: The Next-Gen Open Source ASR Model

OpenAI’s recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. This development marks a significant leap in speech recognition technology, with OpenAI planning to extend its reach through an accessible API for users in the near future. Read more

10 November 2023/by LenseUp

Fresh approaches to improve the quality of synthetic speech and text-to-speech

Language models, Speech recognition and speech-to-text

AI has drastically altered the way people go about their daily lives. Voice recognition has simplified activities like taking notes, typing documents, and more. Its speed and efficiency are what makes it so popular. With the progress made in AI, many voice recognition applications have been created. Google, Alexa, and Siri are a few examples of virtual assistants that use voice recognition software to communicate with users. Additionally, text–to–speech, speech–to–text, and text–to–text have been widely adopted in various applications. Read more

24 February 2023/by LenseUp

Google AudioLM is already capable of making speeches with your voice

Language models, Speech recognition and speech-to-text

Computers are already able to play chess games and they became unbeatable opponents; we let them read our texts and they started to write. They also learned to paint and retouch photographs. Did anyone doubt that artificial intelligence would be able to do the same with speeches and music?

Google’s research division has presented AudioLM, a framework for generating high-quality audio that remains consistent over the long term. To do this, it starts with a recording of just a few seconds in length, and is able to prolong it in a natural and coherent way. What is remarkable is that it achieves this without being trained with previous transcriptions or annotations even though the generated speech is syntactically and semantically correct Moreover, it maintains the identity and prosody of the speaker to such an extent that the listener is unable to discern which part of the audio is original and which has been generated by an artificial intelligence.

The examples of this artificial intelligence are striking. Not only is it able to replicate articulation, pitch, timbre and intensity, but it is able to input the sound of the speaker’s breathing and form meaningful sentences. If it does not start from a studio audio, but from one with background noise, AudioLM replicates it to give it continuity. More samples can be heard on the AudioLM website. Read more

7 October 2022/by LenseUp

OpenAI releases “Whisper” transcription and translation AI as open source ASR

Speech recognition and speech-to-text

OpenAI has introduced a new automatic speech recognition (ASR) system called Whisper as an open-source software kit on GitHub. Whisper’s AI can transcribe conversations in multiple languages and translate them into English, and the GPT-3 teams claim that Whisper’s training makes it easier to distinguish voices in noisy environments and understand heavy accents and technical language.

Automatic speech recognition, often called ASR, turns spoken language into text. Speech-to-text software that automatically converts your voice into written language.

This technology has many applications, including dictation and visual voice messaging software. Read more

30 September 2022/by LenseUp

Audio or video transcriptions: how to translate them?

Speech recognition and speech-to-text

One of the steps when translating an audio/video file involves adapting the transcription into a foreign language. This step is necessary for the subsequent subtitling phase, or for setting up a voice-over in a different language, or simply to understand the audio/video content, or to improve its visibility on the web. Read more

8 February 2021/by LenseUp

LenseUp’s new services: voice cloning, lip sync and multilingual dubbing

OpenAI’s Whisper 3 – A Game Changer in Speech Recognition

Fresh approaches to improve the quality of synthetic speech and text-to-speech

Google AudioLM is already capable of making speeches with your voice

OpenAI releases “Whisper” transcription and translation AI as open source ASR

Audio or video transcriptions: how to translate them?

LenseUp

Contact

Video and Audio solutions

Archive for category: Speech recognition and speech-to-text

LenseUp

Contact

Video and Audio solutions