Google has announced a new project to build an AI model that can support the world’s 1,000 most spoken languages. The company has presented an AI model that has been trained in over 400 languages, which it describes as the “largest language coverage seen in a speech model today.” This new project emphasizes Google’s commitment to language and AI.

Google has announced the development of a “giant” AI language model that can handle more than 1,000 global languages. The company has been working on the project for a while now, and it’s already made some progress. With the help of machine learning, Google has been able to translate between languages with “zero human intervention.” Now, with the new AI language model, the company is hoping to take things to the next level. The goal is to make it easier for people to communicate with each other, regardless of the language they speak. Read more

Computers are already able to play chess games and they became unbeatable opponents; we let them read our texts and they started to write. They also learned to paint and retouch photographs. Did anyone doubt that artificial intelligence would be able to do the same with speeches and music?

Google’s research division has presented AudioLM, a framework for generating high-quality audio that remains consistent over the long term. To do this, it starts with a recording of just a few seconds in length, and is able to prolong it in a natural and coherent way. What is remarkable is that it achieves this without being trained with previous transcriptions or annotations even though the generated speech is syntactically and semantically correct Moreover, it maintains the identity and prosody of the speaker to such an extent that the listener is unable to discern which part of the audio is original and which has been generated by an artificial intelligence.

The examples of this artificial intelligence are striking. Not only is it able to replicate articulation, pitch, timbre and intensity, but it is able to input the sound of the speaker’s breathing and form meaningful sentences. If it does not start from a studio audio, but from one with background noise, AudioLM replicates it to give it continuity. More samples can be heard on the AudioLM website. Read more