In an era where global communication is paramount, the ability to break down language barriers is more crucial than ever. Meta, the tech conglomerate, has taken a monumental step in this direction with the introduction of its latest AI model, SeamlessM4T. This groundbreaking model is poised to redefine the landscape of multilingual communication, offering real-time translations and transcriptions in nearly 100 languages. It will be particularly helpful for video and audio translation.

Read more

In the realm of audio production, the integration of artificial intelligence has always been a topic of intrigue. Imagine a world where musicians and content creators can craft intricate soundscapes and melodies using simple text prompts. This is no longer a distant dream, thanks to Meta’s groundbreaking release: AudioCraft. Read more

Large Language Models (LLM) have been in the spotlight for several months. Indeed, they are one of the most powerful advancements in the field of artificial intelligence. These models are transforming the way humans interact with machines. As each sector adopts these models, they are the prime example of how AI will be ubiquitous in our lives. The LLMs excel in text production for tasks involving complex interactions and knowledge search, the best example being the famous chatbot developed by OpenAI, ChatGPT, based on the Transformer architecture of GPT 3.5 and GPT 4. Not only in text generation, but models like CLIP (Contrastive Language-Image Pretraining) have also been developed for image production, allowing the creation of text based on the content of the image. Read more

The advent of artificial intelligence (AI) has brought about a revolution in various sectors, and the museum industry is no exception. The introduction of AI chatbots, particularly OpenAI’s ChatGPT, has opened up a plethora of opportunities for museums to enhance visitor experience, or streamline operations. This article explores how ChatGPT can be utilized in museums, drawing insights from various sources and experts in the field. Read more

ChatGPT is a chatbot developed by OpenAI. It is based on instructGPT: it has been trained to respond to instructions or “prompts” written by users.

ChatGPT shows an impressive ability to provide detailed, consistent and relevant answers. It appears to be particularly good at natural language processing (NLP) tasks such as summarising, answering questions, generating speech and machine translation.

However, as a very new system, ChatGPT still needs to be scientifically evaluated to compare its natural language processing performance with previous work. Read more

Google has announced a new project to build an AI model that can support the world’s 1,000 most spoken languages. The company has presented an AI model that has been trained in over 400 languages, which it describes as the “largest language coverage seen in a speech model today.” This new project emphasizes Google’s commitment to language and AI.

Google has announced the development of a “giant” AI language model that can handle more than 1,000 global languages. The company has been working on the project for a while now, and it’s already made some progress. With the help of machine learning, Google has been able to translate between languages with “zero human intervention.” Now, with the new AI language model, the company is hoping to take things to the next level. The goal is to make it easier for people to communicate with each other, regardless of the language they speak. Read more

Computers are already able to play chess games and they became unbeatable opponents; we let them read our texts and they started to write. They also learned to paint and retouch photographs. Did anyone doubt that artificial intelligence would be able to do the same with speeches and music?

Google’s research division has presented AudioLM, a framework for generating high-quality audio that remains consistent over the long term. To do this, it starts with a recording of just a few seconds in length, and is able to prolong it in a natural and coherent way. What is remarkable is that it achieves this without being trained with previous transcriptions or annotations even though the generated speech is syntactically and semantically correct Moreover, it maintains the identity and prosody of the speaker to such an extent that the listener is unable to discern which part of the audio is original and which has been generated by an artificial intelligence.

The examples of this artificial intelligence are striking. Not only is it able to replicate articulation, pitch, timbre and intensity, but it is able to input the sound of the speaker’s breathing and form meaningful sentences. If it does not start from a studio audio, but from one with background noise, AudioLM replicates it to give it continuity. More samples can be heard on the AudioLM website. Read more