{"id":14367,"date":"2023-11-10T07:19:19","date_gmt":"2023-11-10T06:19:19","guid":{"rendered":"https:\/\/www.lenseup.com\/?p=14367"},"modified":"2023-11-10T07:40:42","modified_gmt":"2023-11-10T06:40:42","slug":"openais-whisper-3-a-game-changer-in-speech-recognition","status":"publish","type":"post","link":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/","title":{"rendered":"OpenAI&#8217;s Whisper 3 &#8211; A Game Changer in Speech Recognition"},"content":{"rendered":"<p><strong>OpenAI Unveils Whisper 3: The Next-Gen Open Source ASR Model<\/strong><\/p>\n<p>OpenAI&#8217;s recent Developer Day saw the unveiling of<a href=\"https:\/\/github.com\/openai\/whisper\"> Whisper large-v3<\/a>, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. This development marks a significant leap in speech recognition technology, with OpenAI planning to extend its reach through an accessible API for users in the near future.<!--more--><\/p>\n<p><strong>Enhanced Performance in English and Multilingual Capabilities<\/strong><\/p>\n<p>Whisper 3 excels in English language applications, particularly with its <code>tiny.en<\/code> and <code>base.en<\/code> models, achieving impressive accuracy rates. However, the model&#8217;s performance varies with different languages, a challenge OpenAI continues to address.<\/p>\n<p>Originally centered on English, the model has evolved since its initial release in September last year. December saw the introduction of version 2, broadening its linguistic scope to include multiple languages, though these specific languages have not been explicitly named.<\/p>\n<p><strong>A Tool for Diverse Applications<\/strong><\/p>\n<p>Available on GitHub under a permissive license, Whisper large-v3 is celebrated for its proficiency in transcribing diverse content. Its exceptional functionality and ease of use have earned it the title of the best transcription tool currently available. A standout feature is its unique timestamp section, which is particularly useful for creating subtitles on platforms like YouTube.<\/p>\n<p>The model processes audio by dividing it into 30-second segments, which are then decoded to predict the corresponding text captions. Additionally, it includes a language identification feature, enabling it to transcribe and translate multilingual speech into English.<\/p>\n<p><strong>Integration with ChatGPT and Focus on Research<\/strong><\/p>\n<p>Though initially anticipated to be integrated with ChatGPT for direct speech-to-text interaction, OpenAI opted to make the model publicly available, primarily targeting the research community. This decision highlights OpenAI&#8217;s dedication to advancing the field of speech recognition and language processing.<\/p>\n<p>The model was developed using an extensive dataset of 680,000 hours of supervised data, with a significant portion coming from non-English sources. This rigorous training process underscores OpenAI&#8217;s commitment to creating a robust and versatile ASR tool.<\/p>\n<p><strong>Complementary Technologies: The Audio API<\/strong><\/p>\n<p>OpenAI has also introduced a text-to-speech API, the Audio API, which complements Whisper large-v3. It offers six preset voices and two AI model variants, poised to revolutionize user interaction with applications through natural-sounding speech. Starting today, this service is available at competitive rates, aiming to make digital interactions more natural and accessible.<\/p>\n<p>However, OpenAI&#8217;s Audio API currently does not support emotional tone modulation in its audio output. The company acknowledges that text characteristics like capitalization and grammar might influence voice output, but admits that the effectiveness of these factors has been inconsistent in internal testing.<\/p>\n<p><strong>Looking Ahead: The Impact of Whisper and Audio API<\/strong><\/p>\n<p>OpenAI&#8217;s Whisper large-v3 and Audio API are not just technological advancements; they represent a paradigm shift in how we interact with digital systems. By making these technologies more accessible and user-friendly, OpenAI is setting new standards in speech recognition and synthesis, paving the way for more intuitive and engaging digital experiences.<\/p>\n<p>In conclusion, OpenAI&#8217;s latest developments in ASR and text-to-speech technology hold tremendous potential for a wide range of applications, from enhancing accessibility to transforming how we learn and interact with AI systems. The future of speech technology, powered by OpenAI&#8217;s innovations, promises to be more inclusive, efficient, and user-centric.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI Unveils Whisper 3: The Next-Gen Open Source ASR Model OpenAI&#8217;s recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. This development marks a significant leap in speech recognition technology, with OpenAI planning to extend its reach through an accessible API for users in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":14368,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,81],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI&#039;s Whisper 3 - A Game Changer in Speech Recognition<\/title>\n<meta name=\"description\" content=\"OpenAI&#039;s recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI&#039;s Whisper 3 - A Game Changer in Speech Recognition\" \/>\n<meta property=\"og:description\" content=\"OpenAI&#039;s recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/\" \/>\n<meta property=\"og:site_name\" content=\"LenseUp, video and audio solutions\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-10T06:19:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-10T06:40:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"514\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"LenseUp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"LenseUp\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/\",\"url\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/\",\"name\":\"OpenAI's Whisper 3 - A Game Changer in Speech Recognition\",\"isPartOf\":{\"@id\":\"https:\/\/www.lenseup.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg\",\"datePublished\":\"2023-11-10T06:19:19+00:00\",\"dateModified\":\"2023-11-10T06:40:42+00:00\",\"author\":{\"@id\":\"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/dadfed1f52570f3378a4679e8e398337\"},\"description\":\"OpenAI's recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage\",\"url\":\"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg\",\"contentUrl\":\"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg\",\"width\":900,\"height\":514},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.lenseup.com\/en\/home-oct-2021\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI&#8217;s Whisper 3 &#8211; A Game Changer in Speech Recognition\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.lenseup.com\/en\/#website\",\"url\":\"https:\/\/www.lenseup.com\/en\/\",\"name\":\"LenseUp, multilingual audio and video solutions\",\"description\":\"Audioguides, audio books, audio and video translations, multilingual chatbots... discover LenseUp.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.lenseup.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/dadfed1f52570f3378a4679e8e398337\",\"name\":\"LenseUp\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/630b0f43e55077cd2abe39e3e9e2a52c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/630b0f43e55077cd2abe39e3e9e2a52c?s=96&d=mm&r=g\",\"caption\":\"LenseUp\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI's Whisper 3 - A Game Changer in Speech Recognition","description":"OpenAI's recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI's Whisper 3 - A Game Changer in Speech Recognition","og_description":"OpenAI's recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.","og_url":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/","og_site_name":"LenseUp, video and audio solutions","article_published_time":"2023-11-10T06:19:19+00:00","article_modified_time":"2023-11-10T06:40:42+00:00","og_image":[{"width":900,"height":514,"url":"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg","type":"image\/jpeg"}],"author":"LenseUp","twitter_card":"summary_large_image","twitter_misc":{"Written by":"LenseUp","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/","url":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/","name":"OpenAI's Whisper 3 - A Game Changer in Speech Recognition","isPartOf":{"@id":"https:\/\/www.lenseup.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage"},"image":{"@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage"},"thumbnailUrl":"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg","datePublished":"2023-11-10T06:19:19+00:00","dateModified":"2023-11-10T06:40:42+00:00","author":{"@id":"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/dadfed1f52570f3378a4679e8e398337"},"description":"OpenAI's recent Developer Day saw the unveiling of Whisper large-v3, a state-of-the-art upgrade to their open-source automatic speech recognition (ASR) model. Learn more.","breadcrumb":{"@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#primaryimage","url":"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg","contentUrl":"https:\/\/www.lenseup.com\/wp-content\/uploads\/2023\/11\/dall-3-e.jpg","width":900,"height":514},{"@type":"BreadcrumbList","@id":"https:\/\/www.lenseup.com\/en\/openais-whisper-3-a-game-changer-in-speech-recognition\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.lenseup.com\/en\/home-oct-2021\/"},{"@type":"ListItem","position":2,"name":"OpenAI&#8217;s Whisper 3 &#8211; A Game Changer in Speech Recognition"}]},{"@type":"WebSite","@id":"https:\/\/www.lenseup.com\/en\/#website","url":"https:\/\/www.lenseup.com\/en\/","name":"LenseUp, multilingual audio and video solutions","description":"Audioguides, audio books, audio and video translations, multilingual chatbots... discover LenseUp.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.lenseup.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/dadfed1f52570f3378a4679e8e398337","name":"LenseUp","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.lenseup.com\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/630b0f43e55077cd2abe39e3e9e2a52c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/630b0f43e55077cd2abe39e3e9e2a52c?s=96&d=mm&r=g","caption":"LenseUp"}}]}},"_links":{"self":[{"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/posts\/14367"}],"collection":[{"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/comments?post=14367"}],"version-history":[{"count":4,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/posts\/14367\/revisions"}],"predecessor-version":[{"id":14375,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/posts\/14367\/revisions\/14375"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/media\/14368"}],"wp:attachment":[{"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/media?parent=14367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/categories?post=14367"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lenseup.com\/en\/wp-json\/wp\/v2\/tags?post=14367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}