Spanish English Eesti French German Italian Portuguese
Social Marketing
HomeTechnologyArtificial IntelligenceOpenAI introduces Whisper API for speech-to-text transcription

OpenAI introduces Whisper API for speech-to-text transcription

To coincide with the release of the ChatGPT API, OpenAI today released the Whisper API, a hosted version of the open source Whisper speech-to-text model that the company released in September.

Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI says enables “robust” transcription in multiple languages, as well as translation from those languages ​​into English. Takes files in a variety of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.

Countless organizations have developed highly capable voice recognition systems, which sit at the core of the software and services of tech giants like Google, Amazon, and Meta. But what makes Whisper different is that it was trained on 680.000 hours of multilingual and “multitasking” data collected from the web, according to OpenAI president and president Greg Brockman, which led to better recognition of unique accents, noise background and technical jargon.

“We released a model, but that wasn't really enough for the entire developer ecosystem to build around it,” Brockman said. “The Whisper API is the same great model you can get from open source, but we've optimized it to the extreme. It's much, much faster and extremely convenient."

To Brockman's point, there are many barriers when it comes to companies adopting voice transcription technology. According to Statista in a poll As of 2020, companies cite accuracy, accent or dialect-related recognition issues, and cost as the top reasons they haven't adopted technology like voice technology.

However, Whisper has its limitations, particularly in the area of ​​“next word” prediction. Because the system was trained on a large amount of noisy data, OpenAI warns that Whisper could include words in its transcriptions that were not actually spoken, possibly because it is trying to predict the next word in audio and transcribe the audio recording. Additionally, Whisper does not perform equally well in all languages, as it suffers from a higher error rate when dealing with speakers of languages ​​that are not well represented in the training data.

Unfortunately, that last bit is nothing new in the world of speech recognition. Biases have long plagued even the best systems, in a study The 2020 Stanford results on systems from Amazon, Apple, Google, IBM, and Microsoft made significantly fewer errors (about 19%) with white users than with black users.

Despite this, OpenAI believes that Whisper's transcription capabilities are used to improve existing applications, services, products, and tools. The AI-powered language learning app Speak is already using the Whisper API to power a new in-app virtual speaking partner.

If OpenAI can enter the speech-to-text market in a major way, it could be quite profitable for the Microsoft-backed company. According to the report, the segment could be worth $5,4 billion by 2026, up from $2,2 billion in 2021.

"Our image is that we really want to be this universal intelligence," Brockman said. “We really want to, very flexibly, be able to take whatever kind of data you have, whatever kind of task you want to do, and be a force multiplier in that care.”

RELATED

Leave a response

Please enter your comment!
Please enter your name here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks