HomeIADeepgram's Aura gives voice to AI agents

Deepgram's Aura gives voice to AI agents

Deepgram has made a name for itself as one of the go-to startups for speech recognition. Today, the well-funded company announced the launch of Aura, your new real-time text-to-speech API. Aura combines highly realistic voice models with a low-latency API to allow developers to create real-time conversational AI agents. Supported by large language models (LLM), these agents can then replace customer service agents in call centers and other customer-facing situations.

As Deepgram co-founder and CEO Scott Stephenson said, it has long been possible to access excellent speech models, but they were expensive and required a lot of computation and processing time. Meanwhile, low-latency models tend to look robotic. Deepgram's Aura combines human-like voice models that reproduce extremely quickly (typically in less than half a second) and, as Stephenson repeatedly pointed out, does so at a low price.

"Now everyone is saying, 'Hey, we need real-time voice AI robots that can perceive what's being said, that can understand it, generate a response, and communicate it by voice,'" he said. In his view, a combination of accuracy (which he described as at stake for a service like this), low latency, and acceptable costs is needed to make a product like this worthwhile for businesses, especially when combined with the relatively high of accessing LLMs.

Deepgram maintains that Aura's price currently beats virtually all of its competitors at $0,015 per 1.000 characters. That's not that far from the price Google is offering for its WaveNet Voices at 0,016 per 1.000 characters and Amazon's Polly, the voices of Neural voices at the same price of $0,016 per 1.000 characters, but of course it is cheaper. Amazon's highest tier is significantly more expensive.

“You have to hit a really good price in all segments, but you also have to have amazing latencies and speeds, and also amazing accuracy. So it's a really difficult thing to do,” Stephenson said of Deepgram's overall approach to developing its product. "But this is what we focused on from the beginning and this is why we built for four years before launching anything, because we were building the underlying infrastructure to make it happen."

Aura offers around a dozen voice models, all of which were trained by a dataset that Deepgram created together with voice actors. The Aura model, like all the company's other models, was trained internally. This is what it sounds like:

After testing the model, although there are sometimes some strange pronunciations, the speed is really what stands out, in addition to Deepgram's existing high-quality speech-to-text model. To highlight the speed at which it generates responses, Deepgram looks at how long it took the model to start speaking (typically less than 0,3 seconds) and how long it took the LLM to finish generating its response (which is typically just under one second).

next >>

Anthropic researchers erode AI ethics with repeated questions

Adobe is also working on generative video

Investors are increasingly wary of AI

Meta presents its new custom AI chip

TTC: US and EU establish links for AI security and risks

Building a strong startup development culture requires constant adjustment

Goody-2, AI too ethical to discuss anything

DEI: latest legal and corporate challenges

Key AI policies: Unlock your potential and protect from risks at work

It's never too late to start

TikTok now allows creators in more countries to earn money from their effects

The creative economy is ready for a labor movement

Pay attention to the hidden costs of AI to avoid ruining innovation

Cambio puts artificial intelligence robots on the phone to negotiate debts and talk to bank customers

Time to put subscription economics and its value to customers to the test

AirMyne harnesses geothermal energy to directly capture carbon from the air

Astranis presents Omega 'MicroGEO' satellites to transmit dedicated broadband from high orbit

'Banking as a Service' Startup Griffin Gets Full Banking License

Faddom maps companies' IT infrastructure in any location

AirMyne harnesses geothermal energy to directly capture carbon from the air

Apple acquires AI startup to oversee manufacturing components

Meta presents its new custom AI chip

Astranis presents Omega 'MicroGEO' satellites to transmit dedicated broadband from high orbit

Enterprise SaaS Investment Returns, But Not Where You'd Expect

The chronology you need to know about the AI Chatbot

AI: summary of main concepts

How to present a Startup to Investors

OKR Model

Creation of a Strategic Plan

Deepgram's Aura gives voice to AI agents

Adobe is also working on generative video

Investors are increasingly wary of AI

Meta presents its new custom AI chip

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

MORE PUBLICATIONS

Impossible Foods plans to lay off 20% of its workforce

Amazon introduces new chips to train and run AI models

Paytm tops $977 million in revenue in one year and cuts losses

Politicians pledge to collaborate to address AI safety

Investment in ecommerce continues to grow