p1000964100638119orig

What will it take to make AI sound human?

Conversation fillers such as "hmm" and "uh-huh" may seem like insignificant parts of human conversation, but they're critical to improving communication between humans and artificial intelligence.

So argues Alan Black, a professor in the Language Technologies Institute at the Carnegie Mellon School of Computer Science, who specializes in speech synthesis and ways to make artificially intelligent speech sound more real.

Both Siri and Cortana incorporate aspects of Black's work, he says. But for the most part, such technologies still boil down to a pretty simple pattern: The human speaks, then the machine processes that speech and answers.

"It's not really how humans interact," Black said in an interview on Friday. "It's a stilted kind of interaction."

Key to making such conversations more natural are pauses, fillers, laughs and the ability of speakers to anticipate and complete each other's sentences -- all of which help build rapport and trust.

"Laughing is part of communication," he said. "Machines don't do that -- if they did, it would be unbelievably creepy -- but ultimately they should."

Black and his students are working on those areas.

"You need mm-hmm, back channels, hesitations and fillers, and so far our speech synthesizers can't do that," Black said. "If a system does say 'uh-huh,' it sounds like a robot."

Technologies using synthetic voices typically use speech recorded by humans "in a little room reading sentences," he explained. That, in turn, is "why they sound bored."

Working with students, Black is experimenting with using voices recorded in dialog, so that even if you just capture and use one side, it's clear the speakers are engaged. The idea is to model and incorporate the variance in human responses rather than using the same response all the time -- otherwise, humans can tell it's fake, Black said.

Ultimately, good AI will also know your views on certain topics, such as which candidate you support or oppose in a political race, so it won't say something offensive.

"On a higher level, it's a matter of being personalized," Black said. "That can be creepy, but it can also be appropriate, and it's important for trust. It's all about building this thing that's close to what humans expect and makes it easier to have this conversation."

Looking ahead, another big issue is how to get people to learn to do new things with their devices. There's basic interaction happening now with technologies like Siri and Cortana, but the next challenge is to get users to turn to AI first for answers, Black said.

Some users have been embarrassed talking to their phones but more comfortable talking to Amazon Echo because all they have to do is speak out loud in their homes. "People are treating it differently," he said. "It's there in the room with you."

IDG Insider

PREVIOUS ARTICLE

« Tom Clancy's The Division review: Bow before the Grind

NEXT ARTICLE

Good news for home theater fans: Yamaha releases DTS:X firmware, while Denon and Marantz adopt Auro-3D »
author_image
IDG News Service

The IDG News Service is the world's leading daily source of global IT news, commentary and editorial resources. The News Service distributes content to IDG's more than 300 IT publications in more than 60 countries.

  • Mail

Recommended for You

Trump hits partial pause on Huawei ban, but 5G concerns persist

Phil Muncaster reports on China and beyond

FinancialForce profits from PSA investment

Martin Veitch's inside track on today’s tech trends

Future-proofing the Middle East

Keri Allan looks at the latest trends and technologies

Poll

Do you think your smartphone is making you a workaholic?