Top

The voice imitation game. Research shows the risks of voice authentication

August 5, 2017

Most of us have seen the Terminator movies at least once. To be honest, the voice imitation scenes are creepy every time. But we are getting closer to that technology then some might think. University of Alabama at Birmingham (UAB) researchers have discovered that automated and human verification for voice-based user authentication systems can be vulnerable. Voice impersonation attacks open the door to a whole new world of threats, those that penetrate human verification systems. This is a problem that could affect even government organizations, since voice biometrics have become more popular for building access control.

At the 20th European Symposium on Research in Computer Security (ESORICS), held in Vienna between 21-25 September, UAB’s new discovery was presented in front of researchers, system developers, and other participants concerned with the protection of information. The aim of the symposium is to bring together researchers in the security area who can exchange ideas, and ultimately improve the progress of data protection.

Voice biometrics have emerged as a simple and efficient way to secure information in a time when identity theft was rising every day. It should be a convenient and safe authentication method that along with a password, can act like an added layer of security. Voice biometrics have become quite popular among banks and government organizations, because they provide a simple and (until now) safe way of securing someone’s access. The whole concept is based on the assumption that each person has a unique voice that is influenced by the physiological features of the vocal cords, and our entire body shape. Thus, there should be endless unique combinations. The UAB research implies that this system is also vulnerable, and once your voice is hacked, the damage possibilities become almost limitless for the attacker, since he basically has a copy of your voice.

How a voice impersonation attack works

1. The attacker collects a voice sample from his target. The sample can be both online and offline. The goal is to get a recording of the target’s voice.

2. It’s voice-morphing time. Using a voice-morphing software, a model of the victim’s speech patterns is built. Voice morphing, or voice transformation is a software process that creates an alteration of a voice. This had many uses along the years – from witness protection, to just having fun with effects added to a voice. The main point is that a voice can also be replicated up to a point, not just altered, and voice-morphing tools have become more accessible.

3. The resulted model is used to say anything in the victim’s voice. An attacker can have an entire conversation in a hacked voice, not to mention using it for passwords and other authentication keywords.

The researchers at UAB developed a voice impersonation attack using an off-the-shelf voice-morphing tool.

Because people rely on the use of their voices all the time, it becomes a comfortable practice,” said Nitesh Saxena, Ph.D., the director of the Security and Privacy In Emerging computing and networking Systems (SPIES) lab and associate professor of computer and information sciences at UAB. “What they may not realize is that level of comfort lends itself to making the voice a vulnerable commodity. People often leave traces of their voices in many different scenarios. They may talk out loud while socializing in restaurants, giving public presentations or making phone calls, or leave voice samples online.

If you think about it, there is no real way to protect your voice. It enables communication through a variety of channels – physical, phone or digital, all of which can provide the means for a recording that an attacker can use.

Oprah Winfrey and Morgan Freeman helped with the demonstration

The research has shown that with just a number of samples, an attacker can build a very close model of the victim’s voice, close enough to fool most voice-verification algorithms.

As a result, just a few minutes’ worth of audio in a victim’s voice would lead to the cloning of the victim’s voice itself,” Saxena said. “The consequences of such a clone can be grave. Because voice is a characteristic unique to each person, it forms the basis of the authentication of the person, giving the attacker the keys to that person’s privacy.

The case study of the paper explored the aftermaths of stealing voices in two important applications and contexts that involve voice authentication. The first application is a voice-biometrics, or speaker-verification, system that uses the potentially unique features of an individual’s voice to authenticate that individual. The second research looked at the impact of stealing voices to imitate humans in conversation. The examples imitated by the tool were two of the most famous and well-known celebrity voices, those of Oprah Winfrey and Morgan Freeman.

The conclusion of the case study is that an attacker can make the morphing system say anything in the victim’s voice. A voice impersonation attack can harm security, reputations, and even the safety of the victim’s friends and family.

Our research showed that voice conversion poses a serious threat, and our attacks can be successful for a majority of cases,” Saxena said. “Worryingly, the attacks against human-based speaker verification may become more effective in the future because voice conversion/synthesis quality will continue to improve, while it can be safely said that human ability will likely not.

The only reliable way to secure voice-authentication would be to develop a technology that can detect the live presence of the speaker. Until this technology is available, we should maybe sacrifice commodity for safety, and be more careful about the voice recordings and videos we post online.