Security vulnerabilities of voice recognition technologies

Daniel Dimov
December 7, 2015 by
Daniel Dimov

An increasing number of devices nowadays use voice recognition technologies for performing commands, allowing access to stored information, and transcribing audio to a written text. For example, voice recognition technologies are implemented in automated phone systems, car entertainment systems, and the operational systems "Windows" and "iOS".

However, since voice recognition technologies are in their infancy, it may have a large number of security vulnerabilities that can be used by hackers to gain unauthorized access to sensitive information. One of the major weakness of devices relying on voice authentication is that voice information can accessed more easily than other biometric information, such as fingerprints. For example, while one needs to be in a physical proximity to objects touched by a person in order to receive information about that person's fingerprints, this may not be necessary for obtaining access to voice information. Nowadays, many people make their voice freely available through multiple online channels, e.g., YouTube, SnapChat, and Facebook.

Learn Vulnerability Management

Learn Vulnerability Management

Get hands-on experience with dozens of courses covering vulnerability assessments, tools, management and more.

This article discusses the state of the art of voice recognition technologies (Section 2) and examines a number of security vulnerabilities in devices using such technologies. (Section 3). Next, the present contribution provides recommendations on how to prevent attacks on voice recognition devices (Section 4). Finally, a conclusion is drawn (Section 5).

State of the art of voice recognition technologies

Voice recognition technologies refer to hardware devices and the accompanying software, which are capable of decoding human voice for the purpose of performing various functions (e.g., transcribing voice to text, executing software applications, and verifying the identity of an individual).

The first automatic voice recognition device was introduced in 1952. Although the machine was not computerized, it was capable to recognize human voice pronouncing single digits. The first prototype of a modern voice recognition system, Sphinx-II, was created in 1992 by Xuedong Huang, one of the founders of Microsoft speech recognition group. The invention was capable to perform real-time voice recognition and became suitable for using in modern software applications.

Along with the advance of other computer technologies, voice recognition systems became widespread in various fields, including healthcare, customer support, avionics, military, automotive safety, and telecommunications. For example, in the field of healthcare, voice recognition technologies allows the instant processing of voice into medical records. In the field of avionics, voice recognition technologies are used for training of pilots and air traffic controllers. In the field of automotive safety, voice recognition technologies are used to enhance the safety of the passengers in automobiles because they allow the driver to make phone calls without the need to use hands. In the field of customer support, many companies have implemented automated menu systems in their service hotlines. If a customer calls to such a support hotline, he/she will hear a recorded question asked by a computer. After the voice recognition software processes the answer provided by the customer, the system redirects the customer to a suitable department.

Vulnerabilities of voice recognition technologies

In this section, we examine the voice impersonation attack, a type of attack that can be used to hack most devices relying on voice authentication (Section 3.1). Afterwards, we discuss security vulnerabilities of two particular voice recognition applications, namely, "Siri" (Section 3.2) and "Google Now" (Section 3.3).

Voice impersonation attacks

A voice impersonation attack refers to an unauthorized activation and use of a device or software based on voice recognition technology by bypassing security mechanisms with cloned recorded or synthesized speech commands. A group of researchers from the University of Alabama at Birmingham have recently demonstrated that any automated user authentication system that uses voice recognition technologies is vulnerable to voice impersonation attacks. The researchers managed to penetrate automated and human verification systems by taking a sample of user's voice and using it to gain an unauthorized access to a device.

According to the researchers, a sample of user's voice can be collected in various ways, including (1) making a spam call, (2) recording person's voice from a physical proximity of the speaker, (3) mining for audiovisual clips online, and (4) compromising cloud servers that store audio information. Nitesh Saxena, Ph.D., the leader of this research warns that:

"Just a few minutes' worth of audio in a victim's voice would lead to the cloning of the victim's voice itself. The consequences of such a clone can be grave. Because voice is a characteristic unique to each person, it forms the basis of the authentication of the person, giving the attacker the keys to that person's privacy."

Voice impersonation attacks are an important issue today, when more and more devices at home are connected to a network and rely on voice authentication technology. A voice impersonation attack that takes control over one device can allow the attacker to gain unauthorized access to multiple devices.

"Siri" security vulnerabilities

Many new smartphone users can find the voice recognition application "Siri" installed as a default setting on their device. Apple has created "Siri", a hand-free application designed for iOS operational system, with the aim to provide its customers with a virtual assistant, which can be controlled by voice. "Siri" is available on iPhones 4s and later, 3rd generation and later iPads, and 5th generation or later iPods. The software can recognize a variety of voice commands, ranging from making reservation in restaurants to providing information about the time of a sunset in Paris.

Although "Siri" was created by a reputable company, it faced security issues. In 2011, a hacker group based in China managed to jailbreak iPhone 4 and run a full version of "Siri", which did not officially operate on this model of iPhone. The jailbreak version of "Siri" has security vulnerabilities allowing the hackers to steal sensitive information from the users who installed it on their devices.

In mid-2015, a group of researchers at the French information security organization ANSSI has discovered that "Siri" can be remotely controlled by any person. Their article, which describes the security vulnerabilities of "Siri", claims that the researchers "exploited the principle of front-door coupling on smartphones headphone cables with specific electromagnetic waveforms". This method allowed the researchers to control the software by sending radio waves to a microphone attached to a mobile device that had "Siri" installed. Such radio signals activated "the home button" on the device, thus allowing "Siri" to accept commands. As a result, "Siri" was able to react to voice commands sent by an unknown person and perform unauthorized activities, including calling to premium-rate telephone numbers, sending messages, installing apps, or performing financial operations. It is worth noting that the research identified that this hack works only on smartphones that have earphones with a microphone plugged in. The video demonstrating the attack performed by ANSSI researchers is available at

"Google Now" security vulnerabilities

"Google Now", an application for Android, which is analogous to "Siri", is also designed to assist users without a keyboard or swaps on the screen. The voice commands activate the so-called "Google Now" cards, which provide practical information for the users of "Google Now". For example, the software can inform about the intensity of commute traffic, sports results, and popular places of interest nearby. "Google Now" voice commands may instruct the mobile phone to perform certain tasks, e.g., opening files, taking photos, turning on Wi-Wi.

A team of security researchers at AVG, a company providing anti-virus software, has conducted an experiment that allowed them to hack "Google Now". During the experiment, the researchers created a game for Android operational system. The game required the users to recite voice commands that can be used to launch "Google Now". The recited voice commands were used by the game to access "Google Now" and send messages to third parties. Hackers may use this method for hijacking voice-controlled devices with a purpose of gaining financial profits. The video demonstrating the use of the malicious game is available at

Prevention measures

In the coming years, when voice recognition technologies can become more popular than they are now, it is of utmost importance to take reasonable security measures for protecting users' privacy. Security researchers offer several security measures that could help to prevent the users from unauthorized access to their voice recognition devices. According to researchers, the best defense would be the creation of a voice authentication technology that would be resistant to voice imitation attacks. Such a technology would be a significant contribution to the field of speech recognition. Moreover, researchers who have tested the vulnerabilities of "Siri" advice the developers of voice recognition applications to better shield on headphone cord. Furthermore, a stronger electromagnetic sensor could contribute to preventing a voice impersonation attack. The users of voice recognition software, such as "Siri" or "Google Now", can easily enhance their own security by regularly removing the headphones from their devices and creating own custom words that can be used for launching their software.


Although the modern voice recognition technologies are evolving rapidly, the companies developing such technologies still face information security challenges. This article has clearly shown that the currently used virtual assistants and voice authentication systems are not sufficiently protected against hacker attacks, including voice impersonation attacks.

Hence, it can be concluded that the success of the voice recognition technologies will undoubtedly depend on the extent to which such technologies protect the information of their users. For example, it is hard to believe that a person will agree on using a telephone banking system relying on voice recognition technology if a mere recording of that person voice will allow hackers to send his/her money to a remote offshore jurisdiction.

Nevertheless, it should be noted that, despite their authentication problems, voice recognition technologies can be a well-suited supplement to other biometric methods, such as fingerprints, face recognition, and iris recognition. Authentication systems relying on the identification of several biometric characteristics are also known as multimodal biometric systems. Recent studies indicate that multimodal biometric systems are more secure than biometric systems relying on one biometric method.

Learn Vulnerability Management

Learn Vulnerability Management

Get hands-on experience with dozens of courses covering vulnerability assessments, tools, management and more.



Rasa Juzenaite works as a project manager in an IT legal consultancy firm in Belgium. She has a Master degree in cultural studies with a focus on digital humanities, social media, and digitization. She is interested in the cultural aspects of the current digital environment.

Daniel Dimov
Daniel Dimov

Dr. Daniel Dimov is the founder of Dimov Internet Law Consulting (, a legal consultancy based in Belgium. Daniel is a fellow of the Internet Corporation for Assigned Names and Numbers (ICANN) and the Internet Society (ISOC). He did traineeships with the European Commission (Brussels), European Digital Rights (Brussels), and the Institute for EU and International law “T.M.C. Asser Institute” (The Hague). Daniel received a Ph.D. in law from the Center for Law in the Information Society at Leiden University, the Netherlands. He has a Master's Degree in European law (The Netherlands), a Master's Degree in Bulgarian Law (Bulgaria), and a certificate in Public International Law from The Hague Academy of International law.