Google speech recognition is now much faster and accurate, thanks to a series of updates implemented by the company’s Speech Team.
The upgrade was presented on September 24 on the Google Research Blog, and details the technology behind this improved functionality. The new system allows the app to capture the meanings conveyed by human voices with higher fidelity. This is especially noticeable in environments with high levels of background noise.
The improved acoustic performance was made possible by resorting to Recurrent Neural Networks (RNN), which use artificial intelligence in order to capture whole words instead of just disparate fragments of sound.
RNNs are an extention of Deep Neural Networks (DNNs), which have been used by Google ever since 2012 after replacing the old standard, the Guassian Mixture Model (GMM).
DNNs were a major upgrade following the 30-year old former technology, because they increased the accuracy of speech recognition and made such software more reliable.
Thanks to “feedback loops” the new and improved RNNs can now identify relationships between individual phonemes and group them together. This is done by employing “Long Short-Term Memory” (LSTM) which helps store information better than other prior systems.
Google also uses “Connectionist Temporal Classifications”, as a significant refinement of the older voice search system, which used to make predictions after every single 10-millisecond snippet of sounds. Instead, the new technique resorts to its upgraded memory in order to listen further ahead and recognizes audio as a sequence.
This is achieved by taking into account temporal dependencies, which refer to phonemes based on how they are preceded and followed by other sounds. As a result, the app uses fewer computational resources (lower computation time and memory space), because it analyzes larger chunks of sounds.
Experts have also trained the recognizer by adding artificial background noise and reverberation, to enable it to perform more efficiently in noisy environments.
Overall, this system allows the app to respond more accurately to voice commands and to recognize patterns of speech, in order to yield the desired results.
The model is also much more robust, and much faster in its responses. Initially, experts discovered a delay of approximately 300 milliseconds, as the app worked ahead, in order to predict the rest of the speech, but that has been corrected and now users can get feedback in real-time.
So far, the tweak has been introduced on Android phones for Google’s search app and voice dictation, and on iOS systems for voice searches and commands. It is expected that the acoustic models will also be extended to computers that use Chrome browsers as well.
Google’s rivals have also made significant progress in the realm of voice search. Apple’s Siri can now be used on Apple TV in lieu of a remote control, Microsoft’s Cortana digital assistant can answer questions and display search results, and Amazon’s Echo follows user commands and enables home automation.
Image Source: Flickr