The views of this article are the perspective of the author and may not be reflective of Confessions of the Professions.
Automatic Speech Recognition Devices
We are probably quite a few years away from everyone having their own personal robot, however automatic speech recognition (ASR) and interactive voice response (IVR) make communicating with our mobile devices for information that much easier.
Think about how much Siri and other software platforms are used by our society today. West Interactive has put together a very cool graphic and comprehensive guide to how your devices learn to talk to you using automatic speech recognition. The guide covers the basics to how ASR works along with examples of ASR in action and also how voice recognition learns from you.
This infographic shows in detail how voice-recognition and device-learning technology works.
Click to open / Right-click for save options
How Your Devices Learn To Talk To You
The Basics: How automatic speech recognition (ASR) works
- You speak into your device.
- The device creates a wave form from the sound.
- Background noise is reduced and the volume is normalized.
- This filtered wave form is broken into phonemes.
- Each phoneme is like a link in a chain. Based on the first phoneme, statistical analysis is used to find the most likely phonemes to follow.
- Words are connected in the same way.
Examples of ASR: Putting it all to Work
A direct dialog conversation
– Which of the following would you like to do? You can say ‘check my balance’, ‘make a payment’, ‘change my account’, or ‘more options’
– Check my balance
– Your balance is…
A natural language conversation
– How can I help you?
– What’s the weather forecast?
– 72 and sunny
– direct dialog
– natural language
ASR can be used in different ways. Two examples you’ve probably experienced are directed dialog and natural language. Directed dialog ASR systems ask you to choose from a set ‘menu’ of choices. Natural language systems can have more open-ended ‘conversations’ with you.
So How Does Natural Language Work?
– Does he mean ‘weather’ or ‘whether’?
If you say three words to a program with a 60,000 word vocabulary (which is common), there are a possible 216 trillion combinations for it to process.
Context clues from other words can help narrow down which word is being used. The word ‘forecast’ helps the program determine that the word ‘weather’ is being used, instead of the word ‘whether’.
The larger the vocabulary of the program, the more training it needs to work with the same speed as a program with a small vocabulary.
Rather than searching the entire vocabulary for each word and processing them separately, natural language systems react to certain ‘tagged’ words and phrases like “weather forecast”, “check my balance” or “pay my bill”.
The Tuning Test: How Voice Recognition ‘Learns’ From You
– I always learn so much from our conversations.
ASR systems can either be ‘tuned’ by humans, or they can learn on the fly through ‘active learning’.
Programmers and linguists can manually review logs to identify opportunities to teach the system words and phrases that are becoming more commonly used. Think of it as adding words and phrases to the dictionary. This process is called ‘tuning’.
As with your phone’s auto-correct, data is stored from past interactions as the program gets to know the words, and combinations of words, you most often use. Likewise, if an auto-correct is repeatedly denied, the system will begin to treat the ‘incorrect’ word as a word in its own right.
Presented by west®