Voice AI is becoming increasingly ubiquitous and powerful. Forecasts suggest that voice commerce will be an $80 billion business by 2023. But speech recognition has significant race and gender biases. As with facial recognition, web searches and even soap dispensers, speech recognition is another form of AI that performs worse for women and nonwhite people.
Here’s a thought experiment: Let’s consider three Americans who all speak English as a first language. Say my friend Josh and I both use Google speech recognition. He might get 92% accuracy and I would get 79% accuracy. We’re both white. If we read the same paragraph, he would need to fix about 8% of the transcription and I’d need to fix 21%. My mixed-race female friend, Jada, is likely to get 10% lower accuracy than me.
Dialects also affect accuracy. For example, Indian English has a 78% accuracy rate and Scottish English has a 53% accuracy rate. Amazon and Google teams are working to improve that accuracy, but the problem has not yet been solved.
Disparities exist because of the way we’ve structured our data analysis, databases and machine learning. The underlying reason may be that databases have lots of white male data, and less data on female and minority voices.
AI is therefore set up to fail. Machine learning is a technique that finds patterns within data. When you use speech recognition, the system is answering the question “Which words best map onto this audio data, given the patterns and data in the database?” If the database has mostly white male voices, it will not perform as well with data it sees infrequently, such as female and other more diverse voices.
This is absolutely a matter of social injustice. And companies should be aware that the accuracy of speech recognition also affects customer purchasing decisions.
What can companies do? Be more transparent about your voice statistics, and encourage competition in the area. Remember that women and minorities have huge purchasing power — why wouldn’t you want to solve this problem?

