Noise-canceling headphones are widespread nowadays, but scientists have found a way to take these devices to the next level — by creating headphones that can focus on one external sound source and block out all other noises.
The technology, called "Target Speech Hearing," uses artificial intelligence (AI) to let the wearer face a speaker nearby and — after a delay of a couple of seconds — lock onto their voice. This lets the user hear only that specific audio source, retaining the signal even if the speaker moves around or turns away.
The technology comprises a small computer that can be embedded into a pair of commercial, off-the-shelf headphones, using signals from the headphones' built-in microphone to select and identify a speaker's voice. The scientists outlined the details in a paper published on May 11 in the journal Proceedings of the CHI Conference on Human Factors in Computing Systems.
Scientists hope the technology could be used as aids for people with impaired hearing, and they are working to embed the system into commercial earbuds and hearing aids next.
"We tend to think of AI now as web-based chatbots that answer questions," said study lead author, Shyam Gollakota, professor of Computer Science & Engineering at the University of Washington. "In this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking," Gollakota said in a statement.
Target Speech Hearing (TSH) follows on from research the same scientists conducted into "semantic hearing" last year. In that project, they created an AI-powered smartphone app that could be paired with headphones, which let the wearer choose to hear from a list of preset "classes" while canceling out all other noises. For example, a wearer could choose to hear sirens, babies, speech or birds — and the headphones would single out only those noises and block out all others.
To use TSH, the wearer faces straight in front of the speaker whose voice they wish to hear, before tapping a small button on the headphones to activate the system when positioned correctly.
When the speaker's voice arrives at the microphone, the machine learning software then "enrolls" the audio source. It allows for a small margin of error — in case the listener isn't directly perpendicular to the speaker — before it identifies the target voice and registers vocal patterns. This lets it lock onto the speaker regardless of the volume or the direction they're facing.
As the speaker continues talking, it improves the system's ability to focus on the sound because the algorithm better identifies the unique patterns of the target sound over time.
For now, TSH can only enroll a single audio source, or a single speaker, at any one time, and it's less successful if there's another noise of a similar volume coming from the same direction.
In an ideal world, the scientists would present the system with a "clean" audio sample to identify and enroll, with no other environmental noise that could interfere with the process, they said in the paper. But this would not be well-aligned with building a practical device, as obtaining a clear sound is challenging in real-world scenarios.