New research from ADAPT offers fresh insight into how state-of-the-art artificial intelligence systems process incomplete or disrupted speech, shedding light on the internal workings of modern automatic speech recognition (ASR) models. The paper, published in Computer Speech & Language, is titled “Under the hood: Phonemic Restoration in transformer-based automatic speech recognition”. It was authored by ADAPT researchers Iona Gessinger and Erfan A. Shams, with supervision from Professor Julie Carson-Berndsen, and was supported by Taighde Éireann – Research Ireland through the ADAPT Centre.
In the paper the researchers examine how two leading transformer-based ASR models — wav2vec 2.0 and Whisper— respond when parts of spoken language are obscured or missing. While ASR systems are typically evaluated using output-level measures such as word error rate, far less is known about how these models internally process degraded speech signals.
Inspired by the Phonemic Restoration Effect, a well-documented phenomenon in human speech perception where listeners “hear” missing sounds based on context, the researchers investigated whether similar restorative processes occur inside AI systems. Using domain-informed probing techniques, the team analysed speech embeddings across multiple transformer layers, examining how well the models encoded key articulatory features such as place of articulation, manner of articulation and voicing.
The study revealed clear differences in how the two ASR models handle disrupted speech. The findings demonstrate that although modern ASR systems show impressive robustness, their internal processing differs in important ways from human listeners. The full paper is available online.