Posted: 18/06/18
Professor Naomi Harte, Associate Professor in Digital Media Systems, Electronic & Electrical Engineering at Trinity College Dublin, and the Trinity based ADAPT Centre, became the only Irish recipient of the prestigious 2017 Google Faculty Research Award in Ireland.
Google presents this highly competitive research grant to world-class faculty at the forefront of computer science, engineering, and related fields. According to the dedicated website, on average only 15 percent of applicants are selected for funding.
Professor Harte, a specialist in Human Speech Communication, received the award in the Speech category for her proposal to revise the current speech evaluation methodologies of Text-To-Speech (TTS) generated synthetic voice quality.
TTS systems are utilized in a variety of applications, including personal assistant apps and navigation tools, to generate a speech signal or a vocal conversion of a given text. According to Professor Harte, two main methodologies to create the speech signal have evolved in the field: the unit selection paradigm, and the parametric speech synthesis paradigm.
Harte remarks that regardless of the paradigm adopted, appropriate evaluation methodologies have failed to evolve in parallel to the progress made in TTS speech output, leaving speech synthesis researchers uncertain of how to evaluate the quality of the synthetic voice.
Speaking about the award, Professor Harte said, “10 years ago, this evaluation was more focused on how well you could understand the voice – the intelligibility issue. Now all these synthetic voices are highly intelligible, but what separates them is how human-like, how expressive, how friendly, or how trustworthy people can perceive them to be.”
Currently, there is no common agreed strategy to assess the quality of a TTS system speech output and unsustainable reliance on human listener tests.
“We need new ways to automatically assess how “good” or “convincing” a synthetic voice is,” Harte said.
Her project aims to create new methods to evaluate synthetic speech in a manner that better reflects how people may perceive that voice without the need for large-scale human listener tests. Her team will explore ways to model a voice that allows researchers to investigate the different impressions it might give to listeners, such as its likeability, trustworthiness, expressiveness, or just how human it sounds.
“This helps move current evaluation approaches beyond measures of intelligibility, and will support the faster development of voices that match applications,” Harte said.
Share this article: