Naomi Harte, Professor in Speech Technology in the School of Engineering in Trinity College Dublin and Co-PI and a founding member of the ADAPT SFI Centre in Ireland, delivered a talk to the Samsung AI Centre in Cambridge this week titled ‘Multimodal Speech – embracing a wider view of speech’.
This talk considered the multimodal nature of speech and speech technology. Human speech communication is extremely rich. People use many elements to communicate, from words to gestures and eye gaze, and seamlessly interpret these many cues in conversations. But how can this be exploited in technology? In her talk, Professor Harte looked at how visual and linguistic information can be integrated into deep learning frameworks for audio-visual speech recognition and turn taking prediction. She also explored how conversational interaction online can be challenging due to disruptions to the cues we usually rely on, and considered whether multimodal approaches can help in such situations.
The Samsung AI Centre, Cambridge, forms one of seven Samsung AI centres around the world. The Cambridge AI Centre performs world-class blue-sky AI research with an open and collaborative approach to science. The centre publishes its results in top scientific venues and releases open source code and datasets to facilitate engagement with the wider academic community. Current scientific interests of the Cambridge AI Center are diverse and include the following research areas: video understanding, AutoML, action recognition, neuro-symbolic models, meta-learning, domain adaptation, on-device AI, unsupervised and self-supervised learning, efficient reasoning, speech recognition and audio modelling, and federated learning.