Opinion piece by Professor Benjamin Cowan and Justin Edwards
We have seen huge growth in the use of voice assistants like Amazon Alexa and Google Home in the past decade. But these devices have a major flaw (beyond their inability to recognise Irish accents). To engage with these devices, we have to use a wake word- a command that lets the agent know that we are looking to start conversing. After this, the types of interactions we have with them are limited to a few turns of dialogue and a request being fulfilled. But what if these agents could start conversations with us? This type of proactive agent leads to a wealth of opportunities, from an agent being able to collaborate with you and your team in a meeting, to being able to inform you about the status of your automated drive and seamless transition to asking you about the in-car entertainment. What would this be like? And what do we need to do to stop this being a voice version of the infamous Clippy?
Our research focuses on the need to get initiation of these proactive agent interactions right, so that they minimise distraction and user annoyance. Our recent work takes inspiration from two major concepts in social science. The first is from cognitive science and focuses on identifying the best time to interrupt a task. Research tells us that there are opportune moments to interrupt, termed breakpoints, that make the interruption less distracting. These often occur naturally when you finish a part of a task such as when you have just finished reading this sentence. Interrupting people at these breakpoints is thought to make it easier for them to return to what they were doing, and thus is more likely to suit the person who was interrupted. For agents to be more proactive, they need to know where these breakpoints are in our everyday routines and how to identify them.
Secondly we take inspiration from human dialogue interaction, investigating how people interrupt other busy people. Imagine you are in a car and you are looking to get the attention of the driver. How would you do it? What would you say? How would you say it? If what you had to say was urgent, what would you do differently? Questions like these can inform how we should design agents to do the same thing. In social science, the patterns of behaviour we use to engage others in conversation are called access rituals. These are the regular “hellos” or “do you have a moment?”, hand waves and facial expressions we use to initiate conversations throughout the day. Our research aims to better understand these behaviours so as to determine what access rituals might be appropriate for voice-based agents to use to proactively engage with us while we are doing another task.
In order to make a proactive agent something that a user would actually want to engage with, it is crucial that we ensure these agents know when to initiate but also how to initiate and what to say to get our attention without annoying us. Only then will we be able to shift towards agents being a proactive partner, rather than the sleepy helper on the desk that we need to wake up.