Conversational UI: Design Principles
Conversational user interface (CUI) design is probably one of the hottest topics in the design world these days. Unlike typical GUI (graphical user interface), the experience of CUI might or might not involve any visual prompt. Take Alexa for example, all you have is the Amazon Echo device (or other compatible device) which responds to you with voice, and very little visual clues like the light color change on the device. So how should UX designers rethink our roles in a world that’s lack of visual clues?
Here I listed out the 10 If we review the typical heuristic evaluation criteria from Nielsen Norman group, in which we use quite often to evaluate a graphical user interface (e.g. website, mobile app):
- Visibility of system status
- Match between system and the real world
- User control and freedom
- Consistency and standards
- Error prevention
- Recognition rather than recall
- Flexibility and efficiency of use
- Aesthetic and minimalist design
- Help users recognize, diagnose, and recover from errors
- Help and documentation
The criteria above still stand for a conversational user interface. We do want the interface to be somewhat similar to how we typically understand the world, and there should be some consistency so it’s easily learnable. Also, we want the system and interface to be smart enough to prevent some common error, and when the error happens, we need the interface to help us out. Seems essential, isn’t it? However, certain criteria might be hard to meet in its full potential due to the limitation of verbal communication. For example, comparing to a dashboard page on a website where users can see multiple charts at the same time, the linear nature of conversation limits itself to provide information in a time sequence with spoken language. In short, you can’t say five sentences at the same time.
There is some fundamental advantage of adopting CUI, but that also comes with limitation of it. Therefore, I’d like to share with you a list of Design Principles which I found helpful when designing a CUI.
Say it like you mean it
Let’s start with the fun part, how do you make sure it’s a enjoyable experience with your CUI? There are some high level principles:
- Personality: It’s important to bring a delightful experience to users ears. When we think about the expert in talking without being seen by us, radio host in general do a pretty decent job. At what point the host needs to stimulate positive emotion, and a what point she should be just quietly listening to the call-in. It’s more of an art then science!
- Keep it brief: A chatty CUI might be good for a relationship-focused application, however if you’re trying too hard, you might easily find users’ frustration all over
- Show me the options (money): Especially if the CUI is treated as an agentive technology with task-focused functionally, not only we want to avoid chatty interface, but also we want to be transparent about the options/capability so that it’s clear to the users what she should expect from the interface. Pro tip: don’t provide more than 3 options, it’s proven in psychology that humans short term memory is not that powerful.
- Examples are better than instructions: When applicable, provide actual examples to replace verbose instructions, people tend to get that faster.
- Practice makes perfect: In general there are two main dominant way to design the utterance for a CUI. For one is the Google way, in which the team will make assumptions and build prototype and iterate on that. For the other is the Amazon way, in which, to my understanding, is inspired by Turing Test, where the developer will sit in another room across the users’ room, and fake the computer response in order to converse with the users. Therefore they can quickly capture users reaction if the utterance work for the users or not. No matter which way you choose, it’s essential to keep the iteration going in order to achieve acceptable quality of conversation.
Devil’s in the detail
Now, the biggest challenge of CUI design usually falls into the error state and edge cases. To make it worse, it happens quite often. Some examples are listed below:
- Dis-fluency: People do self correct, and they change their mind quite often. For instance, you might hear users say: “Give me two– no three sodas”, or “Tell me– oh never mind”. This is a bit hard for the certain system to process without a careful system design. As a designer, the way we could solve this is by constructing dialog with mixed-initiative (e.g. collecting two pieces of information using a single prompt), combined with directed forms. This makes the dialogue more concise without many steps and therefore user might feel less cumbersome and close to natural human conversation. Pro tip: Avoid addressing disfluency through the structure design of grammar, for it might result in big increase of the grammar complexity therefore the recognition process slows down.
- Error amplification: Not only people correct themselves, but also they pause in the middle of sentence, in which the system might recognize it as the end of sentence. Imagine a scenario as below:
User: “I’d like to…” user pauses
System: “I don’t under…”
User interrupt the system: “I’d like to buy…um…” user pauses again
System: “I don’t und…”
User interrupt the system again: “I’d like to buy forty…” and user pauses again
System: “What do you…”
User interrupt the system again: “I’d like to buy forty cups of…”
For scenario above, designer can come up a smart system which includes some SAFE points, e.g. high-level menu, so that user and computer can resynchronize turn-taking and get a fresh start.
3. Exit strategy: Just like a phone conversation, you probably have the experience that the person on the other end of the call hung up the call for no good reason. Similar to that, it’s hard for us to predict why and when users would leave the conversational process on the interface. Therefore, an exit strategy is important. Whether is a back button for users to go back to previous stage, or a quick access to the option menu, these are extremely important when designing CUI.
4. Context is king: Without a great understanding of the context of what users are thinking about, sometimes it’s hard to fully comprehend what users are referring to. For example, the phrase Forty Time could be interpreted as For Tea Time (A British user?), or For Tee Time (golfers probably get it this way). It’s important to structure the conversation so that the system can understand the context as early as possible. Don’t be shy to ask question to the users, it eliminate a lot of undesired situations.
What’s your design principle for CUI design? I’m happy to learn from you.
P.S. Some of the principles are coined from the book [Spoken Dialogue Systems] By Kristina Jokinen, Michael McTear. Great read.
Photo credit: Jason Rosewell