Six principles for designing voice interfaces

Voice interfaces are becoming more commonplace, but our expectations of them still outpace what’s available. Michael Levy, an interaction designer at Fjord, offers six tips for creating engaging voice UIs

Voice interfaces are becoming more commonplace. But they have a long way to go before their impact matches that of the mouse or the touch screen.

Right now, voice interfaces aren’t much more than a way to transfer information, replicating the basic commands and functions of other kinds of interfaces. We’re still in the novelty stage. We’re still getting to know the medium. Telling Alexa to turn on NPR without getting off the couch is a convenience, but the voice interaction itself is just a means to an end.

For voice interfaces to move past novelty, they’ll have to be engaging

For voice interfaces to move past novelty, they’ll have to be engaging. This isn’t a trivial task. It is a genuinely new way of interacting. We don’t yet know how to make a voice interface feel as engaging as playing Monument Valley, or diving deeper into the world of [puzzle game] Myst, because the standards and frameworks that allowed those interfaces to thrive haven’t been fully established for voice yet. However, they are starting to emerge.

This presents a great opportunity for designers, and working in this space requires a special kind of creativity – one that combines empathy with a solid understanding of how systems work.

At Fjord, we’ve developed six principles for creating engaging interfaces. Think of it as a guide to designing intelligent and intuitive systems – systems that won’t mistakenly order expensive dolls’ houses or have you yelling at your device in frustration.

1. Create a conversational bubble

The model for voice interfaces is the same model we use with each other: conversation. We’ve developed some rather high expectations around conversations and when our voice interfaces don’t live up to them, we’re immediately put off, even if we can’t explain exactly why.

One of the unacknowledged features of a conversation between two people is the small pool of shared memory that is built up between them. If you and your co-worker start talking about Carol in HR and a minute later you say “she” instead of using her name, the other person knows you’re still talking about Carol.

Right now, this is beyond most voice interfaces. Google is starting to build this into its voice assistant as it rolls out its Home products, but this kind of shared knowledge building is essential to feeling engaging. If a voice UI can build up and tap into that pool, then interactions will start to feel more like a conversation and less like a one-way information retrieval system.

2. Keep it simple

When a person reaches for something to accomplish a task, they’re looking for the path of least resistance or the easiest way to do something. But what seems easy in theory can be much more complex in practice.

Take the eateries in New York’s LaGuardia airport for example. In an effort to make ordering food easier, with fewer waiting staff, the airport installed tablets on each table with a built-in food-ordering app. While this makes some things easier for some people, it also makes some common tasks much more complex. Ordering a cup of coffee used to be accomplished by saying “One coffee, please.” Now it involves a technological middle-man.

In the right circumstances, a little bit of voice can convey a lot of information. But when the wrong kind of interface is shoved into a situation, it becomes the opposite of engaging and begins to feel like a chore. Voice can be great for certain moments, but context will dictate whether it becomes a help or a hassle.

Apple’s latest ad for voice assistant Siri stars Dwayne Johnson, aka The Rock

3. Guide users from A to B

Repeatable patterns of behaviour build familiarity into interactions. If that doesn’t sound particularly engaging, it’s because it’s not supposed to — it works best when you don’t notice it at all.

If we had to decipher the standard human greeting every time we saw a different person, we might not start too many conversations. The same goes for voice interfaces, and even though wake-words are now broadly familiar to most users, that’s about where it ends — every other piece of the interaction is up for grabs.

When Alexander Graham Bell was developing his newly invented telephone he wanted “Ahoy-hoy” to be the standard salutation for a call. It was, in essence, the wake word of its day. Even though this phrase never caught on, he knew that there had to be new standards set for such new modes of communication. People needed a sequential pattern they could follow, a way to reduce uncertainty around something new.

At the end of the day, it’s not the pattern that makes an interface engaging, it’s the ease of use — the not having to wonder how to use something every time you use it.

4. Know when to be seen, not heard

People have to be aware of the situation in which they’re about to use a voice interface. Contexts shift quickly and Siri often doesn’t understand the concept of “inside voices”. Amazon Alexa has also struggled with this, accidentally ordering dolls houses and adding words uttered in casual conversation to customers’ shopping lists.

An understanding of context – especially if interfaces are working in tandem with other modes of interaction – is key to creating an engaging voice UI. If you’ve ever sent someone in the same room a text message about someone else in that room while maintaining a spoken conversation, then you have some idea of how complex this can get. Right now, that’s something best navigated by human intuition – a voice interface would handle this less delicately.

Understanding what a person needs in a given situation offers a broader value than accomplishing any one task. The public nature of voice makes this a pressing concern – being nervous about Alexa embarrassing you isn’t very engaging, and might put you off using a system altogether.

Official ad for Google’s voice assistant, Google Home

5. Build empathy through personality

Our own voices are deeply tied to our personalities, so it’s little wonder this is something we look for into voice UIs.

When it comes to voice UI, personality is anything intentionally idiosyncratic that is used instead of a standard response (think of the long pauses in William Shatner’s line readings for Captain Kirk that became a core part of his persona). This personality is integral to building a system that people can empathise with.

An increased level of empathy is beneficial to both parties and it even leads users to be more forgiving of an interface’s mistakes. But if a personality is going to seem engaging, it can’t just seem like another entity, it has to seem like another entity that’s interested in you.

Imitation is still the highest form of flattery, even for machines, and a voice interface that learns your mannerisms and patterns of speech can start to tailor itself to sound like you. As the change happens over time, just like it would with a person, the user doesn’t see as many of the seams in the programming. Even if it cycles through “OK,” “Sure,” “Got it,” “You got it,” and “Can do,” instead of just saying “OK” every time, eventually, you’ll see that all it has is a limited set. But if it starts to mimic you, that set feels limited for a reason. The interface can still have a personality – but one that has some of you in it to establish an even deeper connection.

6. Open access to all

Voice interfaces don’t build themselves. They’re made by teams of people with their own histories, perspectives, biases, and blind spots. When those teams are as diverse as possible, these different perspectives complement each other – resulting in tools that work better for everyone.

For something to be engaging, it can’t just be for you. When a voice interface doesn’t work for a specific group of people (with certain accents, or certain speech impediments, for example), it usually doesn’t represent a technological limitation – it represents people prioritising certain groups over others. Exclusivity might be a draw for novelty, but a truly engaging voice interface is one that doesn’t feel like it’s keeping others out.

It’s crucial for designers to research outside their own perspective

The only way to achieve this is through inclusion and research, not just one or the other. It’s crucial for all designers to research outside their own perspective, and test their interfaces regularly to know its limitations, but nothing beats the perspective of a lived experience.

If voice interfaces to grow past the world of novelty, they need to feel engaging, intelligent and inclusive. This won’t happen on its own – it takes work and it takes the human-centric lens of design thinking. Nothing is going to stop the technical progress of shrinking hardware and higher audible sensitivity, but the emotional progress hinges on whether we can put people in the centre of this work. As interfaces grow and change and flex and evolve, the only thing that stays the same is what it’s all focused on: us.

Michael Levy is an interaction designer at design and innovation consultancy Fjord (see fjordnet.com). Fjord recently published an online guide to designing for voice – you can access it here.

More from CR

Lauren Greenfield’s Generation Wealth

Director and photographer Lauren Greenfield, who has created documentary-based works on themes ranging from rich kids in LA to eating disorders, as well as the enormously successful ad Like A Girl, tackles our global obsession with money and materialism in a new book, Generation Wealth. Unsurprisingly perhaps, she paints a rather dark picture of life in the 21st century.

Graphic Designer

Fushi Wellbeing

Creative Designer

Monddi Design Agency