What are some best practices for using voice authentication?

I am researching best practices to incorporate voice ID (voice biometrics/authentication) technology into a call center. Curious if you had any insights, best practices, etc. relating to the topic from a user experience standpoint?

Specifically, I am curious about the best way to design for enrollment into voice ID. As I understand it there are active and passive approaches where the caller would knowingly accept enrollment and have their voice recorded for a given amount of time (maybe specific words or just talk) or potentially the system could use the recorded portion of the conversation prior to enrollment.

- Jeremy Chu


I’m glad you asked this question, because a lot of the time, we’re only thinking about the technology, and not what’s best for the user.

Voice authentication (also known as voice verification, and voice identification) is a very useful way to know who’s talking to your system, and often easier on users themselves, since they don’t have to remember a password.

When done correctly, voice authentication can reach very high accuracy. That being said, it can be difficult to implement, and is not 100% accurate (as no forms of authentication are). Some things like "liveness tests” can help with this, where the user is asked for a dynamic phrase, rather than just the ones in the enrollment phase.

It’s also important to think about the use case. For something like a banking phone system, in which someone might be transferring large sums of money, highly accurate authentication is necessary. For something simpler, like which person in the family is asking for their recipe favorites on a smart display, voice identification (which person is speaking?) can be good enough.

But back to the actual question! If you do chose to use voice authentication in your system, such as an IVR (automated phone system), what’s the best way to enroll?

You mentioned both active enrollment (where the system explicitly asks the user to enroll their voice) as well as the passive method, which relies on previous interactions to create the model.

In my opinion, active enrollment is almost always the way to go. People are (rightly) protective of their voice, and many will be upset if they find out previous recordings were used to construct a model, even if it’s less effort and turns out to be helpful. Often, our voice feels very private in a way a regular passcode does not.

Therefore, I recommend explicitly asking the user if they’d like to enroll their voice, with a (very quick) explanation of how it benefits them. (And not just a marketing upsell. Will it really make things better for them in some way?) Let them know how long it will take, and then lead them through the process in a straightforward way.

Although the user will need to record some sample phrases, you can still keep it conversational and light. As with any conversation design task, write sample dialogs and practice them out loud with other people. The awkward points will quickly reveal themselves.

Personally, I think voice authentication can be really useful for things like replacing a password when I call my bank, and voice identification is great if I’m asking my voice assistant “Where’s my phone?” so it knows it’s me asking, not my husband.

Tread carefully, do a lot of testing, and make sure users have control of their data. Be transparent about how their voice will be used, and allow them to delete their authentication model if they want to. As always, keep the user in mind, and build your tech to support them.

Cathy Pearl2 Comments