Securing voice-based transactions using OAuth2.0 and multimodality

Dec 1, 2022 | Blog

Share this post


Voice call automation for customer support using conversational AI is slowly becoming the new normal across multiple industries. While AI and Machine Learning technologies are the primary building blocks of voice call automation solutions, caller authentication is one of the essential associated technology components. Caller authentication and security are indispensable, especially in the insurance and financial services industries.

Let’s consider a use case where a caller tries to add their spouse to their registered account. Imagine if a virtual assistant could register the account seamlessly during the call. While it is a unique experience for the caller, securing the transaction and avoiding any damage to the customer’s account while on the call or during other processes is essential. The caller’s transaction can be made secure while ensuring a great customer experience by blending the concepts of conversational AI, token-based authentication, and multimodality.

In this blog, we will see how to use the following technology and set up a call flow that helps the above-mentioned registered caller securely add their spouse to their account:

  • Amazon Cognito -for OAuth2.0 implementation,
  • Twilio SMS service -for multimodal experience
  • Amazon Lex for building conversational experience
  • Amazon Connect -as a contact center application

Solution overview

The solution’s primary component is how the user identification information (like the Social Security Number and/or secret pin) is used to generate a secure temporary token. This token is used to perform a specific action like that of adding a user to an account. The ancillary components of the solution include capturing the user information conversationally and communicating with the user via text-based modality in real-time. Let’s look at each solution component in detail in the order they appear in a real-time conversation.

Capturing user information conversationally.

The first component of the solution is to capture user information conversationally. For this, Amazon Connect and Amazon Lex help build the conversational experience. Amazon Connect helps generate a phone number for the demo and integration with the Amazon Lex bot. The bot built using Amazon Lex can be designed to be simple with the ability to capture the caller’s details. Let’s say we need the user id and secret pin to validate the user. The bot should prompt for these values and pass them on to the next layer for processing.

Generating a secure temporary token

The user information collected by the bot is sent to Amazon Lambda for processing. This is where the primary logic to secure the transaction is built. Amazon Lambda sends the user information to an Identity Provider (IDP) requesting a secure token. Amazon Cognito acts as an IDP in this example. Other IDPs that can be used here are Microsoft Active Directory, Google IDP, etc. The IDP responds with a temporary token that can be used to access the application resources. The application is where registered users can perform transactions post login on the web or a mobile app. The IDP can also authorize the token to perform only specific actions.

In our example, the token can only add new users and nothing more. The temporary nature of the token will avoid misuse of the token at a later point in the future. Scope restrictions and the temporary nature of the token make this approach highly secure. The code to generate a token from Amazon Cognito is given below.

// Cognito token code

def get_secret_hash(username, client_secret, client_id):
key = bytes(client_secret, ‘utf-8’)
message = bytes(f'{username}{client_id}’, ‘utf-8’)
return base64.b64encode(, message,

response = client.initiate_auth(
‘USERNAME’: username,
‘PASSWORD’: password,
token = response[‘AuthenticationResult’][‘IdToken’]

Once the token is obtained, we move forward towards passing the link with the token to the caller as a text message.

Perform transaction real-time on alternate modality

The token is used to construct the correct URL to reach the add user page. The link is sent to the call’s phone using the Twilio SMS service. The code to send an SMS to a registered user on Twilio is given below.

// Twilio sms code

from import Client
account_sid = ‘**’
auth_token = ‘***’
client = Client(account_sid, auth_token)

The caller receives the text message. While still on call, the caller can click on the link, which opens a screen allowing the caller to add a user to the account. The caller performs the user addition by providing the new user details. Job done!

Solution Architecture

Let’s review the overall architecture for the solution (see the following diagram).

  • We use an Amazon Lex bot to capture the login details.
  • Login details are sent from lex to lambda for processing.
  • We use Lambda to simulate access to backend systems and perform the authentication function.
  • We will use the Cognito user pool to authenticate the user.
  • After the caller is authenticated, the caller will receive a text message on her mobile with a link containing the authentication token that is generated by Cognito.
  • Assuming the user is authenticated the link directly opens the caller’s account without further authentication.
A template that creates an AWS CloudFormation stack is included for you containing all of these AWS resources, as well as the required AWS Identity and Access Management (IAM) roles. With these resources in place, you can try out the solution for voice authentication on the Amazon Connect channel.


You should confirm the following pre-requisites before deploying the solution:

  • An AWS account
  • Access to the following AWS services
    • Amazon Lex to create bots
    • Lambda for the business logic functions
    • Cognito User pool
    • IAM with access to create policies and roles
    • AWS CloudFormation to run the stack
    • API Gateway
    • S3
  • An existing Amazon Connect instance
  • Twilio setup for SMS configuration
    • Account SID
    • Auth token
    • Messaging service SID

Deploy the Solution

To deploy this solution, complete the following steps:

1. Choose Launch Stack to launch an AWS CloudFormation stack in the Region of your choice Launch Stack

2. For Stack name, enter a name for your stack. This post uses the name: voice-auth-stack.

3. Provide the Parameters for Twilio.

4. Next, provide the Parameters for Frontend, Backend, Authentication and Lex bot parameters.

5. Lastly, review the IAM resource creation and click on ‘Create Stack’. After a few minutes, your stack should be complete. The core resources are listed below:

  • Amazon Lex bot:
  • Lambda functions
  • API Gateway
  • Cognito User pool
  • IAM roles
  • Cloudfront
  • S3

6. Navigate to the Amazon Connect dashboard and click on the ‘Phone numbers’ tab. Next, you will associate a phone number with the card services contact flow. Once the phone number is associated, the solution is ready to be tested.

Test the solution

You can call in to the Amazon Connect phone number and interact with the bot. As you engage with the bot and provide your login credentials you will receive a text message on your mobile with a secure link to perform your transaction.

Contact center flows

You can deploy the pre-built solution as part of Amazon Connect contact flows. When customers call your contact center, the contact flow to which they are sent is the one assigned to the telephone number that they called. The contact flow uses a customer input block to invoke an Amazon Lex bot. The following diagram shows the voice login contact flow in Amazon Connect:


To avoid incurring any charges in the future, delete all the resources created.

  1. Amazon Lex bot
  2. Lambda functions
  3. API Gateway
  4. Cognito user pool
  5. IAM roles
  6. CloudFront
  7. S3
  8. Amazon Connect Contact flow


In this blog, we reviewed a voice login solution that provides secure access and a seamless experience to the caller. The cloud formation template provides a ready-to-deploy basic setup on AWS Cloud. You can easily extend the solution with additional conversation flows and custom logic that are specific to your organization’s needs.

About the Authors

Krishna Teja Kommineni (KT): KT is an experienced Software Engineer with a demonstrated history of working for He has built enterprise-grade conversational AI solutions that are actively serving customers. In his free time, his favorite thing is to go on a long drives.

SMd Muzammil: SMd Muzammil is an Associate Software Engineer on the team. He works with a passionate team of AI engineers building the next generation conversational AI interfaces. He spends his free time mostly exploring new places around the world.

Similar blog post