Deploying an AI Voice Sales Agent using ElevenLabs in a few hours
Includes cost breakdowns, CRM integration tips, and a live demo at the end
Welcome all 5807 subscribers to another edition of The Dispatch. In this edition, we're diving into AI voice sales agents. I'll walk you through my proof-of-concept (POC) implementation using ElevenLabs, compare it with OpenAI's conversational AI, and share a live demo at the end.
Before we begin, if you're a reader of the newsletter just hit the like button above, or reply to this email or comment so I know this is reaching you and you're still interested in this content. It would really help me understand my audience better. Let’s get started!
When ElevenLabs launched their AI voice agent in December, I experimented with POCs for both a travel planning helper and a sales agent. However, the initial pricing was prohibitive, leading me to put these experiments on hold.
But now, ElevenLabs reduced their costs by 50%, bringing the price down to $0.08 per minute—a much more affordable range. I've since revisited the project, and I'm excited to share my implementation experience and demo with you.
What ElevenLabs and other similar voice conversational agents allow you to deploy an AI voice assistant to interact with your customers or end users through a bunch of steps. First you define a system prompt for the interaction, allow you to associate your contextual docs, decide on a voice from options, connect tools for retrieval and storage of data from the call, and configure use cases when they are called — inbound or outbound. Let’s expound on these elements.
System Prompt Design
Drawing from my understanding of sales conversations and business context, I developed a comprehensive system prompt that covers:
Conversational tone and flow
Pricing and discount negotiation strategies
Objection handling frameworks
Data collection requirements
System limitations and safety checks
If you want to see the full prompt, get in touch with me on X.
RAG and documents
For this straightforward use case, I opted to include all necessary information directly in the system prompt rather than implementing a full RAG (Retrieval-Augmented Generation) system. This approach helped minimize latency while maintaining functionality. However, for more complex use cases requiring extensive domain knowledge, implementing a proper document-based knowledge base would be recommended.
Voice localization features
ElevenLabs offers an extensive collection of localized voices spanning multiple regions (including UK, Australian, and Indian English) alongside standard US English. This localization capability is crucial for creating authentic customer interactions and represents a key differentiator for ElevenLabs. For companies implementing AI voice agents, voice localization should be considered a critical feature for market-specific deployments.
Tool Usage
ElevenLabs allows you to integrate function calling.
In my experience, it’s not foolproof, so I am still figuring out the right way. I am using Gemini Flash 2.0 which is recommended by ElevenLabs and I was generally getting good results in terms of communication.
I also created a tool that before the end of the call called a webhook which transcribed, used the transcript to pass the information and categorize it into different fields like name, email, and mobile.
Output
Putting it all together, I was able to deploy a widget on the website to help leads hold a natural conversation with the AI in a local accent with a structured conversation flow, objection handling, pricing negotiations, data collection, lead scoring, and data parsing and saving to CRM through a webhook.
The lead scoring is done basis what the lead has said in terms of urgency or being okay with the pricing and their enthusiasm for going ahead. This is helpful in terms of human follow-ups.
I could add another element of sending an automated payment link through either email or WhatsApp and then complete the loop but at the moment I'm just keeping it manual so that any errors can be rectified before that gets sent. But yeah this is very promising. So far one or two customers have spoken to the AI and have not either realized or implied with it.
Here is the widget on the website.
This is the summary of the conversation with the user.
And here is the webhook sent to the team in Slack with the details.
Finally, here’s the demo that I teased at the beginning of the article.
Comparison with OpenAI’s conversational agent
I also tried out the same use case with OpenAI’s conversational agent. Here is what I thought:
Advantages:
Reduced latency through Omni model architecture (text and voice in one model - no need of a separate transcribing step)
Superior handling of conversation interruptions
GPT-4 provides reliable calculation accuracy (though at higher cost)
Limitations:
GPT-4-mini exhibits calculation inaccuracies for discount computations
Limited voice options with no international accent support
Higher operational costs at scale
Implications
Currently, ElevenLabs is priced at $0.08 per minute or around ₹30-40 for a 5-minute call. OpenAI’s conversational agent is slightly more expensive at $0.16 per minute.
But I expect these prices to come down over time, and AI voice agents to become more sophisticated with lower latency to transform a ton of call center operations over time. AIs can work 24x7 with more script adherence, no emotional involvement, and have better customer outcomes.
This tech can replace existing use cases, as well as enable a lot of newer use cases that are programmatic and earlier were not possible for whatever reason.
These calls are an order of magnitude better than current robocalls that are just recorded intrusive messages.
Again, here’s the demo of the AI sales call.
That’s it for today!
In case you or someone you know would like to have a paid consulting call with me about product or AI, get in touch with me by replying to this email or emailing me at kavir @ hey dot com. You can check out how I can be useful here. Alright then, talk soon!
Smash that like button, don’t be shy!
Great breakdown Kavir. The cost reduction by ElevenLabs really makes AI voice agents more accessible for smaller businesses or side projects.How did the localized voice accents impact customer perception? Did you notice any difference in trust or engagement levels between US English and localized versions?