Microsoft have formally announced the release of their VoiceBot as part of their low code/no code Power Platform at Ignite 2021. For me, this is very exciting although, conceptually, it is quite simple. Microsoft’s configurable text-based chatbot, in the form of Power Virtual Agents, has been around for a while now and this new VoiceBot takes that foundation and links it to Azure Cognitive Services to convert text to speech and vice versa. Behind the scenes it is a remarkable achievement. We literally have a configurable bot capable of generating and processing natural speech, and which can interact in real time with a human.
So what does this technological achievement mean and why do I claim it is the death of the call center?
Why We All Hate Call Centers
No one gets excited about talking on the phone to a call center. There is rarely anything pleasant about the experience. Let us consider a typical scenario for a customer.
Finding the Phone Number
We look up the call center number for the organization we are trying to reach. Often the number is purposely buried in some dark corner of the company’s web site because they want the customer to choose practically any other channel to answer their enquiry. Why? Because speaking to a human on the phone is very expensive.
Forrester looked at the cost of different channels over ten years ago and found call center support cost $6-12 dollar per contact compared to, say, web self-service which cost literally 100 times less. No wonder the companies make it difficult to call.

Wading Through the IVR Swamp
The above table gives us an indication of one of the reason we are immediately encouraged to press keys before speaking. If a customer’s question can be answered through IVR/DTMF touchtones and automated recordings, we are still at least 20 times cheaper than speaking to a human. Another reason for IVR use is the company wants to filter out the high-value customers from the low-value ones. For example, if you are looking to make a purchase, the company will not want to keep you waiting compared to, say, a support call where the customer is a captive audience. Whatever the reason, assuming our call needs that human touch, we need to get through the IVR obstacle course before we are permitted to speak to a human.
On Hold, But Your Call is Important to Us
Next comes the wait and that music. The smarter companies offer a call-back service but, certainly in my experience, these are the exception, not the rule. The lazy ones, again, encourage you to abandon all hope and return to the web site for answers. Why the wait? To limit costs, a balance is struck between how long a customer is predicted to tolerate waiting and how many call center staff the company is willing to spend money on; the more staff, the higher the salary costs and the more expensive the service.
The Language Barrier and a Lack of Localization
To reduce costs, many companies outsource their call center overseas. This means, almost by definition, the call center will be populated with people for whom English is a second language. The language aspect may cause difficulty although, in my experience, things have improved on this front a lot over the years; the days of having to spell “Leon” are mostly behind me unless it is a company that has really gone cheap on their customer service.
Another consequence, which is harder to overcome, is a lack of localization. This means clarification questions are asked which are unnecessary for a local call center. For example, pretty much every Australian knows where the “Gold Coast” is but an overseas operator may still ask for the state.
Assuming these aspects are overcome, the result should be that the customer’s issue is resolved hopefully on the first attempt.
How the VoiceBot Helps
With a VoiceBot, almost all of these pain points are removed. We see from Forrester’s table that, over ten years ago, a virtual agent was ten times cheaper than a human. I am not sure what constituted a virtual agent in 2009 (primitive chatbot perhaps?) but a coded chatbot would not have been cheap so I think it is reasonable to expect a configurable Power Virtual Agent will be the same cost or cheaper and therefore a compelling economic alternative to a human call center agent.
Cheaper means many of the concessions made above, at the expense of the customer experience, are no longer necessary.
First of all, we do not need to hide the phone number and encourage other channels. The phone number can be as prominent as the web site text-based chatbot which, in all likelihood, will be running on the same configured engine as our new VoiceBot. The customer could use the chatbot and get the same results but the decision is the customer’s, as it should be.
The IVR swamp can be drained. Microsoft’s chatbot is IVR-aware but I think this will become less relevant when the customer can simply say what can be typed and be perfectly understood.
The waiting and listening to Muzak will also disappear because scaling an army of VoiceBots is a lot more affordable than running a call center populated by humans.
Issues of language and localization should also diminish as VoiceBots become more sophisticated. While in the early days of voice recognition my (mostly) Australian accent proved troublesome, it is not the case today and localization, in many cases, will be a Google/Bing search away. Alexa, as an example of how things have progressed, is now conversant in Australian slang.
No Longer a Need for Humans?
Of course, as anyone with a bot in their home will tell you, VoiceBots are unlikely to be perfect for quite a while and, where the VoiceBot ends there will still need to be a human waiting. However, setting up a call back service will be trivial and, given Power Virtual Agents can be retrained on previous encounters to improve intent recognition, I believe the need for humans will significantly diminish.
If we think of a traditional technical support setup, Level 1 support (agents with limited technical knowledge following scripts) will disappear the quickest. Any script a human can follow, a bot can as well. Level 2 requires in-depth knowledge of the product which is a product manual away. While a human struggles to sort through large volumes of text quickly, this is trivial for a bot. So, as I see it, in the short term, Level 1 and some fraction of Level 2 will be the easiest to replace, significantly reducing the call center headcount and potentially bringing many call centers back onshore, populated with a handful of deep technical experts.
The Ultimate Outcome
The biggest win though is the choice of channel is put back in the hands of the customer, rather than being dictated and compromised by economic considerations. If a customer chooses not to engage with the VoiceBot, they can request an escalation to a human straight away, although I think this will become less frequent as people learn that the bots can solve an increasing range of issues. The customer regains power to control their experience and the company is not compromised in offering it. Both the company and the customer wins.