Blog Home

Are Your Call Recordings Up-to-Snuff for AI-fueled Speech Analytics?


The Team at CallMiner

April 03, 2019

Future technology interface. Working with AI and businessmen use modern smart phone connect data to communicate around the world through the network
Future technology interface. Working with AI and businessmen use modern smart phone connect data to communicate around the world through the network

We are well within the “Age of the Customer”.  The first step toward competing successfully in this era is capturing the voice of the customer. Many organizations are sitting on a gold mine of customer intelligence which they have already captured – their contact center call recordings. Unfortunately, call recording wasn’t designed for the purpose of analytics and insight. The design of many recording platforms pre-dated the rise of AI-fueled speech analytics, which automatically transcribes and analyzes your call recordings. Call recording was initially designed for the purpose of storage and archiving, often to meet specific regulatory needs. As a result, storage efficiency was prioritized over recording quality – recordings are often highly compressed, degrading the audio quality. The agent and customer speaker tracks are commonly compressed into a single mono audio track.

Speech recognition accuracy is impacted in the same manner as our own human understanding and comprehension when applied to call recording. From a commonsense perspective, would you rather listen to a transistor radio or a high-fidelity audio system? Speech recognizers share that same preference – the better the quality the audio, the better the speech recognizer can identify what is being said.

As humans, we experience the same challenges understanding someone when there’s noise or distortion on the line, interference from background noise, the signal is compressed, or the person is mumbling and not articulating their words. I was once tasked to work with a client who was having challenges with their speech analytics program. I received a handful of call recordings to listen to – the audio quality was so poor, the agent and customer conversation was for the most part unintelligible, even by a human. It was no surprise they were struggling with their speech program!

There are several ways to avoid these issues including effectively designed work environments, high-quality noise canceling head sets, and coaching agents to speak at a reasonable pace and articulate their words. But one of the most effective changes you can make is ensuring you are capturing high quality audio in your recordings – the ideal format is uncompressed PCM WAV files at 128 kbps or higher. The following chart compares the impact on accuracy loss for various audio file formats and bit rates in comparison to that ideal file format.

Image courtesy of CallMiner

This other figure shows a comparison of PCM Wav to MP3. MP3 on the right loses frequencies above 3khz, but speech recognizers (phonetic or full transcription) require these frequencies to accurately differentiate phonemes that make up the sounds in speech. Image courtesy of CallMiner

Combining the two speaker channels – agent and customer – into a single mono signal, does not necessarily have direct impact on the accuracy of the recognition, except for periods of over-talk where both speakers are speaking at the same time. However, conducting analysis on mono call recordings does require more effort to extract insights. Analysts looking to specifically measure agent vs. customer behaviors or activities in such instances where only mono recordings are available, need to properly pattern for the way an agent might express a behavior differently than a client.

A very rudimentary example of this could be a customer driven escalation “let me speak to your supervisor” vs. an agent mention “let me check with my manager”. This takes more time, thought and effort to accurately identify. Speech analytics still drives substantial business value when conducted on mono audio – in fact 46% of CallMiner’s customers have mono call recordings, while 41% have stereo and 13% have a mix of stereo and mono. This is likely a reflection of the general market in call recording. However, stereo recording is becoming more common with the rise of analytics and cloud contact center growth where the signal is not necessarily going through multiple switches to get to its final recording destination.

A final potential challenge with call recording systems of yesteryears is that some vendors don’t feel you have a right to access to your highly valuable voice of customer asset. The very conversations you have with your customers that you have paid to capture through your investment in call recording are held hostage for a hefty ransom by some vendors. Fees are charged to allow you to extract that audio for analytics or other purposes. In this age of the customer and the era of “big data” be sure to check your vendor contracts closely to ensure you own and have access to the data that you are capturing. Don’t be a data hostage!

High quality, stereo conversations can be captured independently and in parallel to your existing recording systems if none of the above options are feasible, or especially if you are stuck with a vendor who is holding your voice of customer data hostage. This is often referred to as Call Capture or Recording for Analytics. Such a solution captures the audio in a manner that supports accurate speech recognition and becomes the foundation for a highly successful speech analytics program with little to no impact to your existing infrastructure.

So what should you do if you are already invested in a call recording system that you cannot replace immediately? Not to worry, you do have options. Check to see if your recording solution has settings to allow for capturing higher quality recordings, and/or the ability to retain the agent and customer speaker tracks in separate channels. In some case this interim format can be made available for your extraction even if a lower quality file is ultimately stored for archiving purposes. Explore using software to separate your mono recordings into separate speaker tracks using voice biometrics. Software-based speaker separation or “diarization” can be a great second-best option if you can’t get native stereo audio. This software listens to the signaland identifies who the agent is and who the caller is by discerning voice tones. While it’s not 100% accurate, our customers have found that there are definite downstream gains in analytical productivity that justify the investment and any misses on accuracy.

  • Gain control of your call recording assets
  • Capture superior quality audio from you customer interactions.
  • Identify customer and agent interaction insight from every call to your contact center

If you are an existing CallMiner Eureka user, exploring speech analytics, or concerned about access to high quality recordings of your customer conversations contact CallMiner today to discuss how Eureka Capture can help you:

Artificial Intelligence Contact Center Operations Speech & Conversation Analytics North America EMEA APAC