Blog Home

3 Potential Pitfalls of DIY Speech Analytics


Richard Britt

May 15, 2019

business documents on office table with smart phone and digital tablet and graph finance with social network diagram and man working in the background
business documents on office table with smart phone and digital tablet and graph finance with social network diagram and man working in the background

More and more organizations are looking to build in-house data science or AI teams to use emerging technology and techniques to harness the power of their data.  With the growth of these internal data science teams, many companies are looking to gain greater control of all aspects their data programs to be more nimble and effective.  If done correctly, this also provides more opportunities for creativity and experimentation with internal and external data. We scientists can bring a new level of insight to organizations. Turns out having scientists around is kind of cool.  Please, don’t let the dazzle of our NASA space camp t-shirts, utter domination in quoting dystopian novels, and late-night sci-fi board game parties fool you. Organizations should be cautious when contemplating taking on data science projects that are not core to their business.

It is important to understand who we are.  Over the past few years we have seen an increased desire for organizations to deploy in-house data scientists to projects into areas outside of their core competence.  There is an allure here. Building new metadata, customizable intelligence like name, area, product, internally on a company’s highly valuable data can provide new intellectual property and possibly competitive advantage.  With all this upside what’s stopping you?

Before tackling any DIY speech analytics data project, organizations should endeavor to know the full scope of the project. When it comes to complicated data projects like a speech analytics program, many organizations don’t fully realize the complexity until they are highly invested, and often are left footing a large bill with a suboptimal outcome.

Let’s look at why it’s attractive on the surface to undertake DIY speech analytics and the pitfalls companies we work with have encountered when they have tried.

Natural Language Processing (NLP) is cool

I am fortunate to work with a team of scientists in the cutting-edge field of AI, and very fortunate to do so for a leading company.  I can tell you Conversational AI and NLP are some of the coolest and most avant garde AI research fields out there.  I am talking self-driving car AI cool.  CallMiner’s If we forget resources and cost for a moment, the advancements in NLP and commoditization of speech recognition transcribers, married with the power of current deep learning technologies, make this a lower cost of entry to a cutting-edge research field.  As a data scientist, it is cost-efficient to try, easy to get prioritized internally, seemingly logical to a business, and cool AF. That’s a lot of wins.

So, what is the down side?

Recently, a client’s data scientist ran a recorded call with their significant other through a free online transcriber. The recognition accuracy was in the low 90%’s ergo the transcript was very clean.  The client said their data scientist was sure he could build a better speech analytics system, at a lower cost than what we have been perfecting over the last 15 years.

We have deeply analyzed what it takes to arrive at parity with our impressive technology. Let’s walk through a few of the foundational hurdles of what it takes to build with something that basically works, which is still not close to where our software is today.

Problem 1: Transcription Speed and Accuracy 

Don’t be fooled by one-to-one audio transcription rates of high-quality audio.  Speech recognition software has come a long way in just the past few years (see Moore’s Law) and there are lots of options, even free ones, that produce acceptable transcription – but not at scale. Many technologies offer great results for one-for-one transcription: one call transcribed per CPU time, ergo a 5 minute call takes 5 minutes to process, upon completion the next one starts.  There is a trade-off between speed and accuracy. Speech recognition software must deal with this, at scale, ergo high-processing speeds.  CallMiner has algorithms that will contextualize with speed and endeavor to pick the next logical word, quickly.  Your data scientist will need to deal with that.

Problem 2: Finding something relevant in the transcript

Once you find a good solution for transcription, the next step is to start finding the pieces of information in the transcript that can have a bearing on the business. You can build an algorithm to search for specific words, but this practice of “word spotting” does little more than show you singular instances of things.  Data anecdotes.  Our ambitious data scientist will learn very quickly two daunting truths, they are like natural laws of speech analytics.

Algorithms need to be built to not only spot a word or phrase, but also identify any of their aliases (ex: loan/lone/alone) and where they fall in the conversation (ex: before “payment”). Even at exceptionally high 95% accuracy with just a million words that is fifty thousand incorrect words that need to be dealt with.

The ability to build phrases together to create scoring also should be contemplated.  As a basis for prediction, how relevant is the thing you found. A phrase by itself may not tell you much, but relevancy scores and counts of the same topic may be an indication of a significant change in customer behavior on that type of call.

Problem 3: Anomalies, quirks and tiny blackholes

So much of a conversation is not what you hear, not what makes sense, but what you don’t or doesn’t.  Without a deep set of experiences (relevant data) finding the anomalies or missing things is nearly impossible. Let me share some examples that a data scientist who is new to this world needs to figure out.  These are all real, and all sadly, very common among clients.  Did you know that the candy “Tootsie Rolls”, among others, has a hotline, and that hotline has no required prompts, so it is effectively an endless loop. This is important if an agent who is not really working hard wants to take an unscheduled break.  Just dial that number and sit there looking busy.  That is something an organization may want to find in a speech analytics system. Or agents listening to phone rings for 5 minutes, or listening to 10 minutes of an answering machine, or an internal extension to listen to hold music for half an hour.

Even more diabolical is silence.  Is silence good or bad?  That depends, that tiny blackhole in an audio recording is speech analytics gold, highly important, and not trivial data science either.

Go/No Go time

As we all know anything “data and analytics” related will take dedicated time and effort by some very specialized resources, typically in high demand.  Certainly, a big data project like speech analytics should not be expected to be completed as side project in any reasonable amount of time. Companies need to be willing to commit specialized FTE hours on an ongoing basis to ensure program success and should weigh the opportunity cost of such a venture.

Hear more on AI and models in our webinar how Text and Speech Analytics Are Not Created Equal.

Artificial Intelligence Speech & Conversation Analytics North America EMEA APAC