iVibe has coded a tool that analyzes customer sentiments and themes from call recordings

Image by author

# Introduction

Every day, customer service centers record thousands of conversations. There are gold mines of information hidden in those audio files. Are customers satisfied? What problems do they mention most often? How do emotions change during a call?
Analyzing these recordings manually is challenging. However, with modern artificial intelligence (AI), we can automatically transcribe calls, detect sentiment and extract recurring themes – all with offline and open-source tools.

In this article, I will walk you through a complete customer sentiment analyzer project. You will learn how:

Transcribing audio files to text using whisper
Detecting sentiment (positive, negative, neutral) and emotions (frustration, satisfaction, urgency)
Extracting topics using automatically berttopic
Displaying results in an interactive dashboard

The best part is that everything is run locally. Your sensitive customer data never leaves your machine.

Figure 1: Dashboard overview showing sentiment gauge, sentiment radar, and topic distribution

# Understanding why local AI matters for customer data

Cloud-based AI services such as OpenAI’s API are powerful, but they come with concerns such as privacy issues, where customer calls often contain personal information; High cost, where you pay per-API-call pricing, which quickly adds up for high volumes; and dependence on Internet rate caps. By running locally, it becomes easier to meet data residency requirements.

This native AI speech-to-text tutorial keeps everything on your hardware. Models download once and run offline forever.

System architecture overview shows how well each component handles a task. This modular design makes the system easy to understand, test, and expand

Figure 2: System architecture overview shows how well each component handles a task. This modular design makes the system easy to understand, test, and expand

// Prerequisites

Before you begin, make sure you have the following:

You have Python 3.9+ installed on your machine.
you should be ffmpeg Set up for audio processing.
You should have basic familiarity with Python and machine learning concepts.
You need about 2GB of disk space for the AI model.

// Setting up your project

Clone the repository and set up your environment:

git clone https://github.com/zenUnicorn/Customer-Sentiment-analyzer.git

Create a virtual environment:

Activate (Windows):

Activate (Mac/Linux):

Install dependencies:

pip install -r requirements.txt

First run downloads the AI model (~1.5GB total). After that, everything works offline.

Figure 3: Terminal showing successful installation

# Transcribing Audio with Whisper

In the customer sentiment analyzer, the first step is to convert the spoken words from the call recording to text. This is done by an automatic speech recognition (ASR) system developed by Whisper OpenAI. Let’s take a look at how it works, why it’s a great option, and how we use it in projects.

Whisper is a transformer-based encoder-decoder model trained on 680,000 hours of multilingual audio. When you feed it an audio file, it:

Resamples audio to 16kHz mono
Produces a matching spectrogram – a visual representation of frequencies over time – which serves as a picture of the sound.
Divides the spectrogram into 30-second windows
passes each window through an encoder that creates a hidden representation
Translates these representations into text tokens one word (or sub-word) at a time

Think of the Mel spectrogram as how machines “see” sound. The x-axis represents time, the y-axis represents frequency, and the intensity of color represents volume. The result is a highly accurate transcript, even with background noise or accents.

code implementation

Here is the basic transcription logic:

import whisper

class AudioTranscriber:
    def __init__(self, model_size="base"):
        self.model = whisper.load_model(model_size)
   
    def transcribe_audio(self, audio_path):
        result = self.model.transcribe(
            str(audio_path),
            word_timestamps=True,
            condition_on_previous_text=True
        )
        return {
            "text": result("text"),
            "segments": result("segments"),
            "language": result("language")
        }

model_size The parameter controls accuracy versus speed.

Sample	parameters	pace	best for
Small	39m	the fastest	quick test
Base	74m	Fast	Development
Small	244m	medium	Production
Big	1550m	slow	maximum accuracy

For most use cases, base Or small Provides the best balance.

Figure 4: Transcription output showing timestamped segments

# sentiment analysis with transformers

With the extracted text, we perform sentiment analysis using Hugging Face Transformer. We use CardiffNLP Roberta Model, trained on social media text, which is perfect for conversational customer calls.

// comparison of feeling and emotion

Sentiment analysis classifies text as positive, neutral or negative. We use a well-established RoBERTa model because it understands context better than simple keyword matching.

The transcript is tokenized and passed through a transformer. The final layer uses a softmax activation, which outputs probabilities that sum to 1. For example, if positive is 0.85, neutral is 0.10, and negative is 0.05, then the overall sentiment is positive.

expressions: overall polarity (positive, negative, or neutral) the answer to the question: “Is it good or bad?”
Emotion: specific emotions (anger, happiness, fear) answering the question: “What are they really feeling?”

We explore both for complete details.

// Code implementation for sentiment analysis

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch.nn.functional as F

class SentimentAnalyzer:
    def __init__(self):
        model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
   
    def analyze(self, text):
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
        outputs = self.model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=1)
       
        labels = ("negative", "neutral", "positive")
        scores = {label: float(prob) for label, prob in zip(labels, probabilities(0))}
       
        return {
            "label": max(scores, key=scores.get),
            "scores": scores,
            "compound": scores("positive") - scores("negative")
        }

compound Scores range from -1 (very negative) to +1 (very positive), making it easy to track sentiment trends over time.

// Why avoid simple dictionary methods?

Traditional approaches like vader Count positive and negative words. However, they often forget the context:

“It’s not good.” The lexicon sees “good” as positive.
A transformer interprets negation (“no”) as negative.

Transformers understand the relationships between words, making them far more accurate to real-world text.

# Extracting topics with BERTopic

Knowing sentiment is useful, but what are customers talking about? berttopic Topics are automatically discovered in the text without you having to pre-define them.

// How does Burttopic work?

embedding: convert each transcript into a vector sentence transformer
dimensional reduction: : umap Compresses these vectors into a low-dimensional space
clustering: : hdbscan group similar transcripts together
subject representation:For each cluster, extract the most relevant words using C-TF-IDF

The result is a set of topics such as “Billing issues,” “Technical support,” or “Product feedback.” as opposed to the old ways Latent Dirichlet Allocation (LDA)BERTopic understands semantic meaning. “Shipping delays” and “late delivery” cluster together because they have similar meanings.

code implementation

From topics.py: :

from bertopic import BERTopic

class TopicExtractor:
    def __init__(self):
        self.model = BERTopic(
            embedding_model="all-MiniLM-L6-v2",
            min_topic_size=2,
            verbose=True
        )
   
    def extract_topics(self, documents):
        topics, probabilities = self.model.fit_transform(documents)
       
        topic_info = self.model.get_topic_info()
        topic_keywords = {
            topic_id: self.model.get_topic(topic_id)(:5)
            for topic_id in set(topics) if topic_id != -1
        }
       
        return {
            "assignments": topics,
            "keywords": topic_keywords,
            "distribution": topic_info
        }

Comment: Topic extraction requires many documents (at least 5-10) to find meaningful patterns. Single calls are analyzed using the fitted model.

Topic delivery bar chart showing billing, shipping, and technical support categories

Figure 5: Topic distribution bar chart showing billing, shipping, and technical support categories

# Building an Interactive Dashboard with Streamlight

Raw data is difficult to process. we made one Streamlight dashboard (app.py) that allows business users to search results. Streamlit turns Python scripts into web applications with minimal code. Our dashboard provides:

Upload interface for audio files
Real-time processing with progress indicators
Using Interactive Visualization plot
Drill-down ability to trace individual calls

// Code implementation for dashboard structure

import streamlit as st

def main():
    st.title("Customer Sentiment Analyzer")
   
    uploaded_files = st.file_uploader(
        "Upload Audio Files",
        type=("mp3", "wav"),
        accept_multiple_files=True
    )
   
    if uploaded_files and st.button("Analyze"):
        with st.spinner("Processing..."):
            results = pipeline.process_batch(uploaded_files)
       
        # Display results
        col1, col2 = st.columns(2)
        with col1:
            st.plotly_chart(create_sentiment_gauge(results))
        with col2:
            st.plotly_chart(create_emotion_radar(results))

Streamlight’s caching @st.cache_resource This ensures that models load once and persist throughout all interactions, which is critical for a responsive user experience.

Figure 7: Full dashboard with sidebar options and multiple visualization tabs

// key features

Upload audio (or use a sample transcript for testing)
View transcript highlighting emotions
Emotion timeline (if the call is long enough)
Topic visualization using Plotly interactive charts

// caching for performance

Streamlit re-runs the script on each interaction. To avoid reprocessing heavy models, we use @st.cache_resource: :

@st.cache_resource
def load_models():
    return CallProcessor()

processor = load_models()

// real time processing

When a user uploads a file, we show a spinner during processing, then immediately display the results:

if uploaded_file:
    with st.spinner("Transcribing and analyzing..."):
        result = processor.process_file(uploaded_file)
    st.success("Done!")
    st.write(result("text"))
    st.metric("Sentiment", result("sentiment")("label"))

# reviewing practical lessons

Audio Processing: From Waveform to Text

The magic of Whisper lies in its matching spectrogram conversion. Human hearing is logarithmic, meaning we are better at recognizing lower frequencies than higher frequencies. The male scale mimics this, so the model “sounds” more like a human. The spectrogram is essentially a 2D image (time versus frequency), which the transformer encoder processes in the same way as it processes an image patch. This is why Whisper handles noisy audio so well; It looks at the whole picture.

// Transformer Output: Softmax vs Sigmoid

softmax(emotion): Forces the probabilities to sum to 1. This is ideal for mutually exclusive classes, as a sentence is usually not both positive and negative.
Sigmoid (emotions): Deals with each class independently. A sentence can be delightful and surprising at the same time. The sigmoid allows this overlap.

It is important to choose the right activation for your problem domain.

// Communicating Insights with Visualization

A good dashboard does more than show numbers; It tells a story. Plotly charts are interactive; Users can hover to see details, zoom into the time frame, and click on the legends to toggle data series. It turns raw analysis into actionable insights.

// running the application

To run the app, follow the steps from the beginning of this article. Test sentiment and sentiment analysis without audio files:

It runs the sample text through a Natural Language Processing (NLP) model and displays the results in the terminal.

Analyze a single recording:

python main.py --audio path/to/call.mp3

Batch processes a directory:

python main.py --batch data/audio/

For the full interactive experience:

python main.py --dashboard

open http://localhost:8501 In your browser.

Figure 8: Terminal output showing successful analysis with sentiment score

# conclusion

We’ve built a complete, offline-capable system that transcribes customer calls, analyzes sentiment and sentiment, and extracts recurring themes – all with open-source tools. This is a production ready foundation for:

Customer support teams identifying pain points
Product Managers Are Collecting Feedback at Large Scale
quality assurance monitoring agent performance

The best part? Everything runs locally, respecting user privacy and eliminating API costs.

The full code is available on GitHub: an-ai-that-analyzes-customer-sentiment. Clone the repository, follow this native AI speech-to-text tutorial, and start extracting insights from your customer calls today.

Shittu Olumide He is a software engineer and technical writer who is passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and the ability to simplify complex concepts. You can also find Shittu Twitter.