top of page

Case study - A Versatile Text-Data Exploration Tool

  • Writer: Pivigo
    Pivigo
  • Dec 13, 2024
  • 2 min read

ree

Purpose

S2DS Project Partner Spot Intelligence, a consulting business, focuses on bringing insights to its clients through Natural Language Processing (NLP). For this project, Spot Intel sought a lightweight and versatile web application to facilitate their initial explorations of diverse client data. The app needed to provide a range of insights into the dataset without compromising ease of use or breadth of insights. Accuracy was not a top priority; rather the tool needed to offer a versatile and user-friendly interface to explore datavfrom various perspectives. Lower computational cost was more important than providing the fastest or most accurate analysis possible.


Approach

There were three main technical challenges to this project – multi-document summarisation, visualisation of themes, and front-end app design. The dataset provided for the team consisted of a large corpus of CNN news articles, though the tool will ultimately be used on datasets as varied as chatbot transcripts and medical records. NLP is a computationally expensive field. Despite this, the team successfully worked within the computational constraints laid out in the brief, creatively combining cloud compute with CPUs (GitHub’s Codespaces) and GPUs (Google Colab).


The team selected BERT topic for its high performance in topic modelling, and sentence transformers for semantic search. They then used several techniques like HDBScan and Mini-Batch K-means for clustering and visualisation in the web application. For computation efficiency, they chose to run textual analysis prior to using the web app for interactive visualisation. The app provided Spot Intel with a user-friendly interface featuring interactive visualisations, customisable summarisation, and efficient data processing, enabling end-users to gain valuable insights from the data.


Outcome

The team developed a Dash dashboard (web application) capable of providing text visualisation and summarisation to give an overview of a new dataset. The app has significantly enhanced the efficiency of data exploration for Spot Intelligence. By automating the process of topic modelling, clustering, and summarisation, the tool reduced the time required to gain insights from new datasets from several hours or days worth of work to now only minutes to achieve a quick overview. The interactive visualisations and

customisable summarisation features provide Spot Intel with a deeper understanding of their client data, which lead to improved decision-making and aid data discussions with descriptive insights.


“The tool changes how I analyse data. Previously, I would dive into a dataset and do a thorough deep dive to understand bottlenecks and determine hidden risks, but with this tool, a clear oversight is provided straight away. This means I can get to the essence faster and deliver value to clients almost immediately. I can also show them what their data looks like and what this means for their analysis.”

Neri Van Otten, founder of Spot Intelligence

bottom of page