Improving operational efficiency with LLMs

Leveraging large language models to obtain valuable insights from unstructured text data faster, cheaper, and more reliably than a full-time analyst.

A professional services company wanted to extract key industry trends from quarterly company reports.

In this project, we demonstrated how large language models (LLMs) could extract valuable insights from vast amounts of unstructured data. For some tasks, our pipeline could even outperform the manual work of a financial analyst.

Context & Objectives

The objective of the project was to obtain clear and complete insights into sectorial trends and companies’ strategies.

At the end of the project, our LLM pipeline needed to prove the feasibility of extracting the required information faster than a financial analyst while maintaining the quality of the data. To be convincing, the LLM pipeline should not allow the financial analyst to catch up to the workload that the pipeline can handle.

A major part of our objective was to be faster and more reliable than a human.

Around 500 company reports from quarterly shareholder meetings provided large volumes of rich unstructured text data for analysis by large language models.

However, our objective was to extract specific trends related to a single department. This meant that there could be instances where only a few sentences would reference the desired topic in an entire report. For this reason, we needed to be cautious about the implications of sparse data on the results, especially its impact on accuracy.

Additionally, since a major part of our objective was to be faster and more reliable than a human, we needed to compare our results to that of an analyst manually processing the same data. We identified three primary challenges that we needed to overcome to meet the objective of the project:

  • Exhaustivity: An analyst reading the content would not miss any essential information or content. So, to meet the objective, we had to ensure that our model did not miss any key information with a high degree of certainty.

  • Reliability: An analyst reading the report wouldn't invent or change any information. So, we needed to ensure our model had no hallucinations, and when we didn't have perfect results, we needed to set reasonable levels of confidence.

  • Structure: Our solution had to enable pattern and trend detection in a structured format. We achieved this by successfully converting the text into structured data in Excel, enabling further quantitative and qualitative analyses (like dashboards).

Approach

The approach could be split into two main steps:

  • Information filtering and summarization from the raw text reports: The output of this first step was to create a list of key sentences for every report. Those key sentences contained and summarized all the insights that must be structured.

  • Key sentence structuring: The key sentences then had to be structured to fit an Excel format. The main challenge here was the diversity of the information's formatting.

Pooling system with multiple tailored models

To further enhance the accuracy of the content extraction, we used a technique called pooling. Instead of relying on a single model, we aggregated the results from multiple models. This technique resulted in a significant improvement in the accuracy of the content extraction by 50%!

Pooling system

LLM-based voting system

To ensure the challenge of reliability was satisfied, we introduced a voting system. This system involved running repeated queries with different models (GPT-3.5 and GPT-4) and assigning voting powers to each model. We selected the output with the highest number of votes as the result. If the number of votes didn't meet a certain threshold, we classified the extracted information as unreliable and subject to manual review.

We found it essential to prioritize prompt quality over upgrading the model to achieve the best results.

Throughout the project, it became evident that the quality of the prompt significantly impacted the results. Even with the latest and more expensive GPT models, starting with a well-crafted and fine-tuned base prompt yielded better outcomes. Therefore, we found it essential to prioritize prompt quality over upgrading the model to achieve the best results.

Moreover, the trade-off between investment (time and money) and results was an important consideration. Going from GPT-3.5 to GPT-4 resulted in a 30X increase in costs. This amount is staggering, and developing a system that could balance the trade-off was crucial.

a diagram illustrating the process of an LLM selection

The process of choosing the right LLM

The operating cost of the LLM pipeline was at least ten times cheaper than a full-time analyst.

We estimated that our prompt development process resulted in lower overall costs than similar manual work of a full-time analyst. Our initial estimates showed that the operating cost of the LLM pipeline was at least ten times cheaper than a full-time analyst.

Results

We developed a solution for our client using LLMs that could extract data faster and cheaper than an analyst that was as accurate and reliable, if not more.

In this project, we had to overcome the challenge of exhaustivity, reliability, and structure in our approach. Doing so further proved that our client could successfully implement LLMs to eliminate time-intensive manual work and improve operational efficiency.

_

Written by Joleen Bothma

Previous
Previous

B2B growth with advanced lead scoring

Next
Next

Future-proof data warehousing in professional services