Retrieval Augmented Generation and its corporate usage

Do you want our logo?

Do you want our logo description

What is RAG?

Retrieval-Augmented Generation (RAG) is an innovative AI framework that enhances the performance of large language models (LLMs) by combining their Generative AI capabilities with external information retrieval systems. Traditional LLMs are trained on vast amounts of text data, which equips them with the ability to generate human-like responses. However, their knowledge is limited to the data available during their training, leading to potentially outdated or inaccurate information.

This restriction can be particularly challenging in a corporate setting, where leveraging internal knowledge bases like Confluence, Jira, documents hosted in SharePoint or Google Drive, or presentations becomes difficult. RAG enhances LLMs by connecting them with real-time, external knowledge repositories like databases, web pages, and specialized sources. This enables the models to produce responses that are accurate, current, and contextually aligned with the user's needs.

How it works

The process of Retrieval-Augmented Generation involves several key steps, as shown in the following diagram.

Before the RAG-powered tool can be used, an information ingestion pipeline must be created. In this process, data is taken from wherever it is hosted (SharePoint, Drive, Slack, Confluence, Jira tickets, etc.) and vectorized through an embeddings model, which essentially breaks down the document into smaller portions, known as chunks. These chunks are then stored in a vector database, from which the second flow of the solution will retrieve the data. It is important to redo the vectorization process periodically, reducing the time between database updates depending on how often the base information is modified. At Paradigma, we propose weekly or even daily updates.

The second workflow is when the user acts. Their prompt is posed to an AI system employing RAG, the LLM generates an initial prompt based on the input. Let’s illustrate this workflow with an example of an employee at a power consultancy firm who needs information about wind power projects:

What projects have we done in the past related to wind power?

The user's query is vectorized to retrieve relevant documents or data from external sources, typically stored in a vector database, a type of database optimized for quick search and retrieval of semantically similar information.

Document	Content
Document 1	Powair Tech. 2nd Aug 2017 Belfast, UKdom. The project/nconsisted on installing new air filtration systems for local industry/n factories. Pilot launch results varied and final adjustments not/nfinalized till Q4.
Document 2	Techsol Ltd.. November 23rd,2018. New York,, USA. The project plan/initialy aimed at redesgining wind power infrastructures but /delays occured due to unexpected client site problems. end result was a mixed outcome.
Document 3	Greentech Solutions/ 15Jan2019; Berlin/Germany; This project involved upgrading wind power technologies but suffered from/n frequent power outages and system incompatibilties; final delivery in summer of 2019.
Document 4	InnoBuild Corp. July 7th 2020. Sydney//Australia. The development/nfocused on adve…d air power materials. Initally planned to finish/ by end 2020, but ixtended to …meed 2022 due to supply chain bottlencks.

Once the relevant documents are retrieved, they are pre-processed and transformed into a format compatible with the LLM.

Document	Content
Document 1	Powair Tech. 2nd Aug 2017, Belfast, UK. The project consisted of installing new air filtration systems for local industry factories. Pilot launch results varied, and final adjustments were not finalized until Q4.
Document 2	Techsol Ltd. November 23rd, 2018, New York, USA. The project plan initially aimed at redesigning wind power infrastructures, but delays occurred due to unexpected client site problems. The end result was a mixed outcome.
Document 3	Greentech Solutions. 15th Jan 2019, Berlin, Germany. This project involved upgrading wind power technologies but suffered from frequent power outages and system incompatibilities. Final delivery was in the summer of 2019.
Document 4	InnoBuild Corp. July 7th, 2020, Sydney, Australia. The development focused on advanced air power materials. Initially planned to finish by the end of 2020, but extended to mid-2022 due to supply chain bottlenecks.

The LLM then integrates this newly fetched information with its own generated content to produce a final output. For instance, using an LLM framework like LangChain, the background process may involve retrieving specific projects related to the query from an internal database and presenting the most relevant information in a coherent, contextual manner.

The user was asked for the following question about their company:

"What projects have we done in the past related to wind power?"

Since this is private information we don't have access to, I retrieved the following content from our internal database:

1- Powair Tech. 2nd Aug 2017, Belfast, UK. The project consisted of installing new air filtration systems for local industry factories. Pilot launch results varied, and final adjustments were not finalized until Q4.
2- Techsol Ltd. November 23rd, 2018, New York, USA. The project plan initially aimed at redesigning wind power infrastructures, but delays occurred due to unexpected client site problems. The end result was a mixed outcome.
3- Greentech Solutions. 15th Jan 2019, Berlin, Germany. This project involved upgrading wind power technologies but suffered from frequent power outages and system incompatibilities. Final delivery was in the summer of 2019.
4- InnoBuild Corp. July 7th, 2020, Sydney, Australia. The development focused on advanced air power materials. Initially planned to finish by the end of 2020, but extended to mid-2022 due to supply chain bottlenecks.

Now answer the User's question...

With its own generated content the LLM produces a final output.

Based on the information retrieved, here are the projects related to wind power that were done in the past:

1. Techsol Ltd. Project  
   - Date: November 23rd, 2018  
   - Location: New York, USA  
   - Description: The project initially aimed at redesigning wind power infrastructures. However, there were delays due to unexpected client site problems. The project had a mixed outcome in the end.

2. Greentech Solutions Project  
   - Date: 15th January 2019  
   - Location: Berlin, Germany  
   - Description: This project involved upgrading wind power technologies. It faced challenges such as frequent power outages and system incompatibilities. The final delivery of the project was completed in the summer of 2019.

These two past projects are specifically related to wind power, focusing on infrastructure redesign and technology upgrades.

How we use RAG at Paradigma

At Paradigma, we have leveraged the power of RAG to build advanced AI solutions tailored to our specific business needs and those of our clients. Our experience shows that RAG's ability to retrieve relevant and current information provides a cost-effective alternative to the more expensive process of fine-tuning or retraining entire models. The decreasing costs of vector databases and embedding services make RAG an increasingly accessible option for many companies.

RAG’s ability to provide contextually relevant responses is another key advantage. It retrieves information directly related to a user’s query, minimizing the chances of irrelevant or off-topic content. This ensures that AI-generated content is coherent and aligned with the latest and most specific data available, reducing contradictions and inconsistencies. At Paradigma, we have utilized this capability to develop specialized solutions for our clients across different cloud platforms, each tailored to their specific needs.

For internal purposes, we leveraged Google Cloud's Retrieval-Augmented Generation capabilities to develop a Generative AI-powered internal chatbot. This chatbot efficiently retrieves information from internal web pages to address common HR and IT desk queries, such as questions about holidays, leave policies, and other employee services. By providing accurate and immediate responses, this solution reduces the workload on HR and IT desk staff and saves employees time, demonstrating how RAG can streamline internal processes. The following diagram illustrates the key components of a RAG architecture based on Google Cloud:

Data ingestion and data query at Google Cloud

Using AWS, we developed a Generative AI tool specifically designed for contact centers to enhance customer interactions. This tool enables agents to retrieve real-time information about competitors' current offers during customer conversations. For instance, when a customer mentions a promotional offer from another company, the tool allows the agent to instantly verify the offer details and accuracy. This immediate access empowers agents to propose competitive counter-offers tailored to the customer's needs, creating opportunities to retain the customer or even upsell. By providing real-time retrieval of competitor information, the tool reduces response times and improves service quality by ensuring agents are well-informed and confident in their responses.

Consequently, contact centers can achieve significant increases in customer satisfaction and retention rates. Additionally, the tool supports data-driven decision-making, allowing agents to deliver more personalized and relevant offers, ultimately leading to higher conversion rates and a stronger competitive position in the market. While the following diagram illustrates some AWS services likely to be used in a RAG architecture, AWS also offers specialized products, such as Kendra, which can simplify some of the steps shown.

For Azure-based solutions, we created an advanced tool for a large fashion retailer that vectorizes their existing APIs, enabling the automatic creation of new APIs that follow the same format and structure as their previous ones. This automation greatly reduces the time and effort required for API development, allowing the retailer to scale their digital services efficiently and maintain consistency across all their APIs. The tool not only accelerates the development process but also minimizes human errors, ensuring that all new APIs adhere to the company’s standards and best practices.

Additionally, we leveraged Azure's capabilities to enhance internal knowledge management systems by vectorizing content from Confluence pages and solving Jira tickets. This solution enables employees to independently resolve common IT issues by accessing relevant information directly, without the need to involve IT staff for repetitive queries.

As a result, IT teams can focus on more strategic and complex problems, leading to improved overall productivity and operational efficiency. This approach has led to significant time savings, cost reductions, and increased employee satisfaction by streamlining workflows and reducing dependency on IT support for routine issues. In the following diagram, the key Azure service for a RAG architecture are highlighted:

Azure Data Lake Storage will store the scripts for preprocessing data, while Azure OpenAI Service will use its embedding models to vectorize the data. This vectorized data will then be stored in Azure Cosmos DB for PostgreSQL; alternatively, Azure AI Search could be used, although it is more costly. In the user workflow, the same embedding model will be utilized, along with one of the latest OpenAI models as the Large Language Model (LLM).

There is a growing trend among cloud providers, such as Azure, to offer low-code alternatives for creating Generative AI solutions. Azure AI Studio is a prime example of this—a comprehensive platform designed to simplify the development, deployment, and management of generative AI applications using both no-code and low-code tools. It provides a variety of pre-built and customizable models, API services, and interactive visual workflows, enabling users to build, fine-tune, and deploy AI solutions at scale without requiring deep programming expertise.

Beyond the components illustrated in the diagrams above, there are several non-compulsory but highly recommended elements commonly found in RAG solutions. These include memory, to retain the context of conversations; tracing, to monitor the performance of each LLM call and every step in the process; and prompt versioning, to manage and promote prompts across different environments.

Wrapping up

These examples demonstrate how RAG technology is already being applied across various AI platforms and services. Providers like Google Cloud, AWS, Azure, NVIDIA, and Microsoft offer robust frameworks for implementing RAG, allowing companies to create customized AI applications that draw on both internal knowledge bases and vast external data repositories. These platforms enable businesses to develop highly specialized AI tools tailored to their unique needs, further showcasing the versatility and adaptability of RAG.

Retrieval-Augmented Generation represents a major step forward in AI by overcoming the limitations of traditional LLMs. By combining the generative capabilities of AI with reliable, real-time data retrieval, RAG ensures that responses are both accurate and relevant. As Paradigma continues to leverage RAG, the potential applications for Generative AI are rapidly expanding, offering more efficient, adaptable, and innovative solutions across various sectors.

José María Hernández de la Cruz

Trained as a philologist and later as a computational linguist, I emigrated to Ireland, where I participated in large NLP projects at Big Tech companies. Additionally, I collaborated in training some of the most recognized Large Language Models. Currently, my efforts are focused on staying up to date with tools surrounding Generative AI, evaluating their viability, and applying them to real-life cases to generate value for our clients.

View more of José María.