Introducing eXact-RAG: the ultimate local Multimodal Rag

By on 20 June 2024

Exact-RAG is a powerful multimodal model designed for Retrieval-Augmented Generation (RAG). It seamlessly integrates text, visual and audio information, allowing for enhanced content understanding and generation.

In the rapidly evolving landscape of the Large language model (LLM), the quest for more efficient and versatile models continues unabated.

One of the latest advancements in this realm is the emergence of eXact-RAG, a multimodal RAG (Retrieval Augmented Generation) system that leverages state-of-the-art technologies to deliver powerful results.

eXact-RAG stands out for its integration of LangChain and Ollama for backend and model serving, FastAPI for REST API service, and its adaptability through the utilization of ChromaDB or Elasticsearch.

Coupled with an intuitive user interface built on Streamlit, eXact-RAG represents a significant leap forward in LLMs capabilities.

A Step back: what is a RAG?

Retrieval Augmented Generation, or RAG, is an architectural approach that can improve the efficacy of Large Language Model (LLM) applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

rag process
RAG process schema. Source: neo4j.com

As the name suggests, RAG has two phases: retrieval and content generation. In the retrieval phase, algorithms search for and retrieve snippets of information relevant to the user’s prompt or question. In an open-domain, consumer setting, those facts can come from indexed documents on the internet; in a closed-domain, enterprise setting, a narrower set of sources are typically used for added security and reliability.

Understanding eXact-RAG

At its core, eXact-RAG combines the principles of RAG, which focuses on retrieval-based conversational agents, with multimodal capabilities, enabling it to process and generate responses from various modalities such as text, images, and audio.

This versatility makes eXact-RAG well-suited for a wide range of applications, from chatbots to content recommendation systems and beyond.

Technologies Powering eXact-RAG

  1. LangChain and Ollama: LangChain and Ollama serve as the backbone of eXact-RAG, providing robust infrastructure for model development, training, and serving.

    LangChain offers a comprehensive suite of tools for natural language understanding and processing, while Ollama specializes in multimodal learning, enabling eXact-RAG to seamlessly integrate and process diverse data types.
  2. FastAPI for REST API Service: FastAPI, known for its high performance and simplicity, serves as the interface for eXact-RAG, facilitating seamless communication between the backend system and external applications.

    Its asynchronous capabilities ensure rapid response times, crucial for real-time interactions.
  3. ChromaDB or Elasticsearch: eXact-RAG offers flexibility in data storage and retrieval by supporting both ChromaDB and Elasticsearch.

    ChromaDB provides a lightweight solution suitable for simpler tasks, while Elasticsearch caters to more complex operations involving vast amounts of data. This versatility enables users to tailor eXact-RAG to their specific needs, balancing performance and scalability accordingly.

User-Friendly Interface with Streamlit for demo purposes

The user interface of eXact-RAG is built on Streamlit, a popular framework for creating interactive web applications with Python. Streamlit’s intuitive design and seamless integration with Python libraries allow users to interact with eXact-RAG effortlessly.

Through the interface, users can input queries, explore results, and interact with generated content across various modalities, enhancing the overall user experience.

Related article: From concepts to MVPs: Validate Your Idea in few Lines of Code with Streamlit

Applications of eXact-RAG

The versatility of eXact-RAG opens up a myriad of applications across different domains:

  • Conversational Agents: eXact-RAG can power chatbots and virtual assistants capable of engaging users in natural and meaningful conversations, leveraging both text and multimedia inputs.
  • Content Recommendation: By analyzing user preferences and behavior, eXact-RAG can recommend personalized content, including articles, videos, and images, tailored to individual tastes and interests.
  • Information Retrieval: eXact-RAG excels at retrieving relevant information from large datasets, making it invaluable for tasks such as question answering, document summarizing, and knowledge base retrieval.
y4E69q0zeenXw40ATAnjp hbJEg Hg6svocuBs0pHZUj7RLz2u DmJfgS74V84CGpBohhisR7aY3HosogTdPySFrp9EbYlvxJkFHE5o0 GboogOSPoS08jmxVtvwL9wjsdf26wQdHDCb2 kiDLaOkFk
eXact-RAG flow schema

Let’s play with eXact-RAG

The first step to use eXact-RAG is to run your preferred LLM model using Ollama or get an OpenAI token. Both are supported by the RAG and to be sure to configure them rightly, it’s necessary to fill the settings.toml with the preferred options. Here an example:

[embedding]
type = "openai"
api_key = ""
chat.model_name = "gpt-3.5-turbo"
...

[database]
type = "chroma"
persist_directory = "persist"
...

In this example it’s possible to configure the LLM model and the vector database in which store the embeddings of your local data.

eXact-RAG is a multimodal RAG, for this reason it can ingest different kind of data like audio files or images. It is possible to choose, in the installation phase, which “backends” to install:

poetry install # -E audio -E image
  • audio extra will install openai-whisper for speech-to-text
  • image extra will install transformers and pillow for image captioning*

* this feature gives the possibility to process images even if the user has not the (hardware) possibility to run a Vision model like llava locally but still wants to pass images as data.

Now the job is done! The following command

poetry run python exact_rag/main.py

starts the server and at http://localhost:8080/docs it is showed the OpenAPI document (swagger) with all the available endpoints.

Demo

eXact-RAG was built as a server for multimodal RAGs but we provide also a user interface just for demo purposes to test all the features.
To run the UI just use the command:

poetry run streamlit run frontend/ui.py

Now, the page at http://localhost:8501 will show a chat interface like in the following example:

Conclusion

eXact-RAG represents a significant advancement in the field of multimodal RAG systems, offering unparalleled versatility and performance through its integration of cutting-edge technologies.

With its robust backend powered by LangChain and Ollama, flexible data storage options, and user-friendly interface built on Streamlit, eXact-RAG is poised to revolutionize various applications of natural language processing and multimodal learning.

As the demand for sophisticated LLM solutions continues to grow, eXact-RAG stands ready to meet the challenges of tomorrow’s digital landscape.

Want a career as a Python Developer but not sure where to start?