Embedding models and vector databases

Humans and computers communicate in two distinct ways. We use collections of words, but computers use numbers. In order for us to effectively pass information to one another, we need to convert our words to numbers while maintaining their syntactic and semantic meaning. This is done using a process known as embedding. Embedding models are the Google Translate of the Human x Computer interaction space. They take text as input and return its numerical representation making sure to retain the text’s key relationships.

share

Our language is incredibly complex. As such, it's only natural that their embedded representations, stored as vectors, share this complexity. Depending on the model, the embeddings can be up to 1,536 dimensions or more! We’re used to thinking about our surroundings in 3 dimensions so it’s difficult to imagine what an n-dimensional space might look like. Techniques like Principal Component Analysis (PCA) help us bridge this gap by identifying underlying patterns and reducing the embedding space from n-dimensions to 2 or 3.

In the gif below the axes represent reduced dimensions and the dots represent where a word falls in the embedding space. Note that the words “cat”, “dog” and “pet” are close together. This is because the embedding model was able to infer that “cat” and “dog” are both “pets”, making them similar!

These data structures benefit from a database optimized for the storage and search of vectors in multi-dimensional space. A key feature of these databases is the ability to search the database using natural language instead of SQL! Here’s how it works:

A user asks a question about the data in natural languageThe question, or query, is embedded using the same model we used for the data in the database. Great! Now we're all speaking the same numerical language.A search algorithm is used to compare the embedded query to the embedded data and return results that are similar to one another.

Embedded text stored in vector databases is a powerful tool for natural language applications. Let’s see how we can couple vector databases with LLMs to create a RAG Enabled LLM.

RAG Enabled LLMs

Now that all of the data work behind the scenes is complete, the user can now ask the LLM questions. After the user enters their question, the question gets embedded and then sent to the vector database to be compared with all existing vectors. A predefined number of vectors are retrieved (often referred to as top k number of vectors) based on a similarity metric. Common metrics include formulas like cosine similarity, euclidean distance, and dot product. Once all the relevant information has been retrieved, it’s combined into one long chunk of text which is then fed to the LLM. What does the LLM do with this information?

Part of the LLM’s base prompt is instructing the LLM to generally only use the information retrieved from the vector database as its source of truth. That is, the retrieved information will “override” the LLM’s parametric knowledge learned during the pre-training process. Override is in quotes because it doesn’t actually change any of the underlying weights of the LLM. For those unfamiliar with machine learning jargon, the weights are the matrices of numbers that are all of the learned parameters of the model. The model has just been told: “Hey, only rely on this chunk of information to answer the user’s question and if you don’t know the answer based on the retrieved information, tell them you don’t know the answer”.

quote The base prompt instructs the LLM to use the information retrieved from the vector database as its source of truth. That is, the retrieved information will “override” the LLM’s parametric knowledge learned during the pre-training process. quote

It’s important to understand that vector databases can be used without an underlying foundation model (FM) as well. They can simply be used as an advanced search technique, which we’re also experimenting with but that would take another blog to go in depth with. In short though, using a vector database as a search method can utilize both standard lexical search and semantic search providing more robust search results.

The entire flow of information can be summed in a few steps.

Ground truth information is embedded and stored in the vector database of choice (FAQs, documentation, blogs, etc.)User enters a query which is embedded into its numerical representationTopK vectors are retrieved and turned into a big chunk of textThe retrieved information augments is put into the base prompt of the LLMThe LLM takes that information and uses it to inform it’s final answer that is given to the user rather than having to use the information it was trained on to answer the question

The PBS Innovation Team has built a few RAG-enabled chatbot prototypes for various PBS websites such as the Hub and the PBS help site. White these bots have not been trained (fine-tuned) on PBS-related information, these chatbots have access to various PBS documents such as the documents found on the Hub FAQs from the PBS help site, and documents from the PBS Documentation site, allowing the bots to answer user queries with relevant, accurate, and up-to-date information! That is, the holy grail of information retrieval. Check out one of the bots we built below!

What is the future of RAG?

While RAG is a popular solution today to supercharge an LLM-based chatbot, does it have a future in the AI space? Some say yes and others would argue that RAG is simply a temporary solution to AI not having a method of fact-checking and having limited context windows (the amount of text you can submit to it in any one prompt). We’re already seeing LLMs with incredible context windows, such as Claude Pro with 200k+ tokens, which is about 350 pages of text. Then we have GPT4 Turbo with a context window of 128k tokens and most recently Gemini 1.5 Pro with a context window of 1 million tokens! More context doesn’t always translate to enhanced reasoning though. What about fact-checking and ground truth? Will LLMs have to rely on vector databases for their source truth forever? Based on the speed of development within the AI space, there will probably be a time in the not too distant future where LLMs don’t have to rely on an external database for ground truth information but for now RAG is here to stay and supplies AI with a host of benefits beyond providing a source of truth for the model.