Advancing QA Systems: Integrating Semantic Search with Elasticsearch and OpenAI Embeddings

Semantic search

In this blog we will explore how to integrate Elasticsearch and OpenAI embeddings to create a sophisticated Question-Answer (QA) bot. Our approach focuses on semantic search, offering a more accurate and context-aware retrieval of information.

Explore the Code

Visit the GitHub repository for more insights and to feel free to contribute: Semantic Information Retrieval.

The Shift to Semantic Search

Our QA bot uses text-to-vector embeddings, a method that goes beyond traditional keyword searches. This technology understands the context and semantics of queries, leading to more relevant and precise results.

Benefits of Semantic Search

  • Contextual Understanding: Unlike keyword matching, vector embeddings interpret the meaning behind text.
  • Query Flexibility: Capable of handling a variety of query phrasings.
  • Targeted Results: Minimizes irrelevant responses by focusing on the intended context.

Leveraging OpenAI Embeddings

We use OpenAI’s service to transform textual data into meaningful vector representations. This enhances the bot’s ability to parse and respond to queries semantically. Learn about OpenAI’s text embeddings.

Elasticsearch as Data Storage

Elasticsearch 8.11.1 (Use anything above 8.11.0) serves as our datastore for both text and vector embeddings. ElasticSearch 8.11.0 introduced update that supports Increase the max vector dims to 4096. This approach allows for:

  • Diverse Search Capabilities: Enables both vector and traditional searches.
  • Efficient Data Management: Handles large data volumes effectively.

Cosine similarity

Elasticsearch not only serves as an efficient datastore for our QA system but also facilitates advanced semantic search capabilities. We leverage cosine similarity to measure the similarity between query and document vectors. This method is preferred for its effectiveness in high-dimensional spaces, like text embeddings, where it focuses on the angle between vectors, capturing the semantic essence of the text. While other methods like k-Nearest Neighbors (kNN) are viable, cosine similarity excels in contextually relevant searches, making it ideal for our application.

cosineSimilarity(params.query_vector, 'question_vector') + 1.0

Setting Up Your Development Environment

Before diving into the application, set up a Python virtual environment as detailed in this guide.

Running the ElasticSearch in Docker

Our ElasticSearch is containerized for ease of deployment. Here’s how to run the system:

Elasticsearch Setup:

docker compose -f ./ up --build -d

Starting the FastAPI Server:

Run all the cells of jupyter notebook.
After running all the cells simultaneously you will see following output from last cell

image 4

You can then verify that app is up and running calling the health endpoint at:


This should give 200 status code with healthy as status.

image 5

You can also access elasticsearch at:

image 6

Example Data and Demonstration

Now that our infrastructure is up and running and the api is ready, we can start adding some question answer. For test purpose here are few question answers.

curl --location 'localhost:5001/train/' \
--header 'Content-Type: application/json' \
--data '{"question": "What is Python?", "answer": "Python is a programming language that emphasizes readability."}

Here is an example screenshot in postman

image 7

Here are some other sample question answers you can use to feed the system:

{"question": "What is FastAPI?", "answer": "FastAPI is a modern, fast web framework for building APIs with Python."}
{"question": "What is Elasticsearch?", "answer": "Elasticsearch is a distributed search and analytics engine."}
{"question": "How to write a loop in Python?", "answer": "You can write a loop using 'for' or 'while' constructs."}
{"question": "What is Docker?", "answer": "Docker is a platform for developing, shipping, and running applications in containers."}
{"question": "How to deploy a FastAPI app?", "answer": "You can deploy FastAPI apps using servers like Uvicorn or Gunicorn."}
{"question": "What is a REST API?", "answer": "REST API is an architectural style for designing networked applications."}
{"question": "What is machine learning?", "answer": "Machine learning is a field of AI that uses statistical techniques to give computers the ability to learn from data."}
{"question": "What is a neural network?", "answer": "A neural network is a series of algorithms that attempts to recognize underlying relationships in a set of data."}
{"question": "What is Kubernetes?", "answer": "Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications."}
{"question": "What is Git?", "answer": "Git is a distributed version-control system for tracking changes in source code."}
{"question": "What is blockchain?", "answer": "Blockchain is a system of recording information in a way that makes it difficult or impossible to change."}
{"question": "How to manage state in React?", "answer": "State in React can be managed using hooks like useState or using state management libraries like Redux."}
{"question": "What is cloud computing?", "answer": "Cloud computing is the delivery of computing services over the internet."}
{"question": "What is Big Data?", "answer": "Big Data refers to complex, large data sets that are challenging to process using traditional data processing tools."}
{"question": "What is an API?", "answer": "API stands for Application Programming Interface, a set of rules that allows programs to talk to each other."}
{"question": "What is SQL?", "answer": "SQL is a standard language for accessing and manipulating databases."}
{"question": "What is NoSQL?", "answer": "NoSQL is a type of database that stores and retrieves data in a way that does not rely on the traditional tabular structure."}
{"question": "What is a microservice architecture?", "answer": "Microservice architecture is an approach to software development where applications are composed of small, independent services."}
{"question": "What is Artificial Intelligence?", "answer": "Artificial Intelligence is the simulation of human intelligence processes by machines, especially computer systems."}
{"question": "What is DevSecOps?", "answer": "DevSecOps is an approach to culture, automation, and platform design that integrates security as a shared responsibility throughout the entire IT lifecycle."}

We can control number of results that API returns by change the size value. Following example will return 3 results. You can find the code in cell 14 of the jupyter notebook.

# Perform the search
response =
            "size": 3,
            "query": script_query,
            "_source": {"includes": ["answer"]}

Now that we have our data source ready it is time to ask questions and get semantic results back.
Here is example question
What is semantic search?

    "answers": [
        "Machine learning is a field of AI that uses statistical techniques to give computers the ability to learn from data.",
        "Elasticsearch is a distributed search and analytics engine.",
        "Artificial Intelligence is the simulation of human intelligence processes by machines, especially computer systems."
image 8

And thats all we have 🍻

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.