Tutorial: Using PostgreSQL as a Vector Database for LangChain

Using PostgreSQL as a vector database is twice as fast as dedicated vector databases like pine cone or we8. I’ve been using pine cone for a year, but recently found PG factor to be faster and more efficient. The open-source nature of PG factor allows for better control and simplicity, making it the clear winner for me. Even built a custom PG factor service for multiple collections. If you’re working on large language model applications, give PG factor a shot. Check out super base for a managed PostgreSQL database with the factor extension. It’s a game changer! #Efficiency #PGFactor #PostgreSQL 🚀

Introduction

Here you can see that if we use this setup it’s twice as fast and we can also use a that we’re already familiar with to manage our vectors as well our documents. For most building applications with large models, using PG Factor is actually better than using a dedicated Vector database like pine cone or wv8. I’ve been working with large language for about a year now and I think this image is a good overview of what a typical architecture looks like from a high level.

Transitioning to PostgreSQL Factor

About one year ago, I started to use pine cone as my primary factor database for AI applications. However, as some of the applications transition from proof of concept to production, managing the data that is active in the factor database becomes more important. This prompted me to reconsider my data pipelines and factor databases, leading me to research alternative ways of working with these factors.

Comparison Between PG Factor and Pine Cone

As I was doing some research, I saw others running into issues with pine cone and switching to PG Factor, which got me interested. PG Factor is an extension for a PostgreSQL database used to perform similarity search. A comparison done by superbase outlined that PG factor is faster than pine cone, which led me to start experimenting with it to understand the pros and cons.

Demonstrating the Setup

I’m definitely not an expert on this but I have found that from a speed and simplicity perspective, PG factor is the clear winner for me. Let’s walk through some examples to showcase the difference in setup and performance between pine cone and PG Vector.

Pine Cone vs. PG Vector Setup

I’m going to load up a python session and load some data to demonstrate the differences between pine cone and PG Vector.

Pine ConePG Vector
Split all the documents in the indexConnect to the database and create a store for similarity search
Load data by connecting with pine cone account using an API keyQuery PG Vector object to perform similarity search
Run a similarity searchRetrieve the first chunk of the ebook and calculate the time it takes to run the query

Performance and Benefits

The results of the experiments showed that PG Factor is almost twice as fast as pine cone, which makes it a more efficient choice for managing data in AI applications. It also provides better control and visibility of the data within the application, making it a preferred option.

Custom PG Factor Surface

I built a custom PG Factor surface with a custom similarity search with scores that allows for performing a similarity search over multiple collections, providing a powerful way to manage and query data effectively.

Setting Up PostgreSQL Database

To set up a postgreSQL database, you can use managed services like super base or deploy your own locally or on a server. Experimenting with these options is essential to understand their benefits and feasibility for your specific use cases.

Conclusion

Using PG Factor as a vector database has proven to be faster, more efficient, and more manageable compared to dedicated vector databases like pine cone. This experience has given me valuable insights that I am sharing with you to help improve your AI projects. If you’re working on large language model applications, I highly recommend exploring PG factor and giving it a shot to see if it aligns with your requirements. If you’re interested in more AI lessons or lessons learned working with generative AI projects, make sure to subscribe for more content. 📈

About the Author

About the Channel:

Share the Post:
en_GBEN_GB