Make a Wishlist

Universal wish list

Christmas list

Birthday wish list

Make a Registry

Baby Registry

Wedding Registry

Christmas list

Birthday wish list

Personal wish list

Make a Wishlist

Universal wish list

Christmas list

Birthday wish list

Make a Registry

Baby Registry

Wedding Registry

Christmas list

Birthday wish list

Personal wish list

The rise of semantic search: a comparison of vector databases

David Wood & Marcel Marais

Jul 12, 2023

In the world of ever-complex data and AI powered applications, databases have taken on a new challenge: providing a way to store and query the outputs of deep learning models. One of the latest innovations in this space is the Vector Database.

What exactly is a vector database?

A vector database is a specialised type of database designed to store high-dimensional vectors (also called embeddings), which are mathematical representations of an AI model's “understanding” of the data it’s received.

The video below and any image you see like this simply helps us to visualise embeddings in a 3D space that the human mind can interpret. In reality these embeddings are hundreds of dimension (e.g. 512D), which mean they are impossible to visualise in our 3D world.

What do vector databases enable?

Fundamentally, vector databases empower “semantic retrieval”, what this means is that instead of traditional keyword based searches, queries can be made based on the actual content of the product and context of the query.

Imagine searching for a product not just by its name or tag, but by the essence of its features (which may or may not be accurately described by the retailer but can be understood by AI) or even by a related feeling or mood. Instead of looking up a "blue cotton shirt," users might seek a product that feels "summery" or "cosy" and the system would understand and match this request semantically.

Vector databases significantly boost the speed of such complex searches.

What does this mean in e-commerce?

In e-commerce, this means that customers can find products that match their desires more intuitively and quickly, leading to a smoother and more personalised shopping experience.

In the vast world of e-commerce, we think of vectors as a unique language that gives voice to the distinctiveness of product descriptions and images.

What to Look for in a Vector Database?

These are some of our priorities when deciding on a vector database.

  • Performance ⚡: Can it efficiently query millions of products in a fraction of a second?

  • Maturity ⏳: Given the investor hype around many new vector databases, businesses would ideally want something proven and reliable

  • Ease of Use 🤷‍♂️: We're trying to avoid an overly complex setup or steep learning curve.

  • Native modelling support 🧩: An abstraction layer between the vector db and the model helps simplify code.

The contenders

✅ Pros

  • Relatively new but has large backing from investors.

  • Well-documented

  • Offers ultra-low query latency even with billions of item.

  • Provides a managed solution easing scaling and replication tasks.

  • Cloud-native with integrations in platforms like GCP.

  • Solid integration with programming languages like Python.

❌ Cons

  • Managed services come with a cost. For large volumes of data Pinecone might be a costly option

  • Closed source.

✅ Pros

  • New but financially backed

  • Fully open-source.

  • Minimal setup required to get started.

❌ Cons

  • Less mature documentation and support.

  • Less robust support for different types of AI models.

  • Deployment may be challenging. 

  • Lacks comprehensive support for multi-modal models.

✅ Pros

  • Available in both open-source and managed versions.

  • Mature with advanced monitoring and replication capabilities.

  • Comprehensive documentation and strong support for popular LLMs and multi-modal models like CLIP.

  • Offers unique search features, such as moving towards semantic concepts.

❌ Cons

  • Resource intensive if a lot of data needs to be stored


✅ Pros

  • Scalable to billions of embeddings

  • Seamless GCP integration

  • High flexibility if configured correctly

  • First class support for new Google Models (PaLM)

❌ Cons

  • Complex setup

  • Lacks inherent modeling support

  • More suited to enterprise needs which might make it expensive for startups / small businesses.