Mapping shopping journeys with the help of AI

Moonsift is currently building the world’s first shopping Copilot for the entire internet. Understanding how people shop online, their habits and all the interactions that take place on the ‘path-to-purchase’ is a key part of achieving this. We have therefore been modelling the shopping journeys of some of Moonsift’s ‘power’ users with the aim of better understanding:

The intricate details of how people discover new products online
How much time and effort (searching, filtering, scrolling and clicking) goes into finding a potential purchase

If you haven’t used Moonsift before, it’s a browser plugin and app that is used by thousands of online shoppers to keep track of all the items they’re considering buying. They can save products directly from any retailer or marketplace in the world, compare them all side by side in one curated space and share their decision making process with others.

Here’s what the month of September looked like for one typical Moonsift user, let’s call her Shopper X, who agreed to us sharing her shopping journey here.

An overview of September

When researching what to buy online, our data shows that it’s not uncommon for people to browse tens of thousands of products over many days, weeks, months and sometimes even years before landing on the item they want to buy.

This particular user browsed over 15k products from 80 different retailers in September alone. Of these 15k items she:

Clicked on 439 to consider in more detail
Shortlisted 97 products to her Moonsift collections.

Doing the maths, this means that of the 15k products she browsed, less than 0.7% were considered worthy of shortlisting for further consideration. To gain a clearer picture, let's explore where she dedicated most of her browsing time:

Here are some insights from the above graphs:

Frequency: Shopper X undertook some level of shopping research on 19 out of the 30 days in September. On the days she was actively browsing, she viewed on average 800+ items.
Retailer Loyalty vs. Exploration: While she has clear preferences, as seen with "www.marksandspencer.com" and "www.libertylondon.com", the significant portion labelled "Other (74)" suggests that it takes a large number of retailers to meet her discovery needs.
Large Marketplaces vs. Niche Sites: With Amazon only at 10.3% this might suggest that while she does visit major retail platforms, she still spends the majority of time on smaller stores. This is typical of our shoppers.
Weekly habits: The regular intervals between browsing spikes might hint at a pattern or routine in her shopping behaviour. Interestingly, Tuesday (particularly the 12th, 19th and 26th of September), was by far her preferred day for shopping.

Which products are being looked at?

Given the wide range of retailers and products we support, understanding and visualising this shopping data can be tough. To make it easier to figure out what this user was looking for, we made a “word cloud” using the product descriptions she checked out in September. The bigger the word in the cloud, the more often it showed up in the descriptions she viewed.

Here’s a couple of things we learned about Shopper X’s preferences just by looking at the simple figure above:

👗 Fashion-Focused: Words like "Midi Dress", “Shirt”, "Jacket" suggest a strong focus on clothing in September.
🧵 Material and Textures: Words like "Silk," "Wool," "Textured," and "Organic" give an idea about the X’s preference towards specific materials or textures when selecting products.
🛍️ Product Features: The presence of words like "Floral", "Sleeved", "Pleated", and "Oversized" might hint at specific design or style features that X is interested in.
🎨 Colour Preferences: The appearance of "Brown" and "White" suggests a preference to these colours.

Moonsift is uniquely positioned to learn about these preferences as it supports customers on their entire shopping journey from start to finish.

What it Takes to Find the Perfect Shirt

Based on the word cloud above we learned that in September, one of Shopper X’s missions was to find a shirt. In fact she looked over 2,300 but only ended up saving 24. Curious about this, we focused on one of her busiest shopping days and narrowed our search to "shirts" alone.

To delve deeper into what made certain shirts stand out to Shopper X, we used the way generative AI understands products, in a process known as “embedding”. Simply put, embedding involves converting each product into a set of numbers that capture its essence. Imagine trying to describe a shirt by its colour, fabric, style, and brand. Embedding does this, but in a mathematical way, transforming these descriptions into coordinates in a high-dimensional space.

Visualising data in this high-dimensional ‘embedding’ space is not possible for humans, but that's where UMAP (Uniform Manifold Approximation and Projection) comes in. UMAP helps us reduce these high dimensions to a 2D or 3D space, making it possible to visualise and interpret the data. Think of it like translating a complex idea into a simple graph. By using UMAP, we can gain insights into what exactly caught Shopper X's eye, helping us understand her unique shopping preferences.

In this 3D dimensionality-reduced space the grey dots are ‘products they viewed’ and the yellow dots are ‘ products they saved to Moonsift’. Some interesting observations:

Clusters of Interest: There are distinct clusters of shirts within the graph. These clusters suggest groups of products with similar attributes or features. Notably, the saved (yellow) dots are primarily located within certain clusters, indicating a preference or interest in those specific attributes.
Saved vs. Unsaved: The yellow dots are not randomly dispersed throughout the space. This suggests that the decision to save a product is not arbitrary but is influenced by certain features that these products share.

To dive deeper into the characteristics that make certain products stand out, we embedded product attributes such as "short sleeves" into the same multidimensional space we have embedded the products themselves - to allow for comparison.

To do this comparison we used the dot product, a mathematical operation that takes two equal-length sequences of numbers. In our case, this is the attribute embedding (“short sleeves'') and the product embedding (one of the shirts Shopper X looked at). It then returns a single number which can provide a measure of similarity.

Therefore, when we take the dot product of the product embedding with specific attributes, we essentially measure how closely aligned a product is to a particular attribute. The higher the result, the more the product possesses that attribute. We can also say that we are projecting the product into dimensions defined by attributes.

To illustrate this, here’s a simple example. These are some of Shopper X’s shirts when projected into the dimensions of “dark blue” and “light blue”.

It seems there are a mix of long and short sleeve shirts in her browsing too let’s project into “sleeve length” space and see what that looks like:

It’s clear that two distinct clusters have formed. Showing us that our multi-modal model has a great understanding of this attribute.

It would be interesting to see if this user was on two distinct journeys when looking for long and short sleeve shirts or if she was looking for them at more or less the same time. To investigate this we can connect each product with a line in a shade of blue. Each product is connected to the previous product in order of time (the shade of blue also gets darker with time). We can then check to see if there is a general transition of lines from one cluster to the next.

It seems like she was looking for both at the same time because the lines move rapidly back and forth between the long and short sleeve clusters.

What's next?

We are only just scratching the surface here on what we can learn about shopper's discovery journeys by applying the semantic understanding that is now possible with large multi-modal models. We are already working on exploring these journeys in more depth and have created tools that allow Moonsift users to explore the embedding space shown here to supercharge their shopping. These tools will play a key role in the foundations of Moonsift’s AI Shopping Copilot.

Moonsift is currently building the world’s first shopping Copilot for the entire internet. Understanding how people shop online, their habits and all the interactions that take place on the ‘path-to-purchase’ is a key part of achieving this. We have therefore been modelling the shopping journeys of some of Moonsift’s ‘power’ users with the aim of better understanding:

The intricate details of how people discover new products online
How much time and effort (searching, filtering, scrolling and clicking) goes into finding a potential purchase

If you haven’t used Moonsift before, it’s a browser plugin and app that is used by thousands of online shoppers to keep track of all the items they’re considering buying. They can save products directly from any retailer or marketplace in the world, compare them all side by side in one curated space and share their decision making process with others.

Here’s what the month of September looked like for one typical Moonsift user, let’s call her Shopper X, who agreed to us sharing her shopping journey here.

An overview of September

When researching what to buy online, our data shows that it’s not uncommon for people to browse tens of thousands of products over many days, weeks, months and sometimes even years before landing on the item they want to buy.

This particular user browsed over 15k products from 80 different retailers in September alone. Of these 15k items she:

Clicked on 439 to consider in more detail
Shortlisted 97 products to her Moonsift collections.

Doing the maths, this means that of the 15k products she browsed, less than 0.7% were considered worthy of shortlisting for further consideration. To gain a clearer picture, let's explore where she dedicated most of her browsing time:

Here are some insights from the above graphs:

Frequency: Shopper X undertook some level of shopping research on 19 out of the 30 days in September. On the days she was actively browsing, she viewed on average 800+ items.
Retailer Loyalty vs. Exploration: While she has clear preferences, as seen with "www.marksandspencer.com" and "www.libertylondon.com", the significant portion labelled "Other (74)" suggests that it takes a large number of retailers to meet her discovery needs.
Large Marketplaces vs. Niche Sites: With Amazon only at 10.3% this might suggest that while she does visit major retail platforms, she still spends the majority of time on smaller stores. This is typical of our shoppers.
Weekly habits: The regular intervals between browsing spikes might hint at a pattern or routine in her shopping behaviour. Interestingly, Tuesday (particularly the 12th, 19th and 26th of September), was by far her preferred day for shopping.

Which products are being looked at?

Given the wide range of retailers and products we support, understanding and visualising this shopping data can be tough. To make it easier to figure out what this user was looking for, we made a “word cloud” using the product descriptions she checked out in September. The bigger the word in the cloud, the more often it showed up in the descriptions she viewed.

Here’s a couple of things we learned about Shopper X’s preferences just by looking at the simple figure above:

👗 Fashion-Focused: Words like "Midi Dress", “Shirt”, "Jacket" suggest a strong focus on clothing in September.
🧵 Material and Textures: Words like "Silk," "Wool," "Textured," and "Organic" give an idea about the X’s preference towards specific materials or textures when selecting products.
🛍️ Product Features: The presence of words like "Floral", "Sleeved", "Pleated", and "Oversized" might hint at specific design or style features that X is interested in.
🎨 Colour Preferences: The appearance of "Brown" and "White" suggests a preference to these colours.

Moonsift is uniquely positioned to learn about these preferences as it supports customers on their entire shopping journey from start to finish.

What it Takes to Find the Perfect Shirt

Based on the word cloud above we learned that in September, one of Shopper X’s missions was to find a shirt. In fact she looked over 2,300 but only ended up saving 24. Curious about this, we focused on one of her busiest shopping days and narrowed our search to "shirts" alone.

To delve deeper into what made certain shirts stand out to Shopper X, we used the way generative AI understands products, in a process known as “embedding”. Simply put, embedding involves converting each product into a set of numbers that capture its essence. Imagine trying to describe a shirt by its colour, fabric, style, and brand. Embedding does this, but in a mathematical way, transforming these descriptions into coordinates in a high-dimensional space.

Visualising data in this high-dimensional ‘embedding’ space is not possible for humans, but that's where UMAP (Uniform Manifold Approximation and Projection) comes in. UMAP helps us reduce these high dimensions to a 2D or 3D space, making it possible to visualise and interpret the data. Think of it like translating a complex idea into a simple graph. By using UMAP, we can gain insights into what exactly caught Shopper X's eye, helping us understand her unique shopping preferences.

In this 3D dimensionality-reduced space the grey dots are ‘products they viewed’ and the yellow dots are ‘ products they saved to Moonsift’. Some interesting observations:

Clusters of Interest: There are distinct clusters of shirts within the graph. These clusters suggest groups of products with similar attributes or features. Notably, the saved (yellow) dots are primarily located within certain clusters, indicating a preference or interest in those specific attributes.
Saved vs. Unsaved: The yellow dots are not randomly dispersed throughout the space. This suggests that the decision to save a product is not arbitrary but is influenced by certain features that these products share.

To dive deeper into the characteristics that make certain products stand out, we embedded product attributes such as "short sleeves" into the same multidimensional space we have embedded the products themselves - to allow for comparison.

To do this comparison we used the dot product, a mathematical operation that takes two equal-length sequences of numbers. In our case, this is the attribute embedding (“short sleeves'') and the product embedding (one of the shirts Shopper X looked at). It then returns a single number which can provide a measure of similarity.

Therefore, when we take the dot product of the product embedding with specific attributes, we essentially measure how closely aligned a product is to a particular attribute. The higher the result, the more the product possesses that attribute. We can also say that we are projecting the product into dimensions defined by attributes.

To illustrate this, here’s a simple example. These are some of Shopper X’s shirts when projected into the dimensions of “dark blue” and “light blue”.

It seems there are a mix of long and short sleeve shirts in her browsing too let’s project into “sleeve length” space and see what that looks like:

It’s clear that two distinct clusters have formed. Showing us that our multi-modal model has a great understanding of this attribute.

It would be interesting to see if this user was on two distinct journeys when looking for long and short sleeve shirts or if she was looking for them at more or less the same time. To investigate this we can connect each product with a line in a shade of blue. Each product is connected to the previous product in order of time (the shade of blue also gets darker with time). We can then check to see if there is a general transition of lines from one cluster to the next.

It seems like she was looking for both at the same time because the lines move rapidly back and forth between the long and short sleeve clusters.

What's next?

We are only just scratching the surface here on what we can learn about shopper's discovery journeys by applying the semantic understanding that is now possible with large multi-modal models. We are already working on exploring these journeys in more depth and have created tools that allow Moonsift users to explore the embedding space shown here to supercharge their shopping. These tools will play a key role in the foundations of Moonsift’s AI Shopping Copilot.

All Guides

Start your Wishlist

All Guides

Start your Wishlist

Mapping shopping journeys with the help of AI

Mapping shopping journeys with the help of AI

An overview of September

Which products are being looked at?

What it Takes to Find the Perfect Shirt

What's next?

An overview of September

Which products are being looked at?

What it Takes to Find the Perfect Shirt

What's next?

Fashion Search using Composed Image Retrieval (CIR)

The Agent-API Bottleneck

Mapping shopping journeys with the help of AI

What's needed for AI to solve the product discovery problem