DataDrivenInvestor

empowerment through data, knowledge, and expertise. subscribe to DDIntel at https://ddintel.datadriveninvestor.com

Follow publication

Recommendation System with Graphs at eCommerce and Foodtech: Flipkart, UberEats, Swiggy, Instacart, Delivery Hero, Amazon, etc.

Graph Recommendation System

Disclaimer: I don’t endorse any brand. Flipkart, Walmart, Instacart, Delivery Hero, Amazon, etc. names are used to make the project more relatable to the audience and know in which business this approach can be used. This project can be applied at any eCommerce, Grocery, and Foodtech company. In this Blog, I will be using Flipkart as our primary example.

Problem Statement

Flipkart is one of the largest eCommerce platforms with millions of products listed, 100M+ downloads, and millions of daily active users (DAU).

To improve user experience with personalization, platform diversity, and boost sales a state-of-the-art recommendation system is much needed.

1Liner(Goal): Build a recommendation system for an eCommerce platform.

Business Objective

  • Improve user experience with personalization
  • Improve platform diversity and discovery of long-tail products

Detailed Explanation <Videos> and Code — HERE

Problem Statement and Business Objective Understanding

What is Long Tail?

“The long tail is a business strategy that allows companies to realize significant profits by selling low volumes of hard-to-find items to many customers, instead of only selling large volumes of a reduced number of popular items.” — Investopedia Link

Understanding User Journey and Data Input

In an eCommerce setup, a user journey has mainly 3 components: Product Page (PDP) Visit/View, Add to Cart, and Transaction.

User Journey — Flipkart

Data Description

Data having user journeys 3 event types: PDP view, ATC, and Purchase for all the users active for 1month from Flipkart’s logs is shared with us.

  • event_time: UTC timestamp of the event
  • event_type: view, cart, purchase
  • product_id: unique id assigned to a product on the platform
  • category_id: category id of the product
  • category_code: category mapping for a product
  • brand: product brand name
  • price: the price of the product on which the event (view/cart/purchase) happened
  • user_id: unique id of the user who performs an event on the platform
  • user_session: uuid per session visit of user (session = 30mins of inactivity)

Dataset provided: raw_data.csv.gz — Contains 55M+ events entries for one month of Jan2020.

Events Distribution

Event Distribution on Platform
Data set

Expected Outcome

List of Product recommendations based on event action by the user like a product page view.

Recommendations

Product Page recommendation on parent product <Samsung Tv 32inch> — other similar products.

Why Graph for building recommendation system?

Till this point, we have a fair understanding of the problem and user journey on an eCommerce platform.

To understand the intuition of usage of graphs for this problem we need to go more into detail to understand the psyche & intent of a user on an eCommerce website.

A user visits an eCommerce site with an intent let us say it is to buy a mobile phone then the users hops/views on PDP would on different mobile phones. This journey of the user can be mapped to a graph.

Visual of the user journey of product page views. Here A, B, C, D, E, F are different products viewed in sequential order (as shown in fig). A dotted line is the separation of different sessions.

Graph of above User Behavior Sequence would look like as shown below

This is an unweighted directed graph where Node type = Product and Edge is product viewed in sequence.

Graph to Embedding

Graph construction is only half the battle but the core of the problem. If graph construction goes wrong next steps will all go to waste.

There are multiple algorithms for the graph to embed i.e. vector representation of graph nodes which in our case node = product.

Random Walk Algorithms for Graph to Embedding

  • Deepwalk
  • Node2Vec
Graphical representation of step-wise implementation.

Open Source Modules Used:

  • To generate Randomwalk; I highly recommend trying Pecanpy Open Source Module.
  • RandomWalks to Embedding — used Gensim Word2vec with Skip-gram with negative sampling

Visualization —

Once we got a vector representation for each product on the platform. We used the UMAP dimensionality reduction algorithm for 2d and 3d vector visualization.

UMAP ( Uniform Manifold Approximation and Projection )

Uniform Manifold Approximation and Projection created in 2018 by (Leland McInnes, John Healy, James Melville) is a general-purpose manifold learning and dimension reduction algorithm.

UMAP is a nonlinear dimensionality reduction method, it is very effective for visualizing clusters or groups of data points and their relative proximities.

The significant difference with TSNE is scalability, it can be applied directly to sparse matrices thereby eliminating the need to applying any Dimensionality reduction such as PCA or Truncated SVD(Singular Value Decomposition) as a prior pre-processing step.

2d visualisation
3d visualisation

From the above two visuals, we can observe that many similar products are clustered together within their product category.

Recommendation O/P—

For a given product id <parent product id> generate a list of similar product recommendations.

For example <17303153> parent product id is of apparel.shoes.sandals → similar product recommendation is computed by lookup of nearest neighbors using approximate nearest neighbor algorithms ScaNN or FAISS.

Similar Product Recommendation

Conclusion

We convert clickstreams to a co-occurrence graph and then apply random walk techniques to get vector representation for each of the products on the platform. Generate recommendations for each <parent> product by lookup for the top 10 nearest neighbors i.e. similar products to the parent product.

I hope you learned something new from this post. If you liked it, hit 👏 and share this with others. Stay tuned for the next one!

Connect, Follow or Endorse me on LinkedIn if you found this read useful. To learn more about me visit: Here

Video Course with Code Walk Through — PART1 || PART2

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response