Similar Products Recommendation and Ranking of Products
Overview
- Recommendation engines are omnipresent nowadays which plays a very important role in buying decision of a customer.
- Word Embedding by BERT, Word2Vec, TF-IDF are ultra-popular which are used for performing a variety of Natural Language Processing tasks.
- In this blog we aim to explain our approach of recommending and ranking products to users using BERT on Product pages.
Introduction
Recommendation is an important part on a e-commerce platform which has always helped customers in their buying decision. It is a powerful acquisition channel and enhances the customer’s experience.
Let me show you a multivitamin tablet (Neurobion Forte Tablet) on our platform.
In this article, we are going to build our own recommendation system for similar products and rank similar products. But we’ll approach this from a unique perspective. We will be sharing our two different versions of similar product recommendation. Let’s dive straight into it.
I have covered a few concepts in this article that you should be aware of. I recommend taking a look at these:-
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Corpus-based and Knowledge-based Measures of Text Semantic Similarity
Table of Content
- Approach (Data Preparation)
- Catalog details ( OTC products knowledge that we have)
- Which Embedding word be Best?
2. Version1: TFIDF based approach.
3. Version2: How BERT has helped in Ranking our Similar Product List?
- Cold Start Problem
- Why BERT?
Approach:
We aim to recommend customers OTC (Over the Counter) Products to our customers. For each OTC product we have a lot of side information that uniquely define a product this is our based data.
Catalog Details:
- SKU (Stock Keeping Unit) Name
- SKU Description
- Brand Name
- Manufacturer Name
- Product Variant
- Product Tags
- Product Usage Details
- Product Type
- Product Flavour
- Product Colour
- Potency
- Target Age Group
- Gender Details
All these feature together help us to define a product. These features are in form of text which require to be embedding using a suitable document embedding.
Word Embedding: Collective term for models that learned to map a set of words or phrases in a vocabulary to vectors of numerical values.
Document embeddings are different from word embeddings in that they give you one embedding for an entire text, whereas word embeddings give you embeddings for individual words.
Q. Which word embedding would be best to represent our text features?
→ For our use case TF-IDF based product embedding worked the best.
Version1:- TF-IDF (In Early 2018)
We have around 1Lakh+ OTC products for which we have TF-IDF embedding.
To figure out most similar products list we used Cosine Similarity based approach (explained in detail here: Corpus-based and Knowledge-based Measures of Text Semantic Similarity).
We picked top30 most similar product, ranked them based on similarity score.
This approach worked very well, similar product list recommendation was welcomed by all. Within a day or two we got around 5k+ clicks on the list.
Version2:- Ranking Approach based on BERT
The above approach had a draw back that the ranking of products was based on similarity score. Hence, to improve the user experience we planned to rank products based on CTR data.
On a product page P1 from the Similar product list the product which has the maximum CTR should be shown on top.
COLD START Problem:-
We could have sorted the products directly on basis of CTR data but daily on our platform around 100+ products get added for which we need a time to gather accurate CTR data.
Attempt 1:
Based on the TF-IDF product features embedding we decided to train CTR of product (P2) from similar list shown on product page (P1).
A TF-IDF based embedding is very large in size and even after dimensionality reduction it wasn’t able to learn the CTR nature.
Attempt 2:
Q. Why BERT?
BERT: Bidirectional Encoder Representation from Transformer has ability to incorporate both syntactic and semantic sense of the text. Visualizing and Measuring the Geometry of BERT (proved)
BERT gives word embedding and to get document embedding different pooling strategies are used pooling by max or mean are mostly commonly used.
Besides pooling we can also use RNN to get Document Embedding. The RNN takes the word embeddings of every token in the document as input and provides its last output state as document embedding. You can choose which type of RNN (GRU or LSTM) you wish to use.
We fine-tuned BERT on our medical domain dataset (Universal Language Model Fine-tuning for Text Classification a domain specific fine-tuned model works better than pre-trained model) and using that got product features word-embedding which has both syntactic and semantic detail of text. Another big advantage of BERT is that a variable length textual data could be represented as a fixed length vector. Q. How we got our perfect match embedding?
We performed various experiments to get document embedding and got best results with Fine Tuned-BERT Embedding on [CLS] token. On a Fine Tuned Model the hidden state of [CLS] is a good sentence representation.
→ We trained an Ensemble Regressor(Xgboost) on CTR data with Fine Tuned-BERT Embedding on [CLS] token (First Token) and got Model Score 87.09% on Test Data.
Result:
This model has learned to predict CTR (of product P2 shown on Product P1) with feature as Fine Tuned-BERT Embedding (of P1 and P2) solving our problem of ranking both old and new products on are similar product recommendation list.