Contextual Offer Recommendation Engine (CORE) - Personalized Offer Recommendations

Business Context

Imagine walking into a Target store or browsing Target.com, and the offers that you see seem like they were picked just for you. This isn’t just a happy coincidence; it’s the result of a fine blend of machine learning and retail strategy.

Target Circle is Target’s loyalty program where members receive personalized spend-based offers called Target Circle Bonus (TCB). CORE recommends customized TCB offers with specific spend and reward dollar amounts for Target Circle members. Multi-trip-based spend threshold offers involve a series of shopping experiences where each trip’s spending must exceed a certain dollar threshold to earn a reward of a specified value.

Our Goal

Our goal is to increase guest engagement and drive trips to Target by determining the most effective offers for our guests.

Solution Overview

Target developed a new system called Contextual Offer Recommendation Engine (CORE), used to recommend personalized offers to each Target Circle guest. Under the hood, CORE is powered by a contextual multi-arm bandit (CMAB) model built on top of a rich custom feature set including transactions, promotions, and guest behavior that optimizes for guest engagement including offer adds and redemption.

How CORE Works

Our contextual multi-arm bandit (CMAB) algorithm includes the state of the environment in the decision-making process, allowing context-specific decisions. CORE employs a combination of matrix factorization techniques and CMAB to generate pertinent offers.

1. Non-Negative Matrix Factorization (NNMF)

We start by pulling historic guest offer interactions to construct an interaction matrix. This matrix is highly sparse as individual guests might have only a few interactions with Target Circle offers.

NNMF is used to reduce sparsity in interactions. This is used because of its proficiency in uncovering latent features that represent underlying guest preferences and offer attributes. This method is particularly beneficial as it can capture the nonlinear relations in the data, thereby providing a deeper insight into user-offer interactions.

We apply NNMF to factorize the guest-offer matrix into matrices W (user matrix) and H (offer matrix).

The factorization can be represented as follows:

V ≈ W × H^T

where:

In the next steps, the offer latent features (H) will be extensively leveraged as bandit’s per-arm features in the CMAB approach, aiding in the fine-tuning of personalized offer recommendations.

The reverse computation can be carried out to find an approximate interaction matrix I’ using the factorized matrices as:

I’ = W × H^T

This I’ is the approximation of the original interaction matrix, which retains most of the significant information from the original matrix. Each offer becomes a single arm bandit and guest features become context.

Matrix Factorization Process: From sparse interaction matrix V to dense approximation I' using NNMF
Matrix Factorization Process: From sparse interaction matrix V to dense approximation I' using NNMF

Aided by the dense interaction data I’, the neural networks foster a stable environment where the loss function aptly gauges the divergence between predicted and actual values. This not only guards against overfitting but also improves the model’s accuracy over successive iterations.

2. CORE’s CMAB Workflow

CMAB excels in environments where the reward distributions of actions are not known a priori and must be estimated from observed outcomes. The algorithm iteratively refines these estimates, maximizing the expected reward. CMAB leverages contextual guest information compared to standard multi-arm bandits to ultimately provide more accurate recommendations.

Additionally, CMAB is adept at handling scenarios with inherent uncertainties, where outcomes of actions are unknown. Due to the sparse nature of guest-offer interaction data, the algorithm employs risk-reward balancing techniques, enhancing decision accuracy as it gathers more data.

Key Components:

Contextual Multi-Arm Bandit Framework: Environment-Agent interaction with E-Greedy algorithm
Contextual Multi-Arm Bandit Framework: Environment-Agent interaction with E-Greedy algorithm
3. Neural Epsilon Agent with Two Towers + Common Tower

A neural Epsilon-Greedy agent is a type of reinforcement learning agent that combines the Epsilon-Greedy algorithm with a deep neural network. The neural network is used to approximate the expected reward for each action, based on the current context. One of the benefits of the Epsilon-Greedy algorithm is that it is simple to implement and easy to tune.

CORE’s deep learning agent has 3 main components:

Network Architecture:

4. Maximizing Efficacy in Recommendations

The E-Greedy algorithm works by selecting actions in one of two ways:

Epsilon is a hyper-parameter that determines the balance between exploration and exploitation. A high value for epsilon will result in more exploration, while a low value will result in more exploitation.

In uncertain environments, it’s crucial to explore different options to gain more information. CMAB incorporates an exploration mechanism, which helps in discovering the effectiveness of various offers despite limited information.

Balancing the explore and exploit tradeoff is similar to balancing a bias vs. variance tradeoff. High levels of exploration might result in less-than-optimal offers in the short term, whereas a low exploration rate can create an echo effect, where the recommendation model becomes restricted to its previous predictions for future learning.

A/B Testing - Results

For testing the model, we used a robust A/B experimentation framework built on a multi-variate stratified sampling method. We split the initial audience list into two groups:

A holdout group is kept separate from both Variant and Control for incremental sales measurement. The primary metric of interest here is the opt-in rate, i.e. the percentage of guests who opted in or added the offer divided by the total number of guests who received the offer. The secondary metric here is the Completion Rate, defined as the percentage of guests who completed the offer from the initial pool of guests who opted in.

Results:

Technical Implementation

We used PySpark for efficient data handling and TensorFlow for machine learning, with model training and scoring orchestration on Kubeflow. This enabled a streamlined data preparation and model deployment process.

Impact & Recognition

Patent Information

Next Steps

The Offer Personalization team’s goal is to make shopping with Target more engaging and fun by presenting offers that guests are excited about, turning every trip into a uniquely tailored experience. This work has taken the mantel from personalization at a cohort level to a state of hyper-personalization at a guest level. The current implementation also allows for enhancements to bring multi-objective models, where many metrics can be optimized at once.


This project represents cutting-edge work in production reinforcement learning for personalization at scale. Read the full blog post for more details!