We are receiving quite many questions concerning the differences between the markov model and shapley value data driven attribution models. So I thought we would cover the main differences in a blog post. Regarding the sophisticated models the questions are mostly regarding the differences in Shapley value( aka. Game theory ) and the Markov model.

Of course, many are still using last-click models, but for those who take a next step and want to understand more about marketing optimisation and attribution modelling the choices are usually between the markov model or the shapley value based model. Google Attribution 360 is also based on Google’s customised version of the shapley value. Moving away from the last-click models have usually given our clients a 15%+ increase in marketing ROAS.

The rules-based multi-touch attribution models are usually: first touch, last touch, linear, half-life time decay, and U-shaped). They are useful definitely, a bigger unknown problem is many times the double attribution covered in eg. https://www.windsor.ai/spend-520-000-advertising-accomplish-absolutely-nothing/

However with the rule-based models the challenge is that they all have built in biases due to the rules one has to manually set. Eg. U shaped model, one has to pick how to distribute the credits that introduce a bias. Better is to build the models on the real historical customer journey’s so they are backed up by real data and represent the reality as much as possible.

In this post I will cover the two most popular algorithmic and data driven attribution models. Using these you can take out guessing and be data driven. This also enables you to go very granular, on eg. keyword and content levels. Also doing the attribution modelling for different segments or products separately. Forming rules here would be very time-consuming so it is better to use historical data, algorithms and machine-learning to determine the attributions on the different channels and campaigns.

There are many approaches when it comes to algorithmic attribution, but for this post, I’m going to focus on the two most popular and most widely used approaches: the **Shapley value method and the Markov chain method.**

A bit of history: The Shapley value approach to marketing attribution stems from a concept of cooperative game theory, and is named after Nobel Laureate Lloyd Shapley. At a high level, the Shapley value is smart solution that solves the problem of equitably distributing the payoff of a game among players who may have an unequal contribution to that payoff (which is similar to distributing credit for an online conversion among marketing channels). The Shapley value is the solution used by Google Analytics’ Data-Driven Attribution model, and variations on the Shapley value approach are used by most attribution and ad bidding vendors in the market today.

The Markov chain method for marketing attribution, on the other hand, has gained a lot of popularity among the data science community and is based on the concept of a Markov chain (named after the brilliant Russian mathematician Andrey Markov). A Markov process makes predictions based on movement through the states of a stochastic process. In a marketing context, this means a visitor’s propensity to convert changes as he/she is exposed to various marketing channels over time. The markov model calculates the value of a touchpoint based on the “removal effect”.

Lets assume you have already loaded your clickstream data into a sparklyr data frame called “data_feed_tbl” with columns containing your marketing touch points and a conversion indicator (in my case these are called “mid_campaign” and “conversion” respectively). Then we construct the order sequence:

# Construct conversions sequences for all visitors

data_feed_tbl = data_feed_tbl %>%

group_by(visitor_id) %>%

arrange(hit_time_gmt) %>%

mutate(order_seq = ifelse(conversion > 0, 1, NA)) %>%

mutate(order_seq = lag(cumsum(ifelse(is.na(order_seq), 0, order_seq)))) %>%

mutate(order_seq = ifelse((row_number() == 1) & (conversion > 0),

-1, ifelse(row_number() == 1, 0, order_seq))) %>%

ungroup()

Very important is to keep the non-converting customer journeys for the algorithmic models to be realistic.

## The Shapley Value Attribution Model

The Shapley value method relies on the marginal contribution of each marketing channel to weight its contribution to overall conversion. Rather than going into all of the gory details here, I recommend reading this very instructive article by Michael Sweeney which explains how marginal contributions work and how you’d calculate them manually using a simple example. However, to make this work at scale, we’re first going to calculate the Shapley value for each possible combination of marketing channels that might have participated in any given order, and to do that, we’ll have to first summarize each of the converting or non-converting sequences according to the marketing channels that participated:

# Summarizing Order Sequences

seq_summaries = data_feed_tbl %>%

group_by(visitor_id, order_seq) %>%

summarize(

email_touches = max(ifelse(mid_campaign == "Email",1,0)),

natural_search_touches = max(ifelse(mid_campaign == "Natural_Search",1,0)),

affiliate_touches = max(ifelse(mid_campaign == "Affiliates",1,0)),

paid_search_touches = max(ifelse(mid_campaign == "Paid_Search",1,0)),

display_touches = max(ifelse(mid_campaign == "Display",1,0)),

social_touches = max(ifelse(mid_campaign == "Social_Media",1,0)),

conversions = sum(conversion)

) %>% ungroup()

This bit of code gives us a big binary data frame where each conversion sequence is summarized according to the marketing channels involved (in my case, Email, Natural Search, Affiliates, Paid Search, Display, and Social Media). Because Shapley values aren’t concerned with the number of times the channel participated within a sequence, I use the “max” function rather than a “sum” function since the max value will always be “1” if a sequence converted and “0” if it did not. From there, I need to do one more summarization step to aggregate the total number of sequences and conversions that we observed from each marketing channel combination:

# Sum up the number of sequences and conversions

# for each combination of marketing channels

conv_rates = seq_summaries %>%

group_by(email_touches,

natural_search_touches,

affiliate_touches,

paid_search_touches,

display_touches,

social_touches) %>%

summarize(

conversions = sum(conversions),

total_sequences = n()

) %>% collect()

There is a good library “GameTheoryAllocation” which has a fantastic implementation of the Shapley value calculation. For the remainder we will use this library.

The “GameTheoryAllocation” library requires a characteristic function as input – basically, a vector denoting the payoff (in our case, the conversion rate) received by each possible combination of marketing channels. To get the correct characteristic function, I’ll left join my previous results with the output of the “coalitions” function (a function provided by “GameTheoryAllocation”).

library(GameTheoryAllocation)

number_of_channels = 6

# The coalitions function is a handy function from the GameTheoryALlocation

# library that creates a binary matrix to which you can fit your

# characteristic function (more on this in a bit)

touch_combos = as.data.frame(coalitions(number_of_channels)$Binary)

names(touch_combos) = c("Email","Natural_Search","Affiliates",

"Paid_Search","Display","Social_Media")

# Now I'll join my previous summary results with the binary matrix

# the GameTheoryAllocation library built for me.

touch_combo_conv_rate = left_join(touch_combos, conv_rates,

by = c(

"Email"="email_touches",

"Natural_Search" = "natural_search_touches",

"Affiliates" = "affiliate_touches",

"Paid_Search" = "paid_search_touches",

"Display" = "display_touches",

"Social_Media" = "social_touches"

)

)

# Finally, I'll fill in any NAs with 0

touch_combo_conv_rate = touch_combo_conv_rate %>%

mutate_all(funs(ifelse(is.na(.),0,.))) %>%

mutate(

conv_rate = ifelse(total_sequences > 0, conversions/total_sequences, 0)

)

Once run, this code gives me a data frame containing the conversions.

Based on this it is now easy to calculate all of the Shapley values I need for attribution weighting. Notice that I use the conversion rate as my “payoff” rather than the total conversions. I do this because usually, you’ll have fewer total conversions when multiple channels participate, but a higher conversion rate. Shapley values assume that when all “players” work together, the total payoff should be higher, not lower – so using the conversion rate makes the most sense as the payoff in this case.

# Building Shapley Values for each channel combination

shap_vals = as.data.frame(coalitions(number_of_channels)$Binary)

names(shap_vals) = c("Email","Natural_Search","Affiliates",

"Paid_Search","Display","Social_Media")

coalition_mat = shap_vals

shap_vals[2^number_of_channels,] = Shapley_value(touch_combo_conv_rate$conv_rate, game="profit")

for(i in 2:(2^number_of_channels-1)){

if(sum(coalition_mat[i,]) == 1){

shap_vals[i,which(shap_vals[i,]==1)] = touch_combo_conv_rate[i,"conv_rate"]

}else if(sum(coalition_mat[i,]) > 1){

if(sum(coalition_mat[i,]) < number_of_channels){ channels_of_interest = which(coalition_mat[i,] == 1) char_func = data.frame(rates = touch_combo_conv_rate[1,"conv_rate"]) for(j in 2:i){ if(sum(coalition_mat[j,channels_of_interest])>0 &

sum(coalition_mat[j,-channels_of_interest])==0)

char_func = rbind(char_func,touch_combo_conv_rate[j,"conv_rate"])

}

shap_vals[i,channels_of_interest] =

Shapley_value(char_func$rates, game="profit")

}

}

}

After you’ve run this bit of code, if you inspect the resulting shap_vals data frame, you can see each of the Shapley values we’ll use for attribution. In a perfect world, you would never expect to see negative Shapley values, but in reality, I often do. This is because specific marketing channels can actually hurt the conversion rate. Why? Well, some marketing channels have a proclivity to bring in a lot of unqualified traffic. For example, it’s common for display ad click-throughs to be accidental, which means that if I see a visitor to my site from a display ad click-through, I can usually predict that that visitor will not convert. The Shapley values pick up on that fact and sometimes exhibit negative values in our matrix as a result.

Finally, I can multiply the resulting Shapley values by the number of sequences observed for each channel, and voila!

# Apply Shapley Values as attribution weighting

order_distribution = shap_vals * touch_combo_conv_rate$total_sequences

shapley_value_orders = t(t(round(colSums(order_distribution))))

shapley_value_orders = data.frame(mid_campaign = row.names(shapley_value_orders),

orders = as.numeric(shapley_value_orders))

As you can see from my example dataset, the Shapley value method is relatively straightforward to implement, but it has a downside – Shapley values must be computed for every single marketing channel combination – 2^(number of marketing channels) in fact, which becomes unfeasible for more than about 15 channels. Especially when going to finer granularities taking into account keywords, content and even keyword-content combinations like we sometimes do this becomes very problematic.

The next algorithmic method I’ll cover, while more difficult to implement yourself, doesn’t suffer from this drawback.

### The Markov Chain Attribution Model

The Markov Chain approach has gained a lot of attention over the last few years, and there are tons of fantastic resources out there if you want to learn the statistics behind it including this article by Kaelin Harmon and this article by Sergey Bryl. At a high level, Markov chain advocates believe the best way to model attribution is by considering each marketing channel as a state in a Markov chain. So, if a visitor comes to the site via Email, they become part of the “Email state” which has an increased probability of conversion compared to someone who has not come into any marketing channel at all. Increases (or decreases) in conversion probability from this approach are then used as attribution weights to distribute conversions equitably.

Markov chains can be a pain to implement (especially at scale), but luckily for us, the “ChannelAttribution” R package written by Davide Altomare and David Loris makes this a lot easier. Once we have the sequences setup previously, it just requires one additional sparklyr step to prepare the data for modeling: creating a channel “stack.” A channel stack is where we take each step of the marketing channel journey for each sequence and concatenate it all together – for example, “Email > Display > Email > Natural Search” would describe a visitor moving from Email to Display, to Email again, then finally to Natural Search.

Channel stacks are simple enough to construct using the Spark function “concat_ws”. From there, I’ll filter out all of the paths that had no marketing channel touchpoints at all:

channel_stacks = data_feed_tbl %>%

group_by(visitor_id, order_seq) %>%

summarize(

path = concat_ws(" > ", collect_list(mid_campaign)),

conversion = sum(conversion)

) %>% ungroup() %>%

group_by(path) %>%

summarize(

conversion = sum(conversion)

) %>%

filter(path != "") %>%

collect()

Next, I’ll feed this data to the “markov_model” function, and it does the difficult part. One quick note about the “order” parameter – this sets the order of Markov chain you want to use. A higher order Markov chain computes probabilities based on not just the current state (a single marketing channel) but across several ordered states (combinations of channels) and assigns weighted probabilities to each combination. For example, the sequence Email > Display might have a different weight than Display > Email. Be careful not to set the Markov order too high, so you don’t overfit – typically I wouldn’t go beyond a 2nd or 3rd order Markov chain but this depends a lot on the amount of data you have.

library(ChannelAttribution)

markov_chain_orders = markov_model(channel_stacks,

"path", "conversion", order=3)

names(markov_chain_orders) = c("mid_campaign", "orders")

markov_chain_orders$orders = round(markov_chain_orders$orders)

The ChannelAttribution package is an ok implementation of Markov chains. It is a bit hard to debug as its written in C++ and is not very informative in its messages.

As always with R available RAM is an important consideration. For huge datasets, you’d have to manually construct a Markov chain using Spark SQL or a sparklyr extensions.

## Pros and cons of Markov and Shapley models

Now you might be wondering which model would be best for you. As with most machine-learning problems the answer depends very much on your situation and what assumptions you are ready to make.omfortable with.

I think the Shapley value method has a few advantages over the Markov chain method:

- It has much broader industry adoption and has been used successfully in attribution and auto-bidding platforms for years.
- It’s backed by Nobel Prize winning research.
- It takes a slightly more straightforward approach to the attribution problem in which sequence doesn’t matter – which makes it easier to implement and its results are usually more stable in practice.
- The results of the algorithm are also less sensitive to the input data.

That being said, I think the Markov chain method outshines the Shapley value method in a few ways:

- It considers channel sequence as a fundamental part of the algorithm which is more closely aligned to a customer’s journey.
- It has the potential to scale to a more considerable number of channels than Shapley value can – marginal contributions for Shapley value must be calculated 2^n times (n being the number of marketing channels), which becomes pretty much impossible beyond 15 or 20 marketing channels.

We prefer the markov model because we can take it to a very fine granularity. To the content, product and eg. keyword level.

We rely mostly on the markov model within our platform for attribution modelling and marketing optimisation. However whichever model you choose will most certainly be better than a heuristic or rule based model! Usually clients get a minimum of 15% increase in marketing ROI after moving away from the simple models.

We are happy to discuss any topics around this. Drop us a line at hello@windsor.ai or comment below!