In this series of articles I’m talking about the 5 steps of digital marketing attribution. Classification, Pathing, Attribution, Valuation and Optimisation. In the last article I covered the details of classification so today I’m going to discuss Pathing in more detail.

If you’ve read the previous article you’ll know that the goal of classification is to take a sequence of hits on our website and add a column to specify which marketing channel was responsible for that hit. Here’s some sample tracking data along with the channel classification.

VisitorID, URL, Referrer, VisitStart, Event, UUID, CHANNEL
1846,,, TRUE, NULL, 41984, SEO

The first thing we want to do is stitch together all the rows related to a single visitorID so we can get a path for any device which has been used to browse our website. The volume of data involved could be significant depending on your website, but the output you’re looking for is something like this…

81539,,, TRUE, NULL, NULL, SEO
81539,,, TRUE, NULL, NULL, PPC

BTW, we should be keeping some date/time information along with this row information, probably also the device type, and anything else we might want to correlate to marketing activity.

Now we’ve got an individual visitors browsing history we have a few choices on how to process the data. For now we’ll consider the case where we only have a single conversion that we care about. Let’s just take all of the touchpoints with a date/time from 30 days before conversion up until the conversion event, and build these into a path. With this sample data the path is simply


Brilliant, pathing complete. Well; not really. There’s a bunch of things we can do to improve this path, and improve the flexibility of our pathing algortihm.

A Unified Customer ID

Let’s go on a quick perpendicular discussion about unified customer Ids. Back in the olden days it was quite common for traditional businesses to start an ecommerce department, and have a completely separate set of IT infrastructure for this new department. Often this would mean that a customer in store was never linked up to the same customer using the website to make a purchase.

People realised pretty quickly that this was a bit bonkers and the best retailers now operate combined tracking across all their customer touchpoints. This is known as the unified customer journey, and at the heart of this style of marketing is a unified unique identifier (UUID) for every customer.

More recently, in the digital world, we’ve noticed that people have an infuriating tendancy to use more than one internet connected device. Traditionally this wasn’t the case, and it was easy to assume that a single visitor was a single customer. These days you’ll often find customers using their smartphone, their tablet and their desktop before making a purchase. So now we have three different VisitorIds, but really they’re all the same person, and we should link them all up to the same UUID.

Cross-device attribution

Moving on to how this affects attribution, you can see that linking each device (visitorID) to the same UUID is going to be crucial to understanding which marketing channels have brought in converting visitors. So how do we work out that a customers UUID should be matched to a visitorID?

There are two main ways to do this. First up you can use heuristics like location, behaviour, interests etc. to try to match two visitorIDs as the same person. This type of advanced guessing is known as probabilistic matching. It’s a bit of a mugs game to be honest…

Moving on, there are more precise methods for knowing that multiple visitors are the same customer.
a) A huge 3rd party database that links the devices together
b) A way to identify your customer on all the devices they use

The huge 3rd party databases do exist, and the ones you’d probably want to get your hands on most are the Facebook, Google and Amazon ones. Sadly, unless you’re very good at electronic espionage it’s unlikely you’re getting any of this data. You might be able to use services built on top of them (Atlas, DFA, etc.) but you’re not going to get to the actual data.

So lets focus on b), because that’s how Facebook, Google, Amazon, etc. got their databases in the first place.

The trick to linking devices together is to give your customers a reason to identify themselves on multiple devices, or to trick them into identifying themselves on multiple devices. It depends how ethical you want to be, but pretty straightforwards methods involve giving your customer a better service once they’ve logged in. This really isn’t difficult to do and some of the best digital experiences depend on identification to work well.

As an example think of amazon. You can browse on multiple devices without logging in, but if you want to purchase, or modify your orders, or change your settings, you need to login. That allows amazon to link your devices together. Indeed, inside the amazon settings page you can see which devices they have linked together with your account. Another good example of the benefits of identification is netflix. Lets say that I’m watching a programme on my mobile phone on the way home from work. I don’t get to the end before I get off the bus, and I want to finish watching the programme on my TV. I load netflix on my TV and I’m immediately given the option of continuing that programme from the point I’d got to on my mobile device. Superb! If only the BBC iPlayer could do this!

There are of course, sneakier ways to capture a link to someone’s devices. Imagine a scenario where a customer researches our products on their desktop and mobile devices, and then creates an account and purchases from their desktop device. We now know, for sure, that the desktop device is related to that customer. We send them an email where they can see their invoice and check delivery status, and the links in that email contain a unique identifier attached to the customer. The customer clicks the link to check delivery status from their mobile phone. This now opens the website in their mobile browser and because the URL contains their unique identifier, your tracking system can capture it. They haven’t even logged in, but we’re now certain that their mobile device is linked to their customer record.

Linking the paths

Once we know that two devices belong to a single user we can produce a unified conversion path by merging the marketing toughpoints for the connected VisitorIDs. The example I showed earlier was a limited path with two marketing touchpoints. But perhaps a month later I identify another device that belongs to the same customer. I might also discover that the second device was in use before the conversion event, and now I might find that I have 5 marketing touchpoints leading to conversion, 2 on the desktop device and 3 on their mobile.

In terms of data how do we support stitching paths together? First off we need to maintain a mapping between each customers UUID, and the visitor ID for the browsing session that is related to it. Every time we get an identifiable event, we check this table to see if a relationship is already present, and if not, we add it to the mapping table.

Now when we rerun our attribution system we capture all of the paths for the visitor IDs associated with a single UUID. This gives us the cross-device stitched conversion path which we’ll use for our attribution modelling.

Other conversion events

It may be that for our customers we actually care about multiple conversion events. Imagine that a firm has a problem with customer retention. Lots of users complete only a single purchase, so our business strategy is now focused on producing return visits and additional purchases. This gives us a new conversion event “second purchase” which we can also analyse to see which marketing channels are effective at encouraging repeat purchases.

To support this it makes sense to build a more flexible pathing system. Instead of extracting the previous 30 days path each time we consider a conversion, let’s just store the entire path for a client in a table. Then we can snip it at any point in their journey to consider whichever microconversion we’re interested in. This really classes as an optimisation but adds enormous flexibility to our attribution system.

Clients who share devices

A quick word on one of the challenges you’ll face with device stitching. It’s not unusual for a couple to share tablets or desktop computers, and it’s quite possible that you’ll find multiple UUIDs for a single trackingID. At this point you can either decide that you’ll ignore these paths for both clients, or add it to the path of the most recently identified user, or the path of the next identified user, or just add it to both clients paths. There’s no right answer here, the truth is, this data tends to be a bit messy so at some point it becomes best effort!

I hope you enjoyed this quick skim across the joys of creating marketing paths from web tracking events, and if there’s other things you wanted to add/ask, please use the comments thread below!

Next time we’ll talk about how to convert these customer paths into attribution data.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

Spam Protection *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>