In this series of articles I’m aiming to talk about the 5 steps of marketing attribution. Classification, pathing, attribution, valuation and optimisation. Today, let’s go over classification in more detail.
As you may recall the goal of classification is to look at each touchpoint a visitor has with our website, and calculate whether or not that touchpoint was driven by a marketing channel. In a simple world a single visit would have a single initial marketing touchpoint that leads to a single conversion, but in reality you find that people will often come back to your site multiple times through multiple marketing channels, sometimes within a single session, before finally converting.
Let’s start by looking at the data that we’ll be able to recover about people who visit our website. If you’re in any way serious about digital marketing then you’ve already got about 12 different tracking pixels. Hopefully one of these is your key webtracking system. If you’re lucky you’re using a webtracking system which will allow you to download the raw row level data that shows people visiting your website. I believe you can do this with a Premium GA account, you can certainly do this with SiteCatalyst, and if you’ve rolled your own tracking SDK that awful decision just paid off! There are a number of free tracking systems that you can pick off the shelf if you’re currently on standard GA, one that I’ve seen increasing in popularity is Piwik, top tip.
Whatever system you use you’ll need access to two main things. Website visit history (pages viewed, a unique identifier to stitch a browser’s behaviour together, referral information) and conversion history (events that your business cares about).
Once you’ve got access to the tracking logs you’ll find that you have large amounts of data that look something like this…
VisitorID, URL, Referrer, VisitStart, Event, UUID 73859, woo4.me/, NULL, TRUE, NULL, NULL 81539, woo4.me/download, woo4.me/article1, FALSE, NULL, NULL 1846, woo4.me/article13, google.co.uk/, TRUE, NULL, 41984 81539, NULL, NULL, FALSE, CONVERSION_DOWNLOAD, 498327 73859, woo4.me/article1, woo4.me/, FALSE, NULL, NULL etc. etc.
Although most likely it’ll be about 100 times more horrific to look at, with 10s of variables that provide no useful information.
There’s a few key things though that I want to highlight. First off, every individual browser that’s used to view your website will have a unique visitorID. That means that the same person in Firefox and Chrome, on the same PC, will look like two visitors. That same person on their mobile phone, and their tablet, looks like two more visitors. That same person in incognito mode on two different visits, looks like two more visitors. Importantly, that same person, on two different days, on the same browser, on the same PC, still looks like one visitor!
We’ll talk about events and UUID’s in a later article, but be forewarned that we’re coming back to them.
The other thing you’ll notice here is a VisitStart marker. This indicates that a row refers to the first page view on your website from that visitor within a defined tracking period. Typically all page views that occur without inactivity periods in between them will be classed as a single visit. Most tracking systems tend to consider an inactivity period to be 30 minutes without interaction, or midnight if you’re GA (spang). This is pretty damn arbitrary but not the most absurd thing in attribution, except the GA midnight cutoff thing. That is a bit dumb…
Now before we get into the guts of classification, let’s briefly take two asides into the world of “DIRECT” and “INTERNAL”.
The mystery of “DIRECT”
A “DIRECT” visit is one where the very first touchpoint on the website has no referrer, and no other indication that it came from a paid advertising channel. This generally makes up 10-50% of all traffic to websites which is infuriating if you’re trying to understand what’s driving your traffic!
How does this happen? Well there’s a few likely candidates that could be the cause:
- Your offline marketing is so successful people just remember your URL and type it into the browser
- People have already visited your website (possibly on a different device) and remembered your URL
- People bookmarked your website and clicked the bookmark
- People clicked a natural link in an application (outlook, etc.)
I’m not sure I’d place too much weight in the brand argument because the “DIRECT” issue seems to exist for all websites whether they’re all-powerful brands like amazon, or teeny-tiny personal websites like my own. But if you’re in analogue/offline marketing it’s a nice theory.
Whatever the cause of all these direct visits, they’re not going anywhere, and it’s important we recognise these first visits as DIRECT. Later on when we look at attribution models we’ll discuss some ways of reallocating visits to earlier channels, but keep them as DIRECT for now.
The idiocy of “INTERNAL”
You may have come across a channel called “INTERNAL”. This is where a visit starts with a referring URL which is internal to your website. All this means is that someone had a tab open to your website, and then they clicked a link some hours after initially visiting.
This is not a marketing channel, it’s not even telling you anything interesting. Ignore internal, it’s just a continuation of an earlier visit.
Finally, an aside and a rant later, let’s focus back on classification. We can do a lot with the URL that’s visited and the referrer URL, so let’s start there.
First off let’s assume that you’re adding a couple of parameters to every link from your paid advertising channels. I’d suggest using something like “ch=x” where x is a channel identifier, and “ca=y” where y is a campaign identifier. To add them to the URL set your target URL as www.woo4.me?ch=1&ca=7. Now when we download the website tracking data we can quickly allocate these touches to paid channels. This approach works well for PPC, Display, Affiliates, Email, and anywhere else where you control the link URL.
For natural links to your website you’re going to need to come up with a different approach. In general google will list your pages without any additional query string in the search ranking pages, so you’ll be unable to identify natural search (SEO) traffic using the query string method. Instead you’ll need to look at the referrer URL. As we’ve already captured all our paid search engine advertising using query strings, we can now hoover up any remaining traffic that has a referrer of *google* and list this as natural search. You probably also want to look for *bing*, *baidu*, etc.!
Another source of natural links will come from social networks where you may be active. Again all paid advertising through these networks should use the query string approach for classification, but natural links are outside your control so you’ll need to rely on the referrer once again. In this case we’re looking for *facebook*, *t.co*, *instagram* etc. in the referrer strings to classify traffic as natural social.
We may also have some unique areas of our website that we’d like to class as “marketing channels” for the purpose of evaluating their ROI. For example, imagine I sell cars, and I’ve invested £2M in a webapp where users can design their own customised vehicles. I might decide to classify this URL as an additional marketing channel so I can see how many purchases occur after people have used the tool. That’ll at least help us to make a decision later about whether or not to build another version of that webapp for an upcoming model.
Finally we need to look at any remaining “VisitStart” events which aren’t currently classified to a marketing channel. If the referrer is our own website, then we’ll just ignore that event as it’s not relevant. If there’s a different website referrer that we haven’t already classified as something else, then mark that out as “natural link” or some other catch all marketing channel. Anything that has no referrer is a DIRECT hit, so let’s capture that.
Hopefully we’ve now defined a simple way to parse through this row level tracking data and build an additional column for the marketing channel that is responsible for that page view. In the next article we’ll start using this information to build up paths for our website visitors.