Day 1 - The Masterplan
This is the first day in a series to solve marketing attribution. If you want to learn about why I am doing this please read my initial post.
🔗What do we need to track?
We’re already sending analytics calls that would allow me to tie everything together:
- page()(spec): every time a visitor opens a page. The payload has information about the browser referrer (if there is one)
- track()(spec) : custom events that allow you to track stuff you’re interested in: Demo Request, Newsletter Subscribe, Start Chat…
- identify()(spec) : assigns the visitor a unique identifier (customer id, email address, CRM id, …)
How analytics solutions allow you to tie together the full history of page views and/or events before someone is identified every browser gets a unique anonymousId which is added to every analytics call.
Once an identify() call is triggered (that also includes that same anonymousId) all the previous activity can be associated with the new userId. First time I learned about this trick I found this really clever.
Access to segment data🔗
For this experiment I need access to the segment data. More specifically the details that go to every page() and identify() call. Let’s discover the easiest way to get access to the raw segment data.
🔗Option 1: Data Warehouse 🤯
To get access to raw segment data you could set up a warehouse syncing all your events to your own data warehouse. After that, it’s just a question of tieing together the different records and tables to find what you’re looking for.
The problem with that approach:
- segment maintains your database schema, there are no indices set up
- segment real-time integration is $$$ (we have 2 times day)
- working with this data is hard. You need special SQL powers to link together the different tables. Look at these examples
🔗Option 2: Query an enabled destination 😐
Second idea: Get the data from one of our upstream partners (linked to the segment account).
The ones that might qualify are Customer.io, Hubspot, Mixpanel and Hull. Let’s take a look
- Customer.io. Super awesome (tech/product first) company that I am sure has a great tech stack/API. I could not find information in their API docs to get access to raw segment data.
- Hubspot: Their API is ok but to get events in Hubspot we’d have to upgrade (see below paragraph on reporting)
- Mixpanel: Access to raw segment data is available through export API only. We can’t query it so we need somewhere to store.
- Hull (we’re currently trialing) : Might work. They are storing the data in an Elastic Search cluster so searching for events is pretty much making elastic queries.
This whole experiment started with my telling my co-founder
This whole stuff shouldn’t be that expensive. Actually I don’t think it’s that hard at all. Let me try something
So this disqualifies expensive software (Hubspot subscription) or linking up an extra source (Hull) to do just that. Back to square 1.
🔗Option 3: Webhook 👍
Segment.com allows you to set up webhooks that are fired in real time. It can be used to run your own logic on certain events, enable an integration that is not currently available in the segment.com catalog or just used as a backup to store your raw segment data (in real time).
For this experiment I would set up a quick webhook that stores the data somewhere so I can process it.
Storing the data would allow me to generate a full trail of all the events for a user (and it’s linked anonymous Ids). That event stream would then be used to run the attribution analysis and answer questions like:
- Where did the visitor first come from?
- What other sources contributed to the conversion?
- What’s the last source right before the conversion happened?
- How many devices did visitor use?
Identifying Visitor Sources🔗
A big part of the Marketing Attribution challenge is knowing where visitors come from. Almost all integrations we have enabled in our segment stack, and even segment itself, have a way to do some kind of source identification.
Some examples:
The thing is that the identification models are different for every integration so there is no way to get uniform reporting.
To solve this I was thinking to use existing software for that. Here is what I found:
- Sourcebuster (JS) → sbjs.rocks
- Segment 😄 Inbound (JS) → github.com/segmentio/inbound
- Monkey Referrer (JS) → github.com/melihmucuk/monkeys-referrer
- Snowplow Ref Parser (Python) → github.com/snowplow-referer-parser
- Referrer Parser (Rails) → github.com/jetrockets/referer-parser-rails
- Referrer Parser (Scala) → github.com/HiFX/referer-parser
There is no magic there. It’s about passing the HTTP Referrer and the page url to some matchers and grouping those. The challenge here is to keep up with social networks URLs, new search engines, etc…
Considering I want to write as least code as possible I will be picking one of those libraries later.
Feeding back data 💡🔗
Knowing that we want to pass all the learnings (attribution, first/last touch) back to all the software we use (Mixpanel, Customer.io, …) this one is obvious for me → Feed the data back to segment.
The analytics events can be fired server side. The Visitor Source Identified event should be fired every time we detect a change in the visitor source while the traits per user (through identify() ) are maintained by making use of our stored pages.
The advantage here is that the different models (First touch, Last touch, decay, linear) can be added by adding additional traits for each user.
Funnel / Pipeline Reporting 📊🔗
Your CRM can probably do a lot of reporting around which sources are generating the most deals, the current open pipeline etc…
We’re using Hubspot ourselves and while we have access to great reports (such as this one) the matching logic they use to bucket visitor sources is unknown. Plus you need to load their 288kb JS bundle (visitor.js) 😱
Sidenote: if you really want to feed all your existing track() events in Hubspot CRM you need to upgrade to their premium marketing professional plans. Last time I checked you’re looking at more than 20k/year in total to see tracking events in hubspot.
Now for reporting I personally am a big fan of Mixpanel. It allows me to build reports on top of my existing tracking plan and can be enabled with 3 clicks from the integration tab in your segment account. Mixpanel allows you to use both the user properties (coming from identify()) as well as the event data and it’s properties (coming from track()).
To learn more about Mixpanel features and reports check out their knowledge base. But for this experiment i would like to see funnel reports like the one below split up by first touch, last touch and maybe assisted conversions.
The only way this could work if all important attribution data would be available in mixpanel. Getting this right would solve a lot from the wishlist described in my previous post (scroll to the ultimate solution. WTF is it with Medium and page anchors?)
- Seeing the different sources a visitor comes from → Any report grouped by user attribute First Touch or Last touch
- Understanding which touchpoints contributed to a conversion → Something like Mixpanel flow where there are events for source detection.
- Look at more than just the last touchpoint that contributed to a conversion → Mixpanel reports can apply any model as long as the user traits have the data on the models you want to run
- Link different sessions / multiple devices to a single user identifier → Solved using segment cross device tracking
- Be able to rewrite history. PQLs that ultimately end up buying are worth more than other PQLs → See previous paragraph on where does the funnel stop.
Additionally all the segment interactions are stored in our Postgres data warehouse (we’re running this on AWS Aurora) but I want to avoid interacting with it (unknown schema + no indices). For now I am considering this a data backup.
Where does the funnel stop?🔗
If you want to fully understand marketing spent you need to be able to link together real revenue (won/lost deals) to marketing activity (visits, conversions).
That means that tracking demo requests or newsletter subscribes is not enough but you need to trigger events and update user or group properties when deals change status.
The green stuff is important. I won’t go into detail on how to do that but in essence, it’s as simple as firing off track() or identify() calls as soon as something interesting changes. So every time a deal is created or gets updated in your CRM you have to make sure the following events are triggered.
The frustrating part is that there are not a lot of affordable CRM’s doing this well. We had to teach Hubspot and Segment to play together (using something like Hull.io).
Conclusion🔗
All right. No more questions on how/what/where. Tomorrow I will start writing some code to try and get a prototype up.
- a minimal service that listens for webhooks
- stores events in some kind of data store
- run a library to analyse url/referrer
- store the results of that library somewhere
- trigger track() calls when a visitor source is noteworthy
Check-in tomorrow where I will post some code and a working prototype.
Other articles in the series
05/07/2021
Day 11 - Sales Attribution
03/07/2021
Day 10 - Six months later
03/06/2020
Day 9 - Dealing with tracking/ad blockers
18/05/2020
Day 8 - Feeding in sales data
06/05/2020
Day 7 - Reporting on visitor sources
01/05/2020
Day 6 - Feeding source attribution data back to Segment.com
27/04/2020
Day 5 - Feed old events
24/04/2020
Day 4 - Run in production + API
22/04/2020
Day 3 - Cleanup & Identify Visitor Source
21/04/2020
Day 2 - Capture segment events
20/04/2020
Day 1 - The Masterplan
19/04/2020
Solving marketing attribution (using segment)