Do you want to tie your business objectives with your Google Analytics data better? Robert Børlum-Bach, shares how to use machine learning to distill behavioral dimensions and optimize KPIs.
TL;DR
This article describes an example of activating web analytics data based on one or more specific marketing KPIs. The metaphor of distilling, concentrating, refining is introduced to describe the process of taking multiple data dimensions and boiling them down into a concentrate.
The example case is the modeling of a propensity (likeness) score of adding a product to the cart. This score of 0-100 is added back as a custom user dimension, enabling the creation of segments and audiences based on their propensity to add a product to the cart. This both helps better targeting and bidding through ad servers, but also content optimization and personalization.
Furthermore, the created dimension distillate can be used for training the next model for another marketing KPI - thus both sharpening the media activation part and augmenting the data and understanding itself.
Objective: Distilling data to optimize KPIs
In my personal experience, the connection between the business objectives and what is measured through web and app analytics tools has often been a bit unclear, or an obscured measure of apples and oranges.
The measurement plan or solution design requirement is essential for making the connection between the objective and the data measured. In a classic maturity model, this is the foundation - the first steps that have to be set before collecting, enriching, or activating the data.
Sidenote: Google Analytics is a marketing tool, based on a classic funnel approach. And having events or goals specified as KPIs is essential.
The step after the KPI and planning phase is to collect data and to report on it, reactively. The buck often stops here. Why not use the data collected to actively improve the business objective behind the indicator?
Many business objectives are measured and supported by multiple data dimensions, that often are peripheral to the objective itself. Using machine learning models, we can distill these dimensions based on their importance to a KPI metric.
In my personal experience, the connection between the business objectives and what is measured through web and app analytics tools has often been a bit unclear, or an obscured measure of apples and oranges.
The add-to-cart propensity
In this case example, the business objective is to more intelligently use the marketing budget against a lower-funnel activity - the add-to-cart event.
We work with the concept of creating a custom data dimension, which gives us a score of how likely the user is to add a product to the cart (propensity). The data dimension’s scope is the user. Specifically, each clientId will be attributed a score of 0-100 on how likely they are to add a product to a cart.
Four generic audiences can be made for a start. Subsequent granular audiences can be made by mixing the created score with other relevant dimensions.
Users with scores from 0-25 have a very low propensity, this equals an audience segment that can either be excluded or analyzed more thoroughly. A score of 26-50 would be a low-to-mid segment, while the audience segments of 51-75 and of 76-100 are in focus for the marketing activities. Enabling adjustments of the bid activities to the expected value.
adoption
The technical bits
The setup is almost exclusively built in the Google Cloud Platform, with the exception of CRMint not being an official Google product, and where third party data is from a non-Google database.
The ingested data are from Google Analytics 360 with some joined user data from an external CRM system. The datasets are updated intraday in Big Query.
A specific data scheme is created in Big Query with the custom dimensions hypothesised having the biggest impact and attributes in creating the machine learning model. These dimensions include, but are not limited to:
- clientId (!)
- visitNumber
- totals.timeOnSite
- trafficSource.source
- device.browser
- device.deviceCategory
- geoNetwork.region
- hits.eCommerceAction.action_type
- hits.customDimensions.index
- hits.eventInfo.eventAction
The schema is used to train the model (a Tensorflow framework) - an iterative process of usually 2-3 cycles. The trained scoring model utilizes AI Platform for the subsequent deployment.
The facilitator for connecting the services is CRMint: A data pipeline tool that integrates and automates the flow - from Google Analytics 360 through GCP and to Google's advertising products (DV360, Google Ads, etc).
For marketers, the deliverables are clear: new audiences are available in the Google Analytics interface which can be pushed to the advertising products used.
The unique selling point in CRMint is reusable “workers” and a graphical user interface to better design and understand the pipeline, steps and job functions. From a personal side, there’s an integrated worker for importing the created dimensions back into the Google Analytics interface, so you don’t have to develop a Measurement Protocol or API service. Amazing stuff.
The unique selling point in CRMint is reusable “workers” and a graphical user interface to better design and understand the pipeline, steps and job functions. From a personal side, there’s an integrated worker for importing the created dimensions back into the Google Analytics interface, so you don’t have to develop a Measurement Protocol or API service. Amazing stuff.
The marketing benefit
For marketers, the deliverables for optimizing KPIs are clear: new audiences are available in the Google Analytics interface which can be pushed to the advertising products used. In this example, the full suite of Google marketing products is used. This enables better bidding in Google AdWords, more specific targeting in DV360 and creating checkout flow tests in Google Optimize.
Let the data circulate
An additional value of using a customized setup like this one, and not “just” using built-in smart bidding algorithms or similar, is that the created dimension can be used to further improve a new data scheme and model, and so on, and so forth. This creates a data augmentation loop. The distillates created will often be strong inputs for other classification models. In a cooking analogy, the distillate is the fond, the stock - which defines the dish.
In the example, a natural next step is to use the propensity score as an input for predicting the product type or product category users are most interested in.
Summary
Does this model work for KPI optimization? Yes. With a well-documented setup, reusable code and repositories in Github utilizing CRMint, the development time regarding data engineering and data science can be held to a minimum - breaking even of only a few months with more intelligent marketing spend. The additional value created, such as personalising content using Google Optimise and using the created dimension, is ready to be harvested.
About Robert Børlum-Bach:
Robert Børlum-Bach is the Head of Analytics Architecture at TV2 in Denmark. He works closely with digital analytics, data collection, data quality, and data governance, and machine learning in analytics.
organization
fall behind