M.G.: At a high level, experimentation via A/B testing is a fairly accessible idea. You roll out competing options in parallel and see if any of them perform better (or different) than your existing way of doing things. At a deeper level, A/B testing has a fair amount of complex abstractions and counterintuitive thinking that underpin the underlying statistics. Treating A/B testing as a black box can often lead to confusion and unmet expectations. The black box thinking is a form of solutionism.
Solutionism is when we hyperfocus on a solution, or tool, without:
Unfortunately, this can lead to treating A/B testing as an input-output procedure that is able to:
Unfortunately, this is not the case. Most of the work, and value, in experimentation, is in:
Both 2 and 3 require at least some basic understanding of the statistical ideas underpinning A/B testing. For example, sometimes we see companies running A/B tests with 20, 30, or even more conversion objectives. Having many conversion objectives is often an indication that what exactly is being tested hasn't been fully thought through. This lack of focus tends to happen when the team has their analytics hat on, where data collection is of the Just In Case (JIC) variety. Meaning that all data is collected just in case a question comes up down the road. Taking the just-in-case approach can lead to incorrect interpretation, extra effort in gleaning insights, and ultimately a failed experiment. However, more effective A/B testing takes a Just in Time (JIT) data collection approach. Meaning that we are collecting THIS data to answer THIS specific, well-defined question.
For example, collapsing several separate conversion objectives into a single conversion goal, after running the experiment, requires care in recalculating the standard errors (a single user with two conversions will contribute more to the variance than two users each with just one conversion). It will also most likely make the original minimal detectable effect (MDE) and power, used in calculating sample size (assuming this was even done) inconsistent with the final analysis, invalidating or at least compromising the quality of the test. Even more of an issue, is the possible type 1 error control bias due to cherry-picking, based on whichever ones will show a significant result, the conversion events to use in the analysis.
This is not to say that it is always a bad idea to experiment without a strong hypothesis. There is nothing wrong with running a discovery/exploratory experiment to get an idea if certain types of interventions might have the potential to be worth it to invest more time in developing. However, it is good practice to follow up these results with a more formal confirmatory flavor of the A/B test.
In general, a solution, if it exists, will be found within the problem. So, rather than focus on the solutions (A/B testing, bandits, machine learning, etc.), we should first focus most of our energies on defining the problem and how much value we think there will be in solving it.
The value of A/B testing and experimentation comes not in the tests themselves, but in how well the company understands their customers' wants and needs, and in the skill of the people running the experiments.
Another issue with the input/output way of thinking is that it leads to the expectation that the experimentation program will directly generate an incremental profit. This is what I call the incremental value view of experimentation.
“We run X experiments and we expect Y incremental revenue.” This way of seeing experimentation calculates value based on both the direct gains and loss avoidance. Direct gains are what we tend to think of as optimization - changes made to the customer experience that increase revenue over some original experiences. Loss avoidance is a counterfactual that asks, “what decisions would we have made that would have made things worse had we not had our experimentation program in place?” Often loss avoidance makes up the lion’s share in this calculation and can make an experimentation program worth the cost and effort.
Although the incremental view is useful, I think the larger payoff is in a transformative view. The transformative view is about unlocking the full potential of the company to learn and deliver products and experiences that customers want in an ongoing way. A well-functioning experimentation program with a transformative view allows the organization to tap into unrealized productivity. Put another way, by using a process that provides strong guarantees against making catastrophic mistakes, you increase the liquidity of ideas flowing from employees into the product and customer experiences that would not have occurred without this protection. The transformative view is about culture change, which is difficult to measure explicitly but can deliver tremendous value nonetheless.
D.D.: In several industry presentations, you talk about the difference between active vs passive data collection. Would you explain a bit more what you mean by that?
M.G.: Sure. This is related to the point I made earlier. I think that the main difference between experimentation and other approaches in analytics is that experiments explicitly collect the data needed to answer a specific question - active data collection - whereas analytics tends to collect a lot of data before the questions are known, just in case it is needed to answer some currently unknown future question. In active data collection, you need to know the question before you collect the data. You can think of active data as just-in-time data.
Thinking about experiments as exercises in active data collection also helps to think about the marginal cost of data, or alternatively, the marginal value of answering a given question. So one can ask, ‘is the answer to this question going to be worth the cost in time and energy to collect the data required to answer it?'
D.D.: What are the tough truths that we don't talk a lot about in experimentation and optimization?
M.G.: It’s not the martech tools that provide the answers. Technology requires human expertise to get anything out of it - the value is in the skill of the operator. If you don’t know what you want to learn, the technology is not going to be helpful.
D.D.: You’ve helped some of the world’s largest brands optimize their A/B testing and experimentation programs. What would you say are some of the most common hurdles when implementing or expanding a testing and experimentation program?
M.G.: In addition to what I mentioned above, I would say:
D.D.: Are there any new ideas or methods in testing and experimentation coming?
M.G.: We expect that there will be a continued expansion of the experimentation/optimization tool kit. Specifically, for data-rich environments, we expect there to be greater familiarity and demand for use of the adaptive methods that Conductrics provides, such as multi-armed contextual bandits, and reinforcement learning for multi-step problems.
Interestingly, we also expect additional tools for situations for more limited data environments. For example, with browsers limiting how event data is remembered, it will become increasingly challenging to run classic Random Control Trials (RCTs) - the foundation of A/B tests. This will encourage clients to use other methods. Causal inference methods are better suited to account for the inability to assign an individual user directly to a particular treatment or experience. Conductrics is working on providing tools in the next several months that will help our clients ask causal inference questions in these situations.