Jim Sterne is the founder of the Marketing Analytics Summit (former eMetrics Summit) and co-founder and Board Chair of the Digital Analytics Association. An internationally known speaker and consultant, he is the author of numerous books, including Artificial Intelligence for Marketing, 101 Things You Should Know About Marketing Optimization Analysis, Social Media Metrics, and The Devil's Data Dictionary. He has spent more than 35 years selling and marketing technical products and has devoted all his attention to the Internet as a marketing medium since 1994.
J.S.: Let’s start with why to leverage it now: in order to learn. It is why companies should build websites in 1995, and not wait until 1998. So they can learn. It is why companies should get into social media sooner rather than later. So they can learn.
Machine learning is a very powerful tool and some vendors are already offering AI-infused capabilities. Now is the time to know where they’re useful.
The secret is that some decisions are not very important, so I don’t have to spend a lot of money to make a decision. I can flip a coin and make a decision that’s fine, move forward. Some decisions I want to give more numeric thought to. So, I’m going to create a spreadsheet, and that’s very useful for about 90% of the numeric decisions I am making. When I get more sophisticated, I want to build a statistical model, I want to do predictive analytics, I want to get some rigor involved in how I manage the data.
D.D.: There’s no doubt that machine learning is extremely beneficial to marketing analytics. How would you advise people to get started and what are the important steps to ensure good outcomes?
J.S.: Education is an important step in the process. Then we need to have a very clear question that we’re asking. It is important to define the areas where machine learning and AI make sense to me, as a marketer or a business. Specific tools and processes work for very specific things. I can’t just say, I will use machine learning to solve all my problems, it doesn’t work that way. It’s good for solving a specific problem.
Build a model that will help you get more emails opened. But then you’ll need another model to get people to click through. Then you’ll need another model to optimize the landing page, and another model to optimize the shopping cart. And these are all different and specific functions and tactics. No system will resolve all of that for you. Some people think that machine learning will solve all their problems, and they don’t have to worry about predictive analytics, they can just go straight to AI. It does not work like that.
D.D.: For digital marketing, and machine learning in marketing analytics specifically, data needs to be factored in at the early stages. Why is data important for implementing and scaling AI?
J.S.: At the top end, if I have enough data - not just quantity, but quality - I can bring machine learning to bear and I can solve problems I couldn’t solve before. And I can solve problems I hadn’t even thought of trying to solve before.
The more data you collect and the cleaner you make your data, the more opportunity there is to discover something interesting in it. Reporting is absolutely necessary because my budget depends on “hey look, I spent this much and we made these sales and we can't necessarily connect it directly, but we can infer that we’ve been getting value for money.” But if we collect more data and we can do analysis rather than just reporting, we can come up with insights.
It’s the 3 V-s of big data:
If I can increase those, machine learning can help me, and analysts can help me in seeing something that I wasn’t able to see before.
D.D.: One of the tough truths is that many companies want to get into areas like AI and machine learning, but their data foundation is a mess. We know that’s true, yet data quality is something that we don’t talk a lot about. Why is data quality important?
J.S.: Figuring out when your data is good enough is a serious challenge, especially when it comes to machine learning. Data being a mess is however a permanent problem, and it’s not an easily solvable one. It’s like network security, you always have to be working on it.
To ensure that we know what’s happening inside the model, we need to have trustworthy well-labeled data going in, and we have to evaluate the output. Because the output is going to say, for example, that the best way to get more people to open more emails is to send everybody a thousand emails in one hour. And more people will open their emails just to find out who this horrible person is. This will not help the business. It is statistically correct, the machine did great reinforcement learning to discover this output, but it doesn’t pass the smell test.
So, the human has to be there to make sure that the data going in is trustworthy and then the output makes sense.
Finding out how trustworthy the data need to be, can only be evaluated by the output. If it gives you an answer that is reasonable (and reason is something that the machine does not have), you take its recommendation. If it works, the data is good enough. If it didn’t work, maybe there’s a data problem, maybe you asked the question wrong, or maybe the model is wrong and needs to be reconfigured.
D.D.: So, it’s more about reaching a hygiene level where we say “Ok, the data foundation we have right now is good enough to get started. Because we cannot afford to wait a couple of years until we hopefully get to the level that we want.”
J.S.: Yes, Matt Gershoff always says that certainty has a cost. If you’re ok with uncertainty, flip a coin – that’s fine. But if you want to be 90% certain, you’re going to pay a lot of money for that. The difference between 80% certain and 90% is a certain cost. But the difference between 90% and 95% is a MUCH bigger cost. Is it worth it? Is the answer to your question worth the investment?
D.D.: What about output? Why is that important for machine learning in marketing analytics?
J.S.: Evaluating the output is the third pillar of leveraging machine learning successfully. It is equally important to setting clear goals and ensuring data quality.
Did I make a good model? Did the predictive analytics statistician do a good job with assumptions or did the machine learning model arrive at some interesting output? The value of the output all depends on the quality of the question and the quality of the data.