Due to the recent advances in machine learning and data science, we're entering a new wave of advertising.

Hyper-personalization, programmatic, and real-time-bidding are the name of the game in the age of AI in advertising.

Here are the stats:

  • 84% of all digital display ads spending will be programmatic by 2019 according to eMarketer
  • By 2020, US advertisers will transact nearly $69 billion in US digital display ad spending programmatically according to this study

Bottom line, programmatic has taken over.

1. What is Programmatic Advertising?

Programmatic advertising is the process of buying ad space in an automated fashion. Using audience data and insights, the goal of programmatic is show highly relevant ads, to the right audience, at the right time.

There are two types of programmatic:

  1. Programmatic Direct
  2. Real Time Bidding (RTB)

What is Programmatic Direct?

Programmatic direct is purchases ad space programmatically, but it is purchased in advance based on the advertisers required number of impressions and audience reach.

Since this article is about reinforcement learning, we're going to focus on RTB.

What is Real-Time Bidding (RTB)?

RTB is the automated process of buying ad display space by bidding for your target audience in real-time.

Put simply, advertisers bid for space on publisher sites in the form of a real-time auction.

Let's review how advertising exchanges work.

Of course this is a simplified version of things, but it will help us understand the problem in the context of reinforcement learning.

  • An ad exchange sends information to the advertiser about the page content and users. This is accessed from a supply-side platform (SSP).
  • Advertisers place bids for these impressions, the impression (generally) goes to the highest bidder.
  • Demand side platforms automate this bidding process and make it simpler to target relevant users with ads

Now that we have an overview of how advertising exchanges work, let's look at how real-time bidding works.

Here's an example of how RTB works from Sigmoidal:

When a visitor lands on a web page, the browser sends a request to an ad server, which then places that ad request on an exchange where software can bid on those impressions. Prospective advertisers (or more accurately, their software) analyze the impressions and decide how they want to bid, if at all. The exchange collects all the bids and awards the ad spot to the winner. This all happens in the time it takes a web page to load.

2. What is Reinforcement Learning?

If you want to read a complete guide to reinforcement learning you can check out our article called: What is Reinforcement Learning? A Complete for Beginners.

In order to provide an overview of reinforcement learning, let's first look at supervised and unsupervised learning...

Supervised Learning

In Supervised Learning we're trying to predict a value that already exists, this is known as the label, the target variable, or the dependent variable.

The rest of the features are known as independent variables.

The classic example of supervised learning is an app that predicts whether an image is a cat or a dog, or whether an image contains a hotdog, or not hotdog...


The 'hotdog' or 'not hotdog' is what we mean by labels.

Most data that we collect doesn't have clean labels attached to it, but we still want to make use of it...this is where unsupervised learning comes in.

Unsupervised Learning

In Unsupervised Learning we have uncategorized, unlabelled data and we want to use machine learning algorithms to find patterns or structures in our data.

We then observe and learn from theses patterns that the algorithm identifies.

This allows us to visualize groups of data points that we may not have otherwise known of.

But what if we don't have a pre-existing static dataset to learn from, and we need an algorithm to learn and make intelligent decisions in real time?

This is where Reinforcement learning comes in.

Reinforcement Learning

The problem that we're attemping to solve with Reinforcement Learning sits somewhere in between Supervised and Unsupervised Learning.

In reinforcement learning, we have a learner, or decision maker, that we call an agent.

The agents goal is to learn how to maximize it's long-term expected reward through interacting with its environment.

As the agent takes an action, we then get time-delayed labels that are sparse.

From these labels, which we can call rewards, the agent can learn how to operate in an uncertain environment.

Now that we have discussed an overview of reinforcement learning, let's look at how we can use it for programmatic advertising and real-time bidding.

3. RTB and Reinforcement Learning

So now we know that the key to winning the right auctions and reaching the ideal audience with programmatic advertising comes down to optimal bidding in real-time.

So why are we using Reinforcement Learning with RTB?

As we know supervised learning uses labeled data is used for things like image classification, and unsupervised learning uses unlabeled data but can be used for identifying structures in our data.

In the real (reinforcement) world,  however, there isn't always a perfect answer the problem at hand, like there is with the "is this a cat or a dog?" image classification problem.

In reinforcement learning we have to instead take actions in our environment and observe which ones work and which ones don't, and then optimize this learning process.

For example, in the famous game of Go with Lee Sedol and AlphaGo, the AI (which uses Deep Reinforcement Learning, amongst other algorithms) has to take actions in the game and observe in real-time which ones are work and which ones don't.

Real-time bidding is a similar problem.

With RTB the goal is the achieve the minimum winning bid for a particular impression.

This value is never observed, however, since Facebook and Google obviously don't explicitly tell you won with the lowest possible bid.

As advertisers, we only know if the bid was high enough to win that particular auction.

This makes it a reinforcement learning problem because we have to take an action -which is placing a bid - and we receive a time-delayed label - winning (or losing) the ad impression.

Over time, the goal of the reinforcement learning algorithm is to maximize expected return on ad spend (ROAS) based on our specific biding parameters.

4. Summary: Real Time Bidding & Reinforcement Learning

As advertisers living in age of AI implementation we need to make use of the huge amounts of data collected every day in order to serve more relevant content.

In the past using reinforcement learning was really only accessible by the Google's and Facebook's of the world, but as this technology develops it is becoming much more accessible for companies of all sizes.

If you want to learn more about reinforcement learning, programmatic advertising, and real-time bidding I recommend the following resources.

Resources