AI in Advertising: Real-Time Bidding & Reinforcement Learning

In this guide, we discuss the application of reinforcement learning to real-time bidding for advertising.

3 years ago • 5 min read

By Peter Foy

Due to the recent advances in machine learning and data science, we've entered a new wave of advertising. Specifically, hyper-personalization, programmatic, and real-time bidding are the name of the game in the age of AI in advertising.

Here are the stats:

An estimated 72% of all display ads spending are done programmatically Statista
By 2020, US advertisers will transact nearly $69 billion in US digital display ad spending programmatically according to this study

The bottom line is, programmatic is slowly but surely taking over the advertising world.

What is Programmatic Advertising?

Programmatic advertising is the process of buying ad space in an automated fashion. Using audience data and insights, the goal of programmatic is show highly relevant ads, to the right audience, at the right time.

There are two types of programmatic:

Programmatic Direct
Real-Time Bidding (RTB)

What is Programmatic Direct?

Programmatic direct purchases ad space programmatically, but it is purchased in advance based on the advertisers required number of impressions and audience reach.

In this article, we'll focus on real-time bidding and reinforcement learning.

Stay up to date with AI

What is Real-Time Bidding (RTB)?

Real-time bidding is the automated process of buying ad display space by bidding for your target audience in real-time.

Simply put, advertisers bid for space on publisher sites in the form of a real-time auction.

Let's quickly review how advertising exchanges work.

This is obviously a simplified version of things, but it will help us understand the problem in the context of reinforcement learning.

An ad exchange sends information to the advertiser about the page content and users. This is accessed from a supply-side platform (SSP).
Advertisers place bids for these impressions, the impression (generally) goes to the highest bidder.
Demand-side platforms automate this bidding process and make it simpler to target relevant users with ads

Now that we have an overview of how advertising exchanges work, let's look at how real-time bidding works.

Here's an example of how RTB works from Sigmoidal:

When a visitor lands on a web page, the browser sends a request to an ad server, which then places that ad request on an exchange where software can bid on those impressions. Prospective advertisers (or more accurately, their software) analyze the impressions and decide how they want to bid, if at all. The exchange collects all the bids and awards the ad spot to the winner.

The impressivev part is that this whole auction process takes place in the time it takes for a web page to load.

What is Reinforcement Learning?

If you want to read a complete guide to reinforcement learning you can check out our article called: What is Reinforcement Learning? A Complete for Beginners or find our full list of reinforcement learning articles here.

In order to provide an overview of reinforcement learning, let's first look at two other sub-branches of machine learning: supervised and unsupervised learning.

Supervised Learning

In supervised learning we're trying to predict a value that already exists, this is known as the label, the target variable, or the dependent variable.

The rest of the features are known as independent variables.

The classic example of supervised learning is an app that predicts whether an image is a cat or a dog, or whether an image contains a hotdog, or not hotdog...

Most data that we collect, however, doesn't have clean labels attached to it, but we still want to make use of it...this is where unsupervised learning comes in.

Unsupervised Learning

In unsupervised learning, we have uncategorized, unlabelled data and we want to use machine learning algorithms to find patterns or structures in our data.

We then observe and learn from these patterns that the algorithm identifies.

This allows us to visualize groups of data points that we may not have otherwise known of.

But what if we don't have a pre-existing static dataset to learn from, and we need an algorithm to learn and make intelligent decisions in real-time?

This is where reinforcement learning comes in.

Reinforcement Learning

The problem that we're attempting to solve with reinforcement learning sits somewhere in between supervised and unsupervised learning.

In reinforcement learning, we have a learner, or decision-maker, that we call an agent.

The agents goal is to learn how to maximize it's long-term expected reward through interacting with its environment.

As the agent takes an action, we then get time-delayed labels that are sparse.

From these labels, which we can call rewards, the agent can learn how to operate in an uncertain environment.

Now that we have discussed an overview of reinforcement learning, let's look at how we can use it for programmatic advertising and real-time bidding.

Real-Time Bidding and Reinforcement Learning

Now we know that the key to winning the right auctions and reaching the ideal audience with programmatic advertising comes down to optimal bidding in real-time.

So why are we using reinforcement learning with RTB?

As we know supervised learning uses labeled data is used for things like image classification, and unsupervised learning uses unlabeled data but can be used for identifying structures in our data.

In the reinforcement world, however, there isn't always a perfect answer the problem at hand, like there is with the "is this a cat or a dog?" image classification problem.

In reinforcement learning, we have to instead take actions in our environment and observe which ones work and which ones don't, and then optimize this learning process.

For example, in the famous game of Go with Lee Sedol and AlphaGo, the AI (which uses deep reinforcement learning, amongst other algorithms) has to take actions in the game and observe in real-time which ones are work and which ones don't.

Real-time bidding is a similar problem.

With RTB the goal is the achieve the minimum winning bid for a particular impression.

This value is never observed, however, since Facebook and Google don't explicitly tell you won with the lowest possible bid.

As advertisers, we only know if the bid was high enough to win that particular auction.

This makes it a reinforcement learning problem because we have to take an action, which in this case is placing a bid, and we receive a time-delayed label, which is winning (or losing) the ad impression.

Over time, the goal of the reinforcement learning algorithm is to maximize expected return on ad spend (ROAS) based on our specific biding parameters.

4. Summary: Real-Time Bidding & Reinforcement Learning

As advertisers living in age of AI implementation we need to make use of the huge amounts of data collected every day in order to serve more relevant content.

In the past using reinforcement learning was really only accessible by the Google's and Facebook's of the world, but as this technology develops it is becoming much more accessible for companies of all sizes.

If you want to learn more about reinforcement learning, programmatic advertising, and real-time bidding, check out the following resources.

Further Resources

Tags:
Reinforcement Learning

public

Quantum Computing Startups Raise Record Investment - This Week in AI

public

What is Programmatic Advertising?

What is Programmatic Direct?

Stay up to date with AI

What is Real-Time Bidding (RTB)?

What is Reinforcement Learning?

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Real-Time Bidding and Reinforcement Learning

4. Summary: Real-Time Bidding & Reinforcement Learning

Further Resources

Spread the word

Quantum Computing Startups Raise Record Investment - This Week in AI

Tesla AI Day Highlights - This Week in AI

Keep reading

Guide to Deep Reinforcement Learning: Key Concepts & Use Cases

Deep Reinforcement Learning for Trading: Deploying the Algorithm at Interactive Brokers

Deep Reinforcement Learning for Trading: Using Gradient Ascent to Maximize Sharpe Ratio

Subscribe to our newsletter