Matrix Factorization for Ad Recommendation

14.3.2018 | 7 minutes of reading time

This blog post describes how matrix factorization can be applied to the problem of ad targeting. It draws from my experience of developing a machine-learning-based solution for this task for the real-time performance marketing company twiago together with other colleagues from our Data Science team.

The problem: Ad targeting

Let us begin by framing the problem: We have an idle website floating in the endless vastness of the internet. Like this:

The page contains two div containers A and B. Whenever a user visits our page, we want to place ads in these containers. For example, such an ad could be a banner ad as in the following picture:

Typically, we would have many of these possible ads. The decision which ad should be placed in which ad container is handled by an ad server that handles the request for ad material sent by the web page. This is depicted in the following illustration:

This all should happen in a blink of a moment and finally the page renders to the users containing two ads.

Now the next interesting thing that could happen: the user clicks on one of the ads. If so, the advertiser will pay a certain amount of money to the operators of the ad server and the page owner.

There is a whole industry for this and in general a huge market for online advertising.

As we have seen in our toy example, there are two major players: the publisher (web page owner) and the advertiser (the person advertising goods and services). Typically a publisher does not host one single web site, but several sites under his domain. A newspaper or online sports magazine might be a good example. Our ad server would also serve a network of publishers. This is illustrated here:

There is a mutual benefit here: Advertisers can publish on various sites, potentially reaching different audiences, while publishers will benefit from “crowd wisdom” (or: collaborative filtering ) as ads are exposed to more users and will finally flock to the right ad spaces.

The performance indicator: Click-through rate

We can measure the performance of an ad by its click-through rate (CTR). It describes the probability of the ad receiving a click when it is shown. Mathematically it is given by this ratio:

CTR = (Number of clicks) / (Number of impressions)

In this formula you should think of the denominator as the number of “unique views by unique users”. The click-through rate depends on the ad space and the ad. The number of clicks and impression counts are with respect to a fixed time window (one week in our case).

Side note: What technically counts as an impression can be defined by certain industry standards (e.g. those provided by www.iab.com ) and is part of the implementation of the ad serving technology. We can ignore the technical details and trust the ad server to provide a systematic way of counting “impressions” and providing this information to us.

The problem restated: Filling missing values

We can rephrase the problem stated above as in the following terms: we have to fill in missing values of the click-through rate matrix and rank ads by the thus predicted click-through rates. Of course, the predictions should somehow align with the already known values.

Assuming that we have M ad space in our complete network of publishers and N ads we could deliver, we can write down a matrix of shape (M x N) that records for each possible combination (Ad, AdSpace) the currently observed click-through rate. This matrix is called click-through matrix (for short: CTR matrix).

Of course, this matrix would contain missing values for the following reason: not every ad has been (and likely never will be) shown on all possible ad spaces. So certain combinations have not occurred yet. As said, we would like to fill in these values with the constraint that we do not want to deviate too much from the known values with our predictions.

Side note: Typically this matrix would be rather sparse. But thanks to the fact that we target ad spaces (rather than individual users) on websites with a lot of traffic, impressions and click counts for individual users aggregate rather quickly to yield a matrix with a good degree of filling in the course of a week which was the time frame for our batch job.

The solution: Matrix factorization

We are interested in predicting these missing values since they might reveal how well a yet unknown combination of ad and ad space would perform and whether it would make sense to bring them together.

How can we achieve this? A popular technique for such collaborative filtering tasks is matrix factorization, which is what we also use here.

Matrix factorization in a nutshell

To outline the geometric idea behind this, recall how the matrix product X of a matrix W with the matrix H is given in algebraic terms:

This formula computes the entry X[i,j] at position (i,j) as dot product of the i-th row of W with the j-th column of H. This is the algebraic definition usually given. A more appealing geometric picture can be obtained as follows:
This says that the j-th column of the product X is given by weighting the columns of W with the corresponding weights from the j-th row of H and summing everything up.
Depending on your taste, this might be a little bit too much of math. But let us give a interpretation of this in more layman terms: whenever we take a data matrix X of records and write it as product X=WH as in the following picture

we can use the second factor H as the new representation of the data and reconstruct the original data use the first factor W as seen here:

Now for a given matrix X there might be several ways to decompose X as product WH. A point of interest is to choose the pair (W, H) such that the number of rows of H (equivalently: the number of columns of W) is strictly lower than the number of rows of the original matrix X. This would mean that we reduce the number of features in our representation and correspond to data compression.
The idea is that during the compression we will learn what information present in X is important and which is not. But how would we learn this compressed representation?
Once again, machine learning (a.k.a. mathematical/numerical optimization) comes to the rescue. If you liked the math above, this picture is for you:

We attempt to approximate X as WH (denoted as X with a hat in the picture). For this, we try to minimize the error J(W,H). This error is given by comparing the squared errors between the known entries of X and the approximation WH. (Remember our click-through matrix has missing values; so this fits into the overall picture.) One can use the Alternating squares algorithm (ALS) to compute such an approximation.

The deployment

We delivered a software solution that runs a weekly batch job in the cloud. It first fetches the impression and click statistics from the existing ad server and then computes a table of favorable ad and as space combinations for the next week. For this, it used the mathematical optimization we outlined above.

The final solution is deployable to Amazon Web Serives (AWS) as JAR. In this concrete case we use a single EC2 instance to run the weekly batch job.

A note on the software development part: we decided to use Apache Spark and Scala to code everything. Less so because we had to deal with big data (we are looking at a few GBs) but rather because it allows us to write ETL pipelines and machine learning components using a single ecosystem or API. (Of course, this is also possible with other solutions.)

The outcome

In a live test we observed a performance improvement of 15 – 20 % compared to the existing system based on expert rules. This is quite good and shows the potential of using a machine-learning-based approach in this case.

Summary

In this blog post we explained how matrix factorization can be used to predict missing values from a data matrix and saw how to apply this technique to the problem of ad targeting.

Was this post helpful?

Likes

Blog author

Daniel Pape

Do you still have questions? Just send me a message.

fromDaniel Pape

Spark 2.0 – Datasets and case classes

The brand new major 2.0 release of Apache Spark was given out two days ago. One of its features is the unification of the DataFrame and Dataset APIs. While the DataFrame API has been part of Spark since the advent of Spark SQL (they replaced SchemaRDDs...

27.7.2016 | 7 Minuten Lesezeit

Daniel Pape

Spam classification using Spark’s DataFrames, ML and Zeppelin (Part 1)

This is the first entry in a series of blog posts about building and validating machine learning pipelines with Apache Spark . Its main concern is to show how to explore data with Spark and Apache Zeppelin notebooks in order to build machine learning...

Scala
Big Data
Data
Machine Learning

22.6.2016 | 15 Minuten Lesezeit

Daniel Pape

Calculating Pi with Apache Spark

Apache Spark is a system for cluster computing and part of the increasingly popular SMACK stack . The aim of this blog post is to provide a beginners introduction on how to set up a mini Spark cluster of virtual machines (VMs) using Vagrant and to run...

Big Data
Machine Learning

16.4.2016 | 9 Minuten Lesezeit

Daniel Pape

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

In der Welt der Cloud-Technologie und insbesondere bei AWS (Amazon Web Services) ist die effiziente Verwaltung von Ressourcen von entscheidender Bedeutung, um unnötige Kosten zu vermeiden. Dieser Blogbeitrag konzentriert sich auf AWS S3 und die teuren...

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

CI/CD-Pipelines mit AWS CDK CodePipeline

Das Aufsetzen der CI/CD-Pipeline ist ein typischer Task in der Anfangszeit eines Projekts. Ist die Pipeline dann aufgesetzt, sind Änderungen nur noch selten notwendig. Dementsprechend wenig Routine entwickeln Programmierende im Umgang mit der Konfiguration...

Cloud
CI/CD
AWS

17.7.2023 | 4 Minuten Lesezeit

Dennis

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur.Und...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Infrastructure as Code in AWS: Keine Silver Bullet

TL;DR Es gibt keine Universalmethode. Infrastructure as Code ist ein vergleichsweise neuer Ansatz. Einige Lösungen rund um Infrastructure as Code befinden sich noch in der Entwicklung. Es gibt keinen klaren Favoriten. Die Wahl des passenden Tools hängt...

Cloud
AWS
Infrastructure as Code

13.12.2022 | 27 Minuten Lesezeit

Florian Wiech

Sören

AWS CloudFront Functions testen

Mit den CloudFront Functions bietet AWS die Möglichkeit, den Funktionsumfang von CloudFront um kleine JavaScript-Funktionen zu erweitern. AWS führt diese Funktionen direkt an den Edge-Locations aus und ermöglicht es dadurch, alle ankommenden Requests...

Cloud
AWS
Testing
Softwareentwicklung

4.10.2022 | 3 Minuten Lesezeit

Dennis

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Green Cloud: Emissionen unserer Cloud-Architektur messen

Überall wird von der Cloud geschwärmt: Grenzenlose Skalierung und unzählige Features sind bereits „out of the box“ verfügbar. Das alles gibt es zu unschlagbar günstigen Preisen. Das Thema Nachhaltigkeit kommt dabei selten zur Sprache. Rechenzentren verbrauchen...

AWS
Azure
Cloud
Google Cloud
Green IT

24.4.2022 | 6 Minuten Lesezeit

Dennis

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Strukturierung von Serverless-Anwendungen in der Cloud

Serverless ist ein Modell, bei dem Cloud-Anbieter allein verantwortlich für den Betrieb der Server-Infrastruktur sind. Compute-Ressourcen werden beim Serverless-Ansatz hauptsächlich in Functions strukturiert. Daher wird dieser Bestandteil „Functions ...

Softwarearchitektur
AWS
Cloud
Serverless

14.6.2021 | 10 Minuten Lesezeit

Jonas Verhoelen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Matrix Factorization for Ad Recommendation

The problem: Ad targeting

The performance indicator: Click-through rate

The problem restated: Filling missing values

The solution: Matrix factorization

Matrix factorization in a nutshell

The deployment

The outcome

Summary

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Spark 2.0 – Datasets and case classes

Spam classification using Spark’s DataFrames, ML and Zeppelin (Part 1)

Calculating Pi with Apache Spark

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

CI/CD-Pipelines mit AWS CDK CodePipeline

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Bessere SQL-Datenpipelines mit dbt

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code in AWS: Keine Silver Bullet

AWS CloudFront Functions testen

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Green Cloud: Emissionen unserer Cloud-Architektur messen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Strukturierung von Serverless-Anwendungen in der Cloud

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten