Better time series forecasting using expert knowledge

15.2.2019 | 6 minutes of reading time

Methods for time series forecasting have become more and more powerful in recent decades, ranging form simple linear models to complex machine learning algorithms. Nevertheless, not only the quality of the forecasts is important, but also their acceptance by the staff. Especially with automatic forecasts, there is the possibility of distrust and incomprehension among long-term dispatchers. Furthermore, long-standing senior employees in many cases have a very good overview of customer behavior, market situation and development, economic conditions and many other important factors. Therefore, it makes sense to include this expert knowledge in the predictions of machine learning algorithms.

The following blogpost will therefore show a way to include expert knowledge in the predictions of arbitrary algorithms (Python sourcecode: Maximum Entropy Example ).

Basic forecast using Facebook Prophet

We start with the famous air passengers time series that shows the monthly totals of international airline passengers between 1949 to 1960 in thousands from which we would like to predict the year 1960:

In order to do this, we use Facebook Prophet with multiplicative seasonality:

The red circles mark the forecasts for May and July 1960, which are visibly off. Fortunately, Facebook Prophet does not only provide us with the point forecasts, but also with associated Markov-Chain-Monte-Carlo samples $y^{\ast}_{i}$ from the posterior predictive distribution of each forecast step. Let’s take a look at the kernel density estimate of the posterior predictive distribution $p_{0}\left(y\right)$ of the forecast for May 1960:

Calculating the integral $\int_{-\infty}^{\infty} y \ p_{0}\left(y\right) dy \approx \frac{1}{n} \sum y^{\ast}_{i}$ yields the point forecast for May, which is $\hat{y}_{\text{May}}= 440 $. In order to improve the forecast, it would be useful if we were able to enrich the posterior predictive distribution by expert views about future events.

Mathematical background

The starting point is the Kullback-Leibler divergence:
$$\text{KL}\left[p,p_{0}\right] = \int_{-\infty}^{\infty} p\left(y\right)\text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)} dy.$$
Given the prior $p_{0}\left(y\right)$, we seek the distribution $p\left(y\right)$ that minimizes the functional $\text{KL}$, given certain constraints. Or, in other words: we are looking for the distribution $p\left(y\right)$ that has some predefined properties and comes as close as possible to our prior knowledge $p_{0}\left(y\right)$. The distribution $p\left(y\right)$ then is the called Maximum Entropy distribution. What could these constraints look like? What could the expert say?

“The probability of 400.000 or fewer passengers for next July in my view is 5%.”
$\Leftrightarrow$
$\int_{-\infty}^{400} p\left(y\right)dy \overset{!}{=} 0.05$

“We have a strong growing economy, so I think with 80% probability we will have between 440.000 and 480.000 passengers.”
$\Leftrightarrow$
$\int_{440}^{480} p\left(y\right)dy \overset{!}{=} 0.8$

“I expect 460.000 passengers.”
$\Leftrightarrow$
$\int_{-\infty}^{\infty} y \ p\left(y\right)dy \overset{!}{=} 460$

Therefore, our constraints k=1,2,..,m are of the form
$$\int_{-\infty}^{\infty} F_{k}\left(y\right) \ p\left(y\right)dy \overset{!}{=} f_{k}.$$

What does $F_{k}\left(y\right)$ mean? This is best understood by inspecting the second and third constraint-example. For the second example, it is
$$F\left(y\right) = \begin{cases}
1, \text{if} \ y \in \left[440,480\right]\\
0, \text{else}
\end{cases}$$
and for the third example, we simply have
$$F\left(y\right) = y.$$

In order to minimize $\text{KL}$ under constraints, the Lagrange multipliers $\boldsymbol{\lambda} = \lambda_{1}, \lambda_{2},…,\lambda_{m}$ have to be introduced. We arrive at the functional:

$$ L\left[p, \boldsymbol{\lambda}\right] = \int_{-\infty}^{\infty} p\left(y\right)\text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)}dy-\lambda_{1}\left(\int_{-\infty}^{\infty} F_{1}\left(y\right) \ p\left(y\right)dy – f_{1}\right)-…-\lambda_{m}\left(\int_{-\infty}^{\infty} F_{m}\left(y\right) \ p\left(y\right)dy – f_{m}\right).$$

The first step is to calculate the derivatives of $L$ with respect to $p$ and $\boldsymbol{\lambda}$ and to set them to zero. Beginning with the functional derivative with respect to $p$, we get

$$\frac{\delta L}{\delta p} = \text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)}+1 – \lambda_{1}F_{1}\left(y\right)-…-\lambda_{m}F_{m}\left(y\right)\overset{!}{=}0.$$ After resolving to $p\left(y\right)$ and normalizing the result, we arrive at the Bolzmann distribution
$$p_{B}\left(y\right) = \frac{1}{Z}p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)$$
with the normalizing constant
$$Z\left(\boldsymbol{\lambda}\right)=\int_{-\infty}^{\infty} p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)dy.$$
The partial derivatives of $L$ with respect to $\boldsymbol{\lambda}$ read
$$\frac{\partial L}{\partial \lambda_{k}} = \int_{-\infty}^{\infty} F_{k}\left(y\right) \ p\left(y\right)dy-f_{k}\overset{!}{=} 0,\ k=1,…,m.$$
As we already have calculated our normalized solution to $p\left(y\right)$, which is $p_{B}\left(y\right)$, we can insert this result into the derivatives:
$$\frac{\partial L}{\partial \lambda_{k}} = \int_{-\infty}^{\infty} F_{k}\left(y\right) \ \underbrace{\frac{1}{Z}p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)}_{p_{B}\left(y\right)}dy-f_{k}\overset{!}{=} 0,\ k=1,…,m.$$
This, however, means nothing more than: $E\left[F_{k}\right]\overset{!}{=}f_{k},\ k=1,…,m$.

We are finally there: we have to find $\boldsymbol{\lambda}$, so that the expected values of the functions $F_{k}$ match the given constraints.

As the number of constraints rises, the numerical solution to the the system of equations becomes increasingly harder to find. Due to the problem of multiple local minima, we refrain from using a gradient-based algorithm and instead use a heuristic algorithm. In our case, it is the particle swarm algorithm (Python package pyswarm).

Improving the forecasts for May and July

In this section we will make up expert assessments for May and July 1960 and show how the forecasts are affected.

The expert assessment for May:

“This May we had 420.000 Passengers and we will definitely not have fewer in May 1960 (probability 1%). Furthermore, given the numbers of the last three years, I am sure that a growth rate compared to this May between 7.5% and 15% is extremely probable (probability 80%). However, an increase of 15% or more compared to this May, in my opinion, is unrealistic (probability 1%).”

This results in the following constraints:

$\int_{-\infty}^{420} p_{B}\left(y\right)dy \overset{!}{=} 0.01$

$\int_{451}^{483} p_{B}\left(y\right)dy \overset{!}{=} 0.8$

$\int_{483}^{\infty} p_{B}\left(y\right)dy \overset{!}{=} 0.01$

The expert assessment for July:

“This July we had 448.000 Passengers. Comparing the Julys of the past five years, we can see that we had an average increase of 50 passengers per year. Due to the good economic situation, I am sure that we will at least regain this growth (probability 80%).” This yields the constraint: $\int_{498}^{\infty} p_{B}\left(y\right)dy \overset{!}{=} 0.8$.

The following two figures show the distributions of the Facebook Prophet forecasts and the associated Maximum Entropy distributions. As can be seen, the expert’s assessments lead to distributions that differ significantly from the prior distributions. Nevertheless, the Maximum Entropy distributions have the smallest possible distance to the priors, while maintaining the given constraints.

In the last figure, the forecasts which result from the Maximum Entropy distributions as well as the Facebook Prophet forecasts are shown. The RMSE of the forecasts of Facebook Prophet is 64.90. Using the Maximum Entropy approach leads to a RMSE of 30.94, which is equal to a reduction of approximately 52%.

This artificial example is intended to show that the inclusion of expert assessments, which in many cases may reflect only gut instincts or common sense, can be useful to improve the forecasts of complex machine learning algorithms. In addition, the inclusion of employee opinions may also increase the general acceptance of forecasts.

References:

Kullback, S., Information Theory and Statistics, John Wiley & Sons, 1959.
Singer, H., Maximum entropy inference for mixed continuous‐discrete variables, International Journal of Intelligent Systems, John Wiley & Sons, 2010.

Was this post helpful?

Likes

Blog author

Dominik Ballreich

Do you still have questions? Just send me a message.

fromDominik Ballreich

Can you win the stacking challenge? An example of heuristic optimization

I have come across an interesting optimization problem. The task is to stack the items of a given set of boxes of different sizes, weights, and stabilities onto as few pallets as possible. Moreover, there is a multitude of additional conditions that...

Data
Software development

27.3.2019 | 9 Minuten Lesezeit

Dominik Ballreich

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Data Governance: Wie können wir Daten demokratisieren?

“Data is the new oil” ist inzwischen ein alter Hut. Jedes Unternehmen versucht, Daten besser zu nutzen, sei es, um die eigenen Prozesse zu optimieren, die Kunden besser zu verstehen oder neue Produkte anzubieten. Dabei stellen fast alle fest: Wir haben...

Data Science

23.11.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Dieser Artikel begleitet meinen Vortrag The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren, den ich am 20.10.2020 auf der data2day gehalten habe.Datenvisualisierung ist ausschlaggebend für Verständnis und KommunikationDatenvisualisierung...

Data
Data Science

19.10.2020 | 11 Minuten Lesezeit

Shirin Elsinghorst

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Better time series forecasting using expert knowledge

Basic forecast using Facebook Prophet

Mathematical background

Improving the forecasts for May and July

References:

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Can you win the stacking challenge? An example of heuristic optimization

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Bessere SQL-Datenpipelines mit dbt

Data Governance: Wie können wir Daten demokratisieren?

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten