The Machinery behind Machine Learning – Part 1

19.3.2015 | 15 minutes of reading time

Machine Learning is one of the hottest topics on the web. But how do machines learn? And do they learn at all? The answer in most cases is: Machines do not learn, they optimize.

This is the first blog post in a small series about the machinery that drives most of the commonly used Machine Learning algorithms. The overall idea is to illustrate the dependance of the performance of ML algorithms on the underlying optimization algorithms. In the end I want to show that the use of appropriate minimization algorithms can have a significant positive influence on speed and even accuracy – I intend to explicitely demonstrate this for a Collaborative Filtering Recommender System.

Most data scientists are somewhat familiar with basic Machine/Statistical Learning algorithms like Linear and Logistic Regression, lots do know something about or use more sophisticated algorithms like Support Vector Machines, Neural Networks, Collaborative Filtering Recommender Systems, maybe even Restricted Boltzmann Machines, Deep Learning tools and so on. But most of these algorithms – let’s restrict ourselves to Supervised Learning here – share some general properties and rely on the same more or less hidden machinery: the toolbox of Optimization Theory. In this first posting I want to introduce the general ideas of this theory and prepare the ground for applying them to real problems. Self-contained example code in R will be available at github, see Optimization Routines . The series starts with looking at the ingredients of a rather general iterative minimization algorithm, then continues with the well-known Gradient Descent method, proceeds with the Nonlinear Conjugate Gradient and finally discusses Quasi-Newton methods, in particular BFGS. The reason for this selection of algorithms is that all of them just use information in terms of function values and gradients, they compete on equal ground – a globalized Newton method would require the second derivate, i.e. the Hesse matrix as well.

Unconstrained optimization

This part (un-?)fortunately it involves a bit of mathematical notation but I will keep it as simple as possible. Given is a function that maps points from the \(n\)-dimensional Euclidean space to the set of real numbers:

\(\displaystyle f:\mathbb{R}^n \rightarrow \mathbb{R}\)

Let’s start with a \(2\)-dimensional example. Consider points \(p=(x,y)\) in the Euclidean plane \(\mathbb{R}^2\) and the function

\(\displaystyle f(p)=f((x,y))=\frac{1}{2}\left(x^2+y^2\right) \)

that determines half of the squared length of vector \(p\), e.g. for \(p = (3, -1)\) the corresponding function value is \(\displaystyle f(p)=f((3,-1)) = 5\).

Looking for local minima

The goal of unconstrained optimization is to find local minima of such functions. The problem that has to be solved is minimizing the function \(f\) over the whole space without any constraints like e.g. nonnegativity of some variables or integrity conditions, formally written as

\(\min\ f(p),\quad p\in \mathbb{R}^n.\)

In Optimization theory we can restrict ourselves to minima because any maximum of \(f\) is a minimum of \(-f\), so in fact this is no restriction but simplifies things a lot. We do not always aim to find the global minimum in \(\mathbb{R}^n\) but instead points that take smaller function values than all of their neighbors, this is called a local minimum.

Local minima often exist even if a global minimum does not, e.g. just look at the plot of the function

\(\displaystyle g(x)=x+2\ {\sin(x)}\).

Finding the global minimum is sort of an ill-posed problem in its full generality because in a general setting we only have local information – function value and gradient at some points – that does not tell us anything about other points far away, thus we would have to check the whole space in the end in order to figure out whether a local minimum is global or not. In some important special cases we can derive this from known properties of the function, e.g. if \(f\) is convex then any local minimum is a global minimum, and if there exists an isolated minimum of the convex function \(f\) then it is the unique global minimum, as it is the case for our example function \(f\).

In order to apply the standard methods from the optimization toolbox we need some basic technical requirements, the problem and the setting have to be well-defined. Obviously there should be one or more local minima and the function has to be continuously differentiable at least once. Since this is the case in almost all supervised learning applications we can safely assume that this is granted. The first derivative describes the slope of the function at the current point. Informally, all we have to ensure is that a well-defined notion of slope exists in all points – this implies that there are no sharp corners/cusps as e.g. in the absolute value function – and that neither the function nor the first derivative have immediate jumps if we slightly change the current point.

A general minimization algorithm

Imagine now that you are located somewhere at a hill in a dry desert (ok, approximately all deserts are dry, this seems to be somewhat constituting for a desert, but you know what I mean, at least it should not be a snow desert). You know that there is one valley or canyon, maybe even several, somewhere near and you want to find a reasonable path that leads you to the bottom of such a valley. At these spots you might find water as all of us have learned in our regular IT survival trainings, especially in north facing canyons. In such a situation it would be very helpful to not have to search more or less randomly or driven by our gut feelings but to follow some rules that guarantee success. You need a recipe for rescue and this is what the meta-algorithm below will provide. In order to increase your chances of finding water at the most probable place – a local minimum – most iterative algorithms consist of an initializing procedure, that chooses the starting point, and three core iteration steps:

Initialize:

Choose a starting point – an initial guess that can either be determined by your situation – your current location in the desert – or that can be actively chosen.

Iterate until convergence:

Check a termination criterion – are we close enough to the minimum?
Find a descent direction – a direction in which the function value decreases near the current point.
Determine a step size – the length of a step in the given direction that leads to a good decrease.

Concrete realisations of this prototype algorithm typically differ in how they implement the last two points of the iteration, the choice of the descent direction and the step size rule. Sometimes, one even determines the step size before fixing the descent direction – e.g. Trust Region methods – which seems to be counterintuitive at first glance. But most minimization methods form a simplified model of the true function around the current point, and the name Trust Region already indicates that the step size then tells us something about how far we can trust our local model. Whatsoever, Trust Region methods are beyond the scope for now, and we now want to look closer at the three iteration steps of the above meta-algorithm.

Checking for optimality

This refers to the first part of the iteration – the termination criterion – and it’s a bit technical. As you might have already guessed from the first plot a candidate \(p^*\) for a minimum always has a certain property, the first derivative, i.e. the gradient has to be zero:

\(\nabla f(p^*) = 0\)

For our \(2\)-dimensional example the gradient is a vector with two components, one for each coordinate axis, and the above equation translates into the vector equation

\(\nabla f(x,y) = (x,y)^T = (0,0)^T\)

i.e. each component of the gradient has to be \(0\). The unique solution in this simple case is \(x = y = 0\), or \(p^* = (0, 0)\). The formal notation \((…)^T\) – T stands for transposed – means that despite the row notation the vector is column vector, which is the standard convention but not really important here. Some textbooks even omit this detail.
A point \(p^*\) with the property \(\nabla f(p^*) = 0\) is called critical point. In general a critical point must not be a minimum but any minimum is a critical point. To better understand why the above equation is important let’s get an intuition what happens to function and gradient near an isolated local minimum \(p^*\). When you approach \(p^*\) from a specific direction – we can choose e.g. the directions along the coordinate axes – the function values have to decrease until you reach the minimum \(p^*\) and then increase again. Obviously this has to be true regardless of the direction you are coming from, because \(p^*\) is an isolated minimum, in a small neigborhood all points have stricter larger function values. The described behaviour directly implies that the gradient has to become zero exactly at the minimum – the slope is zero – and that the gradient changes the sign in each component, i.e. due to the decrease in the one direction and the increase in the opposite direction. To see that think of a simple univariate quadratic function like

\(g(x) = x^2\).

The graph of this function is the standard parabola, the minimum is attained at \(x^* = 0\) and the gradient is given by

\(\nabla g(x) = 2x\),

thus taking negative values for all \(x<0[ latex]="" and="" positive="" values="" for="" all="" [latex]x=""> 0\), both approaching zero when \(x\) tends to zero.

Now we can come up with a first rule for deciding when to terminate the algorithm. A standard termination criterion simply checks whether the gradient is “sufficiently close” to zero as the gradient has to vanish at a minimum:

Termination criterion: Terminate if \(\|\nabla f(p)\| < \varepsilon[/latex] for a given tolerance [latex]\varepsilon > 0\).

There is no need to specify this any further here, you will see a possible implementation later on as well as some other complementary criteria.

Finding a descent direction

A search direction \(s\) is a descent direction if the function value decreases in that direction at least “at the beginning”. The search direction defines a ray, a unidrectional path, and if we make a tiny (infinitesimal) step along that path the function value has to decrease. Intuitively the term descent direction is rather self-explaining but how can we test whether a given direction is a descent direction or not? In practice we cannot make an infinitesimal step and see what happens. Fortunately there is a nice geometric intuition, and based on that that we derive a surprisingly simple test. Here are the contour lines of our example function:

Points on the same contour line (here: the circle segments in grey) share the same function value. The minimum is attained at the point \((x^*,y^*)=(0,0)\) and the function values increase with the radii of the circles. For one specific point there are gradient \(\nabla f(x,y)\), negative gradient \(-\nabla f(x,y)\) and a descent direction \(d\) given in the plot. Furthermore, you can see the tangent to the contour line at that point, and one can immediately see that the gradient is perpendicular/orthogonal to this tangent (do not mix it up with the tangent to the function!). For the geometric interpretation the tangent to the contour line plays a crucial role. Obviously it splits the \((x,y)\)-plane into two halves. We now focus on the angle \(\alpha\) between the gradient and a possible search direction. The half-plane, where the gradient points to, corresponds to angles \(\alpha < 90^{\circ}[/latex] and [latex]\alpha > 270^{\circ}\). The other one, where the negative gradient points to, corresponds to

\(90^{\circ} < \alpha < 270^{\circ}[/latex].

From the plot is immediately plausible that any search direction [latex]s\) with an enclosing angle \(\alpha\) between \(90^{\circ}\) and \(270^{\circ}\), thus pointing into the negative gradient’s half-plane as e.g. the blue \(d\), is a descent direction: There is at least a tiny path through the interior or the contour circle, where the function values decrease before they increase again later on.
So now we have figured out that the angle \(\alpha\) between the gradient \(\nabla f(x,y)\) and a search direction \(s\) reliably tells us whether \(s\) is a descent direction or not. Visually such a check is trivial in the given setting, but in higher dimensions and with less regular functions, i.e. more interesting shapes of the contour lines it is practically impossible.

Descent direction – a simple test

Fortunately there is an almost trivial computation that does the trick here. Lots of you do remember the geometric interpretation of the scalar or dot product of vectors. With \(\|a \|\) being the Euclidean norm or the length of the vector \(a\) it holds

\(\langle a,b \rangle = a^Tb = \|a \|\|b \| \cos \alpha(a,b),\)

\(\alpha(a,b)\) is the angle enclosed by \(a\) and \(b\). We now focus on the sign of this expression. Since the Euclidean norm or length of any nontrivial vector is positive (zero if and only if the vector is \((0,…,0)\)) the interesting part is the cosine. We switch to radian instead of degree because mathematically that is the natural domain of the trigonometric functions. An angle of \(90^{\circ}\) corresponds to \(\frac{\pi}{2}\), similarly, \(180^{\circ}\) corresponds to \({\pi}\) and \(270^{\circ}\) corresponds to \(\frac{3\pi}{2}\). The roots or zeros of the cosine function in the interval \([0^{\circ},360^{\circ}]=[0,2\pi]\) in radians are \(\frac{\pi}{2}\) and \(\frac{3\pi}{2}\). In the open interval \((\frac{\pi}{2},\frac{3\pi}{2})\) the cosine function is negative, thus the scalar product is negative. Conversely, for \(\alpha \in (0,\frac{\pi}{2})\cup (\frac{\pi}{2},2\pi)\) the cosine is positive. We have found our simple test based on the sign of the above scalar product:

Descent direction: Find a search direction \(s\) that satisfies \(s^T\nabla f(p)<0[ latex].<="" center=""/>

Determining the step size

There is a variety of methods available for this task. It is rather critical, e.g. in some settings there are very strict requirements necessary in order to make an algorithm work, but we will not go into detail here. We assume that we have found a reasonable descent direction [latex]d\) and now we need to decide how long the step in that direction shall be. Since we are only searching along a single direction, this is referred to as line search. It is essentially a one-dimensional problem and a first idea would be to directly find the next minimum in the given descent direction, an exact line search. In some cases this is even necessary in order to make the whole minimization algorithm work or at least for proving good convergence results but it requires solving an one-dimensional optimization problem in every iteration.
For algorithms like Gradient Descent we can rely on one of the most basic procedures, the so-called Armijo rule, an inexact line search method. Historically, it has been the very first non-exact stepsize rule. The original idea needs a slight technical modification called widening but for simplicity we omit this here. Later we will see more advanced stepsize procedures anyway. The Armijo rule takes two parameters, \(0< \sigma < 1[/latex], and [latex]0 < \rho < 0.5[/latex]. Now we test for [latex]\ell = 0,1,2,...[/latex] the Armijo condition:

[latex]f(p+\sigma^{\ell} d) < f(p) +\rho \sigma^{\ell}\nabla f(p)^Td.[/latex]

The first [latex]\ell\) that passes this test defines our stepsize \(\sigma^{\ell}\) for the current iteration. We formulate our first stepsize rule

Stepsize: Choose \(\sigma^{\ell}\) as stepsize according to the Armijo rule.

We aim at generating some progress in the descent direction \(d\) and we already know that if we make a tiny step in this direction then the function has to decrease. But in practice we do not want to make tiny steps. We want to choose a step as large as possible (whatever that might mean). Therefore, we always try a full gradient step, we test the stepsize \(1=\sigma^0\). But depending on the current point this might be too ambitious and we have to accept a smaller step size, and sometimes it is not ambitious enough and we could use larger stepsizes. The latter can be realised by a technique called widening that leads to better theoretical and practical properties for Armijo’s rule (efficiency).
But what exactly is the Armijo condition going to tell us? Let’s answer this question graphically.

The Armijo condition, i.e. the above inequality, compares two quantities: the function value on the left side and the cryptical value on the right side. The left side is represented by the black curve, the true function \(f\), and the concrete value \(f(p+\sigma^{\ell}d)\) is the function value when we make a step in direction \(d\) with stepsize \(\sigma^{\ell}\). The right side is represented by the red line. The green line is a straight line with exactly the same slope as the function \(f\) at the current point \(p\). Red and green line are very similar when you look at the defining functions. The one important difference between them is that the slope of the red line is smaller in terms of absolute value: for both lines the slope is negative and the red line is less steep which is ensured by the parameter \(0<\rho<0.5[ latex].="" But="" we="" know="" from="" Analysis="" that="" the="" function="" [latex]f[="" latex]="" are="" interested="" in="" locally="" decreases="" as="" green="" line="" near="" current="" point,="" and="" this="" implies="" funtion="" has="" to="" lie="" below the red line at least for all stepsizes [latex]t\) below some unknown threshold value \(t_s>0\). There has to be a whole interval \((0,t_s)\) of admissable stepsizes and for some finite number \(\ell\) the corresponding stepsize \(\sigma^{\ell}\) is smaller than \(t_s\) and larger than zero by definition, thus is contained in this admissable interval

\(\sigma^{\ell}\in (0,t_s)\).

Therefore, after a finite number of steps the Armijo condition is fulfilled, we have computed an admissable stepsize that guarantees a reasonable decrease.

Summary

We now have everything at hand for building our very first algorithm for unconstrained minimization of multivariate functions. We know when to stop, how to figure out whether the function decreases in a given direction or not and how to choose the length of the next step if the search direction is a descent direction.
There are several approaches how to find a good descent or search direction \(s\). We are going to start the next posting with the most obvious choice. Stay tuned…

Other articles in this series:
Machinery – Linear Regression
Machinery – Part 2

Was this post helpful?

Likes

Blog author

Stefan Kühn

Do you still have questions? Just send me a message.

fromStefan Kühn

The Machinery behind Machine Learning – A Benchmark for Linear Regression

This is the third post in the “Machinery behind Machine Learning” series and after all the “academic” discussions it is time to show meaningful results for some of the most prominent Machine Learning algorithms – there have been quite a few requests ...

Machine Learning

14.1.2016 | 10 Minuten Lesezeit

Stefan Kühn

data2day 2015 – Ein Konferenzbericht

Auch dieses Jahr war wieder ein Team der codecentric AG auf der data2day. Die Konferenz hat sich sehr positiv weiterentwickelt und hat sich in unseren Augen als fester Bestandteil der deutschen Data-, Big-Data- und Data-Science-Konferenzlandschaft etabliert...

14.10.2015 | 3 Minuten Lesezeit

Stefan Kühn

The Machinery behind Machine Learning – Part 2

Machine Learning is one of the hottest topic on the web. But how do machines learn? And do they learn at all? The answer in most cases is: Machines do not learn, they optimize. This is part 2 of the series, and after all the preparation it definitely...

Machine Learning

20.4.2015 | 17 Minuten Lesezeit

Stefan Kühn

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und ausreichend Spezialwissen möglich. Hauptsächlich große Internet-Konzerne wie Google, Apple und Facebook hatten das Geld, die Daten und die Expertise, um ...

Data
Machine Learning
Künstliche Intelligenz

6.7.2020 | 7 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

BIE Spotty – unsere Lösung beim BIE City Hackathon

Typischerweise sind bei Hackathons viele Soft- und Hardware-Entwickler zu finden, die innerhalb eines begrenzten Zeitraums versuchen, kreative und ungewöhnliche Lösungen in Form von Code und ersten Prototypen für vorher definierte Challenges zu erarbeiten...

IoT
Computer Vision
IT-Security
Machine Learning

2.7.2020 | 5 Minuten Lesezeit

Meike Wocken

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Machine Learning und künstliche Intelligenz sind aktuell in aller Munde und versprechen vielfältige Einsatzmöglichkeiten im Unternehmen. Trotzdem tun sich viele Unternehmen aktuell noch schwer, das Potential der Technologie zu nutzen. „Der Fokus liegt...

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Warum gelingt es Data-Science-Initiativen häufig nicht, einen echten Mehrwert zu schaffen? Wir haben einige Ursachen dafür ausgemacht. In diesem Blogpost stellen wir vier typische Fallen für Data-Science-Projekte vor und geben Tipps, wie Du sie umschiffen...

Machine Learning
Data
Künstliche Intelligenz
Softwareentwicklung

27.3.2020 | 11 Minuten Lesezeit

Marcel Mikl

Machine-Learning-Modelle bewerten – die Crux mit den Testdaten

Machine-Learning-Technologien lassen sich erfolgreich und praxisnah im Unternehmensumfeld einsetzen. Ein konkreter, überschaubarer Anwendungsfall und somit fokussierter Einsatz von Machine-Learning-Modellen kann dabei echten Mehrwert generieren. Dieser...

Data
Machine Learning
Data Science

25.3.2020 | 5 Minuten Lesezeit

Berthold Schulte

Deployment von Machine-Learning-Modellen mit Seldon Core

In diesem Artikel sehen wir uns an, wie wir Machine-Learning- und Deep-Learning-Modelle mit Seldon Core deployen können. Seldon Core ist eine Open-Source-Plattform, um Modelle auf einem Kubernetes-Cluster in Betrieb zu nehmen. Bevor wir uns Seldon Core...

Softwarearchitektur
Data
Künstliche Intelligenz
Machine Learning

9.9.2019 | 7 Minuten Lesezeit

Nico Axtmann

Data Science in der Praxis: Häufige Fehler und Vorgehen

In diesem Artikel gehen wir auf die Besonderheiten von Data Science in der Praxis ein. Wir konzentrieren uns auf die technischen Unterschiede, häufige Fehler und Herausforderungen. Dabei lassen wird die sozialen und kommunikativen Aspekte außen vor. ...

Agilität
Machine Learning
Data

28.8.2019 | 11 Minuten Lesezeit

Nico Axtmann

Inbetriebnahme eines scikit-learn-Modells mit ONNX und FastAPI

Dieser Artikel befasst sich mit dem Deployment eines Machine-Learning-Modells, das den Wert eines Hauses in Boston anhand gewisser Merkmale wie der Kriminalitätsrate des Bezirks und der Anzahl der Räume in einer Wohnung bestimmen kann. Im ersten Schritt...

Data
Python
Künstliche Intelligenz
Machine Learning

6.8.2019 | 3 Minuten Lesezeit

Nico Axtmann

Machine-Learning-Modelle bewerten – die Crux mit der Metrik

Ist ein Modell erst einmal trainiert, kann es auf verschiedene Art und Weise und mit mehr oder weniger komplexen und aussagekräftigen Verfahren und Metriken bewertet werden. Die Anzahl und möglichen Kriterien, ein Modell zu bewerten, sind allerdings....

Data
Machine Learning
Softwareentwicklung

1.7.2019 | 13 Minuten Lesezeit

Berthold Schulte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

The Machinery behind Machine Learning – Part 1

Unconstrained optimization

Looking for local minima

A general minimization algorithm

Checking for optimality

Finding a descent direction

Descent direction – a simple test

Determining the step size

Summary

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

The Machinery behind Machine Learning – A Benchmark for Linear Regression

data2day 2015 – Ein Konferenzbericht

The Machinery behind Machine Learning – Part 2

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

BIE Spotty – unsere Lösung beim BIE City Hackathon

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Machine-Learning-Modelle bewerten – die Crux mit den Testdaten

Deployment von Machine-Learning-Modellen mit Seldon Core

Data Science in der Praxis: Häufige Fehler und Vorgehen

Inbetriebnahme eines scikit-learn-Modells mit ONNX und FastAPI

Machine-Learning-Modelle bewerten – die Crux mit der Metrik

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten