LANGUAGE

Data Science for Fraud Detection

5.9.2017 | 10 minutes of reading time

What is fraud and why is it interesting for Data Science?

Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business, there is the potential for one party scamming the other.
With an ever-increasing use of the internet for shopping, banking, filing insurance claims etc., these businesses have become targets of fraud in a whole new dimension. Fraud has become a major problem in e-commerce and a lot of resources are being invested to recognize and prevent it.

Traditional approaches to identifying fraud have been rule-based. This means that hard and fast rules for flagging a transaction as fraudulent have to be established manually and in advance. But this system isn’t flexible and inevitably results in an arms race between the seller’s fraud detection system and criminals finding ways to circumnavigate these rules.
The modern alternative is to leverage the vast amounts of Big Data that can be collected from online transactions and model it in a way that allows us to flag or predict fraud in future transactions. For this, Data Science and Machine Learning techniques such as Deep Neural Networks (DNNs) are the obvious solution!

Here, I am going to show an example of how Data Science techniques can be used to identify fraud in financial transactions. I will offer some insights into the inner workings of fraud analysis, aimed at non-experts to understand.

Synthetic financial datasets for fraud detection

A synthetic financial dataset for fraud detection is openly accessible via Kaggle. It has been generated from a number of real datasets to resemble standard data from financial operations and contains 6,362,620 transactions over 30 days (see Kaggle for details and more information).

By plotting a few major features, we can already get a sense of the data. The two plots below, for example, show us that fraudulent transactions tend to involve larger sums of money. When we also include the transaction type in the visualization, we find that fraud only occurs with tranfers and cash-out transactions and we can adapt our input features for machine learning accordingly.

Fraudulent transactions tend to involve larger sums of money. This plot shows the distribution of transferred amounts of money (log + 1) in fraudulent (Class = 1) and regular (Class = 0) transactions.

Fraud only occurs with tranfers and cash-out transactions. This plot shows the distribution of transferred amounts of money (log + 1) in different transaction types for fraudulent (Class = 1) and regular (Class = 0) transactions.

Dimensionality reduction

In preparation for machine learning analysis, dimensionality reduction techniques are powerful tools for identifying hidden patterns in high-dimensional datasets. In addition, we can use them to reduce the number of features for machine learning while preserving the most important patterns of the data. Similar approaches use clustering algorithms, like k-means clustering.

The most common dimensionality reduction technique is Principal Component Analysis (PCA). PCA is good at picking up linear relationships between features in the data. The first dimension, also called the first principal component (PC), reflects the majority of variation in our data, the second PC reflects the second-biggest variation and so on. When we plot the first two dimensions against each other in a scatterplot, we see patterns in our data: The more dissimilar two samples in our dataset, the farther apart they will be in a PCA plot. PCA will not be able to deal with more complex patterns, though. For non-linear patterns, we can use t-Distributed Stochastic Neighbor Embedding (t-SNE). In contrast to PCA, t-SNE will not only show sample dissimilarity, it will also account for similarity by clustering similar samples close together in a plot. This might not sound like a major difference, but when we look at the plots below, we can see that it is much easier to identify clusters of fraudulent transactions with t-SNE than with PCA. PCA and t-SNE can both be used with machine learning.

Here, I want to use dimensionality reduction and visualization to perform a sanity check on the labelled training data. Because we can assume that some fraud cases might not have been identified as such (and are therefore mis-labelled), we could now advise to take a closer look at non-fraud samples that cluster with fraud cases.

Dimensionality reduction techniques in fraud analytics. The plots show the first two dimensions of PCA (left) and t-SNE (right) for fraudulent (Class = 1) and regular (Class = 0) transactions.

Which Machine Learning algorithms are suitable for fraud analysis?

Machine learning is a broad field. It encompasses a large collection of algorithms and techniques that are used in classification, regression, clustering or anomaly detection. Two main classes of algorithms, for supervised and unsupervised learning, can be distinguished.

Supervised learning is used to predict either the values of a response variable (regression tasks) or the labels of a set of pre-defined categories (classification tasks). Supervised learning algorithms learn how to predict unknown samples based on the data of samples with known response variables/labels.

In our fraud detection example, we are technically dealing with a classification task: For each sample (i.e. transaction), the pre-defined label tells us whether it is fraudulent (1) or not (0). However, there are two main problems when using supervised learning algorithms for fraud detection:

Data labelling: In many cases, fraud is difficult to identify. Some cases will be glaringly obvious – these are easy to recognize with rule-based techniques and usually won’t require complex models. Where it becomes interesting are the subtle cases; they are hard to recognize as we don’t usually know what to look for. Here, the power of machine learning comes into play! But because fraud is hard to detect, training data sets from past transactions are probably not classified correctly in many of these subtle cases. This means that the pre-defined labels will be wrong for some of the transactions. If this is the case, supervised machine learning algorithms won’t be able to learn to find these types of fraud in future transactions.
Unbalanced data: An important characteristic of fraud data is that it is highly unbalanced. This means that one class is much more frequent than the other; in our example, less than 1% of all transactions are fraudulent (see figure “Synthetic financial dataset for fraud detection”). Most supervised machine learning classification algorithms are sensitive to unbalance in the predictor classes, and special techniques would have to be used to account for this unbalance.

Synthetic financial dataset for fraud detection. Fraud cases are rare compared to regular transactions; in the simulated example dataset less than 1% of all transactions are fraudulent.

Unsupervised learning doesn’t require pre-defined labels or response variables; it is used to identify clusters or outliers/anomalies in data sets.

In our fraud example data set we don’t trust the predictor labels to be 100% correct. But we can assume that fraudulent transactions will be sufficiently different from the vast majority of regular transactions, so that unsupervised learning algorithms will flag them as anomalies or outliers.

Anomaly detection with deep learning autoencoders

Neural networks are applied to supervised and unsupervised learning tasks. Autoencoder neural networks are used for anomaly detection in unsupervised learning; they apply backpropagation to learn an approximation to the identity function, where the output values are equal to the input. They do so by minimizing the reconstruction error or loss. Because the reconstruction error is minimized according to the background signal of regular samples, anomalous samples will have a larger reconstruction error.

For modeling, I am using the open-source machine learning software H2O via the “h2o” R package. On the fraud example data set described above, an unsupervised neural network was trained using deep learning autoencoders (Gaussian distribution, quadratic loss, 209 weights/biases, 42,091,943 training samples, mini-batch size 1, 3 hidden layers with [10, 2, 10] nodes). The training set contains only non-fraud samples, so that the autoencoder model will learn the “normal” pattern in the data; test data contains a mix of non-fraud and fraud samples. We need to keep in mind, though, that autoencoder models will be sensitive to outliers in our data in that they might throw off otherwise typical patterns. This trained autoencoder model can now identify anomalies or outlier instances based on the reconstruction mean squared error (MSE): transactions with a high MSE are outliers compared to the global pattern of our data. The figure below shows that the majority of test cases that had been labelled as fraudulent indeed have a higher MSE. We can also see that a few regular cases have a slightly higher MSE; these might contain cases of novel fraud mechanisms that have been missed in previous analyses.

This plot shows reconstruction MSE (y-axis) for every transaction (instance) in the test data set (x-axis); points are colored according to their pre-defined label (fraud = 1, regular = 0).

Pre-training supervised models with autoencoders

Autoencoder models can also be used for pre-training supervised learning models. On an independent training sample, another deep neural network was trained – this time for classification of the response variable “Class” (fraud = 1, regular = 0) using the weights from the autoencoder model for model fitting (2-class classification, Bernoulli distribution, CrossEntropy loss, 154 weights/biases, 111,836,076 training samples, mini-batch size 1, balance_classes = TRUE).

Model performance is evaluated on the same test set that was used for showing the MSE of the autoencoder model above. The plot below shows the predicted versus actual class labels. Because we are dealing with severely unbalanced data, we need to evaluate our model based on the rare class of interest, here fraud (class 1). If we looked at overall model accuracy, a model that never identifies instances as fraud would still achieve a > 99% accuracy. Such a model would not serve our purpose. We are therefore interested in the evaluation parameters “sensitivity” and “precision”: We want to optimize our model so that a high percentage of all fraud cases in the test set is predicted as fraud (sensitivity), and simultaneously a high percentage of all fraud predictions is correct (precision).
An optimal outcome from training a supervised neural network for binary classification is shown in the plot below.

Results from training a supervised neural network for binary classification. The plot shows the percentage of correctly classified transactions by comparing actual class labels (x-axis) with predicted labels (color; fraud = 1, regular = 0).

Understanding and trusting machine learning models

Decisions made by machine learning models are inherently difficult – if not impossible – for us to understand. The complexity of some of the most accurate classifiers, like neural networks, is what makes them perform so well. But it also basically makes them a black box. This can be problematic, because executives will be less inclined to trust and act on a decision they don’t understand.

Local Interpretable Model-Agnostic Explanations (LIME) is an attempt to make these complex models at least partly understandable; With LIME, we are able to explain in more concrete terms why, for example, a transaction that was labelled as regular might have been classified as fraudulent. The method has been published in “Why Should I Trust You? Explaining the Predictions of Any Classifier” by Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin from the University of Washington in Seattle. It makes use of the fact that linear models are easy to explain; LIME approximates a complex model function by locally fitting linear models to permutations of the original training set. On each permutation, a linear model is being fit and weights are given so that positive weights support a decision and negative weights contradict them. In sum, this will give an approximation of how much and in which way each feature contributed to a decision made by the model.

Code

A full example with code for training autoencoders and for using LIME can be found on my personal blog:

Was this post helpful?

LANGUAGE

Likes

Blog author

Shirin Elsinghorst

Team Lead & Consultant Data/AI

Do you still have questions? Just send me a message.

fromShirin Elsinghorst

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Dieser Artikel begleitet meinen Vortrag The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren, den ich am 20.10.2020 auf der data2day gehalten habe. Datenvisualisierung ist ausschlaggebend für Verständnis und Kommunikation Datenvisualisierung...

Data
Data Science

19.10.2020 | 11 Minuten Lesezeit

Shirin Elsinghorst

Simple Deep Learning mit Amazon SageMaker

In unserem neuesten codecentric.AI-Video geben wir eine kurze Einführung in Amazon SageMaker und zeigen, wie man damit schnell und einfach ein Bildklassifikationsmodell trainieren kann, das Brillenträger von Nicht-Brillenträgern unterscheidet. Mit...

Big Data
AWS
Cloud
Data
Machine Learning
Künstliche Intelligenz
Python

11.7.2018 | 5 Minuten Lesezeit

Shirin Elsinghorst

Oliver Moser

Vergleich von Text-to-Speech-Systemen

Computer haben angefangen mit uns zu reden! Dahinter stecken sogenannte Text-to-Speech-(TTS-)Systeme. Durch neuronale Netze, Deep Learning und viele Trainingsdaten sind die in den letzten Jahren immer besser und besser geworden. In manchen Fällen kann...

Data

28.6.2018 | 5 Minuten Lesezeit

Shirin Elsinghorst

rOpenSci Unkonferenz 2018 und eine Einführung in TensorFlow Probability...

Am 21. und 22. Mai hatte ich die Ehre, an der rOpenSci Unkonferenz in Seattle teilnehmen zu dürfen. Es war ein großartiges Event, bei dem ich viele tolle Menschen kennenlernen konnte. Hier berichte ich über das Event und das Projekt, an dem ich dort ...

Data
Machine Learning

18.6.2018 | 6 Minuten Lesezeit

Shirin Elsinghorst

Künstliche Intelligenz Initiative: „codecentric.AI“

Im April 2018 haben wir bei codecentric eine AI (Artificial Intelligence bzw. KI = künstliche Intelligenz) Initiative gestartet. Unter dem Projektnamen „codecentric.AI “ werden wir versuchen zu zeigen, was heute mit künstlicher Intelligenz möglich ist...

Computer Vision
Künstliche Intelligenz
Python

10.4.2018 | 3 Minuten Lesezeit

Oliver Moser

Shirin Elsinghorst

Deep Learning Workshop at codecentric AG in Solingen

Big Data – a buzz word you can find everywhere these days, from nerdy blogs to scientific research papers and even in the news. But how does Big Data Analysis work, exactly? In order to find that out, I attended the workshop on “Deep Learning with Keras...

Big Data
Data
AI
Machine Learning

6.2.2018 | 6 Minuten Lesezeit

Shirin Elsinghorst

Looking beyond accuracy to improve trust in machine learning

Traditional machine learning workflows focus heavily on model training and optimization; the best model is usually chosen via performance measures like accuracy or error and we tend to assume that a model is good enough for deployment if it passes certain...

Data
Machine Learning
Python

9.1.2018 | 11 Minuten Lesezeit

Shirin Elsinghorst

Explore Predictive Maintenance with flexdashboard

Predictive Maintenance Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below). A common use case for...

Big Data
Data
Machine Learning

2.11.2017 | 3 Minuten Lesezeit

Shirin Elsinghorst

Social Network Analysis and Topic Modeling of codecentric’s Twitter friends...

Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles , where he is giving a comprehensive introduction to Topic Modeling. In this article I am showing a real-world example of how we can use Data...

Open Source
AI
Data
Data Science

24.7.2017 | 8 Minuten Lesezeit

Shirin Elsinghorst

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Dieser Artikel begleitet meinen Vortrag The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren, den ich am 20.10.2020 auf der data2day gehalten habe.Datenvisualisierung ist ausschlaggebend für Verständnis und KommunikationDatenvisualisierung...

Data
Data Science

19.10.2020 | 11 Minuten Lesezeit

Shirin Elsinghorst

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Data Science for Fraud Detection

What is fraud and why is it interesting for Data Science?

Synthetic financial datasets for fraud detection

Dimensionality reduction

Which Machine Learning algorithms are suitable for fraud analysis?

Anomaly detection with deep learning autoencoders

Pre-training supervised models with autoencoders

Understanding and trusting machine learning models

Code

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Simple Deep Learning mit Amazon SageMaker

Vergleich von Text-to-Speech-Systemen

rOpenSci Unkonferenz 2018 und eine Einführung in TensorFlow Probability...

Künstliche Intelligenz Initiative: „codecentric.AI“

Deep Learning Workshop at codecentric AG in Solingen

Looking beyond accuracy to improve trust in machine learning

Explore Predictive Maintenance with flexdashboard

Social Network Analysis and Topic Modeling of codecentric’s Twitter friends...

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Bessere SQL-Datenpipelines mit dbt

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten