DISH-O-TRON – Gather that DATA you must!

24.9.2020 | 11 minutes of reading time

This is the second article in our dish-o-tron series (a non-standard Deep Learning tutorial) in which we tackle one of the biggest problems in community kitchens: coming across someone else’s dirty dishes. We are facing this problem by building a state-of-the-art AI-system – the dish-o-tron.

If this is the first time you hear about the dish-o-tron and you are interested in the whole story, you might want to start with the first part.

In our conception, the dish-o-tron uses a computer vision AI model in order to detect dirty dishes, hence we require training data to produce this kind of AI-model. A brief look around reveals there is no suitable data available. This might come as a shock. However, this realization is a typical issue for many problem solvers tackling real-world problems with AI. Don’t get discouraged!

In this blogpost we start building the dish-o-tron hands-on by gathering an initial “good enough” data set for the next steps. Although we have already collected a dataset that we will share with you at some point, we need to point out that we will NOT share the link with you until you have collected your own data. In order to have the real dish-o-tron-experience, it is absolutely necessary that you gather training data yourself.

image source: pixabay.com by Dieterich01

Approach and reasoning

The high-level idea is to just start tackling the problem with an end-to-end solution. For the first prototype it is reasonable to take some shortcuts. In many cases this is a very promising approach. We do not want to start by collecting data for a few months, then train “the best” AI-model for a few weeks and finally try to set up a dish-o-tron on a kitchen sink.

Instead, we’d like to put a first version of the dish-o-tron on a kitchen sink as fast as possible and iteratively improve the solution. In this way, we can (hopefully) decide with more certainty which parts actually need improvement by taking into account real-world feedback.

So in this first article, we gather and prepare an initial data set for the dish-o-tron in a hands-on way and in three steps:

Making videos of clean and dirty kitchen sinks
Splitting the videos into images of clean and dirty kitchen sinks
Splitting the images into training, validation and test data sets

As a starting point for your data collection we provide a Google Colab notebook to follow along. A Colab notebook is a service designed by Google that allows you to run code and train models in the cloud. And the best part: it is currently free. Think of a Colab notebook as a Linux Docker container that runs a web server that can execute Python code and even allows you to use a GPU for model training. You have a temporary file system in your container where you can download stuff and install libraries etc.

You can find the Colab notebook here. (You will need a Google account to be able to run the code. Save a copy of the notebook to your Google drive to persist your changes.)

Videos of kitchen sinks

As we have already mentioned several times (sorry but not sorry!), gathering data is a key step for many problem solvers to tackle actual real-world problems because these problems typically do not start with a polished Kaggle data set. For this reason, we strongly encourage you to leave the comfort of your desk chair and make videos of the kitchen sink in question.

Yes, at first glance, this is quite a hassle. However, gathering data on-site is a valuable learning experience because we obtain important information about the problem and its domain. In our case, this involvement with the domain, for instance, leads to questions like:

What should the videos look like?
What data is useful and required for solving the problem?

Thinking about such questions is important in order to tackle the actual problem at hand and not focus too narrowly, for example, on the AI component of the dish-o-tron.

A few further (possibly) helpful considerations about the videos:

Take note of the future position of the dish-o-tron for the perspective
It might be useful to take several videos of clean and dirty sinks with little changes such as:
- switching the lights on/off
- changing the position of the water tap
- repositioning of unrelated objects around the sink
It might help to move the camera slightly back and forth to add some variance
Dirty sinks come in many different configurations when making several videos with e.g. different plates/cups/cutlery, and changing their positions might be required.

This is certainly not a complete list, feel free to point out additional considerations for the videos.

Are you still sitting in your comfy desk chair? Did our passionate plea for the importance of gathering data not convince you?

You can do this! Gather data for your dish-o-tron! It’s worth it! Just grab your smartphone and follow these instructions:

use a landscape perspective (please say NO to vertical videos)
If you are not sure why you should say NO to vertical videos, please study this comprehensive explanation on YouTube.
film from a top-down position (not from the front)
it’s allowed to have objects located next to the sink
the difference between clean and not_clean is only determined by whether or not there are dishes IN the sink
Don’t scroll down to read more. Take your smartphone and go to the sink.

PRIVACY WARNING: Make sure you do not record any personal things like photos or other people. Since you are recording video, also make sure you don’t record any conversations or other persons. Otherwise you won’t be able to share and talk about your great work later. Then you won’t become famous for building the best dish-o-tron in the world. As a result, you will not get the job as AI Lead at the self-driving car company and so on. So be careful, you have been warned.

Record 5 short videos (3-5 sec.) of a clean sink:

- slowly move the camera slightly to get some different angles and reflections
- for each video change some conditions e.g.
  - switch light on/off
  - open the tap to make the sink wet
  - move the tap
  - … be creative – what else could happen?

(sample video for not_clean sink)

Record 5-8 short videos of a not_clean sink:

put dishes/glasses/tools/pans whatever into the sink
for each video change something
- move the position of dishes
- put more / remove
- change light
- …

That’s it. You have collected your very own first data to build the dish-o-tron. For the next steps this data will be enough. Will this data suffice to build a reliable AI product that works under every condition? Absolutely not! However, this data lays the foundation for building a running AI system and iteratively improving it.

Another source for additional data is your friends and colleagues. Just tell them about your journey to turn the community kitchen into a peaceful meeting ground. Ask them to provide additional videos of their dirty dishes for your collection. Believe it or not, this may be a nice door opener and starting point for interesting conversations with people to whom you didn’t talk for a while (and perhaps won’t for a while)!

Labeling data

Now go back to the Colab notebook and merge your data collection with the data that we provided. You can upload your additional videos into the Colab environment, for example via the UI in the panel on the left-hand side.

Put them into data/video_samples and sort them into the right subfolders (first 5 into clean and the rest into not_clean). With this sorting step you have “labeled” the data. You told the dish-o-tron what a clean and a not_clean sink looks like. From your knowledge, dish-o-tron can learn everything!

That’s all the magic. You have put pictures into folders. Bravo.

OK, to be fair, labeling at scale is not that simple. Some datasets have millions of images. Some labels are not as easy to give as clean or not_clean. For example, a label could also be that you need to mark every pixel in an image where you see a road and seperate it from a wall. This might help a self-driving car to stay on track. Labels like this for millions of images can be very expensive – but also very valuable.

A short side note: Typically, AI systems benefit from large databases. Hence, we considered kick-starting a crowdsourcing campaign in order to gather a community DISH-O-TRON dataset. Potentially, this could improve all dish-o-trons around the world and crowdsourcing datasets would also be useful for various other kinds of problems.

In other IT communities, there are tools and platforms to collaboratively share and grow code. In some open-source software projects, hundreds or even thousands of collaborators are contributing to one big mission goal (often without getting anything back but good software). Unfortunately things like this do not yet exist to grow and collect datasets. But wouldn’t it be great to have a GitHub for datasets? To have a Kickstarter for data collection initiatives? To have hundreds of people around the world collecting data to feed the dish-o-tron? But maybe the data would just be too valuable to be shared – being the new oil or was it electricity? If you are interested in collaboratively building a large (dish-o-tron) dataset, please drop us a note.

Splitting images in train, validation and test datasets

A fundamental concept in training machine learning models is splitting the dataset into train, validation and test datasets. Because this concept is a comprehensive topic on its own, we only briefly discuss the intricacies of data splitting here. To get a better understanding, we strongly recommend familiarizing yourself with this topic further. A possible starting point is the articles here and here. (For our German speaking readers: You could also watch our introduction to machine learning video from our AI bootcamp here .)

Another side note: fast.ai is a really great starting point if you want to learn more about machine learning. Big kudos to Jeremy Howard, his team and the fast.ai community. You have been a great inspiration to us as well. We have watched your lessons. We love your practical way of teaching. Implement something and learn by doing! It’s not necessary and also not possible to understand all details of Deep Learning before you build something. Like this you will never build something. That’s how the dish-o-tron was born, by the way.

Choosing a validation dataset and test dataset in order to evaluate the model more or less defines the rules of the game. Hence, it is crucial to understand the implications of the chosen splitting approach. For example, in our case, the images originate from videos, and hence are not completely unrelated in this sense. Two chronologically close frames of a video might not be very different potentially resulting in two very similar images in the train and test dataset.

In the future it could be useful to have a test set for the dish-o-tron containing only images from kitchen sinks that are not present in the training and validation dataset. Many times it is not clear at the beginning what the test set should look like and creating a reliable test set is an art in itself. In many cases, we have to iterate and improve it over time.

A rule of thumb is that the test set should represent the actual real-life situation as well as possible. Therefore, there is a good argument in favour of putting data from the same kitchen sink into the train, validation and test set if the model for the dish-o-tron is only used for one particular sink.

Attention: the Colab notebook only provides temporary storage and all data will be deleted if the notebook is closed. Hence, to persist the data, you have to download it or store it, e.g. in your Google Drive storage.

Conclusion

In this article we tackled a sub-problem that appears to be tedious at first glance. However, gathering data, working and preparing data, understanding the data and its origins are fundamental tasks for problem solvers to understand the big picture of the problem at hand.

We hope that we were able to motivate you to actually get up from your desk and take videos of kitchen sinks in various configurations and work with the data. This is an important step to build the dish-o-tron and get the real problem solver experience.

In the next article we will use that data to train our first model. We will demonstrate this by using a service like Google AutoML as well as an easy-to-use framework like fast.ai. If you want to get in contact with us and maybe ask for the whole DIRTY-DISHES-DATASET, please answer to this tweet.

Continue with the third part of our series where we train the vision model.

Was this post helpful?

Likes

Blog authors

Marcel Mikl

Do you still have questions? Just send me a message.

Oliver Moser

Partner und Key Account Manager

Do you still have questions? Just send me a message.

fromMarcel Mikl & Oliver Moser

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

DISH-O-TRON – Train that vision model!

With this article we continue our endeavor of building dish-o-tron – an AI system designed to prevent the sudden appearance of dirty dishes in the community kitchen sink, and hence turning the community kitchen into a place of peace and harmony. This...

AI
Computer Vision

11.10.2020 | 11 Minuten Lesezeit

Marcel Mikl

Oliver Moser

DISH-O-TRON – No more dirty dishes thanks to AI

Sadly, to tell you the truth, doing dishes is still a thing. However, so far most of our readers still like our non-standard Deep Learning tutorial. Typically, AI is demonstrated as solving various toy problems. AI plays chess and Go, AI plays video ...

10.9.2020 | 7 Minuten Lesezeit

Marcel Mikl

Oliver Moser

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und ausreichend Spezialwissen möglich. Hauptsächlich große Internet-Konzerne wie Google, Apple und Facebook hatten das Geld, die Daten und die Expertise, um ...

Data
Machine Learning
Künstliche Intelligenz

6.7.2020 | 7 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

Thinking AI means re-thinking data

While doing AI is sexy and cool, data infrastructure is typically not considered any of this. However, production-grade machine learning applications heavily rely on proper data infrastructure. Hence, in order to generate actual business value, solid...

AI
Big Data
Data
Machine Learning

27.5.2020 | 7 Minuten Lesezeit

Marcel Mikl

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Warum gelingt es Data-Science-Initiativen häufig nicht, einen echten Mehrwert zu schaffen? Wir haben einige Ursachen dafür ausgemacht. In diesem Blogpost stellen wir vier typische Fallen für Data-Science-Projekte vor und geben Tipps, wie Du sie umschiffen...

Machine Learning
Data
Künstliche Intelligenz
Softwareentwicklung

27.3.2020 | 11 Minuten Lesezeit

Marcel Mikl

Great Expectations: Validating datasets in machine learning pipelines

Typically your favorite machine learning model doesn’t care whether or not your input dataset is professionally and technically correct. However, particularly for machine learning algorithms, the all-encompassing truth garbage in, garbage out holds true...

Python
Data
Machine Learning

17.2.2020 | 6 Minuten Lesezeit

Marcel Mikl

Remote training with GitLab-CI and DVC

In many Data Science projects there is a point in time where the workstation under your desk is not the ideal machine to perform the model training anymore. More potent processors and GPUs are required, e.g. a suitable server in your company’s rack or...

Git
Machine Learning
CI/CD
AI
GitLab

27.1.2020 | 15 Minuten Lesezeit

Marcel Mikl

Bert Besser

E-Mail-Klassifizierung mit SpaCy

Noch vor kurzer Zeit war E-Mail-Klassifikation mittels Deep Learning nur mit Spezialwissen und ausreichend Data Science Know-how möglich. Heute existieren sehr gute Open-Source-Bibliotheken mit fertigen Deep-Learning-Modellen, welche sehr weit optimiert...

Data
Machine Learning

28.4.2019 | 8 Minuten Lesezeit

Marcel Mikl

Kunden-E-Mails effizient verarbeiten – mit künstlicher Intelligenz

Einleitung Künstliche Intelligenz (KI) findet sich heutzutage scheinbar überall. Bereits ohne den derzeitigen Hype-Faktor um KI ist der Begriff nur schwer zu greifen. Viele Unternehmen sehen sich unter Zugzwang, KI als neue Technologie einzusetzen und...

Data
Künstliche Intelligenz

7.4.2019 | 7 Minuten Lesezeit

Marcel Mikl

Oliver Moser

Wie trainiert man eigentlich neuronale Netze?

Neuronale Netze sind für Außenstehende häufig von einer mystischen Aura umgeben. Sie werden regelmäßig in Verbindung mit menschlichen Gehirnen gebracht, und ihnen wird eine sich verselbständigende Intelligenz zugeschrieben. Das macht sie für viele mysteriös...

Künstliche Intelligenz

27.8.2018 | 8 Minuten Lesezeit

Marcel Mikl

Simple Web Reports, #NoMoreCookies und #BanTheCookieBanners

Jeder kennt sie, keiner mag sie: Die Rede ist von Cookie-Bannern, die gerade im Laufe des Jahres 2020 immer lästiger geworden sind. Eine echte Plage. Grundsätzlich steckt dahinter eine gute Idee, nämlich die Benutzer*innen davor zu schützen, dass sie...

21.12.2020 | 2 Minuten Lesezeit

Oliver Moser

DISH-O-TRON – Train that vision model!

AI
Computer Vision

11.10.2020 | 11 Minuten Lesezeit

Marcel Mikl

Oliver Moser

DISH-O-TRON – No more dirty dishes thanks to AI

10.9.2020 | 7 Minuten Lesezeit

Marcel Mikl

Oliver Moser

codecentric.AI Bootcamp ist online!

Im letzten Jahr haben wir bei codecentric eine AI-Initiative gestartet. Wir haben uns zum Ziel gesetzt, einen Online-Kurs zum Thema Machine Learning und künstliche Intelligenz in deutscher Sprache zu entwickeln. Natürlich gibt es bereits mehrere sehr...

Computer Vision
Künstliche Intelligenz
NLP

26.5.2019 | 4 Minuten Lesezeit

Oliver Moser

Kunden-E-Mails effizient verarbeiten – mit künstlicher Intelligenz

Data
Künstliche Intelligenz

7.4.2019 | 7 Minuten Lesezeit

Marcel Mikl

Oliver Moser

Simple Deep Learning mit Amazon SageMaker

In unserem neuesten codecentric.AI-Video geben wir eine kurze Einführung in Amazon SageMaker und zeigen, wie man damit schnell und einfach ein Bildklassifikationsmodell trainieren kann, das Brillenträger von Nicht-Brillenträgern unterscheidet. Mit...

Big Data
AWS
Cloud
Data
Machine Learning
Künstliche Intelligenz
Python

11.7.2018 | 5 Minuten Lesezeit

Shirin Elsinghorst

Oliver Moser

Künstliche Intelligenz Initiative: „codecentric.AI“

Im April 2018 haben wir bei codecentric eine AI (Artificial Intelligence bzw. KI = künstliche Intelligenz) Initiative gestartet. Unter dem Projektnamen „codecentric.AI “ werden wir versuchen zu zeigen, was heute mit künstlicher Intelligenz möglich ist...

Computer Vision
Künstliche Intelligenz
Python

10.4.2018 | 3 Minuten Lesezeit

Oliver Moser

Shirin Elsinghorst

Einführung in Computer Vision mit OpenCV und Python

Computer Vision ist eine spannende Disziplin in der Informatik. Die Forschung beschäftigt sich bereits seit Jahrzehnten mit dem Thema, aber erst durch aktuelle Fortschritte in den Bereichen Big Data und künstliche Intelligenz ergeben sich beeindruckende...

Computer Vision
Künstliche Intelligenz
Python

5.6.2017 | 13 Minuten Lesezeit

Oliver Moser

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Data
Machine Learning
Künstliche Intelligenz

6.7.2020 | 7 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

BIE Spotty – unsere Lösung beim BIE City Hackathon

Typischerweise sind bei Hackathons viele Soft- und Hardware-Entwickler zu finden, die innerhalb eines begrenzten Zeitraums versuchen, kreative und ungewöhnliche Lösungen in Form von Code und ersten Prototypen für vorher definierte Challenges zu erarbeiten...

IoT
Computer Vision
IT-Security
Machine Learning

2.7.2020 | 5 Minuten Lesezeit

Meike Wocken

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Machine Learning und künstliche Intelligenz sind aktuell in aller Munde und versprechen vielfältige Einsatzmöglichkeiten im Unternehmen. Trotzdem tun sich viele Unternehmen aktuell noch schwer, das Potential der Technologie zu nutzen. „Der Fokus liegt...

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Machine Learning
Data
Künstliche Intelligenz
Softwareentwicklung

27.3.2020 | 11 Minuten Lesezeit

Marcel Mikl

Machine-Learning-Modelle bewerten – die Crux mit den Testdaten

Machine-Learning-Technologien lassen sich erfolgreich und praxisnah im Unternehmensumfeld einsetzen. Ein konkreter, überschaubarer Anwendungsfall und somit fokussierter Einsatz von Machine-Learning-Modellen kann dabei echten Mehrwert generieren. Dieser...

Data
Machine Learning
Data Science

25.3.2020 | 5 Minuten Lesezeit

Berthold Schulte

Deployment von Machine-Learning-Modellen mit Seldon Core

In diesem Artikel sehen wir uns an, wie wir Machine-Learning- und Deep-Learning-Modelle mit Seldon Core deployen können. Seldon Core ist eine Open-Source-Plattform, um Modelle auf einem Kubernetes-Cluster in Betrieb zu nehmen. Bevor wir uns Seldon Core...

Softwarearchitektur
Data
Künstliche Intelligenz
Machine Learning

9.9.2019 | 7 Minuten Lesezeit

Nico Axtmann

Data Science in der Praxis: Häufige Fehler und Vorgehen

In diesem Artikel gehen wir auf die Besonderheiten von Data Science in der Praxis ein. Wir konzentrieren uns auf die technischen Unterschiede, häufige Fehler und Herausforderungen. Dabei lassen wird die sozialen und kommunikativen Aspekte außen vor. ...

Agilität
Machine Learning
Data

28.8.2019 | 11 Minuten Lesezeit

Nico Axtmann

Inbetriebnahme eines scikit-learn-Modells mit ONNX und FastAPI

Dieser Artikel befasst sich mit dem Deployment eines Machine-Learning-Modells, das den Wert eines Hauses in Boston anhand gewisser Merkmale wie der Kriminalitätsrate des Bezirks und der Anzahl der Räume in einer Wohnung bestimmen kann. Im ersten Schritt...

Data
Python
Künstliche Intelligenz
Machine Learning

6.8.2019 | 3 Minuten Lesezeit

Nico Axtmann

Machine-Learning-Modelle bewerten – die Crux mit der Metrik

Ist ein Modell erst einmal trainiert, kann es auf verschiedene Art und Weise und mit mehr oder weniger komplexen und aussagekräftigen Verfahren und Metriken bewertet werden. Die Anzahl und möglichen Kriterien, ein Modell zu bewerten, sind allerdings....

Data
Machine Learning
Softwareentwicklung

1.7.2019 | 13 Minuten Lesezeit

Berthold Schulte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

DISH-O-TRON – Gather that DATA you must!

Approach and reasoning

Videos of kitchen sinks

Labeling data

Splitting images in train, validation and test datasets

Conclusion

Was this post helpful?

Ja

Blog authors

Get in contact

Get in contact

Contact Marcel

Contact Oliver

More articles

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

DISH-O-TRON – Train that vision model!

DISH-O-TRON – No more dirty dishes thanks to AI

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Thinking AI means re-thinking data

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Great Expectations: Validating datasets in machine learning pipelines

Remote training with GitLab-CI and DVC

E-Mail-Klassifizierung mit SpaCy

Kunden-E-Mails effizient verarbeiten – mit künstlicher Intelligenz

Wie trainiert man eigentlich neuronale Netze?

Simple Web Reports, #NoMoreCookies und #BanTheCookieBanners

DISH-O-TRON – Train that vision model!

DISH-O-TRON – No more dirty dishes thanks to AI

codecentric.AI Bootcamp ist online!

Kunden-E-Mails effizient verarbeiten – mit künstlicher Intelligenz

Simple Deep Learning mit Amazon SageMaker

Künstliche Intelligenz Initiative: „codecentric.AI“

Einführung in Computer Vision mit OpenCV und Python

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

BIE Spotty – unsere Lösung beim BIE City Hackathon

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Wie man Data-Science-Projekte nicht in die PoC-Sackgasse manövriert

Machine-Learning-Modelle bewerten – die Crux mit den Testdaten

Deployment von Machine-Learning-Modellen mit Seldon Core

Data Science in der Praxis: Häufige Fehler und Vorgehen

Inbetriebnahme eines scikit-learn-Modells mit ONNX und FastAPI

Machine-Learning-Modelle bewerten – die Crux mit der Metrik

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten