Take control of named entity recognition with your own Keras model!

13.11.2020 | 7 minutes of reading time

This post shows how to extract information from text documents with the high-level deep learning library Keras : we build, train and evaluate a bidirectional LSTM model by hand for a custom named entity recognition (NER) task on legal texts.

In a previous post , we solved the same NER task on the command line with the NLP library spaCy . The present approach requires some work and knowledge, but yields a much more flexible solution which we can tune, scale and modify to our needs.

The NER dataset and task

We use the dataset presented by E. Leitner, G. Rehm and J. Moreno-Schneider in

Fine-grained Named Entity Recognition in Legal Documents.

again. It consists of decisions from several German federal courts with annotations of named entities referring to legal norms, court decisions, legal literature and others of the following form:

‘Trotz der zweifelhaften Bewertung von MDMA als ” harte Droge ” ( vgl. BGH , Beschluss vom 3. Februar 1999 – 5 StR 705/98
, juris Rn. 2 RS ; zum Meinungsstand Patzak in Körner / Patzak / Volkmer , BtMG , 8. Aufl. , Vorbem. zu §§ 29 ff. Rn. 213 LIT mwN ; Weber , BtMG , 5. Aufl. , § 1 Rn. 364 LIT mwN ) hat der Strafausspruch
Bestand , da die verhängte Rechtsfolge jedenfalls angemessen ist ( § 354 Abs. 1a Satz 1 StPO GS ) . ‘

The task will be to build, train and evaluate a model that, given sample sentences, annotates each token of each sentence with a tag that indicates whether this token is part of a reference to a legal norm, court decision, legal literature and so on.

NER with bi-LSTM for dummies

We implement a standard deep-learning architecture for NER — a bi-directional recurrent neural network — which works as follows:

Each sentence is split into a sequence of token and each token is represented by a word vector. These word vectors or embeddings are usually pre-trained on a huge corpus of documents so that they encode semantic information. We thus employ general language proficiency to our special task, a technique known as transfer learning . Common methods for pre-training are word2vec , gloVe or fasttext ; we use the word vectors provided by spaCy .
The model processes the input sequence step by step and maintains an internal memory along the way,
- reading the corresponding input vector,
- combining this input with the internal memory,
- producing an output vector and
- updating the internal memory
at each step. This magic is carried out by a long-short-term memory (LSTM) cell . As a result, we obtain an output sequence ot the same length as the input sequence, and an internal memory state.
Going backwards, the model reads the input again and produces a second output sequence.
At each position, the outputs of steps 2 and 3 are combined and fed into a classifier which outputs, for the input word at this position, the probability that should be annotated with the first tag, second tag, and so on.

To improve performance, one can replace the last feed-forward layer by a conditional random field model (CRF) . The resulting architecture is called bi-LSTM-CRF model.

Setting up the environment

First, set up a virtual environment as described in the preceding blog post , and install the required dependencies:

1mkdir keras_ner_project
2cd keras_ner_project
3python3 -m venv .venv
4source .venv/bin/activate
5pip install spacy
6python -m spacy download de_core_news_md
7pip install tensorflow

Alternatively, follow along with Jupyter running inside a TensorFlow Docker container , or with a google colab notebook .

Next, download the data as in the preceding blog post (in case you are inside a Jupyter notebook, put an exclamation mark ! in front of each command to have it executed by the shell):

1mkdir -p data/01_raw
2curl https://github.com/elenanereiss/Legal-Entity-Recognition/raw/master/data/dataset_courts.zip \
3     -L -o data/01_raw/raw.zip
4!unzip data/01_raw/raw.zip -d data/01_raw

Step 1: Preprocessing for NER

The data files contain sample sentences separated by blank lines, with one token and annotation in BIO format per line as follows:

 1an O
 2Kapitalgesellschaften O
 3( O
 4§ B-GS
 517 I-GS
 6Abs. I-GS
 71 I-GS
 8und I-GS
 92 I-GS
10EStG I-GS
11) O

We read such a data file line-by-line and store the sentences as lists of token-tag pairs:

 1def load_data(filename: str):
 2    with open(filename, 'r') as file:
 3        lines = [line[:-1].split() for line in file]
 4    samples, start = [], 0
 5    for end, parts in enumerate(lines):
 6        if not parts:
 7            sample = [(token, tag.split('-')[-1]) for token, tag in lines[start:end]]
 8            samples.append(sample)
 9            start = end + 1
10    if start < end:
11        samples.append(lines[start:end])
12    return samples
13
14train_samples = load_data('data/01_raw/bag.conll')
15val_samples = load_data('data/01_raw/bgh.conll')
16all_samples = train_samples + val_samples

For simplicity, we’ll truncate the sentences to a maximum length and pad shorter input sequences. But first, let us determine the set of all tags in the data and add an extra tag for the padding:

1schema = ['_'] + sorted({tag for sentence in samples for _, tag in sentence})

Next, we represent each token by a word vector, using a pre-trained German language model of the NLP library spaCy :

 1import spacy
 2import numpy as np
 3
 4nlp = spacy.load('de_core_news_md')
 5EMB_DIM = nlp.vocab.vectors_length
 6MAX_LEN = 50
 7
 8def preprocess(samples):
 9    tag_index = {tag: index for index, tag in enumerate(schema)}
10    X = np.zeros((len(samples), MAX_LEN, EMB_DIM), dtype=np.float32)
11    y = np.zeros((len(samples), MAX_LEN), dtype=np.uint8)
12    vocab = nlp.vocab
13    for i, sentence in enumerate(samples):
14        for j, (token, tag) in enumerate(sentence[:MAX_LEN]):
15            X[i, j] = vocab.get_vector(token)
16            y[i,j] = tag_index[tag]
17    return X, y
18
19X_train, y_train = preprocess(train_samples)
20X_val, y_val = preprocess(val_samples)

Now, we got the data ready for NER and can assemble our model!

Step 2: Build the bi-LSTM model

With the wide range of layers offered by Keras , we can can construct a bi-directional LSTM model as a sequence of two compound layers:

The bidirectional LSTM layer encapsulates a forward- and a backward-pass of an LSTM layer, followed by the stacking of the sequences returned by both passes.
The second layer applies a dense classification layer to every position of the stacked sequences. Here, the SoftMax
activation function scales the output so that we obtain sequences of probability distributions:

 1from tensorflow.keras.models import Sequential
 2from tensorflow.keras.layers import Bidirectional, LSTM, TimeDistributed, Dense
 3
 4def build_model(nr_filters=256):
 5    input_shape = (MAX_LEN, EMB_DIM)
 6    lstm = LSTM(NR_FILTERS, return_sequences=True)
 7    bi_lstm = Bidirectional(lstm, input_shape=input_shape)
 8    tag_classifier = Dense(len(schema), activation='softmax')
 9    sequence_labeller = TimeDistributed(tag_classifier)
10    return Sequential([bi_lstm, sequence_labeller])
11
12model = build_model()

 1def train(model, epochs=10, batch_size=32):
 2    model.compile(optimizer='Adam',
 3                  loss='sparse_categorical_crossentropy',
 4                  metrics='accuracy')
 5    history = model.fit(X_train, y_train,
 6                        validation_split=0.2,
 7                        epochs=epochs,
 8                        batch_size=batch_size)
 9    return history.history
10
11history = train(model)

1def predict(model):
2    y_probs = model.predict(X_val)
3    y_pred = np.argmax(y_probs, axis=-1)
4    return [
5        [(token, tag, schema[index]) for (token, tag), index in zip(sentence, tag_pred)]
6        for sentence, tag_pred in zip(val_samples, y_pred)
7    ]
8
9predictions = predict(model)

Finally, we compute precision, recall and f1-score on the level of tag categories using scikit learn ’s classification_report :

 1import pandas as pd
 2from sklearn.metrics import classification_report
 3
 4def evaluate(predictions):
 5    y_t = [pos[1] for sentence in predictions for pos in sentence]
 6    y_p = [pos[2] for sentence in predictions for pos in sentence]
 7    report = classification_report(y_t, y_p, output_dict=True)
 8    return pd.DataFrame.from_dict(report).transpose().reset_index()
 9
10evaluate(predictions)

Training a model with 1024 filters for 10 epochs, we reach the following scores:

tag	f1-score	precision	recall	support
EUN	56.9	67.0	49.5	398
GRT	65.9	91.0	51.6	643
GS	94.5	96.1	92.9	6774
INN	41.3	88.9	26.9	119
LD	74.0	67.0	82.6	86
LDS	0.0	0.0	0.0	9
LIT	79.5	74.3	85.4	1681
MRK	0.0	0.0	0.0	49
ORG	25.3	32.4	20.8	159
PER	0.0	0.0	0.0	473
RR	92.0	94.4	89.8	560
RS	90.7	97.1	85.0	8380
ST	71.9	93.9	58.2	79
STR	0.0	0.0	0.0	35
UN	32.7	64.9	21.8	110
VO	2.2	4.0	1.5	66
VS	0.0	0.0	0.0	10
VT	18.0	11.7	38.9	144

Let’s see how this compares to the results achieved with spaCy :

It seems that our hand-built NER model does very well! But beware that these experiments do not show a winner: neither of the two approaches has been optimized and we did not compare training time nor compute resources used. The main differentiating factor is that

spaCy can be used out-of-the-box with no understanding of deep learning
the approach presented here is much more flexible and tuneable (see below).

What next?

With the deep learning library Keras , build and training our custom NER model took just a few lines, but setting up the data and the training required much more understanding than the command-line approach with spaCy .

To improve performance, we could try to tune the model and

increase the number of filters, that is, the size of the LSTM cell,
stack several bidirectional layers on top of each other,
replace the time-distributed classification layer with a conditional random field (CRF) model or
address the imbalance of the tag distribution with a focal loss instead of categorical cross-entropy.

But to achieve a significant boost, we need to provide our model with more input by

labeling more task-specific training data or
applying more of task-independent language proficiency to our task.

In a next blog post, we shall fine-tune a pre-trained NLP transformer model to our NER task and get state-of-the-art performance.

Stay tuned!

Was this post helpful?

Likes

Blog author

Thomas Timmermann

Data Scientist

Do you still have questions? Just send me a message.

fromThomas Timmermann

NER with little data? Transformers to the rescue!

How do you solve deep learning problems with too little labelled data? The answer, of course, is transfer learning. In this post, we will apply this concept to named entity recognition (NER) and fine-tune a pre-trained BERT to extract information from...

Data
Machine Learning
AI
NLP
Agile transformation

14.12.2020 | 8 Minuten Lesezeit

Thomas Timmermann

NER @ CLI: Custom-named entity recognition with spaCy in four lines

Named entity recognition is a technical term for a solution to a key automation problem: extraction of information from text. Applications include automation of business processes involving documentsdistillation of data from the web by scraping websitesindexing...

Data
AI
NLP
Machine Learning

6.11.2020 | 8 Minuten Lesezeit

Thomas Timmermann

Move n-gram extraction into your Keras model!

Move n-gram extraction into your Keras model! In a project on large-scale text classification, a colleague of mine significantly raised the accuracy of our Keras model by feeding it with bigrams and trigrams instead of single characters. For his experiments...

AI
NLP
Big Data
Python
Data

18.7.2019 | 7 Minuten Lesezeit

Thomas Timmermann

Natural Language Processing — Einsteigen und loslegen!

1 Worum geht es? Ob Suchmaschinen, Spamfilter, Chatbots oder Sprachassistenten wie Siri und Alexa — Computer verarbeiten immer mehr Sprache mit immer besserer Genauigkeit und dringen damit immer weiter in unseren Alltag vor. Dahinter stecken anspruchsvolle...

Künstliche Intelligenz
Machine Learning
Python
NLP
Data

7.3.2019 | 10 Minuten Lesezeit

Thomas Timmermann

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

ChatGPT im Alltag eines Python-Entwicklers

Seit einigen Tagen spiele ich mit ChatGPT herum. Beruflich und privat konnte ich damit einige Fragen bearbeiten, bspw. welche Alternativen es zu bestimmten Tools gibt, was Vorteile von Teilzeit für den Arbeitgeber sind oder wer ich bin. Leider weiß ChatGPT...

NLP
Python
Künstliche Intelligenz

27.1.2023 | 7 Minuten Lesezeit

Robert Meißner

Manches gehört zusammen, manches besser nicht - Konnaszenz in Python

Wir alle kennen es. Wir bekommen neuen Code und irgendwie macht der merkwürdige Sachen. Teilweise müssen wir Reverse Engineering betreiben. Wir wundern uns, warum eine Umgebungsvariable nicht korrekt gesetzt wird oder der Login schief geht. Bis wir merken...

Python
Softwareentwicklung
Softwarearchitektur

30.11.2022 | 7 Minuten Lesezeit

Robert Meißner

Mit wenigen Zeilen Code Titel und Vorschaubild generieren

Ich bin ein fauler Mensch. Und ich schreibe viel, u. a. beruflich und privat in Blogs, auf Twitter und auf Wissenschaftsseiten. Das Schreiben per se ist schön. Aber wenn ich mir Titel überlegen muss oder gar Schlagwörter, dann ist der Spaß vorbei. Noch...

11.10.2022 | 7 Minuten Lesezeit

Robert Meißner

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Schon die Beatles besangen ein uraltes Problem in ihrem Song „Strawberry JSON Fields Forever“ : Wie lässt sich mit der GraphQL Library Strawberry für Python nach Werten in JSON-Feldern einer PostgreSQL-Datenbank filtern?SetupUm das zu zeigen, braucht...

Frontend
API
Python

26.6.2022 | 4 Minuten Lesezeit

Michael Eichenseer

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

Das Auslesen von Adress-/Anschriftbereichen in Briefen war schon immer eine recht schwierige Problematik. Die Freude war umso größer, als Kofax vor einigen KTM-Versionen (Kofax Transformation Modules ) ein Werkzeug (Adress-Lokator) für das automatisierte...

NLP
Archivierung

7.3.2022 | 6 Minuten Lesezeit

Jürgen Voss

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Wie man Java-Klassen in Python benutzt

Generell sollte man zwar für jedes Problem das passende Werkzeug nutzen. Aber oftmals wird man gezwungen, den Hammer Java zu nutzen, weil der Rest des Hauses mit diesem Hammer gebaut wurde. Eine moderne Lösung dieses Problems ist natürlich die Microservice...

Künstliche Intelligenz
Java
Python

15.11.2021 | 8 Minuten Lesezeit

Hendrik Schawe

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Take control of named entity recognition with your own Keras model!

The NER dataset and task

NER with bi-LSTM for dummies

Setting up the environment

Step 1: Preprocessing for NER

Step 2: Build the bi-LSTM model

What next?

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

NER with little data? Transformers to the rescue!

NER @ CLI: Custom-named entity recognition with spaCy in four lines

Move n-gram extraction into your Keras model!

Natural Language Processing &mdash; Einsteigen und loslegen!

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Bessere SQL-Datenpipelines mit dbt

ChatGPT im Alltag eines Python-Entwicklers

Manches gehört zusammen, manches besser nicht - Konnaszenz in Python

Mit wenigen Zeilen Code Titel und Vorschaubild generieren

Streaming Wikipedia mit Apache Kafka

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Wie man Java-Klassen in Python benutzt

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten

Natural Language Processing — Einsteigen und loslegen!