Create your own parser with the Nearley Parser

24.10.2018 | 5 minutes of reading time

For a project I needed to parse data that is being delivered through email; yes, I know, when was the last time you received production data through email instead of a clean REST API? For sure it is not an ideal interface but it is one we have to deal with as we cannot change the source to provide a clean REST API on short notice; and perhaps also familiar, the project needs to deliver value as soon as possible. Luckily I came across the Nearly Parser.

Nearley Parser

So at first you might think to try and parse the email and extract the data by using regular expressions or even just on keywords, rows, and columns. However, the nature of email and mail servers can cause changes to the content of the message like messages being forwarded or signatures being added or text changing into HTML. So I needed something that is more intelligent and robust to parse these mail messages.

Searching for a document parser that could do the job, I found something more interesting. Why would you not explain the parser how to read your document and as a result provide you with some meaningful JSON objects containing your data?

The tool for this job is the Nearley Parser. It was named after its inventor Jay Earley.

To really go into depth on how this parsing algorithm works, I would recommend you to read the explanation of the algorithm by the author himself.

We can make the algorithm work for us by providing a definition file called the grammar file. But before we do, we can experiment with our grammar file with an online tool which is obviously called the Nearley Playground . Here you create a test, which is basically the content you need to parse, and provide the grammar which is real-time compiled, and it directly shows you the result.

Nearley Parsing Primer

The data we need to parse is coming from a sensor and contains, among other things, an information status about its battery. This data needs to be read and stored in a document store. So wouldn’t it be great if we could feed this piece of data to a document parser that can parse this and that returns a JSON object containing the data?

So the first sentence of data we are interested in:


Battery: 51%, 4.01 Volt

The sentence starts with a word, then a colon, white space and the data.

The Nearley Parser reads each line of the document and treats every word and character as what is called a “terminal”. When there is a match with one of the strings or characters, we can use a post-processor method to actually do something with the data and in this case construct a JSON object.

So to parse “Battery:”, the grammar is


sentence -> “Battery:”

But our line contains more characters. To tell the parser there can be one or more spaces after “Battery:”, we can use the built-in function “whitespace.ne” simply by adding an underscore in our grammar. So now our grammar becomes:


sentence -> “Battery:” _

Then we encounter our first value; a percentage value. To parse this we can use the built-in function “number.ne” as follows:


“sentence -> “Battery:” _ percentage

This way we can parse the complete line with the following grammar, including the built-in functions:


@builtin "whitespace.ne" # `_` means arbitrary amount of whitespace
@builtin "number.ne"     # `int`, `decimal`, and `percentage`

sentence -> “Battery:” _ percentage “,” _ decimal _ “Volt”

All this can be easily run as an experiment on the Nearley playground website.

Real usage

This is all nice in a playground web application to learn and write your grammar file, but now we need to put this to use in an application.

Nearley consists of two components, the compiler and the parser. The compiler is used to compile your grammar file and can be used with the parser and your document to be parsed.

Both components are available as npm packages and can be installed through npm.

To install the parser in your project:


npm install --save nearley

This will add it as dependency in the package.json

To use the compiler to compile your grammar, install it as follows:


npm install -g nearley

Store the grammar as shown above into a file called grammar.ne and compile it using the following command:


nearleyc grammar.ne -o grammar.js

This will compile the grammar file into a JavaScript Parser module. Now we can use the test tool provided by the Nearley compiler:


nearley-test ./grammar.js --input “Battery: 51%, 4.01 Volt”

And it will show the results:


Parse results:
[ [ 'Battery:', null, 0.51, null, ',', null, 4.01, null, 'Volt' ] ]

As you can see, it will output some arrays containing our data and also some null values for the whitespaces in our string. To clean this up and make it return a JSON object, we can add a post-processing method as follows:


sentence -> "Battery:" _ percentage _ "," _ decimal _ "Volt" 
  {% ([,,level,,,,volts]) => 
    ({battery:{percentage: level, value: volts}}) %}

When we compile and test this, we get the following result:


Parse results:
[ { battery: { percentage: 0.51, value: 4.01 } } ]

To use this in your code, you can simply include the parser and provide your compiled grammar and data and you get the result back as an array:


const nearley = require("nearley");
const grammar = require(“./grammar.js");
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));

console.log(parser.feed("Battery: 51%, 4.01 Volt”));

Conclusion

I have written numerous lines of code which parses some data in one or another form; single lines and multilines. So also for this problem at hand, my first choice was to use some kind of regular expression to parse each line in the mail message. Which of course could have worked well, but the knowledge that the content in the mail can vary made me look for something that can handle this content variation in a clean manner without creating a complex unreadable regular expression or complex piece of code; instead the Nearley Parser grammar provides you with clear, semantically readable code.

So luckily my search brought the Nearley Parser to my attention and although it has a steep learning curve, you can create something useable quite quickly. Yes, the above example is just one line that is parsed and could have been done much quicker with a regular expression. However, as this line is somewhere in the message and also has some variation in spacing and there are numerous other pieces of data in the message that you want to read, it can become more complex quite quickly. To be fair, I am definitely not calling myself an expert on the vocabulary of the Nearley Parser, but I thought it was worth spreading the word!

Was this post helpful?

Likes

Blog author

Harald Rietman

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Nachdem wir in Teil 1 unserer kleinen Reihe zum AG-Grid-Framework gezeigt haben, wie man damit schnell interaktive Tabellen erstellt, geht es in diesem Beitrag darum, wie man die gleichen Daten auch in Grafiken (wie Balkendiagramme, Pie Charts oder Zeitserien...

React
Frontend
JavaScript
Framework
Softwareentwicklung

2.5.2023 | 6 Minuten Lesezeit

Daniel Töws

Selvarajah Sivarupan

Astro – Mit der Insellösung zur Lichtgeschwindigkeit

Astro stellt sich als „All-in-one Web Framework“ vor, das „designed for speed“ ist. Große Versprechen wie „Pull your content from anywhere“, „Deploy everywhere“ und „Use whatever frontend library you want“ prangen offensiv auf der Startseite. Eine eierlegende...

Frontend
JavaScript
Webdevelopment
Framework
Softwareentwicklung

14.4.2023 | 4 Minuten Lesezeit

Stephan Köninger

Modernes Data Fetching mit Redux Toolkit Query

Das vor sieben Jahren erstmals veröffentlichte Redux wurde bereits vor vier Jahren mit Redux Toolkit (RTK) modernisiert. Im Juni 2021 erreichte Redux dann die nächste Evolutionsstufe, indem mit Redux Toolkit Query eine dedizierte Data-Fetching-Lösung...

React
JavaScript
Frontend

28.2.2023 | 10 Minuten Lesezeit

Christoph Butschkau

Björn Böing

Tabellen im Browser – Eine Einführung in AG Grid (Teil 1)

Die heutige Datenflut hat Software und Frameworks, wie Tableau, D3 und viele andere, hervorgebracht, deren Aufgabe es ist, die Visualisierung von Daten zu verbessern. Doch trotz der teilweise sehr ausgefallenen Darstellungsformen ist manchmal die simple...

Framework
Frontend
JavaScript
React
Softwareentwicklung

17.2.2023 | 6 Minuten Lesezeit

Daniel Töws

Selvarajah Sivarupan

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Im Rahmen eines kleinen Projekts, bei dem es um das Thema Berechnung von Flugrouten ging, brauchten wir eine einfache und leichtgewichtige Möglichkeit, die Route und andere Bereiche auf der Karte zu visualisieren. Bei der Suche nach einem passenden ...

JavaScript
Framework
Open Source

28.11.2022 | 14 Minuten Lesezeit

Danny Steinbrecher

Hotwire: Ein neuer (alter) Ansatz für moderne Webanwendungen

Hotwire (HTML over the wire) wurde Ende 2020 von Basecamp vorgestellt und verspricht einen alternativen Ansatz zur Entwicklung moderner Webanwendungen mit weniger JavaScript:Hotwire is an alternative approach to building modern web applications without...

Frontend
Softwarearchitektur
Microservices
JavaScript
Webdevelopment

24.8.2022 | 9 Minuten Lesezeit

Felix Rieß

Migration von AngularJS zu Angular – eine Lernkurve

Ich möchte in diesem Blog-Artikel von der erfolgreichen Migration einer Anwendung von AngularJS zu Angular berichten.Der HintergrundAls die erste Version von AngularJS vor fast zwölf Jahren das Licht der Welt erblickte, war es eines der großen Frontend...

Frontend
JavaScript
Angular
Softwareentwicklung

5.5.2022 | 7 Minuten Lesezeit

Thomas Bosch

Miro ohne Grenzen – Wie man eigene Plugins für Miro entwickelt

In den letzten zwei Jahren haben sich viele der Aktivitäten in der Business-Welt zu Remote-Aktivitäten verändert. Für viele von uns sind dadurch neue Tools in den Fokus gerückt.Aber auch wenn diese Werkzeuge sich enorm weiterentwickelt haben, irgendwann...

API
React
JavaScript

2.3.2022 | 8 Minuten Lesezeit

Stefan Spittank

TypeScript 4.5 und der Awaited Type

TypeScript 4.5 führt unter anderem einen neuen Utility Type ein: Awaited. In diesem Blogpost möchte ich erklären, was dieser tut, welche Probleme er löst und wo er für uns nützlich sein kann. Stefan Spittank und ich haben zu diesem Thema auf unserem...

Frontend
JavaScript

21.1.2022 | 9 Minuten Lesezeit

Holger Grosse-Plankermann

Schadcode in npm-Paketen – Was tun?

Security-Stress in npmDie npm Registry ist DIE öffentliche Registry der JavaScript-Sphäre. Die beiden wichtigsten Paketmanager npm und yarn setzen beide auf ihr auf. Dementsprechend groß war der Aufschrei, als Mitte Oktober 2021 bekannt wurde, dass...

JavaScript
IT-Security

23.11.2021 | 7 Minuten Lesezeit

Antonia Schmalstieg

Mock Service Worker – Einfach Backends mocken

Der Mock Service Worker, kurz MSW, ist ein hilfreiches Werkzeug zum API Mocking bei der Entwicklung von Single Page Applications.Beim Entwickeln einer clientseitigen Webanwendung ist die Kommunikation zwischen Frontend und Backend essenziell. Dementsprechend...

Frontend
JavaScript
Testing

29.8.2021 | 9 Minuten Lesezeit

Andreas Houben

Deployment konfigurierbarer Single Page Applications

In den letzten Jahren ist die Implementierung von Frontends in Form von Single Page Applications (kurz SPA) immer beliebter geworden. Bei Single Page Applications handelt es sich um Webseiten, die auf den Web-Technologien HTML, CSS und vor allem JavaScript...

DevOps
Frontend
CI/CD
Container
JavaScript

8.6.2021 | 6 Minuten Lesezeit

Philip Sanetra

Digitalisierung unterstützt Inklusion – Vorfreude auf die DeafIT 2021

„Sprichst du mit jemandem in einer Sprache, die er versteht, so erreichst du seinen Kopf. Sprichst du mit ihm in seiner eigenen Sprache, so erreichst du sein Herz.“ – Nelson MandelaDiesen Freitag und Samstag (12.-13.3.) sind auch wir bei der remote DeafIT...

Kultur
Frontend
JavaScript
Agile Methoden
Community
Barrierefreiheit

11.3.2021 | 3 Minuten Lesezeit

Agnes Köhler

Deno – Einführung & Entwicklung einer einfachen REST API

Was ist Deno? Deno (ein Anagramm von „Node“ 🤯), ist eine JavaScript und TypeScript Runtime, die seit Mai 2020 in der Version 1.0 verügbar ist. Deno wurde von Ryan Dahl, dem ursprünglichen Entwickler von Node.js, entwickelt und soll einige konzeptionelle...

Node.js
API
JavaScript

7.10.2020 | 3 Minuten Lesezeit

Felix Magnus

Grüne Test-Pyramiden mit Cypress – UI-Testing für die Zukunft

Cypress ist ein junges Open-Source-Test-Framework für Web-basierte, grafische Benutzeroberflächen. Cypress-Tests werden in JavaScript geschrieben und orientieren sich, wie auch bei Selenium-basierten Technologien üblich, am Document Object Model (DOM...

Frontend
JavaScript
Testing

29.9.2020 | 7 Minuten Lesezeit

Jonas Verhoelen

Fotoverwaltung und Galerien – Teil 3: Automatisch erzeugte Fotogalerien

In diesem letzten Teil meiner Blogserie zum Thema Fotoverwaltung und Galerien wird es endlich etwas technischer. Nachdem ich beschrieben habe, wie ich meine Fotos verwalte und meine Fotogalerien manuell erzeuge , fehlt noch der letzte logische Schritt...

JavaScript
Ruby
Cloud

30.6.2020 | 7 Minuten Lesezeit

Stephan Köninger

Fotoverwaltung und Galerien – Teil 2: Statische Fotogalerien

Herzlich Willkommen zum zweiten Teil der Blogserie “Fotoverwaltung und Galerien”! Nachdem ich im ersten Teil meinen Weg hin zur Verwaltung meiner Fotos in einer NextCloud geschildert habe, möchte ich in diesem Teil darüber schreiben, wieso ich mich ...

JavaScript
Node.js
Webdevelopment

1.6.2020 | 4 Minuten Lesezeit

Stephan Köninger

Schnelle Frontend-Entwicklung durch typisierte Mock-Server mit json-server...

Bei der Entwicklung von Software kann es vorkommen, dass die Weiterarbeit an einem Feature durch projektinterne Abhängigkeiten aufgehalten wird. Ein Beispiel hierfür ist die getrennte Entwicklung von Frontend und Backend. Oft kann gewisse Funktionalit...

JavaScript
Frontend
Testing

31.3.2020 | 4 Minuten Lesezeit

Felix Magnus

OOD-Prinzipien in Angular

Angular ist seit einigen Jahren eines der beliebtesten JavaScript-Frameworks zur Erstellung von Single Page Applications. Während viele JavaScript-Frameworks auf funktionaler Programmierung beruhen, orientiert sich Angular hauptsächlich an der objektorientierten...

Community
JavaScript
Angular
Softwareentwicklung

13.3.2020 | 8 Minuten Lesezeit

Harald Werner

State Management in Svelte

Teil der Webentwicklung in 2020 sind nicht nur komponentenbasierte Ansätze, sondern ebenso die Nutzung von State-Management-Lösungen. Diese orientieren sich in der Regel an der Flux-Architektur und ihrem prominentesten Vertreter, Redux . Und so ist es...

JavaScript
React
Java

25.2.2020 | 3 Minuten Lesezeit

Daniel Zenzes

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Send

Create your own parser with the Nearley Parser

Nearley Parser

Nearley Parsing Primer

Real usage

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Astro – Mit der Insellösung zur Lichtgeschwindigkeit

Modernes Data Fetching mit Redux Toolkit Query

Tabellen im Browser – Eine Einführung in AG Grid (Teil 1)

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Hotwire: Ein neuer (alter) Ansatz für moderne Webanwendungen

Migration von AngularJS zu Angular – eine Lernkurve

Miro ohne Grenzen – Wie man eigene Plugins für Miro entwickelt

TypeScript 4.5 und der Awaited Type

Schadcode in npm-Paketen – Was tun?

Mock Service Worker – Einfach Backends mocken

Deployment konfigurierbarer Single Page Applications

Digitalisierung unterstützt Inklusion – Vorfreude auf die DeafIT 2021

Deno – Einführung & Entwicklung einer einfachen REST API

Grüne Test-Pyramiden mit Cypress – UI-Testing für die Zukunft

Fotoverwaltung und Galerien – Teil 3: Automatisch erzeugte Fotogalerien

Fotoverwaltung und Galerien – Teil 2: Statische Fotogalerien

Schnelle Frontend-Entwicklung durch typisierte Mock-Server mit json-server...

OOD-Prinzipien in Angular

State Management in Svelte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten

Contact

Send

Create your own parser with the Nearley Parser

Nearley Parser

Nearley Parsing Primer

Real usage

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Astro – Mit der Insellösung zur Lichtgeschwindigkeit

Modernes Data Fetching mit Redux Toolkit Query

Tabellen im Browser – Eine Einführung in AG Grid (Teil 1)

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Hotwire: Ein neuer (alter) Ansatz für moderne Webanwendungen

Migration von AngularJS zu Angular – eine Lernkurve

Miro ohne Grenzen – Wie man eigene Plugins für Miro entwickelt

TypeScript 4.5 und der Awaited Type

Schadcode in npm-Paketen – Was tun?

Mock Service Worker – Einfach Backends mocken

Deployment konfigurierbarer Single Page Applications

Digitalisierung unterstützt Inklusion – Vorfreude auf die DeafIT 2021

Deno – Einführung & Entwicklung einer einfachen REST API

Grüne Test-Pyramiden mit Cypress – UI-Testing für die Zukunft

Fotoverwaltung und Galerien – Teil 3: Automatisch erzeugte Fotogalerien

Fotoverwaltung und Galerien – Teil 2: Statische Fotogalerien

Schnelle Frontend-Entwicklung durch typisierte Mock-Server mit json-server...

OOD-Prinzipien in Angular

State Management in Svelte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten