Big Data – What to do with it? (Part 1 of 2)

21.7.2013 | 6 minutes of reading time

Analyze it, of course!

Uh, right….. but how?

That’s what I’ll be going to talk about in this first part of a two-part blog series about Big Data analytics. So, if you are interested in getting some ideas and answers, please bear with me 🙂

Data’s the new currency

We all know the hype: how companies like Google or Facebook that offer quality services “free of charge” make billions of dollars in the process, by leveraging the data provided by their customers. This scared the established players and, in doing so, created a whole new breed of data applications around one problem: Big Data.

But, what exactly is Big Data?

Unfortunately, that’s one of those questions where you ask two people and get three different opinions. One definition that has come to be more widely accepted is the 3 V’s:

– High volume (amount of data),
– High velocity (speed of data in and out),
– High variety (range of data types, sources and structures).

As a rule of thumb: if it becomes a problem to get your data into your RDBMS of choice, then most likely you’ve encountered Big Data.

In my opinion, the first V (volume) is overrated (in germany, at least) – it may be called Big Data, but it’s the third V (variety) that truly distinguishes Big Data-storages from traditional RDBMS.

Now, the good things of Big Data and, more to the point, most systems that sprang into being are

– that you can store the data as it happens to come along, no matter its structure
– that the systems scale virtually without limit, and
– that you therefore have the possibility to discover new insights, trends and opportunities in your business to act upon.

Of course, at the end of the day, “act upon” means “earn money”.

Data is useless unless combined with other data to create information

That is a simple fact and that is also the point where it gets bad: not only do we face (possibly huge amounts of) semi-structured data, but querying and interconnecting Big Data is much harder when compared to traditional RDBMS supporting SQL. Most of the Big Data systems out there use their own mechanism and/or language to query the data (as of today – maybe in the future there will be something like SQL for Big Data or traditional RDBMS get up to speed). This means, you’ll need at least one expert with the particular system at hand to efficiently query the data.

Analyzing the data to extract the required information to decide on the right action is the real challenge

And this is where it gets ugly: as of today, there exists far from optimal support for data analysts to do their work and analyze the data stored in Big Data storages efficiently. Usually, it looks like this:

The data analyst and the expert work together with the former asking the questions and the latter doing his best to provide the answers by querying the system. This is simpler said than done, because depending on the complexity of the question, it may take quite a while before an answer can be provided with multiple map-reduce-jobs required in the process. And that’s not all: usually the data analyst needs to provide his findings in a way that can be understood by the business people (i.e. reports), which is a lot easier with the right tools.

As we all know, time is money and the time required to analyze Big Data and extract useful information out of it can become such a big factor that Big Data can overwhelm an organization.

I’m not saying that with traditional RDBMS you won’t face the same problems. I’m saying that with those systems

– data analysts can work mostly without the need for additional experts, because most of them know SQL and
– there exist a lot of tools that assist them in their work

making it a lot easier and therefore way faster for them to extract and provide useful information out of the data.

So, now what? Stick with traditional RDBMS whenever possible and only try Big Data if you really, really know what you are doing and getting yourself into?

Well… it’s always best if you know what you are doing, isn’t it?

Enter: JasperReports

Huh? Where did that come from?

First of all: the heading might also have been “Enter: Pentaho” or maybe even “Enter: BIRT”. JasperReports is not the only tool suite providing features for Big Data analysis — it’s the one that I know the best.

The point is: there really are tools out there that assist data analysts when working with Big Data, but that fact doesn’t seem to be widely known — even among people trying to sell Big Data.

For all of you who don’t know: JasperReports has been around for quite a while (since 2001) and is at its heart an open source java reporting library. Over the years, there has been created a whole ecosystem of tools around this core, so that today JasperReports offers a whole BI Suite.

Now, how does JasperReports aid data analysts in working with Big Data?

In many ways; some are part of the Community Edition (Open Source), others are only available in the professional editions and have to be paid for.

At the core is still the open source reporting library which has available connectors to many Big Data storages like MongoDB, Hadoop (Hive and HBase) or Google BigQuery. The main advantage is that it is much more likely to find someone being able to create a useful JasperReports-report than someone to extract useful information out of said storages. The second advantage is the open source, graphical report developing tool iReport that aids in creating the reports and provides additional features for Big Data like auto-discovery of available fields.

There is one caveat: you still need the knowledge of the query-functionality of the storage system at hand to create the connection to your data and to keep that connection effective (so, you probably should keep the expert). However, when that is done, the report designer (aka data analyst) can work with the data like he is used to. In this blog I provide a JasperReports tutorial connecting to a MongoDB-instance which demonstrates how that works.

However, it is still a lot of work to create the report (if you took a peek at the tutorial, you’d agree), which let’s one wonder, if there isn’t an easier option to work with the data.
Well, there is!

The JasperReports-ecosystem features a server component which, at the community edition level, functions as a central repository for report storage and provides a GUI and APIs for report execution and -scheduling. At the professional edition level, this component provides (among other things) an ad-hoc reporting GUI that allows data analysts to explore the data on-the-fly, literally “playing around” with it.

I find this feature to be extremely powerful, especially when considering the nature of Big Data: data analysts don’t have to know how the data in storage is structured up-front; they can explore the data as it is made available by the initial connection query (remember the caveat above!). In the second part of this blog I will explore this in more detail.

In closing, I would like to repeat, that in my opinion the discussion around Big Data focused too much on the “huge amount/high velocity of data”-part where it instead should focus more on the “high variety of data”-part. This part offers greater possibilities, in my opinion much greater than huge amount/velocity of data in itself, but also bears greater risks, because of the difficulty of extracting useful information.

Ok, on to part two .

Was this post helpful?

Likes

Blog author

Jan Malcomess

Do you still have questions? Just send me a message.

fromJan Malcomess

JasperReports Tutorial: Dynamic Drill-Down Reports with MongoDB (Part ...

Let’s continue our work begun in part one of the tutorial! Step 3: The drilldown Step 3.1: iReport to JasperServer First, let’s connect iReport to the JasperServer, so that working with it becomes easier. In iReport, click on the Repository Navigator...

Big Data
UX/UI
Open Source
NoSQL

21.7.2013 | 10 Minuten Lesezeit

Jan Malcomess

JasperReports Tutorial: Dynamic Drill-Down Reports with MongoDB (Part ...

In this article, I will demonstrate by means of a step-by-step tutorial, how one can create an interactive report, using only the community editions (i.e. open source versions) of the JasperReports tools without having to learn the intricacies of how...

Big Data
UX/UI
Open Source

21.7.2013 | 19 Minuten Lesezeit

Jan Malcomess

Big Data – What to do with it? (Part 2 of 2)

Analyze it, of course! Yes, but we’ll do it nice and easy – without getting our hands dirty! Ok, maybe we get out hands just a little dirty 😉 This is the second part of my two-part blog series where I examine the possibilities for Big Data analytics...

Big Data
Open Source

21.7.2013 | 20 Minuten Lesezeit

Jan Malcomess

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

OpenAPI direkt in VS Code schreiben – geht das?

OpenAPI-Spezifikationen (OAS) beschreiben standardisiert und Programmiersprachen-unabhängig HTTP-APIs. Für die Erstellung von OAS gibt es verschiedene Möglichkeiten, häufig werden sie auch generiert. Das ist aber nicht für alle Programmiersprachen und...

API
Open Source

28.3.2024 | 7 Minuten Lesezeit

Mirabell Büscher

Wie als Software-Entwickler sichtbar werden?

Egal ob Junior, Medior oder Senior, introvertiert oder extrovertiert: Jeder Software-Entwickler kann seine Sichtbarkeit mit unterschiedlichen Werkzeugen erhöhen und sollte dem Thema eine gewisse Bedeutsamkeit beimessen. Die Frage dabei ist nur: wie und...

Weiterbildung
Softwareentwicklung
Community
Open Source

21.2.2024 | 6 Minuten Lesezeit

Edgar Klepek

Datenbanken testen mit Testcontainers in Mule4

Hier erfährst du die Möglichkeiten Testcontainers in Mule4 zu nutzen, um deine Datenbankaufrufe zu testen. Vor einiger Zeit hat mein Kollege Christian Langmann eine Blogartikelserie veröffentlicht, in welcher er aufzeigt, wie man in Mule3 Munit-Tests...

Community
Softwareentwicklung
Testing
API
Open Source
Datenbank
Container
Integration

19.1.2024 | 3 Minuten Lesezeit

Benjamin Lüdicke

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Erfahre, wie du mit dem Tool Mule Flow Landscape den Überblick über alle Mule Flows und deren Abhängigkeiten behältst. Die Integrationsplattform Mule ermöglicht es uns, Integrationen mittels einer Low-Code-Entwicklungsplattform umzusetzen. Die Bausteine...

Softwareentwicklung
API
Open Source
Dokumentation
Integration

13.8.2023 | 3 Minuten Lesezeit

Benjamin Lüdicke

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Im Rahmen eines kleinen Projekts, bei dem es um das Thema Berechnung von Flugrouten ging, brauchten wir eine einfache und leichtgewichtige Möglichkeit, die Route und andere Bereiche auf der Karte zu visualisieren. Bei der Suche nach einem passenden ...

JavaScript
Framework
Open Source

28.11.2022 | 14 Minuten Lesezeit

Danny Steinbrecher

Tastaturen selbst bauen

Wir verwenden sie alle jeden Tag: die Tastatur. Aber woraus besteht die Tastatur eigentlich? Wie baue ich eine Tastatur? Und wer legt fest, wie ich an das @ komme? All das haben wir bei einem Ausflug auf Texel erfahren und selbst Tastaturen gebaut. Dieser...

Raspberry Pi
Open Source

31.10.2022 | 6 Minuten Lesezeit

Robert Meißner

PDF-Generierung aus dem Container – speedata Publisher

Nach fast fünf Jahren bei codecentric ist es nun endlich so weit, dass ich auf meine Zeit vor codecentric zurückblicke und ein Thema betrachten möchte, das immer noch viele Menschen im Rahmen von Softwareentwicklungsprojekten bewegt: die Generierung ...

Open Source
Container
Go

9.3.2021 | 6 Minuten Lesezeit

Daniel Kocot

Keycloak-Konfiguration mit Terraform

Infrastructure as Code (IaC) ist heutzutage aus der modernen IT-Landschaft nicht mehr wegzudenken. Red Hat beschreibt den Begriff wie folgt:Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through...

DevOps
Infrastructure
IT-Security
CI/CD
Keycloak
Open Source

2.3.2021 | 6 Minuten Lesezeit

Johanna Nolte

Process Mining mit bupaR

Process Mining schafft Transparenz darüber, was wirklich in Unternehmen geschieht. Im Prozessmanagement werden die Idealvorstellungen eines Prozesses meist langwierig definiert. In der Praxis ist die Qualität dieser Beschreibungen jedoch oft nicht eindeutig...

Open Source
Data
Process Management

5.5.2020 | 9 Minuten Lesezeit

Anna Lukas

Hyperledger Fabric CouchDB lässt meine Cloud-Rechnung explodieren

Hyperledger Fabric ist eine hervorragende DLT-Plattform und bietet großartige Anpassungsmöglichkeiten. Eine Möglichkeit davon ist es, verschiedene Datenbanken zur Speicherung von Blockchain -Daten zu nutzen. Die empfohlene und am besten unterstützte ...

Blockchain
Datenbank
Infrastructure
Open Source

9.1.2020 | 2 Minuten Lesezeit

Jan Rümenapf

Norbert Schneider

Kong API-Gateway – Observability mit Prometheus, Grafana und OpsGenie

Im vorherigen Blogpost habe ich das bestehende Demo-Setup um decK und Konga erweitert. Nun soll es darum gehen, die vorhandenen Daten der APIs sichtbarer werden zu lassen. Hierzu möchte ich zwei Observability Patterns, nämlich Monitoring und Alerting...

Softwarearchitektur
Atlassian
Microservices
Open Source
API
APM

19.12.2019 | 4 Minuten Lesezeit

Daniel Kocot

Kong API Gateway – Deklarative Konfiguration mit decK und Visualisierung...

Seit dem letzten Post ist eine neue Version (1.4 ) des Kong API Gateways veröffentlicht worden. Die größte Neuerung stellt die /status-Route dar. Über diese lässt sich der Status eines Gateways direkt abfragen. Anfang Dezember ist auch ein Patch-Release...

Open Source
Softwarearchitektur
API
Microservices

12.12.2019 | 4 Minuten Lesezeit

Daniel Kocot

Play-with-Docker: Container-Workshops auf AWS

Kubernetes- und Docker-Workshops sind sehr schwer vorzubereiten, Play-with-Docker und Play-with-Kubernetes können dabei aber eine große Hilfe sein. Die Dokumentation dazu ist leider nicht sehr umfangreich, wie man es schnell und einfach installieren ...

Infrastructure
Cloud
DevOps
Container
Kubernetes
Open Source

22.11.2019 | 9 Minuten Lesezeit

Sebastian Kornehl

Kubernetes Operator: Operations-Wissen als Code

In diesem Artikel erkläre ich, was ein Kubernetes Operator ist und wie er aufgebaut ist. Anschließend zeige ich euch, wie man seinen ersten eigenen Kubernetes Operator in Go schreibt.Was ist ein Kubernetes OperatorEin Kubernetes Operator hilft, eine ...

Infrastructure
Open Source
DevOps
Go
Kubernetes

29.10.2019 | 10 Minuten Lesezeit

Manuel

REST: Standardisierte Fehlermeldungen mittels RFC 7807 Problem Details

REST-Fehlermeldungen: Einleitung Wenn man eine REST-Schnittstelle implementiert, kommt schnell die Frage auf, wie man Fehler am besten zurückgibt. Die erste und naheliegendste Option sind die HTTP-Statuscodes (4xx, 5xx je nach Problem) – diese sind ....

Microservices
Python
Spring
Softwareentwicklung
API
Open Source

10.9.2019 | 5 Minuten Lesezeit

Christian Sauer

API-Management mit Kong – Ein Update und mehr

Seit dem letzten Blogpost zu diesem Thema von Alexander Melnyk sind fast zwei Jahre vergangen, und es ist in Sachen „API-Management mit Kong“ eine Menge passiert. Daher war es an der Zeit, zum einen die Inhalte des Posts von Alexander zu aktualisieren...

Open Source
Python
Softwarearchitektur
API
Microservices

3.9.2019 | 5 Minuten Lesezeit

Daniel Kocot

Abweichungen zwischen Spezifikation und REST-API mit hikaku erkennen

Wenn man eine REST-API mit dem Contract-first-Ansatz erstellt, verwendet man vermutlich Codegenerierung oder einen anderen Weg, um sicherzustellen, dass die Spezifikation und die Implementierung im Laufe der Zeit inhaltlich gleich bleiben. In diesem ...

Microservices
Open Source
Testing

8.3.2019 | 3 Minuten Lesezeit

Jannes Heinrich

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Big Data – What to do with it? (Part 1 of 2)

Data’s the new currency

Data is useless unless combined with other data to create information

Analyzing the data to extract the required information to decide on the right action is the real challenge

Enter: JasperReports

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

JasperReports Tutorial: Dynamic Drill-Down Reports with MongoDB (Part ...

JasperReports Tutorial: Dynamic Drill-Down Reports with MongoDB (Part ...

Big Data – What to do with it? (Part 2 of 2)

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

OpenAPI direkt in VS Code schreiben – geht das?

Wie als Software-Entwickler sichtbar werden?

Datenbanken testen mit Testcontainers in Mule4

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Tastaturen selbst bauen

PDF-Generierung aus dem Container – speedata Publisher

Keycloak-Konfiguration mit Terraform

Process Mining mit bupaR

Hyperledger Fabric CouchDB lässt meine Cloud-Rechnung explodieren

Kong API-Gateway – Observability mit Prometheus, Grafana und OpsGenie

Kong API Gateway – Deklarative Konfiguration mit decK und Visualisierung...

Play-with-Docker: Container-Workshops auf AWS

Kubernetes Operator: Operations-Wissen als Code

REST: Standardisierte Fehlermeldungen mittels RFC 7807 Problem Details

API-Management mit Kong – Ein Update und mehr

Abweichungen zwischen Spezifikation und REST-API mit hikaku erkennen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten