About 2 years ago, I introduced you to the architecture of CenterDevice , and it is now time for an update.
A quick refresher for those who do not want to read that, now outdated, article:
CenterDevice is a startup by codecentric which provides document storage in the cloud. It really shines for documents like invoices, orders, project management, presentations etc. where the powerful search engine finds you, what you are looking for without the need for any manually maintained structures. It provides plenty of means to share documents within or outside your organisation. All documents are encrypted and stored in Germany (if that matters to you).
TL;DR: In November 2014 we released version 2 of our API, relaunched all clients and moved our datacenter (virtually and physically). Some tech changed, some stayed the same. Better continue reading 🙂
CenterDevice offers many different clients to its users. With the relaunch we finalized the migration to a new look and feel and introduced a new frequently asked for concept called “collection”. Before “collections” we assumed organizing people and documents in a “group” would be sufficient, but it turned out that those are two different things. Now administrators can organize people in “groups”, and everybody can put documents into “collections”.
Among a unified look across platforms, we added powerful PDF viewing functionality to the Android app. The older version used preview images for each page, but now the app just downloads the PDF to display it. This brings increased performance, as well as added functionality.
When we went to implement mobile apps 3 years ago, we decided to go with native apps, rather than using a crosscompiler like Phonegap or Apache Cordova or even HTML5 apps. At least at that time it was not clear how good features like certificate pinning, local storage, camera access and preview integration for different file formats could have worked. We stuck to the plan and still have no intentions to rewrite the apps in any non-native way. Getting the app into the play store never was a problem. The shared usage of library code between Android, desktop and web apps is a plus. The downside with Android development is still the slow emulator, but it is slightly compensated by the easier distribution of test builds. The PDF viewing technology used is the commercial Qoppa PDF viewer.
The iOS app now supports phones and tablets with a common look and feel. Feature wise the most important addition is the added sharing functionality.
Apple gave us a lot of headaches with the release of our relaunch app, which was in development for a year. We planned with plenty of headroom for the approval process, but it took much longer. The main problem was, that this is now a new “app”: It has a new AppID because it is an universal app, rather than a iPad only app. This caused the big review process to kick in, which applied new arbitrary checks, which seemed ok two years ago. In the end we needed an expedited approval to be “only one week late”, which is after 4 weeks of review. Due to the slow adoption of iOS8, this app supports iOS7 and 8 and does not yet use any iOS8 specifics. Distribution of test builds is still a mess, sorry Apple. For viewing PDFs we are experimenting with the open source vfr/Reader as an alternative to the commercial pspdfkit.
A new member of our client family is the desktop application. Frequently requested, it seems that working with files is still a desktop and offline thing. The application will keep local copies of your documents in sync with what is on the server, so you always have access to documents even when you are offline. It will support monitoring certain local folders for automatic file upload soon. You can get it at www.centerdevice.de/download .
The Desktop Client is a JavaFX8 Application which comes with a bundled JRE. After trying various installer solutions we settled with a custom mechanism to allow separate updates of the JRE and the application code. JavaFX8 is finally a usable platform and was very easy to create the UI with. We used a minimalistic, slightly adopted version of Adam Biens “afterburner.fx” and Google Guice for dependency injection. Some of the data queried from the server is stored in a local Derby database, while the downloaded documents will reside as files in a directory. For now we decided to hide that directory and discourage manual modifications, because there are many tricky edge cases involved when the Desktop Client is actually not aware that you are about to modify a file.
The biggest feature set of all apps is still in web hand. Some of the more administrative workflows are only available here. The left hand side navigation is now hosting collections, while groups and users are on the right hand side. There are a few view modes for you to choose, and you can resize it to your liking.
Being a complex web application, it is unfortunately also the slowest of our apps. Especially in Internet Explorer. If you really need IE, I feel sorry for you. We use Vaadin 7.3 and a customized Valo Sass Theme. After multiple years of debugging and hotfixing, we have finally given up on using Vaadin push. If it works for you, you are lucky. It did not work for us, with all the potential network proxies and browsers our end customers use. We are now using a 5 second polling, which is “good enough” for us. If we need to poll faster we switch the interval dynamically. That is why we are looking forward to Vaadin 7.4, where polling no longer causes layout phases . Still I think Vaadin is a good choice for the type of application we have here at our hand, it allows a very easy integration into a Java stack, and using Node or Angular would require more work on that end. However abstractions come at a cost and debugging Vaadin might not be your cup of tea 🙂 For viewing all types of PDF files, we incorporate the open source mozilla/pdf.js viewer.
And there are a few third party clients already using the CenterDevice API. Unfortunately there is none which I can talk about, but if you are interested, we have published our API, so you could get started developing a custom extension:
The API lives at https://api.centerdevice.de/v2 but without valid auth tokens you will not get far 🙂
It is still implemented using Jersey. Versioning is implemented using a master class for each version which knows all valid resources. This pattern allows us to either reuse the same Resource classes for different API versions, or do customization by composition or inheritance. Its pretty flexible, but also difficult to judge when to apply which pattern for differences in versions. Being backwards compatible is a great challenge everybody should go through.
If you compare this picture to the last published architecture, you see a few changes, but nothing major. We still have the separation between Web Servers (called tomcat-centerdevice in the picture) and REST Servers (named tomcat-rest). The Web Servers host the server side of the Vaadin applications, as well as a few other pages and admin interfaces. The main difference from the REST Servers is that they maintain state, and require session stickyness. Plans are there to put the sessions into memcached, but so far were not at priority.
The REST Servers serve our REST API. All our clients use the above linked public API, with only a few exceptions for private management functionality, which uses a private REST API. As you can see in the picture, there is no direct access to any data store from the frontends, which increases security and allows us to scale better.
There are 3 data sources for the rest server:
- Elasticsearch for all kinds of search related queries.
- MongoDB for all metadata and user data.
- Ceph as the storage for all documents and various previews.
Elasticsearch replaced Apache Solr. Elasticsearch is very easy to maintain and fast. It finds its cluster members automatically and even if it doesn’t, a simple restart solves most of the issues. We had some problems when cluster members died, but it never affected production and was straightforward to resolve. Another nice thing about Elasticsearch is that it allows many index related operations on the fly. Like changing the schema. Christian has written a great blog describing our index handling strategy . We have a few more blog posts about Elasticsearch in case you are curious.
MongoDB is still going strong, but when we moved our cluster (more below) we noticed again that it was not built for administration 🙁 The schema free data storage is great, but for example taking and restoring a backup takes days (!) when authentication is enabled. Perhaps we were the only ones on the planet to run with mongo auth. Who knows. Besides that, my colleagues documented a lot of best practices in other blog posts .
Ceph is our replacement for Gluster. It is a distributed key-value store designed to hold binary artefacts. You could use it as file system, however that is not recommended. We use it as Swift compatible API using RadosGW. Whatever is stored to Ceph (mainly your original documents and preview images/pdfs of them) is encrypted using ChaCha20 256Bit. ChaCha20 is faster than AES if no hardware acceleration is used, and it is an open, crypto analyzed mechanism, in contrast to AES, which is still not proven to not contain a backdoor. Ceph performs really good, as you can find in Lukas Benchmarking Post . It is really surprising to figure out that a networked file system is actually much faster than local discs. However, Ceph is quite resource intensive during cleanup, maintenance or failover, so even when it looks like it is disc only, it actually requires some amount of CPU and is best placed on machines dedicated to “being the file system”.
Whenever a new document is uploaded, the REST Server sends the Document Server a message to start processing the document. The actual tasks executed on a document depend very much on its mimetype, the most important ones are:
- Apache Tika for text extraction.
- Tesseract OCR if Tika was unable to find text.
- LibreOffice to create PDFs out of document formats.
- ffmpeg to convert various video formats.
- Imagemagick + Ghostscript to create preview images out of almost anything.
All of those tools really work great, but are really tricky to set up and avoid regressions when some magic command line flags change. Sometimes the queue to the document server fills up a bit (it is a Rabbit MQ beneath), so we implemented a mechanism which will prefer processing requests from other users over requests from the same user over and over again, so everbody gets a fair share of processing power.
A new piece in the infrastructure is the Import Server. Users can add a Dropbox oAuth Token via the web interface (that is why we talk to dropbox from there) and the Import Server will upload selected documents asynchronously. The Import Server is designed to work with any third party data provider. We have prototypes for google drive and instagram, but they are not productified yet.
Another part not visible on the picture are e-Mail servers, which handle incoming mail uploads. You can generate a mail upload alias in the Web UI, to which you can mail attachments to. These attachments get extracted and uploaded to the REST server from the mail servers. The same mail servers are also responsible for sending out notification/subscription e-mails.
We also moved now to a completely virtualized infrastructure. But of course a virtualized infrastructure needs to sit on physical infrastructure. For that we have a mostly active-active HA setup for all networking and management hardware:
- Firewalls: 2x Dell Sonicwall NSA 3600.
- Switches: 4x Dell Networking N2024, 1x Dell Power Connect 5524.
- Management Server: 2x Dell PowerEdge R420.
- Worker Server: 7X PowerEdge R510, 24CPU, 128GB RAM, 12x4TB HDD, 6x1GBit Networking.
On top of that we run OpenStack as virtualization platform:
We run right now 4 “all in one machines”, which each come with 2 Tomcats, Import and Document Server, MongoDB and Elasticsearch. Everything is set up using Ansible, which is comparable to Chef or Puppet, but with reduced abstraction layers to be closer to shell commands operations people know. We like that simplicity a lot. (Colleagues have written more blogs about Ansible ). This is our “old” setup, which we plan to separate out into virtual machines in the next step. Besides that there are servers for e-Mail, AppDynamics monitoring and an admin gateway. You can find that “4” being mentioned in the AppDynamics screenshot above a few times.
Two HAProxy loadbalancers terminate SSL traffic and balance the internal and external traffic onto the worker nodes. HAProxy is powerful and allows plenty of configuration options. For example easy rate limiting , as described by my colleague Daniel.
We take pride in running a A+ rated SSL setup .
All our apps use certificate pinning. They only work if the receive the certificate our server should serve. This eliminates any potential for man in the middle attacks, as attackers might be able to forge a trusted certificate, but it would never be identical to the ones baked into the apps. Apps using certificate pinning are guaranteed to have a secure connection with the intended server.
Next step will be to containerize components like mail server and document server, so that we can scale them even easier. While it looks like that “4” is a hardcoded number in many places it is actually not. For example starting a new document server would just work due to the way it communicates via RabbitMQ. Similarly a new Elasticsearch node would just work. Our local development environments already run Docker, so hopefully this is an easy step (TM). New hardware is already available, as seen in the pictures above, and is currently being provisioned.
Optimizing iText performance using AppDynamics and YourKit
The following example shows how easy it is to combine a performance monitoring solution with a profiler. On a regular patrol through our AppDynamics monitoring on our continuously integrated projects, I found this interesting HotSpot in iText. iText...
27.11.2010 | 2 Minuten Lesezeit
Phantom java logger causing major performance problems
Recently at a customer, I saw massive amounts of garbage generated, causing many garbage collections, as well as a huge slowdown inside Hibernate code. I browsed through the slow transactions recorded in production by AppDynamics, and was wondering why...
11.11.2010 | 2 Minuten Lesezeit
Easy Performance Analysis with AppDynamics Lite
AppDynamics is the rising star in the Application Performance Management sky. Mirko gives a really good description why AppDynamics delivers the right solutions for todays distributed architectures in his Post “Troubleshoot Java in production – introducing...
30.8.2010 | 1 Minuten Lesezeit
A Different Take on Sprint Retrospectives
There are many ways to do a good sprint retrospective, so we decided to try a new one every now and then. This time we took the role of a painter, painting out impression of the last sprint into a formidable piece of art. It might look strange at the...
- Agile methods
4.8.2010 | 2 Minuten Lesezeit
WordPress WPML Comments Filter Plugin
I admit, this post is a bit “off-topic”. Recently we migrated this blog from using qTranslate to WPML for publishing in German and English. Main reasons were much better updates and a cleaner separation. But one feature was missing because of that: ...
28.6.2010 | 2 Minuten Lesezeit
Style Tests using Selenium and Robotframework
In projects facing end customers style matters, often more than less. While in internal apps it doesn’t matter if the UI changes after each release, there might be customers that want their app to follow a very strict style guide to integrate with their...
15.6.2010 | 4 Minuten Lesezeit
codecentric playing at german board game championship
“Dr. codecentric und seine kranken Pfleger”, (codecentric, M.D. and his sick attendants) the codecentric board game team, Andreas Ebbert-Karroum, Torsten Rodemann, Marc Clemens and Fabian Lange (left to right) competed in Dinslakenhighly motivated for...
27.2.2010 | 2 Minuten Lesezeit
Hot Coffee and Green Builds
Automated builds and tests already have a long tradition at codecentric, but we never managed to put up build radiators in our new offices. Till today. Developers could have looked up the status in the past, but getting it pushed to you while enjoying...
- Software development
1.2.2010 | 1 Minuten Lesezeit
Meet The Experts Architecture – Open Space: Managing the JAR Chaos
This post shall sum up the results from our fruitful discussion on friday evenig. The idea for the open space discussion was sparked by Stefan Zörner who talked about modularity and what happens when you have no control over modularity. This post will...
29.11.2009 | 1 Minuten Lesezeit
#devoxx 09: map&reduce and closures
A hot topic here at the Devoxx were the upcoming Java editions with their features and changes in the language syntax. While it is nice that you will be able to switch() on Strings, have a modularized platform and other cool stuff, one thing is a bit...
19.11.2009 | 3 Minuten Lesezeit
codecentric Crew visiting #Devoxx 2009
As every year, codecentric Developers are attending the Devoxx. Devoxx in Antwerp is among the top conferences for Java in Europe, known for its hand picked Speakers and excellent topics. No surprise that the depicted 7 gents in codecentric Shirts did...
18.11.2009 | 1 Minuten Lesezeit
JUG Cologne – 5th October – Slides on Eclipse RAP
Having a presentation slot at a Java User Group is always special. Its an audience who cares, or is there just for the buffet. No kidding, todays evening was great. Besides my talk on RAP for which i attach the slides, there were new insights on what...
5.10.2009 | 1 Minuten Lesezeit
Neal Ford at RheinJUG: Emergent Design & Evolutionary Architecture
Back after having a great evening at todays RheinJUG talk held by Neal Ford. It was almost a perfect fit for our upcoming Meet the Experts – Architecture . Because Neal has the slides on his github , I just want to briefly summarize my personal takeaways...
- Software architecture
20.9.2009 | 1 Minuten Lesezeit
Commit every day, or revert – Be agile, every day
One of the biggest problems in agile development teams is “effort”. Of course it is always about effort, because effort is money and we all like our money. In planning we can cope with effort quite easily: “oh that’s a week effort”, but when it comes...
- Agile methods
2.9.2009 | 4 Minuten Lesezeit
JSP Tag Pooling Memory Leaks
JSP custom tags were once widely used, but even still nowadays they find their way into projects. Not to mention the masses of production code using them. And almost all projects I have looked at using custom tags had the same issue. When writing JSP...
13.8.2009 | 2 Minuten Lesezeit
Convert InputStream to String
Because searching for “Convert InputStream to String” still brings up solutions involving StringBuffer, byte or something like that, developers still produce large amounts of different implementations of the same conversion in their projects. In an...
10.8.2009 | 1 Minuten Lesezeit
codecentric coding night – facts & figures
Hier einige interessante Statistiken zur coding night . Da die coding night ein „Projekt im Zeitraffer“ war, sind die von Hudson bei den automatischen Builds erstellten Statistiken ganz interessant. JUnit Test Ausführung Das erste was auffällt ist,...
- Agile Methoden
15.7.2009 | 2 Minuten Lesezeit
Eclipse Galileo and SVN
To prove that I can do short posts as well, here a quick guide to SVN in latest Eclipse release. This was not that easy in previous releases, but now it works like a charm: Help Install New Software… Galileo – http://download.eclipse.org/releases/galileo...
28.6.2009 | 1 Minuten Lesezeit
Can I change this Code?
“Can I change this code?” sounds like a normal question, but in my opinion it expresses a problem in agile development that needs addressing. Foremost: This is a very good question, because it shows a noble intent: Make code you found better. Following...
- Software architecture
- Agile methods
29.4.2009 | 3 Minuten Lesezeit
Data Validation Alongside Agile Development
I would like to discuss an issue one can likely experience with agile development processes and systems which data needs to be maintened during upgrades: A customer care application for a PC retailer was developed so far and the software is running pretty...
- Software development
27.1.2009 | 2 Minuten Lesezeit
Ajax World Conference in San Jose, CA
From the 20th to 22nd of October the 6th Ajax World Conference took place in the sunny San Jose, CA. I was there those 3 days as delegate of codecentric to catch up with the newest trends and developments in Ajax and RIA. I tried to collect and write...
4.11.2008 | 1 Minuten Lesezeit
Dein Job bei codecentric?
Agile Developer & Consultant (w/d/m)
An allen Standorten
Gemeinsam bessere Projekte umsetzen.
Wir helfen Deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Do you still have questions? Just send me a message.