Welcome to Axon Framework 102, where we will be deep diving into many interesting challenges you will encounter when working with Axon Framework. We will be diving into asynchronous projections and letting the front-end know new data. We will take a good look at handling CRUD interfaces (which we cannot always avoid) and all kinds of other good topics such as testing for missing event handlers. This blog post will dive into dealing with personal data in your Event Store. Or even better: How NOT to.
About the series
I have been working with Axon Framework for two years now. I have written my thesis about strangling a remaining monolith at the Port of Rotterdam Authority and I am currently doing exactly that with Axon Framework. We have seen interesting challenges, such as a great number of events, long replays and privacy regulations. I believe those challenges to be relevant to all people using Axon Framework, or maybe even all people using Event-Sourcing! This is why I’m sharing these challenges and possible solutions with you; so you can be inspired to make a solution just as good, or even better.
This blog series will require some base experience with Axon Framework. If you have no experience with it yet, it is a very cool framework and you will probably enjoy learning and using it. So get your hands dirty and come back in a bit. This blog series will be waiting for you.
Privacy regulations are a pain
By now we all know about GDPR, right? It’s the privacy regulation of the EU that gives the customer certain rights about his or her personal data. For instance, they have a right to retrieve all data related to them, or to have certain or all data deleted.
This presents us with a dilemma. Let’s consider the following event to be in our event store:
Now C. Boyle calls our company. He wants his data removed, but our event store is immutable. We now have three options:
- Ignore the request, knowing that it might incur a fine by the authorities.
- Delete the event (and all possible other data) leaving a gap.
- Alter the event, masking the value
Take a moment and consider all the options. The last two options are not even possible if you use a “true event store”, which is immutable by nature. In some cases, for example, if the events are stored in a relational database, the events can be altered and deleted. But true event stores won’t let you do that. Deleting the event is probably not what you want; there is other valuable data in there, such as the items bought. If you delete that, the entire transaction would be removed from history and present huge troubles for your stock tracking.
This means that, if you even can, the only viable option was to alter the event changing the customer’s name to a masked value such as ***. However, modifying the event store is not a good thing to do since you are altering the past. Furthermore, the name is only erased from the event store, but all projections still have the value in the database tables.
Luckily for us, we have found a better way to do this. Some people have gone with a crypto-shredding approach and Axon has a commercial data regulation library that takes care of it for you. Personally, I prefer a more extreme approach.
Ignorance is bliss
You can keep your event store in the dark about the personal data in your system, without losing access to it. We can achieve that with a cool Jackson feature; custom serializers. Let’s dive in.
When you publish an event from an aggregate Axon stores it in the event store. Before being stored it is first processed by a Serializer to convert them in an appropriate format. Serializers are also used to convert them back from that representation when Axon reads the events from the store. Not only events are processed by them, snapshots, commands (when using Axon Server), and the metadata that is stored.
Axon offers three serializer implementations; Xstream, Jackson, and the java serializer. Xstream is the one enabled by default. You could also write one yourself (for example to serialize to YAML). However, simply because I like JSON more than I like XML I have configured axon to use Jackson by writing the following spring boot config.
Now Jackson is in charge! And Jackson has just the feature we need for our cause; custom serializers. You can write your own serializers so certain Java classes are serialized in the way you want them. By creating a PersonalData wrapper for a String we can say to Jackson not to serialize the value of the String, but anything we want instead. You can see the effect of this in the following code.
We can now instruct Jackson that every PersonalData object present in an event, metadata, or other location, it should serialize this in another way. In our serializer we will lookup or write the value to a database table and store the id in the JSON instead.
This way personal data never even enters the event store while we can still access the value whenever we want. It also allows us to delete or mask the personal data without altering the event store in any way. Let’s get the serializer to work:
As you can see, it is pretty simple. Whenever this serializer encounters a PersonalData class, it will write that value to a database using the ‘PersonalDataStore’, get its id and write that in the JSON instead. We can also use the same principle to revert the process and access the data again. This is what our deserializer does:
In conclusion, this approach enables you to keep personal data out of your event store while still being able to see the data, delete the data or mask the data. The only thing you have to do is wrap it in a PersonalData class. Great, isn’t it?
Let’s take a look at the following aggregate. The aggregate keeps a user’s real name, wrapped in a PersonalData object. This allows access to the value while prohibiting it from being stored in the event store or in snapshots. In all other ways, it works the same as a String value would.
When we now create an account through the REST-endpoint, the following event is published by the Aggregate.
As you can see. there is no personal data present in the event. Of course, the personal data is still there but is stored in a database table as you can see below.
This means we have succeeded in keeping the personal data out of our event store!
I cannot post all the code here, so I selected the important bits and pieces. You can find the full source code of the demo application here: https://github.com/Morlack/axon-102/tree/main/personal-data/src/main/java/com/insidion/axon102
Take a look and try it out for yourself!
All power comes at a price. Each time events containing personal data are read it’s necessary to consult a database table. That means the application is a little bit slower when reading and writing events, but I think the impact is negligible unless you got insane amounts of personal data in events, as the lookup is very fast. You can use Spring Boot caching with Caffeine to improve the performance. We use it in our projects and I highly recommend it.
The serializer approach we present here can only be used for new events or entire projects. Projects that already have personal data in their event store are at a disadvantage since the store already contains it, so you have to get it out first or decide only new data is written to the database table. It is never too late to implement it and write an upcaster to take advantage of it!
You can use Jackson to your advantage in order to keep personal data from your event store. This saves you the hassle of (illegally) editing your event store or deleting events, if these are even possible. The next Axon Framework 102 blog will focus on using metadata in your Aggregate and projections in an efficient manner, stay tuned!