Anchor modeling offers agile database design, immutable data storage, and enables temporal queries using regular relational database. This catchy excerpt certainly spiked my interest two years ago at Data Modeling Zone conference in Hamburg.
I enjoy making discipline crossovers and in this article I would like to discuss the concept of Anchor modeling related to software development. Anchor modeling has become a trendy subject in the field of business intelligence (BI). However, not everybody will be familiar with what’s buzzing in BI land so I will provide a background into Anchor modeling and then discuss its potential merits for developers.
A lot of companies now favour an iterative approach over a big up-front design approach. Although it might not feel like that on every day. In general I would say we are certainly improving our capability to change the things we built in the past and adapt our plan for what we will built in the future. However the big improvements have generally been at the source code level of our software and in recent years at the deployment of our software. While refactorability of our code might have improved a great deal, I feel industry practices regarding incremental database design changes are not up to par in most development teams. This is exactly something that Anchor Modeling promises to address.
A word of warning. To keep this article readable, I might have stereotyped software development in this article.
Where did Anchor Modeling originate?
Anchor modeling originated in the field of data warehouses ten years ago and was formally presented in academia in 2009. The basic premise of Anchor Modeling is that it offers non-destructive model iterations while avoiding null values and eliminating redundancies. Anchor modeling is an extension of normalisation practices where the result is mostly in the 6th normal form. But don’t be alarmed or put off. A major part of this design approach is that Anchor modeling gives you the means to make this design manageable.
In recent years, these practices have found their way from academia to especially the business intelligence discipline. I am particularly interested how I, as a developer, can learn from these advances and where I can apply them in practice.
Anchor modeling basics
Anchor modeling has four basic building blocks: anchors, attributes, knots and ties. All four of these building blocks are implemented using their own table. The basic concept of any anchor model are the anchors. The concepts of Anchor models are explained below.
- An anchor holds the identity of an entity using a generated surrogate key rather than its natural key. An example could be “Tenant” and “House”.
- A tie defines the relation between usually two – but potentially more – entities (thus anchors). An example would be a relation from Tenant to House that defines the ownership
- An attribute – representing some piece of information – holds the attribute value and is related to an anchor. Attributes are strongly typed. An example could be “Rent”.
- A knot defines the possible limited states of a tie or attribute. So effectively it provides context by defining for example a limited set of genders. It provides context on for example a tie like active (yes/no).
Let’s visualise this using the free online modeling tool on www.anchormodeling.com.
House and Tenant are anchors here connected using a tie. AmountOfRooms, Rent and PhoneNumber are attributes connected using knots. Exported to PostgresSQL this simple model will provide us six tables, twelve views to process this information, and 21 stored procedures. This might look complex, but the tables are our main focus, and you could easily follow Anchor Modeling principles without the online tool and sql generation. When you start out, it’s just an easy quick start. You will get highly optimised views to query information out of the box.
Why Anchor modeling in online transaction processing (OLTP) software?
As a developer you first question at this point could be: why blog about a relational database technique in 2017? Well, it may not be the most popular topic to write about in the development scene, but it is still the most popular storage technology. On the topic of agility, many NoSQL solutions offer relaxed schemas as a way of dealing with changes in requirements, which is nice of course. But NoSQL technology has its own downside – which is really product specific (and beyond the scope of this article). I also personally think that the richness of an SQL interface to your data storage is as yet unmatched in NoSQL technology and is still highly appreciated. Moreover, initiatives like the development of – the highly potent – Cockroach DB seem to support that idea and could potentially unleash a new wave of RDBMS adoption. So it seems that keeping some of your eggs in the relational basket and some of them in the latest-practices-basket will go a long way.
As a developer we often only deal with databases as ‘the system of record’ in our systems that support a business function. The storage behind our applications that we store into and retrieve from is often aimed at producing information about business operations. Information is mostly stored in a state and structure that is very closely related to the software that uses it. So having a highly normalised database might be a scary thought for a developer. In university you are probably taught the basic principles of database design and implementation. If you are like me, you probably haven’t encountered the same amount of academic purity in the databases you use in your working life. A common – potentially outdated – understanding is that highly normalised databases have many practical, but also performance related downsides on the operational front. Another problem is that as a programmer, we have to deal with the structure of information in our storage, and this tends to reflect on the representation of our models in our code. One could say that in many cases the design of the database leaks through. Of course this also happens the other way around. A customer object that consists of ten fields could end up in a database in a single table with ten columns. We are used to create an abstraction layer in our software to protect the business side of our application from changes in the database and vice versa. But I would argue protection is not good enough.
This is the case I am specifically interested in. I would like to discuss how Anchor modeling can aid an agile team in properly designing, building, and maintaining a relational storage layer. Secondly my goal is to show that we can do proper design iterations in an agile team. If storage is easier to modify people are less inclined to take a one-time right approach.
Agility and temporality
One of the defining features of anchor modeling is the capacity for non-destructive schema evolution. In other words, the characteristics of the storage can differ over time without invasive redesigns and large migrations. This is certainly desirable if you are designing and maintaining a huge data lake and have to satisfy business wishes on a weekly basis, like in the position of business intelligence analyst. So let’s see how these rules apply to business software development.
Anchor modeling has been designed with temporality in mind. Attributes and ties can both be of a temporal nature. This is optional. Ties and attributes can be meta-dated with timestamps to signal the lifespan of relations. I will show an example later on.
Let’s enumerate the more invasive database changes that we think about on a daily level, where we are used to deal with databases that are designed foremost to remove redundancy and are often in 3rd normal form.
|Conventional database||Anchor modeling|
|A new column||This changes the structure of a table. An addition of a column is still moderately easy. We alter the existing table. Sometimes you can keep your application alive, but it is most likely unable to query the table for some time. Rollbacks are possible but require some effort.||Using anchor modeling you add an attribute table and, if required, a knot table to your database. Existing tables are left untouched, so most RDBMS will stay up and running.|
|Removal of a column||This changes the structure of a table. A column removal is difficult. The table is locked during the transaction. The application needs migration to be able to handle the new table design. Rollbacks are not possible or require a lot of preparation.||Using anchor modeling removing the attribute of an entity is substantially easier. By default everything is designed to be immutable. So the removal of a relation between an entity and an attribute will actually not cause an update in the database schema. Inserts to the knot table is enough to couple or decouple a relation. There is also the option to use the temporal aspect of an anchor model design.|
|Removal of a table||This changes the structure of a schema. A table removal is difficult. The table is locked during the transaction. Related tables often as well. It is destructive in nature. The application needs migration. Rollbacks are not possible or require a lot of preparation.||Using anchor modeling the removal of an entity is substantially more easy. By default everything is designed to be immutable. So the removal of a table will actually not cause an update in the database schema. The relations can simply be invalidated.|
In the example above the rent and telephone number are marked as temporal (indicated by a dot in the design), as well as the relation between a tenant and a house. This means that when things change, the current reality can be replaced by a new one. By leveraging the power of SQL select statements, combined with parameterized views one can even travel in time to see what the state of our database looked liked on Christmas 2015! And thanks to the dispersed character of the data, this is agnostic of both the structure or the content of your tables.
So, enough with all those nice features. Let’s get some drawbacks on the table. Well, for starters there’s of course the mental effort it takes to fill your head with the number of tables used by a 6th normal form database. I think one of the biggest contributions to the enormous success of RDBMS is the degree to which the solution fits in with our mental (3rd normal form) model of the world.
It is perfectly possible (I have seen this first-hand) for people to adopt these design principles and write down properly structured tables. However, most people will need a tool for this. Luckily most RDB design tools will work and the more advanced ones like Vertica offer features to support this work for uses-cases where the landscape will be too complex to grasp.
Your performance behaviour will vary. I am not a professional DBA nor do I profile databases on a daily basis. So I will refrain from making bold performance claims. One thing to note is that Anchor modeling is built with SQL execution optimization in mind. It will use many advanced features that modern systems offer. One of the most important is table elimination. This limits the number of products you can use (taken from www.anchormodeling.com).
|Microsoft SQL Server 2005||full|
|Microsoft SQL Server 2008||full|
|Oracle 10gR2 Express Edition*||partial|
|Oracle 11gR1 Enterprise/Express Edition||full|
|IBM DB2 v9.5||full|
|PostgreSQL v8.4 beta||full|
|MySQL v6.0.10 alpha||none|
So far we have gone over the concepts of Anchor modeling and seen some nice features that Anchor modeled databases can offer. By leveraging the temporal nature of a database design a development team is potentially better able to adapt to change. The temporal nature of a relation in this design makes schema evolution non-disruptive. You will no longer be restricted by the decisions of your past self (or a colleague). Furthermore the concept of Anchor modeling delivers a set of rules that implicitly work towards a proper design.
A second important lesson I took from looking into this is that I, as a 30-year old developer, have become biased by the maturity of RDBMSs. I normally work with technology that helps me abstract the knowledge I need when working with databases – like Hibernate – and in general lost interest in what problems databases can solve for me. Added to this is of course a movement towards the hip and buzzing NoSQL solutions over the past years. In the development scene I just don’t see many developers blogging (for example) about what’s new and hip in this latest PostgreSQL release. Or how a new release opens up nice technological opportunities. So this new kind of modeling definitely renewed my interest.
The third big takeaway is that the world beyond 3rd normal form has come within reach and should be part of our toolset when we face problems in practice. Anchor modeling is not something that you have to adopt wholesale. It is a design approach that helps you to deal with change. You could for example only apply it to the parts of the system that have a high rate of change, like your product entity.
In the end it’s a matter of making a proper design decision, and also Anchor modeling is not a silver bullet 😉
A lot of credits due to the nice reads and introduction at:
- Regardt, O., Rönnbäck, L., Bergholtz, M., Johannesson, P., & Wohed, P. (2009, November). Anchor Modeling. In ER (pp. 234-250).