Finally time for the almost final blog post on this series. It is the sixth lecture from the MongoDB course I am doing at 10gen education. As it is good tradition by now using film titles for the blog I will of course keep up this tradition. Nevertheless I can already say that the final blog post on this series will be a bit different from all the others as is the final lecture of the course.
But let’s start diving into the topics of this post. Those are:
- Durability of writes,
- Availability and Fault Tolerance and
- Horizontal Scaling.
Ok, so the first thing to talk about it …
Durability of Writes
Let’s take a look at how a MongoDB-system looks like if there is only one instance (one mongod) running.
MongoDB – Single Instance
There are three processes shown in the above figure. One is the mongod-process which is the MongoDB database server process. Then there is an Application shown that acts as an example for any application using MongoDB and there is the mongo shell, which is in principle also an application using MongoDB. All of those might be running on the same or on different machines.
Now the way MongoDB works is that there is (and must be) a driver to access the mongod-server. No big surprise here I would say as it is the same with any other (relational) database. The driver is used to send commands from the application to the mongod-server. What is important to know here is that MongoDB is using a “fire and forget” approach per default (this has only changed in the most recent driver versions). This means any errors of an update- or insert-operation will not be reported back.
“Now if your want to know that your write was actually received by the server, that it got it, that it processed it, that it – not that it completed it – but that at least it didn’t violate any indexes and seems ok syntactically then you issue a getlasterror-call.” -Quotes from the course
Of course there is a possibility to get information on the execution of commands and that is issuing a getlasterror-call to the mongod-server. As written in the above quote it does not tell if it was completed, but it shows if everything was syntactically correct. The mongo shell is issuing this command automatically after each insert and update. For an application one has to decide whether to configure the driver to make this call or do it manually after certain operations.
But there is more to it, as there are two parameters available to the getlasterror-method that pretty much control the amount of information received. Those are:
- w – This tells the driver to wait for the write to be acknowledged. It also ensures no indexes are violated. Nevertheless the data can still be lost as it is not necessarily already persisted to disc.
- j – This stands for journal-mode. It tells the driver to wait until the journal has been committed to disk. Once this has happended it is quite sure that the write will be persistend unless there are any disc-failures.
“Waiting for the write to be acknowledged. Now that does not mean that mongo did the write to disc. There is still a window of vulnerability here. It may have gotten the write and made a change in-memory and then power fails and that write will be lost forever.” -Quotes from the course
The following table gives an overview of these paramters that define the write-concern. (It basically defines how concerned you are about knowing that your writes have been completed.)
|0||0||This is “fire and forget”.|
|1||0||This will wait for an acknowledgement that the write was received and no indexes have been violated. Data can still be lost. This was formally known as “safe-mode” and should be the default nowadays in the drivers.|
|1||1||The most save configuration by waiting for the write to the journal to be completed.|
|0||1||Basically the same as above.|
There is still one more thing 🙂 to consider here. Even though we are setting our write-concern to journaling-mode there can still be network errors. Most applications using a database are running in a client/server-environment and even though it is typically a very rare event it could be that the acknowledgement send back to our application got lost.
“There are these network errors and you can never be completely sure what exactly happened.” -Quotes from the course
Of course this again is not a problem that is specific to MongoDB but that could happen with any database system. In the end it is up to the design of the application how to deal with this kind of uncertainty.
Replication (Replica Sets)
Replication is the answer to the problems of ensuring availability and fault tolerance of the MongoDB database system. (The same is of course true for almost any database system.)
In the mongo world the concept used are so-called Replica Sets. The following figure depicts this.
MongoDB – Replica Set
The replica set always has one primary node and several secondary nodes. Data is shared among the nodes, but for the application this looks as one mongod-instance. A replica set must have at least three nodes. If the primary node fails then there is an election process among the remaining nodes which results in a new primary node being elected. A node is also having a priority which is relevant in the election process. If the priority is set to zero for a node that node cannot act as the primary node.
But there is more to replica sets and this is that there are different types of nodes. So far we have only talked about regular nodes which can be a primary or a secondary node. The following table gives an overview on the types of nodes and their purpose.
|Type||Allowed to vote||Can become primary||Description|
|Regular||yes||yes||This is the most typical kind of node. It can act as a primary or secondary node.|
|Arbiter||yes||no||Arbiter nodes are only there for voting purposes. They can be used to ensure that there is a certain amount of nodes in a replica set even though there are not that many physical servers.|
|Delayed||yes||no||Often used as a desaster recovery node. The data stored here is usually a few hours behind the real working data.|
|Hidden||no||no||Often used for analytics in the replica set.|
Having a replica set all writes must go to the primary node, but it is possible to do reads also from secondary nodes. The reason for this could be the wish to do some read-scaling.
“The lag between any two nodes is not guaranteed because the replication is asynchronous. You wanna keep that in mind.” -Quotes from the course
In this case it is important to know that when reading from a secondary node there is no guarantee that the data read is up-to-date. When doing reads and writes to the primary node only the data will be guaranteed to be consistent.
“Some other systems have a weaker form of consistency, something that is called ‘eventual consistency’. And evetual consistency menas that eventually you’re be able to read what you wrote but there is no guarantee you can read it within a particular timeframe. And the problem with eventual consistency is it is pretty hard to reason about […] it is a little disconcerting to write session information – or some other information – into the database and then read it back out and get a different value.” -Quotes from the course
In order to ensure this there will be a – typically short – period of time during a failover where it is not possible to perform any writes. This is the timeframe between the moment where the primary node went down and the moment a new primary node has been elected.
Failover and Rollback
Let’s take a closer look at the way failover and rollback is happening in MongoDB. Assuming the primary node goes down while there is a write in progress. In this case a new primary node will be elected that does not know anything of this write and the data will thus be lost. If now the failed node – that was previously the primary one – comes back online as a secondary node it will detect the unfinished write and do a rollback on it. The data that is rolled back this way will be written to a specific log file but not be recovered automatically.
“When failover occurs inside a replica set there is a pretty good chance that your app is going to see at least on exception on the connection.” -Quotes from the course
The above problem can be solved for almost all cases by using a write-concern that waits until the majority of the nodes has been updated. This can be defined in the corresponding driver used in the implementation. So let’s revisit the write-concern right away with respect to replica sets.
Write Concern in the context of Replica Sets
In the first paragraph of this blog post we have been talking about setting the write-concern (those w=0/1 and j=0/1). Basically now in the context of replica sets the values set there become a more distinct meaning. For the w-parameter it now basically means the amount of nodes that have acknowledged a write. If having three nodes “two” might be a good value here. There is also a short notation for this by using w=’majority’. For the journal-parameter the value of one is still the best that can be done. It means the data is written to the journal of the primary node.
Basically there are three different ways to configure the write concern:
- For the entire connection (using the driver).
- For a specific collection (using the driver).
- Configure defaults for this directly in the replica set.
So much for replica sets and let’s jump to the final topic of this blog post and the whole blog series which is …
Sharding is the way that MongoDB is using to scale out. The basic idea is to split collections over several nodes and then accessing them via a node that acts as a router. For the application only this router is visible.
“Ok, congratulations, we are going to talk about sharding which is our approach to horizontal scalability. But more importantly sharding is the final topic of the developer course. So you have made it a tremedous distance in the last six weeks, so congratulations.” -Quotes from the course
The following figure depicts a sharded environment. The different shards can be individual mongod-instances, but it is much more likely that those are replica sets of their own.
MongoDB – Sharding
Whether or not sharding is used is a decision that is taken on database-level. So either the whole installation is sharded or there is no sharding at all. Nevertheless it is not mandatory that all collections support the sharding (how this support looks like comes in a few seconds), but if they do not support it they are simply running entirely on shard 1.
To take advantage of a sharded environment a collection must declare a so-called shard-key. This key must be defined from a field (or several fields) from the document. The shard-key must be carefully chosen as it is used to split the collection into chunks. Those chunks are then distributed to the individual nodes. If there are too few chunks or the distribution is very uneven this might lead to suboptimal performance.
There can be more than one router-instance (mongos) if scaling is required here. The application will then connect to one of those mongos-instances and has to do a failover pretty similar to the way it is done in replica sets if the used router-instance goes down.
The mongos-instance is using the shard-key to determine the used chunk and thus the used node. This means that the shard-key must be present in all writes to the database. In reads it can be ommitted, but this will have a bad impact on the performance as it means the mongos-instance needs to request all shards to find the result.
As it is very important let’s sum up the implications with respect to the shard-key:
- Every document has to define a shard-key.
- The value of the shard-key is immutable.
- The shard-key must be part of an index and it must be the first field in that index.
- There can be no unique index unless the shard-key is part of it and is then the first field.
- Reads done without specifying the shard-key will lead to requests to all the different shards.
- The shard-key must offer sufficient cardinality to be able to utilize all shards.
Ok, that was it for the sharding and almost for this complete MongoDB blog series. It was really fun attending the class and blogging on it. And I have really learned quite a lot (not only on MongoDB, but also on databases in general).
But: This is not the end yet. I have some ideas for a final blog post of this series even though there is not really a seventh lecture with new stuff to blog about. Hopefully it will not take too long to come up with the final one as basically now is a quite good time for some blogging ;-).
The MongoDB class series
Part 1 – MongoDB: First Contact
Part 2 – MongoDB: Second Round
Part 3 – MongoDB: Close Encounters of the Third Kind
Part 4 – MongoDB: I am Number Four
Part 5 – MongoDB: The Fith Element
Part 6 – MongoDB: The Sixth Sense
Part 7 – MongoDB: Tutorial Overview and Ref-Card
Java Supplemental Series
Part 1 – MongoDB: Supplemental – GRIDFS Example in Java
Part 2 – MongoDB: Supplemental – A complete Java Project – Part 1
Part 3 – MongoDB: Supplemental – A complete Java Project – Part 2