Overview

Joins and Schema Validation in MongoDB 3.2

No Comments

Version 3.2 of the NoSQL database MongoDB introduces two new interesting features (amongst others) that I’d like to explore in this blog post.

Joins

The logical namespaces where documents are stored are called collections in MongoDB. Up to now every type of query, aggreagtion and even map/reduce job operated on exactly one of these collections.

In version 3.2 the aggregation framework introduces a kind of fetch join that enables you to load documents from more than one collection. Let’s assume the following schema …

customersorders
{
_id: ObjectId(...),
vorname: "...",
nachname: "...",
...
}
{
_id: ObjectId(...),
customer_id: ObjectId(...), // Foreign Key to customers
total: ...,
items: [ ... ],
...
}

… and the need to query the customers togehter with their orders. We use this test data set:

db.customers.insert( {_id: "c1", forename: "Tobias", name: "Trelle"} );
db.orders.insert( {_id:"o1", customer_id:"c1", total: 11.50, items:[{desc: "Item 1"}, {desc: "Item 2"}]} );
db.orders.insert( {_id:"o2", customer_id:"c1", total: 42.95, items:[{desc: "Item 2"}, {desc: "Item 3"}]} );

Our fetch join from the customers collection to the orders collection uses the new pipeline operation $lookup of the aggregation framework:

db.customers.aggregate( [
   {$match: {_id:"c1"}}, 
   {$lookup: {
       from: "orders", 
       localField: "_id", 
       foreignField: "customer_id", 
       as: "orders"}
   }
] )

The resulting customer document holds the array of orders in the joined field orders:

{ 
"_id" : "c1", 
"forename" : "Tobias", 
"name" : "Trelle", 
"orders" : [ 
   { 
   "_id" : "o1", 
   "customer_id" : "c1", 
   "total": 11.5, 
   "items" : [ { "desc" : "Item 1" }, { "desc" : "Item 2" } ] 
   }, 
   { 
   "_id" : "o2", 
   "customer_id" : "c1", 
   "total" : 42.95, 
   "items" : [ { "desc" : "Item 2" }, { "desc" : "Item 3" } ] 
   } 
] 
}

Right now, the join condition is expressed on one field on each side, that may become more general in future versions (with multi field join conditions).

Schema Validation

One very fundamental characteristic of document orientation in MongoDB was the schemalessness, i.e. the absence of a validation mechanism that enforces a schema on documents of a collection. You had neither mandatory nor type checking on the fields of a document.

Now you can define a so called validator on the collection level that can perform type checking and even semantic checks:

db.createCollection("customers", {
   validator: { 
      name: {$type: "string"}, 
      age: {$type: "int", $gte: 18 }
   }
})

We define expected types for the fields name and age. That also makes them mandatory fields. For the field age we define a condition that requires the age to be >= 18. The syntax is more or less the same as with find queries. An invalid document is rejected with an error message:

db.customers.insert({_id:"c2", name: "Trelle", age: NumberInt(8)})
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 121,
                "errmsg" : "Document failed validation"
        }
})

You have to provide an age >= 18 to successfully become a customer:

db.customers.insert({_id:"c1", name: "Trelle", age: NumberInt(25)})
WriteResult({ "nInserted" : 1 })

Conclusion

The fetch joins give you a lot more freedom when designing your schema. You are no longer forced to plan purely query orientated. It also reduces denormalization. Of course, joins will eat up some of your speed. They will impact performance, also in MongoDB.

Schema validation will help you to ensure the semantic consistency of your data. MongoDB can now act as an additional validating instance for your business data. Validation too, will have performance impacts.

With these two new features MongoDB continues providing more and more enterprise readiness. They want to be as powerful as their relational counterparts that offered joins and validation since the beginning of the IT age. MongoDB is becoming an all-purpose database. Let’s see how long this goes along with the basic idea behind the NoSQL movement …

All details and more new features can be found in the release notes of version 3.2.

More content about Big Data

Comment

Your email address will not be published. Required fields are marked *