Overview

MongoDB: Second Round

6 Comments

My class on MongoDB is continuing and so is this blog series. After not only surviving the first contact with MongoDB but finding its concepts quite intriguing I am curious what comes next.

MongoDB – Week 2

I am learning more on schemas for a schema-less database. This sounds quite confusing, doesn’t it? Well, what is meant here is that MongoDB does not have a fixed schema the way traditional relational databases have them. Nevertheless there is something that is called a document-oriented schema. Thus even when using MongoDB one is not free from considerations about how to structure the data to be stored. Even though things are far less restrictive here. Some basic considerations:

  • Is it wise – and at all possible – to put all data into one collection?
  • Is there the danger of hitting the 16MB limit that exists in MongoDB for one document?
  • What is data that really belongs to one document?
  • What about changing some data in a central place (using references) vs. changing it in all the documents?
  • All comes down to the question “To embed or not to embed?”.

I have to admit that in this section some well-groomed habits of mine related to database design are put to the test.
The example in the class is based on a blog. Thus one document is more or less one blog entry and it is designed in a way that possible tags given for a blog are simply stored inside the “blog document”. The moment I see this my mind starts screaming: “What if a tag name is changing? I need to change it in all the documents. We need references for sure!” I had to drink a coffee first to calm down and being able to continue with the class.

This is a topic that I consider really crucial (at least for myself and my understanding of this NoSQL stuff). Therefore I would like to share some thoughts on this. It is kind of a lucky coincidence that I am just working in a project where we – naturally – have to do some relational database design as part of the application development. A lot of thoughts are put into the database design, questions whether or not certain “joins” will be performant enough later on or if we run into trouble. When to use a redundant approach for storing data and when to do more normalisation of the database design. And in the end quite some changes are required in the implementation once we decide to do certain changes to the database design. Not to mention the need to change the database schema on all the environments.

Now with the very limited knowledge I gained on MongoDB so far I am wondering how things would change when switching from DB2 to MongoDB:

  • First of all I would have no clue what is relevant for the performance of MongoDB. Hopefully that comes later in the class.
  • I have the feeling that the document-oriented schema-design would be following the design of the data structures used in the implementation much closer. And not other way around as with a relational database.
  • Fundamental schema changes would be less painful.
  • I would probably use less references as I do in a relational database design. Even though I am not quite sure why I think so.

The above mentioned points are like a snapshot of what I believe how programming based on MongoDB would change my approach to database programming. Hopefully I find out more concretely towards the end of the class.

The Mongo Shell, JavaScript & BSON

I think I was not crystal clear about this in my last post as I was myself not 100% sure yet. But the Mongo shell is indeed a complete interactive JavaScript interpreter. And putting aside the things I associate with JavaScript when it comes to web-development, it is really a nice idea. No need to learn an entirely new language and JavaScript seems to be a quite natural choice when a lot of JSON processing is involved.

“If you happen to be a JavaScript expert and you catch me doing things that happen not to be stylistically desirable in JavaScript I apologise in advance.” – Quotes from the course

When it comes to the Mongo shell I am really starting to like it a lot. Being an old Unix guy it is nice to see that the Mongo shell implements feature from the good old bash. For example it offers autocompletion (using tab key), it allows you to scroll through the history of commands (using up and down arrows) and one can jump to the beginning and end of a line (using CTRL-a or CTRL-e). I think I really like it!

“Be careful when you use programs in a language like JavaScript or Perl which might not have the expressiveness or the type expressiveness to faithfully represent all the types that MongoDB can store.” – Quotes from the course

MongoDB is storing data internally in Binary JSON (BSON) format. Well, looking back at database systems I have been working with I was never that much interested in how those systems internally store their data. Normally I would like this to be a blackbox for me. But on the other hand it cannot hurt.

So basically MongoDB supports all the types that BSON supports. And that is also a requirement for programming languages that would like to offer drivers for MongoDB. They have to support the BSON data types and so does the Mongo shell. An example:

> db.startrek.save({stardate:new Date()})
> db.startrek.find().pretty()
{
	"_id" : ObjectId("5092a74cddb9073fa519a954"),
	"stardate" : ISODate("2012-11-01T16:46:04.409Z")
}

So obviously in programming one would be working with the corresponding types of the used programming language. Anyway there is a nice quote from the lecture on JavaScript date I have to share.

“Date … which is for reasons having to do with esoteric details of JavaScript will print out with a constructor we call ISODate.” – Quotes from the course

There is one more thing I start to understand with respect to using MongoDB. When performing operations you do this entirely by using methods/functions from the programming language API’s. It is not done using a separate language like SQL – used then in the worst scenario as Strings inside some programming language – or some mixtures thinking of Java JPA queries for example. I do not have the final grip on this, but if things continue to develop into that direction it would be really beneficial I think.

IFUR previously known as CRUD

The CRUD operations are well know to anyone implementing applications based on databases (or better storing data in them). Obviously those operations must be possible with MongoDB as well to use it in any meaningful way. This is not a relational database, thus the terminology is slightly different ;-):

Create =>Insert
Read =>Find
Update =>Update
Delete =>Remove

Now we are getting closer to the nuts & bolts of this. And not sure if I mentioned it already: It really is more and more fun working with the Mongo shell the more details of how to use it are revealed. A few more very basic things have become clear to me in this weeks lessons. Therefore let’s take a look at the following commands given in the shell:

> db
test
> 
> db.ships.insert({'name':'USS Defiant','operator':'Starfleet','type':'escort','class':'Defiant'})
> db.ships.findOne()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}

We are working on an implicitly given object named “db”, which represents the database. If nothing is done in the shell to change this we are connected per default to the “test”-database. The next level is the collection we are working on. In the above example this is the collection “ships”, which has been created automatically when we insert the first document into it.
Using the “findOne” command we can kind of select a single document from a collection. If we do not specify any parameters we simply get one random document back.

“In MongoDB we call update update.” – Quotes from the course
Reminded me immediately of how to pronounce Linux :-).

But now begins part of the fun as we can select documents by passing a – part of a – document description in as a search parameter. So let’s create a few more documents and give it a try afterwards:

> db.ships.insert({'name':'USS Enterprise-D','operator':'Starfleet','type':'Explorer','class':'Galaxy'})
> db.ships.insert({'name':'USS Prometheus','operator':'Starfleet','class':'Prometheus'})
> db.ships.findOne({'name':'USS Defiant'})
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}

The MongoDB expert is most likely recognising right away that I have omitted one value – namely the type – in the last insert-statement. As it was not available from Memory Alpha I took advantage of the non-fixed schema right away.

In the “findOne()”-command it becomes visible how to search for a document by using part of the document as the input. This is fundamentally different from the SQL-approach where we would be using Strings in the search. And I have to say it starts to feel quite natural working this way. Using a second parameter we can specify which elements from a document should be displayed:

> db.ships.findOne({'name':'USS Defiant'}, {'class':true})
{ "_id" : ObjectId("5092b4e9ddb9073fa519a955"), "class" : "Defiant" }
> db.ships.findOne({'name':'USS Defiant'}, {'class':true,'_id':false})
{ "class" : "Defiant" }

It can be seen that when using the second parameter to “findOne” only those elements are shown, which keys are defined as true. Anyway the “_id” is an exception as this one is displayed per default unless explicitly suppressed as shown in the second example above.

“The value automatically generated for the ObjectId has very high probability of being unique … so long as time continues to move forward.” – Quotes from the course

While digging deeper into MongoDB’s query language it turns out that the “find()”-method takes very similiar arguments as the “findOne()”-method. Of course the “find()”-method is designed to return more than one resulting document. In fact if a lot of documents are fetched by a find those are shown in batches (of 20 per default). It is then possible to use the “it”-command (iterate) to load further hits.

Obviously – and not unlike SQL – there is a need for query-operators to be able to make more precise queries. So far we have been matching documents basically using concrete sample documents. The following examples are giving an expression on how operators can be used together with the “find()”-method.

> db.ships.find({operator:'Starfleet'}, {'name':true, '_id':false})
{ "name" : "USS Defiant" }
{ "name" : "USS Enterprise-D" }
{ "name" : "USS Prometheus" }
> db.ships.find({operator:'Starfleet', 'class':'Galaxy'}, {'name':true, '_id':false})
{ "name" : "USS Enterprise-D" }

Before the first operator one more example showing how to build a query using more than one value from the document.

> db.ships.find({class:{$gte:'P'}}, {'name':true, '_id':false})
{ "name" : "USS Prometheus" }> db.ships.find({class:{$lt:'F'}}, {'name':true, '_id':false})
{ "name" : "USS Defiant" }

Using the operator $gt, $gte, $lt and $lte it is possible to perform comparisons on certain values from a document. In the above example we have done some lexicographical comparison.

“If it so happens that there is a locale for which sorting things in UTF-8 byte order happens to be correct then MongoDB happens to agree with that as well.” – Quotes from the course

> db.ships.find().pretty()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"name" : "USS Prometheus",
	"operator" : "Starfleet",
	"class" : "Prometheus"
}
> db.ship.find({'type' : {$exists:true}})
> db.ships.find({type:{$exists:true}}).pretty()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}

The $exists-operator is quite useful to search for all those documents that do have a certain attribute. As it is very easy to omit some attributes in MongoDB this one might see use quite often.

But the above example also shows one thing that is not so nice about all this “things get created when they do not exist”-approach. Over this query “db.ship.find({‘type’ : {$exists:true}})” I almost lost my confidence that I understood anything correctly on how this works. Unless I realised that I had missed a “s” in the name of the collection (“ship” instead of “ships”). This does not result in any error message at all, but creates silently a new collection called “ship”. This makes finding the actual problem quite hard and might definitely be a thing to keep an eye on when doing queries with MongoDB.

Ok, the next operator will be my favourite one, I knew this one from the second I was hearing $regex.

> db.ships.find({name : {$regex:'^USS\\sE'}}).pretty()
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}

With the $regex-operator it is possible to use Perl-style regular expressions. That’s really great (did I mention 8 years of programming in Perl) , even so it has to be used with care with respect to performance.

One interesting approach in MongoDB, which is totally different from traditional database systems, is the fact that you can store values of different types for the same attribute. For example I could have created an entry where the name of one starship is 42 (so I did not). Anyway one can use the $type-operator to select only those documents where a certain element has a value of a certain type.

> db.ships.find({name : {$type:2}}).pretty()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"name" : "USS Prometheus",
	"operator" : "Starfleet",
	"class" : "Prometheus"
}
> db.ships.find({name : {$type:3}}).pretty()
>

The types given there are those from the BSON-specification. Somehow I was not able to find a list on the official BSON specification page, but here you can find a list together with further examples on the different MongoDB operators.

For reference here once again an overview on the query-operators mentioned in this blog post:

OperatorDescriptionSample-Query
$gtgreater thandb.ships.find({class:{$gt:’P’}})
$gtegreater than or equaldb.ships.find({class:{$gte:’P’}})
$ltless thandb.ships.find({class:{$lt:’P’}})
$lteless than or equaldb.ships.find({class:{$lte:’P’}})
$existsdoes an attribute exists or notdb.ships.find({type:{$exists:true}})
$regexPerl-style pattern matchingdb.ships.find({name : {$regex:’^USS\\sE’}})
$typesearch by the type of a certain element in the documentdb.ships.find({name : {$type:2}})

I fear it would exceed the limits of this blog post listing all operators mentioned in the course, but I hope it gives you the idea for the read- or should I say find-part of MongoDB.

Ok, after Insert and Find we are coming to Update (which would correspond to Update in SQL ;-)). The first thing you can do is replacing an entire document with a new one.

> db.ships.update({name : 'USS Prometheus'}, {name : 'USS Something'})
> db.ships.find({name : 'USS Something'}).pretty()
{ "_id" : ObjectId("5092b821ddb9073fa519a957"), "name" : "USS Something" }

What happened? As I was only giving one element for the new document – namely the “name” – I replaced the whole document with this and thus all other elements that have not been explicitly specified have been removed. Comparing with some of the above searches we can nevertheless see that the “_id”-value was not changed. Probably this is more of a marginal use case, as more typically one wants to replace certain elements and keep the rest of the document intact.

For this we can use the $set-operator.

> db.ships.update({name : 'USS Something'}, {$set : {operator : 'Starfleet', class : 'Prometheus'}})
> db.ships.find({name : 'USS Something'}).pretty()
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"class" : "Prometheus",
	"name" : "USS Something",
	"operator" : "Starfleet"
}
> db.ships.update({name : 'USS Something'}, {$set : {name : 'USS Prometheus'}})
> db.ships.find({name : 'USS Prometheus'}).pretty()
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"class" : "Prometheus",
	"name" : "USS Prometheus",
	"operator" : "Starfleet"
}

The above example should show quite well how the $set-operator is working. By specifying certain attributes in the update-command one can add entirely new attributes to the document or alter existing ones without affecting the rest of the document.


Time for a short wrap-up on this, even though the Remove-part is still missing. I really have to say that even though I have been working a long time now using SQL I am persuaded by the simpleness of MongoDB‘s “querying language”. To be honest I already now like it more than SQL. I was wondering why that would be and the simple reason seems to be: It comes very close to the way I as a developer think and thus it feels a lot closer to doing some programming than SQL does. Of course that does not say which is better in the end, a traditional SQL-database or MongoDB. And beside the fact that I am lacking some real-life experience with MongoDB to judge that, in the end it will anyway heavily depend on the given use case. But the point stays that it feels very natural working this way and at least I think it is easier to learn and make quick progress here than it might be learning SQL.


Still one more example on updating documents. If one simply wants to remove a value from some document this can be done using the $unset-operator.

> db.ships.update({name : 'USS Prometheus'}, {$unset : {operator : 1}})
> db.ships.find({name : 'USS Prometheus'}).pretty()
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"class" : "Prometheus",
	"name" : "USS Prometheus"
}

On first sight this looks a bit strange doesn’t it? Why is the “: 1” needed and why can we not simply write something like “db.ships.update({name : ‘USS Prometheus’}, {$unset : {operator}})”? The easy answer: Because then the syntax would not match the JSON specification. Thus we either have to give the concrete value or use the value “1” as a kind of convenience value to make the syntax valid.

Remove! As with the update-command we can re-use the syntax from the find-command to specify which documents to remove. This makes the different actions very intuitive and consistent. Nevertheless there are a few distinctive feature. Using the remove-command without any parameter will remove all elements from that collection (one-by-one).

“Each individual document removal is atomic with respect to a concurrent reader or writer. No client will see a document half removed.” – Quotes from the course

Instead of using remove without any parameters one could consider using the drop-command. This will also remove all documents from the collection, but it will do so faster as it is not working document-wise. Another difference is that drop will also remove any indexes. For bigger collections it is recommended to drop them and then recreate the indexes by recreating an empty collection instead of removing all elements one-by-one.

> db.ships.find().pretty()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}
{
	"_id" : ObjectId("5092b821ddb9073fa519a957"),
	"class" : "Prometheus",
	"name" : "USS Prometheus"
}
> db.ships.remove({name : 'USS Prometheus'})
> db.ships.find().pretty()
{
	"_id" : ObjectId("5092b4e9ddb9073fa519a955"),
	"name" : "USS Defiant",
	"operator" : "Starfleet",
	"type" : "escort",
	"class" : "Defiant"
}
{
	"_id" : ObjectId("5092b7e7ddb9073fa519a956"),
	"name" : "USS Enterprise-D",
	"operator" : "Starfleet",
	"type" : "Explorer",
	"class" : "Galaxy"
}

There are probably no surprises in the remove-example shown above. As I would like to keep my star trek ships-collection I will not demonstrate the drop-command ;-).

Ok, there are a few things more in the course this week, but I have the feeling this blog post is already now quite long and I do not want to discourage anyone reading it. (And it is quite late already and there is not too much blogging-time left for this week ;-).) I still enjoy this exercise a lot and have already some ideas how to continue beside the fact that the course material for the third week will be released soon. I am curious what will come next. And then there is the topic of the homework from the course (the Python-part), which I will probably not cover here, but doing some MongoDB-programming in Java would be nice. Hmm, I would say: stay tuned!

To be continued …


The MongoDB class series

Part 1 – MongoDB: First Contact
Part 2 – MongoDB: Second Round
Part 3 – MongoDB: Close Encounters of the Third Kind
Part 4 – MongoDB: I am Number Four
Part 5 – MongoDB: The Fith Element
Part 6 – MongoDB: The Sixth Sense
Part 7 – MongoDB: Tutorial Overview and Ref-Card

Java Supplemental Series

Part 1 – MongoDB: Supplemental – GRIDFS Example in Java
Part 2 – MongoDB: Supplemental – A complete Java Project – Part 1
Part 3 – MongoDB: Supplemental – A complete Java Project – Part 2

Thomas Jaspers

Long-term experience in agile software projects using
Java enterprise technologies. Interested in test automation tools and concepts.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Kommentare

  • 9. November 2012 von Kili Liam

    Thanks again Thomas!

    I agree with you when you say that’s a lot more intuitive to think “in MongoDB” than in SQL terms.

    I’m also really enjoying your blog! And yes, “… MongoDB-programming in Java would be nice.”, very, very nice!

    Bye,

    KL

  • Thomas Jaspers

    Thanks a lot for the extra motivation 🙂 … time is – as usual – a bit of a limiting factor, but I am quite sure some MongoDB Java hacking/blogging will happen quite soon.

    Cheers
    – Thomas

  • Great post Thomas!

    I recently started working on Mongo with Node. Both use JavaScript as the standard language which makes it really easy to work with because there no longer a context switch between the programming language and SQL.

  • Thomas Jaspers

    Hi Tim,

    thanks a lot 🙂 … it’s in my .plan to take a look into Node.js … but currently my class and blogging about it keeps me quite busy.

    Cheers
    – Thomas

  • Tobias Trelle

    14. November 2012 von Tobias Trelle

    AFAIK, the acronym CRUD is *not* derived from the names of SQL commands.

    In SQL terms CRUD would be ISUD: (I)nsert, (S)elect, (U)pdate, (D)elete

  • Thomas Jaspers

    Very true, my bad 🙂

Comment

Your email address will not be published. Required fields are marked *