Why you should use MongoDB. Not always though

Today morning i read «this blog post» and thought i would share my view here rather than in the endless list of comments in the HackerNews thread or in author’s blog. Before proceeding, i suggest you to go through the above article.

First of all, i dont think the Diaspora team, didn’t do better Data Modelling in the beginning. If they had, they would have never dared to use a pure Document Oriented Database. Quoting from the article…

What could possibly go wrong?

There is a really important difference between Diaspora’s social data and the Mongo-ideal TV show data that no one noticed at first.

With TV shows, each box in the relationship diagram is a different type. TV shows are different from seasons are different from episodes are different from reviews are different from cast members. None of them is even a sub-type of another type.

But with social data, some of the boxes in the relationship diagram are the same type. In fact, all of these green boxes are the same type — they are all Diaspora users.

You didn’t see that in the beginning. Ask an 8th grade student to develop a social network and he will definitely see this. What i really couldn’t believe is you even considered duplicating data as an option for this problem.

What’s missing from MongoDB is a SQL-style join operation, which is the ability to write one query that mashes together the activity stream and all the users that the stream references.

Oooh. Amazing discovery. While choosing a database like MongoDB over SQL you have to give some tradeoffs like JOIN. I don’t think that stops a developer from writing custom code to achieve the same. That way its more flexible.

The MongoDB docs tell you what it’s good at, without emphasizing what it’s not good at.

Did you even read the MongoDB docs? Atleast go through this and this. And that’s just the fundamentals.

Cache Invalidation As A Service

What if there is no backing store? What if you skip step 1? What if the cache is all you have?

When MongoDB is all you have, it’s a cache with no backing store behind it. It will become inconsistent. Not eventually consistent — just plain, flat-out inconsistent, for all time. At that point, you have no options. Not even a nuclear one. You have no way to regenerate the data in a consistent state.

Here the author takes two different scenarios to prove the point. In the first case, there was a inconsistent In-Memory Cache and a persistent backend storage like MySQL. In the second case, there’s just a Cache. Here i assume author meant that MongoDB acts as a cache and a backing store.

In Write Request part 1 you update the MongoDB which acts as Backing store (it being persisted). And in Write Request part 2 you update the MongoDB which acts as a caching service.

What happens if that step 2 background job fails partway through?

You still have the data saved in the users posts document. You can always delete the entire activity stream record out of your cache and regenerate it from the posts.

What if there is no backing store? What if you skip step 1? What if the cache is all you have?

What? Did you even considering dropping the Backing Store? I mean no PERSISTENT storage? So how can i even relate this scenario to the first one?

The End

An article titled like this one will definitely get more clicks, but it also delivers wrong idea. I suggest the author to rename the title to

How blind the Diaspora team was(is)?" (JK)