I have a client looking for the skinny on MongoDB. Pros, Cons, best fit use cases, best practices. If anyone has hands on, objective experience would like to discuss. Also, other Big Data experience around document and key value pair stores as well.
MongoDB is schema-free, and there's no referential integrity whatsoever, so it best fit in scenarios that deal with facts rather than transactions, although the last releases support transactions.
If your data consists mainly in facts and its volume is very high (TByte or more), then MongoDB could be your engine: you can replicate your data, adding redundancy to your persistence, improving HA; you can share your data, adding some criteria to divide your data, so queries can be directed to the dataset you need, using a meaningful shared key (maybe, this is one of the most important things to do when you're dealing with a cluster, and it's also crucial to do it right so your queries go to the shard that contains the data to be fetched the most probable; otherwise, the engine will try to find your data on all of the shards).
MongoDB has a lot of APIs to programmatically deal with it, including APIs for the most common programming languages: C/C++/C#, Java, Scala, Ruby, Kotlin, Python, Ruby, Go, etc. For any of those APIs, when you ask for a connection, it actually returns a pool of connections, so the API will handle connect/disconnect/reconnect policy for you.
MongoDB can be configured to handle a variety of scenarios, such as:
- critics at writings, then you just set a standalone MongoDB and turn off the op-log feature
- consistency at readings, even in a cluster, configuring features such as returning from API calls just when the writing process was checked in the primary node, and at least, in one secondary node, or in all secondaries
Other interesting features:
- nomenclators can persist in collections, and as long as those collections remain un-sharded, you can simulate a join operation using the Lookup method from the aggregation framework, which is very, very strong.
- You can set up a Map/Reduce process from its internal query language (Wow!, I mean, Wow!
- When you configure a cluster, any operation triggered from its internal query language will be executed in parallel on all of the nodes (again... Wow!)
- There are several types of indexes available. One curious thing: in multi-field indexes, the order you use to declare it is important at calling time. There are engines that besides that order, the existence of a field could be a valid reason for using that index. In MongoDB, if the field used as a filter is not the first in the index, MongoDB will trigger what I called a full-scan search.
- Date/time fields always persist in UTC time
And etc., etc., etc.
MongoDB best fits in non-transactional scenarios, when it's important to cross query facts over the time. It supports an extremely high volume of data, keeping it steady and solid. It's not a key-value engine: for such scenarios, Redis or Cassandra will fit better, in the case of querying over the network; in the case of embedding apps, Tokyo could be a nice choice for a key-value store.
I hope this will be helpful.
Thanks for your time.
Is one more resilient than the other?
How supportive are the communities?
Which use cases are better for each?