MapR Review

Because of their POSIX compliant file system, they can support read/write files over the WORM storage of their competitors

What is our primary use case?

As an independent consultant, I support multiple use cases depending on the client's need(s).

How has it helped my organization?

To be clear, all of the main three vendors are capable of supporting multiple big data use cases. Where they differ is going to be in terms of scalability, supported tools, and the stability of the tools. 

MapR has MapR-DB, which supports most of HBase’s APIs except for coprocessors. It was a conscious decision and it helped to improve the security and stability. MapR also has written most of their tools in C/C++ instead of Java. Again, since MapR supports external tools, solutions are somewhat portable, except for those that rely on things like Kerberos and HBase coprocessors. 

Note: MapR does support Kerberos, however there are solutions which rely on a combination of products which are still not very well supported. 

What is most valuable?

MapR’s strength comes from their file system. Because they start with the raw disk, they are able to expose the storage through various APIs and have the ability to lockdown and secure the file system better than the Apache derivatives, which store the file blocks above the Linux file system. 

Because of MapR’s POSIX compliant file system, they can support read/write files over the WORM storage of their competitors. 

In addition, they remodeled how to track and store the blocks. So the NameNode isn’t a single point of failure and you can store a magnitude  of multiple orders of small files before you can cause a volume to have issues.

Note: This is a cluster volume, not the entire cluster. Fill up the NameNode with lots of small files upon an Apache release, and you lose the entire cluster. 

What needs improvement?

All products have room for improvement. Because of MapR-FS, they have an incredible advantage in terms of stability, cross cluster replication, and extensibility to create products like MapR-DB (Binary and JSON tables) and MapR Streams. 

One weakness for MapR is the Kerberos support. This is not much of an issue unless you rely on products in a secure environment which only support Kerberos. This really occurs with HBase. The lack of coprocessors is also an issue because MapR-DB is limited in terms of server side extensibility. While MapR can and is improving on this with their next release(s), if they were to implement coprocessors, it would most likely not be compatible with Apache’s release.

For how long have I used the solution?

More than five years.

What do I think about the stability of the solution?

Outside of human error, MapR is probably the most stable of the major releases. One thing is true with all releases, if you push the cluster to extremes or attempt to solve a use case which is on the fringe of what the framework is capable of, you will run into trouble. Customers have to set their level ofexpectations. 

What do I think about the scalability of the solution?

None, however when you get to a certain point in scale, you tend to hit limits in terms of hardware (disk IO and networking). This is true of all releases. Unlike the Apache releases that require Federation, MapR scales the best. 

As we look towards future design, scalability becomes less of an issue. You will see more of a movement towards storage/compute models and this will lead to multiple data lakes rather than a single large ocean. This is also due to potential data governance rules as well as corporate enterprise structure as well.

Which solution did I use previously and why did I switch?

I support multiple vendors and their solutions.

How was the initial setup?

In the big data space, there is a constant level of complexity which requires having knowledgeable staff.  Its never as simple as one would think.

What's my experience with pricing, setup cost, and licensing?

Caveat Emptor!

There are three main vendors of 'Hadoop' and each has their strengths and weaknesses.  Its possible to build similar solutions on top of their platforms, however you have to consider the ease and cost of development and maintenance. 

In all of the solutions, enterprises get in to trouble when they do not have trained and competent staff. Its important to have a blended staff and to not always jump on the latest technology. Take your time, do your homework and then make and live with your decision.

**Disclosure: My company has a business relationship with this vendor other than being a customer: As an independent consultant, I have relationships with all of the major vendors in the 'Big Data' space. I support various clients, who use various vendors in this space. I am a real user with real world experience and have openly expressed my opinions on the technology. I am also the Founder of CHUG (Chicago area Hadoop User Group) and have focused on this space since 2009.
Add a Comment