What is most valuable?
Cloudera Hadoop provides the scalable data architecture organizations need to manage increasing data volumes, though not the intuitive GUI for business users. Oracle Big Data Discovery (BDD) provides business users the ability to explore and analyze that Hadoop cluster to uncover data of interest.
The scalable data storage of Hadoop is the most critical feature, but without Oracle Big Data Discovery that data is difficult for business users to access without significant IT support. BDD relies on Spark and Hive to function so those are the next most valuable features of Cloudera Hadoop for me.
How has it helped my organization?
Using Oracle Endeca Information Discovery has enabled our clients to search and explore unstructured data so they can answer unexpected questions as soon as they come up. This has been a game changer since it dramatically reduces the delay when new data volumes are introduced, or when new business questions are identified and need to be answered.
Hadoop as a big data repository is difficult for non-technical users to access but provides a potential gold mine of data insight. Oracle Big Data Discovery's ability to let business users explore that large volume of data gives them a significant advantage.
What needs improvement?
Oracle Big Data Discovery allows business users to interact with data in Hadoop and to transform it into a different format on the Hadoop cluster. This proprietary format can sit within the Hadoop cluster, but is not fault tolerant and query load is not distributed using native Hadoop technologies.
The more BDD can leverage those technologies the more robust and responsive it’s analytics will be. The second point is that when users identify and transform data of interest they do so directly, meaning they do not need to wait on IT development. However, the transformations are not especially complex.
Leveraging R at some point as a user drive interface within Oracle Big Data Discovery would allow them to do more advanced data analysis. Currently this depends on Hadoop programming which is not a technical barrier, but is not accessible to business users.
There are some details around BDD's configuration that should be improved as the product is refined. The main technical constraint is that Oracle Big Data Discovery is designed to work with subsets of the data on Hadoop. Although the record numbers can be increased it’s performance is impacted.
This means if you have one billion records in your Hadoop cluster, you might still only ingest a few million for analysis at a time. The positive thing is that analysis can be throwaway so you can do this multiple times.
For how long have I used the solution?
I've used BDD for more or less six months, since v1.0 was released. Its predecessor, Oracle Endeca Information Discovery I've used for approximately four years. Cloudera Hadoop, which I've used for just over a year, sits underneath Oracle Big Data. This product provides business users with a web browser interface to the Hadoop cluster which I think is a critical gap in the Hadoop offering. BDD leverages Hive and Spark to provide users with the ability to search, explore, and visualize data from a Hadoop cluster. This is the area we are most engaged with as a professional services company.
What do I think about the stability of the solution?
Oracle Big Data Discovery depends on either Cloudera or Hortonworks Hadoop which are both stable and scalable base deployments.
How are customer service and technical support?
As with most big corporations engaging with Oracle on technical support can be challenging. As a new product that seems to have a higher priority hopefully their support and development of Big Data Discovery will improve from what it was with Endeca Information Discovery.
Which solution did I use previously and why did I switch?
Tableau as a tool for business users to visualize data is very popular, but the ability of Oracle Big Data Discovery to provide built in text enrichment, native support for unstructured data, and a very robust engine for search gave it some advantages to support data discovery that superseded the strengths of Tableau for data visualization. Solr provided excellent search, but not the same ease of support for text enrichment or interactive visualizations.
How was the initial setup?
The initial setup is relatively straightforward, but as a new product in the market the support community is not very evolved so only a few organizations possess any real product knowledge.
What about the implementation team?
As a professional service company implementation work on client sites is done by us. Regardless of who does the implementation, be sure it aligns to how your organization strategically intents to use the product. And be prepared to include training as part of the implementation to enable your target audience to take advantage of it.
What's my experience with pricing, setup cost, and licensing?
Licensing costs are currently very competitive as Oracle looks to establish a market presence for BDD. Organizations that are not seeing tremendous return value on their Hadoop investment or are struggling with accessibility should take advantage of the early pricing options.
What other advice do I have?
Aim to roll it out to a large cross section of your business users and structure the procedures to encourage throwaway analytics. Creating traditional dashboards and static reporting can be done with it, but this depends on the structure which makes them inherently inflexible to change. The strengths of Hadoop to store unstructured data and the ability of it to explore, search, and visualize that data means users can be rapidly exploring their data.
Which version of this solution are you currently using?