What is most valuable?
Scale-out, analytical functions, ML.
How has it helped my organization?
We are an HP partner. A SQL-based compute platform like Vertica enables far less human overhead in operations and analytics.
What needs improvement?
More ML, both data prep, models, evaluation and workflow. Improved support for deep analytics/ predictive modelling with machine learning algorithms. This area of analytics need a stack of functionality in order to support the scenario. The needed functionality includes:
- Data preparation. Scaling, centering, removing skewness, gap filling, pivoting, feature selection and feature generation
- Algorithms/models. Non-linear models in general. More specifically, penalized models, tree/rule-based models (incl. ensambles), SVM, MARS, Neural networks, K-nearest neighbours, Naïve bayes, etc.
- Support the concept of a “data processing pipeline” with data prep. + model. One would typically use “a pipeline” as the overall logical unit used to produce predictions/scoring.
- Automatic model evaluation/tuning. With algorithms requiring tuning, support for automated testing of different settings/tuning parameters is very useful. Should include (k fold) cross validation and bootstrap for model evaluation
- Some sort of hooks to use external models in a pipeline i.e. data prep in Vertica + model from Spark/R.
- Parity functionality for the Java SDK compared to C++. Today the C++ SDK is the most feature rich. The request is to bring (and keep) the Java SDK up to feature parity with C++.
- Streaming data and notifications/alerts. Streaming data is starting to get well supported with the Kafka integration. Now we just need a hook to issue notifications on streaming data. That is, running some sort of evaluation on incoming records (as they arrive to the Vertica tables) and possibly raising a notification.
For how long have I used the solution?
What was my experience with deployment of the solution?
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
Which solution did I use previously and why did I switch?
Postgresql, MySQL, SQL Server. Switched because of scalability and reliability, analytics functionality. V being a better engineered product.
How was the initial setup?
Straightforward. Good docs helped a lot.
What's my experience with pricing, setup cost, and licensing?
Its reasonably priced for non-trivial data problems.
Which other solutions did I evaluate?
Yes, Hadoop / Spark, SQL Server.
What other advice do I have?
See additional functionality above.
Which version of this solution are you currently using?