Pervasive is best known for its data integration products but has recently been developing and releasing a series of products focused on analytics. RushAnalyzer is a combination of the KNIME data mining workbench (reviewed here) and Pervasive DataRush, a platform for parallelization and automatic scaling of data manipulation and analysis (reviewed here).
In the combined product, the base KNIME workbench has been extended for faster processing of larger data sets (big data) with a particular focus on use by analysts without any skills in parallelism or Hadoop programming. Pervasive has added parallelized KNIME nodes that include data access, data preparation and analytic modeling routines. KNIME’s support for extension means that KNIME’s interface is still what you use to define the modeling process but these processes can use the DataRush nodes to access and process larger volumes of data, read/write Hadoop-based data and automatically take full advantage of multi core, multi processor servers and clusters (including operations on Amazon’s EMR).
The parallelized and distributable DataRush operators include:
- I/O – JDBC, Delimited text, Log files, HDFS, HBase, Sparse data and PMML
- Analytics – Association rules, Classifiers ( including Decision Trees, Naïve Bayes and SVM learners and predictors), Clustering (Recommenders and k-Means), Feature selection and Regression
- Transformations – Aggregate, filter, manipulate
- Data Profiling – Binning, percentiles, data quality metrics, pass/fail rules
- Data Matching Fuzzy matching, encoding, clustering
DataRush itself is also extensible so users can add their own operators which can then be used just like other KNIME nodes.
All these nodes support multi-core and multi-processor environments and push processing to either the local desktop machine or to servers/blades that are available. If data is being read from or loaded into Hadoop clusters then the RushAnalyzer nodes execute within the nodes of the Hadoop cluster itself, pushing the function out to the Hadoop environment. While an all-Pervasive process executes fastest, the product can support mixed KNIME flows where not everything is a Pervasive node. One particular feature is that such a mixed environment can stream data between nodes without creating a local copy, something not available in the base KNIME product. The acceleration provided by this feature opens up more options to stage the data for multiple iterations on multiple models.
Pervasive is one of the vendors listed in our Decision Management Systems Platform Technologies report.
4/29/13 : Pervasive RushAnalyzer is now called Actian RushAnalytics.