Written by Nigel Magson & Andy Hindson
We were delighted at being given the opportunity to review KXEN. In analyst terms it’s a bit like being given the chance to drive a Ferrari, so not one we were going to turn down. Andy has been using some of KXEN modules for a number of years, whilst the rest of the team are used to analytic tools such as SAS/ SPSS/FastStats or Smartmodeller. Read on for our overview of one of their key products, InfiniteInsight, and forgive us if we sound too much like Jeremy Clarkson.
KXEN (originally for Knowledge Extraction Engines) is a market leading predictive analytics and data mining software business, they have sales offices around the world and are headquartered in San Francisco with Research and Development based in Paris. They are recognised by the key players including; Forrester and Gartner.
The main markets they service are Financial Services, Telecoms and Retail where customer and data volumes are high, and consequently the opportunity to add value through predictive analytics is greatest. These are also the organisations which tend to have large analytical teams and similarly sized budgets to support them and the tools they require. Consequently, KXEN’s product development direction has been in support of these key markets including the development of the (Modelling) Factory-the deployment and configuration of automated models that run in real-time on organisations systems infrastructure. They readily admit they’re not the tool for small datasets.
KXEN positions its tools as key enablers for maximum productivity by data-mining specialists and business analysts alike. More recently they have developed InfiniteInsight Genius which they claim puts data mining capability into the hands of Marketers. It achieves this by simplifying the modelling process through the use of a GUI that guides the Marketer through the modelling process.
KXEN also provides an API for their predictive analytical engine which has been widely adopted by Marketing Services Providers (MSPs) including; Alterian, Experian and Neolane, and the Database and Business Intelligence community including; Oracle, Teradata and Sybase.
InfiniteInsight Product Set
InfiniteInsight is the name that describes and prefixes all the KXEN family of modelling products. The version available (version 6.0.0) for this review had the following modules;
The KXEN modules that will not be reviewed here are;
InfiniteInsight Scorer module is self-explanatory and enables the KXEN User to apply the model(s) they have built in a number of ways including; scoring directly onto the database or enterprise class, which is the integration of the score back onto an Organisations operational systems e.g. Call Centre or Web Site.
KXEN has invested in this area and supports all the major databases (Teradata, Sybase, Netezza, SQL Server, IBM DB2, etc) and the main statistical modelling packages (SAS and SPSS).
InfiniteInsight Factory is primarily aimed at the high end of the predictive analytics market and enables the full automation or industrialisation of modelling processes which are configured and deployed to run 24/7.
KXEN customers deploy InfiniteInsight for a variety of predictive analytics tasks including optimization of the customer lifecycle; acquisition, cross-sell and up-sell campaigns and customer retention. Additionally, in Financial Services it can play a key role in reducing risk and fraud, and within Telecoms the focus is churn management, social network analysis and cross-sell of further services.
Look and Feel
We felt that the InfiniteInsight GUI feels dated and would benefit from an overhaul with due consideration given towards incorporating a sleeker modelling workflow. Navigation isn’t always straightforward as the menu naming isn’t obvious. This said, like most software once you know what you’re doing you forget about this. On the plus side InfiniteInsight does provide a rich set of features and options that can be tailored and tweaked to address the nuances presented by different data scenarios.
Start InfiniteInsight and you’re presented with the Modelling Assistant pictured above.
KXEN has recently beefed up the Explorer module that enables the Analyst to create the analysis or modelling dataset. This includes some perplexing function names (probably a legacy from the French translation?) such as; data manipulation which is functionality to merge datasets together and Perform an Event Log Aggregation which is actually the process of aggregating child data to the parent table e.g. summing transaction values for customers into a new table. This function does possess some powerful functions for creating date based aggregations across potential time periods that may be of interest for modelling; years, quarters, months and days. Again this is fine once you’ve got used to it.
It should be noted that data manipulation creates SQL code that has to be executed on the source database, so creation of the analysis dataset is external to KXEN InfiniteInsight, although the actual modelling process is internal.
There are some useful features available within Explorer for defining multiple modelling variables to build concurrently through the use of wildcards but I suspect that many Analysts will opt for preparing their modelling dataset using different tools.
The Social component naming is slightly misleading as it has nothing to do with social media such as LinkedIn, Facebook or Twitter. It provides functionality to create links and map relationships within transactional data and display these networks of influence. This is very powerful and it’s primarily aimed at the Telecoms market where the nature and volume of their data supports its use, but we could see that it would have applications to areas, such as social media if the supporting data was available.
The Toolkit allows the User to review and visualise existing datasets (Open the Data Viewer). Transfer a data source to another location or format (Perform a Data Transfer) or export a list of distinct values (List Distinct Values in Data Set). The final option is to generate statistics on the variables in the data set (Get Descriptive Statistics for a Data Set). Again, apart from perhaps the Descriptive Statistics option I suspect that the Analyst will be using different tools for the basic data processing tasks available.
Modeler is the key module for defining and building the different types of modelling scenarios and the focus of the rest of the review. The first task is to define the dataset you wish to build a model upon. If a modelling dataset has been previously defined then this can be selected or a new dataset can be chosen, or Explorer can be used to define the data. Each variable in the data set is defined as to its type; nominal, ordinal or continuous, and this type dictates how the variable will be treated and encoded during modelling.
Modelling dataset defined, the Analyst selects which type of model they want to create; Classification/(Ridge) Regression, Clustering, Times Series or Association Rules (Next Best Offer). I’ll use the regression model to illustrate how Modeler works; a target variable is selected along with a set of explanatory variables.
At the heart of Modeler are a set of algorithms which have harnessed Structured Risk Minimisation (SRM). SRM delivers efficiencies to the modelling process regards the Analysts time as they do not need to worry about;
As a consequence the modelling process is faster and more efficient, as all the above are time consuming activities if checked and validated from first principles.
The question is therefore, how does InfiniteInsightTM Modeler achieve this? The answer is that much of the modelling grunt work is automated. In a traditional approach to modelling, time spent preparing data would account for 40-60% of the time, however, this is drastically reduced as Modeler automatically encodes data as it is loaded. For instance, continuous variables are assigned to 20 “bins”, each containing 5% of the data, this is configurable but is generally left as is.
The modelling dataset is also automatically split based upon the cutting strategy into Estimation, Validation and Test. Various cutting strategies are available and the help provides guidance as to which may be most appropriate for your particular modelling scenario, there is also the option to configure your own specific cutting strategy.
Estimation generates the different models, Validation will select best model among those generated, incorporating selection of only those explanatory variables which make a significant contribution and Test will verify the performance of the selected model on unseen data. This is the “hold-out” data and enables the calculation of the models robustness.
A model diagnostics report is created which is easy to interpret with experience, the two key measures being Ki (Model Quality) and Kr (Model Robustness). Additional information is on hand, including contributions by explanatory variables so it is simple to see which variable is contributing most to the model and, of course, the obligatory gains curve.
We didn’t pursue the analysts track test to see whether we could build a better model in a conventional tool. The point is really irrelevant. KXEN will build great models given the right data. It will do this faster than a conventional analyst ever could. It can update & redeploy faster, and it can do it over multiple models. Case closed. If you only ever build a couple of models and can spend plenty of time over them, don’t bother with KXEN.
Picking up our car analogy. As one of the supercars of the analytics world of course we want one, despite and because of its occasional quirks. KXEN software is an excellent addition to the customer insight team’s toolbox where it would comfortably sit alongside traditional analytical software to explore features discovered in InfiniteInsight and tools to build and engineer modelling datasets. As an overall component of your CRM architecture, it will increase the speed and efficiency of the Modelling / Analysis team enabling them to quickly understand the feasibility of whether a particular scenario can be modelled.
KXEN commercials place it at the high end of analytical engines, but if you are considering this, a bit like a supercar, price won’t be your only interest. By recognising the commercial benefits of delivering powerful models fast, from speed and reliability of updating, through to ease of deployment, a convincing business case could be constructed that would counter the clearly higher price tag of such a solution. The purchase decision for InfiniteInsight won’t be made on the GUI or its data preparation functionality. What it will be judged on is its ability to build quality models that deliver efficiencies and demonstrable ROI. Modeler delivers this through the algorithms at the heart of the product which have harnessed Structured Risk Minimisation (SRM). With KXEN software installed and operating, the business might find itself in a position to decide whether it wants more productivity from the existing modelling team or achieve the same but with a smaller team. Hopefully for the analysts out there the former!
KXEN’s product positioning, claims that it is aimed at marketers but our experience is that marketers are not the users of such tools. We see this as still very much the domain of the marketing analyst or statistician who will be tasked with the modelling work as marketers do not generally have the data wherewithal or statistical background that is needed to execute the full modelling lifecycle. The software, however good, still requires the modelling scenario to be properly framed, after which an appropriate modelling universe needs to be defined along with the target variable and potential explanatory variables. All these actions require ‘hands-on’ data work to engineer a modelling dataset that is ‘fit for purpose’.
Of course, data is the key ingredient for any modelling work, and the quality of that data is paramount to the success of model, as is the breadth and volume of data available within the organisation in an accessible form. If these criteria are met then KXEN InfiniteInsight will undoubtedly provide the quickest answers and deliver quality models.
Addendum - Structured Risk Minimization
The primary challenge for statisticians has been to build highly accurate models that are also reliable. This is particularly challenging with the advent of Big Data where there are high volumes of potential variables to use. Traditional statistics generally only produce an accurate model with a few variables, so an expert is needed to reduce the number of variables before building a model. The more variables there are, the more difficult it can be to build a reliable model. Only the expertise of the statistician or competent analyst guarantees the reliability of the model.
SRM was a breakthrough in mathematics and statistics made by the Russian mathematicians Vladimir Vapnik and Alexey Chervonenkis, which for the first time makes it possible to automatically build reliable and accurate models. In contrast to traditional statistical models, SRM models become more accurate and are still reliable as the number of variables is increased. Model Accuracy and Reliability are determined by the data, not by the expert. Certainly worth a further read if you are interested in how the engine works.