WhereScape RED Review
Significant architectural issues


Primary Use Case

Enterprise Data Warehouse and enterprise data staging.  Automation and documentation of all data warehousing processes.

Improvements to My Organization

It did not pass usability, reliability, sustainability or performance tests so there was no potential benefit other than minor documentation advantages over the current automated documentation solution.

Valuable Features

Documentation has excellent potential as it provides the capability to generate data lineage and a complete set of rich documentation and output in multiple formats.

The tool supports multiple target update methods.

Support for multiple design methodologies including Data Vault.

Room for Improvement

In my opinion, the tool requires a much different architectural approach to be effective. It has fundamental architectural flaws that prevent it from being a practical solution for even moderate volumes of data.

The largest issue is that it violates the most basic database data-movement physics through its reliance upon the infamous RBAR or "row by agonizing row" cursor operations within Tsql. In contrast, efficient database apply operations to whole sets of data rather than row by row achieved through Sql joins and set based queries. Note that this is a common problem with code generators as they end up trading scalability for flexibility. Ironically this also makes them unable to take advantage of the much touted "push down" operations that is one of their big selling points. This tool avoids the traditional lookup ETL lookup cache component issue (sometimes referred to as blocking transactions) - that much is true.  However it ends up doing something similar but just within the database and with no easy way to override it.  Experienced ETL developers know not to do any blocking transactions or RBAR if they can possibly help it. And with traditional ETL tools you either take advantage of built-in push-down capabilities or pre-join the data in views or data sources to avoid this kind issue.

Other constraints include...

- Repository only supports a single target database

- No support for change data capture or delta detection - that must be custom coded

- Must be used to stage data or it requires a redundant copy of staged data.  This ended up more than double our storage

- Relies 100% upon target database server with no ability to scale ETL to separate server

- Relies 100% on the target database engine to do all the work (along with everything else the database must do like reporting)

- Cannot acquire data from sources other than flat files. It relies upon the use of linked servers (which many shops don't allow) or calling an ETL tool such as SSIS in the background for data movement. In that case you're still relying on an ETL tool for the ELT portion so then you're actually using two different tools - Wherescape plus a 3rd party ETL tool instead of just one tool.

Specific list of areas for improvement:

* Performance (this is the area that needs the most improvement)

* Efficient storage utilization

* Logging

* Count validations (data movements don't validate counts which is 101 basics to prevent data leaks)

* Scheduler reliability and transparency

* Ability to separate data acquisition/staging processing from target schema population

* Providing automatic record count validations

* Support for change data capture

* Support for in-line transformational procedure language that extends beyond database procedures

* Documentation lineage accuracy and completeness

* Support for sharing metadata repository across multiple targets

* Improvements in staging architecture requiring fewer copies of source data

* Supporting transparency into post-gens

* Providing encapsulated pre and post events for custom code that doesn't get wiped out when you re-generate supported code from the tool.

* Better support for task orchestration in a top-down way that can be easily understood.

* Support for direct parameter passing from job schedulers.

Use of Solution

Trial/evaluations only.

Stability Issues

The scheduler often hangs with nothing to inform the user why it isn't working. Sometimes this is simply due to a developer starting a scheduler from their PC which integrates with the repository but isn't obvious when looking at the shared scheduler. Other issues seemed to be only resolved by re-generating the process from scratch and hope for a different result (which often there is), especially if the scheduler hangs. It is very easy to choose an incorrect option when generating processes only to result in something that just hangs with no error message. Then you're knee deep in database procedural language trying to debug a tool's generated code.

The sheer resources that it consumes easily overwhelms a server and causes the database to freeze due to exceeding resource constraints. The tool generates an unexpected and seemingly unpredictable amount of staging data (both in and out of the database depending on data movement option chosen) consuming very large amounts of disk space with redundant copies of data. This tool will not promote friendly relations with your system administrators.

Scalability Issues

The product was deemed unscalable in terms of data storage and performance.

Customer Service and Technical Support

Customer Service:

Very good

Technical Support:

Very good, although this was a POC so I can't speak for production support after the product is purchased.

Previous Solutions

Currently use a traditional ETL tool that meets our requirements. The test scenario involved around 250 GB of source data with a resulting 50 GB (compressed) star schema target. The ETL tool can talk to all our sources including multiple cloud sources using SOAP and REST web services and this tool is just a database procedure generator with no adapter capability. It instead relies on ETL tools to do that work so it doesn't eliminate ETL tools ending up having to support not one tool but two tools - both in licensing, support and expertise which was expected to cause significant increase in total support costs.

Initial Setup

The tool relies on ODBC connections requiring complications in security in an AD environment over using OLEDB.

It requires a developer to either access the server directly or setup file shares. It was often confusing when doing development from a PC but working with a scheduler on a server and knowing what files need to be available from the PC verses from the server. The tool requires a great deal of attention to disk usage and location especially when using native database load utilities as it unloads all the data first into temporary files during the load which can easily consume your servers available space in unpredictable ways. Getting the environment functioning just for the training class was unexpectedly difficult.

The metadata repository is unique to a single target database and cannot be shared with any other target which prevents use beyond a silo datamart. This means data lineage is a silo repository for each target.

Implementation Team

Software vendor was the lead implementor.

ROI

The ROI was expected to be very negative to the point of failure.

Pricing, Setup Cost and Licensing

Factor in the price of specialized consulting who know this product. They're hard to find and expensive.

Other Solutions Considered

The existing traditional mainstream ETL tool and the previous one have been functioning without problems but the concept of a self-documenting tool that provides a common framework and shared metadata repository was a compelling alternative. This tool was already in use in our shop in a limited way so it begged the question on whether it could serve an expanded role.

Advice: Before choosing any tool it is best to do your homework and compare features to competitors and test the tool in a real life volume scenario. Software is never what its advertised from any vendor. Don't take other's word for it, especially if they have a vested interest in you using the tool. I know of testimonials about this tool and others that are based on wishful thinking and not reality. Caveat Emptor my friends. So the real question is how far off is it and does it have show stoppers that will prevent it's use in your environment? Fail to take this precaution at your and your organization's peril.

Other Advice

The compelling selling point for this and similar tools as "datamart in a box" providers is that they are pre-built frameworks and documentation. The opportunity is that staff with little data warehousing experience can get up to speed easily and not have to worry about building logging, orchestration and error frameworks or even know much about data warehouse design.

Unfortunately this tool's basic architecture has severe flaws which make it likely impractical for most real world marts or data warehouses. It has some significant cons in the following areas:

- Metadata support specific (cannot share across multiple) target database

- Unreliable lineage

- Lacks capability to interface with application API's

- Lacks out-of-box CDC control capabilities

- Poor/outdated GUI (looks like early 1990s)

- Lack of available resources in the market place who can operate this tool

- Lack of customer base and information available

- Primitive logging and error handling

- Forced duplication of staging data multiple times

- Inability to separate source data capture from targets (tightly couples sources with targets)

- Stores all objects in single target database causing number of issues such as backup/recovery, fragmentation, limited organization options for objects

Disclosure: I am a real user, and this review is based on my own experience and opinions.
5 visitors found this review helpful

10 Comments

Thanks for this detailed description of your PoC with Wherescape Red, very informative.

Like (0)14 April 16

Interesting

Like (0)28 April 16
C LewisReal UserTOP 5

I am sorry to hear that your POC did not yield the results you were looking for. Having said that it occurs to me that perhaps you were looking for a new ETL tool and not a Data Warehouse management tool like WhereScape RED.

We have successfully deployed WhereScape RED to manage a multi-terabyte data warehouse with over 50 subject areas the largest of which holds 4+ billion transaction rows. After 3 years of use I am extremely satisfied with it capabilities. Having said this there are clear areas in my mind where this tool excels (assumes proper usage of tool explained below):
- Standardizing common load logic for typical data warehouse structures
- Documenting structure and dependencies
- Ability to leverage modern features of underlying RDBMS
- Deployment packages to manage change

I would not consider using WhereScape as an ETL tool. It is a meta-data management tool for the creation of the DDL and the auto-generation of stored procedures to load tables at its most basic. Yes, you can create custom stored procedures and generate SSIS packages but in my mind these features are in support of the need to extract data more efficiently from source systems and should not be used as alternatives to a proper ETL tool or custom development project. In other words, you probably will need an ETL tool in addition to WhereScape RED.

WhereScape RED's features, however, do require the discipline and training to use the product in a prescribed manner and I will admit this can be a real challenge. Not because the tool is difficult but because people are difficult. I have very capable developers who would either: a) say they don't need a tool because the can code it better (probably true), or b) don't want to spend the extra effort to use the product as designed because there is a quicker/easier way. Each of these cases typically leads to a developer writing a custom stored procedure within WhereScape (which is absolute supported).

However, in both of these cases you lose most of the benefits I described above. To balance the needs of my team and the desire for a better managed data warehouse, we have separated (as much as possible) the heavy ETL from the loading of the data warehouse. This approach was worked well for us but it should be noted that we are heavy batch load shop as our typical loads range from 100K - 1 billion rows that are processed.

As for the data being staged multiple times, in my opinion this is a feature of the product. WhereScape attempts to move the data to the target platform and then leverage the investment in the database platform. For us this works fantastic with our data loads, I can move very large tables around with easy (at the cost of disk). For us the disk space is a small price to pay for the performance we gain. Furthermore by using native database features like table compress and columnstore indexes we can greatly minimize disk I/O and these features are available through WhereScape RED.

Lastly, I will agree that WhereScape needs to allow for the separation on their meta-data from the data managed in the data warehouse. Additionally we need the ability to manage all object in any database we choose. I am confident, given recent releases of WhereScape RED we will see this feature in the future.

Like (3)03 June 16
GaryMReal UserTOP 5POPULAR

C Lewis, that's great you've been successful at using the product. I'm completely baffled how you made it work with any volume of data but I'm glad to hear someone can make it the tool work for them.

Like (0)14 June 16
Joe HortonReal User

Gary, that is a exhaustive description of your POC!

We have been using Wherescape (WS) for nearly 2 years now and how many ETL/ELT tools can take a data source say with 600 tables and get it up and running in well under a week? NOT very many and with that the audibility, tractability, documentation - things that sometimes get left behind because your in a rush to production!

Now we are nowhere near the size of CLewis; in fact our whole target area from WS is at the moment about 408 GB. We have a few tables that have in excess of 200 million rows, but most are under 1 million.

Some of the features you point out that are lacking; I do agree with. My biggest annoyance is proper baked in code for change data capture - but with this shortcoming you build one stored procedure that is reusable across your objects. With one fatal swoop, because it is all metatdata driven, you can easily tell each object to make use of a reusable stored procedure to accomplish CDC.

I was on a big govt Informatica project about 4 years ago - and even this industry known tool has it's own challenges to some of the basic things your were frustrated with WS. SSIS has it's shortcomings. If you sat an seasoned Informatica developer against a WS developer and let them go side by side on 500 tables - Informatica won't even come close on how fast to completion.

Each release there are improvements, as with most products out there, but many things to refine and do better in the future will come. I really believe the metadata driven side of this is their best strength.

In the old days I was an advocate to follow the Kimball methodology, in fact I was a professor at the college teaching this course - but WS turns this notion on its head - instead of spending so much time gathering requirements, and telling the customer "no" when they were asked what they wanted brought into the DW, and they responded "well everything" - you now within an hour or few hours can stand up a small CUBE to allow your customer to browse some of their subject area data and when you see their eyes light up - it's pretty rewarding. I was really good with other ETL tools - but I could never, NEVER get source to a CUBE within hours - or possibly have the whole 500 tables productionalized within one week - what would really have taken me a couple months using my old best practice and reausalbe components with other ETL tools.

They have some things to improve on - but they all do. In fact your analysis was pretty spot on for some of the weaknesses, but their strengths outweigh most of these issues.

I really do recommend anyone looking for such a tool to have a look at it. I've been in this industry for over 18 years, and I've never come across a company with such great customer support. They do answer the phone, email - and will WebEx at a moments notice to help you out.

Like (1)18 November 16
GaryMReal UserTOP 5POPULAR

Joe,

Actually we spent 6 weeks and the vendor was doing it and the requirements and all the coding and staging was already done so the amount of actual work to do was relatively small. And there was pretty much nothing usable at the end of that.

The issues that the consultant was having were clearly all around the constraints of the tool and he was quickly having to do significant custom coding to produce something that would actually work. And that's an expert in the tool.

As to ETL tool shortcomings, certainly. You can read my review of SSIS. Its hardly a perfect tool but its easy to use and meets my expectations more than any other ETL tool I've used. I've used about a half dozen ETL tools in my career and they share this common attribute - none are perfect. But there are really significant differences in terms of quality and reliability . I don't look for perfection but rather software doing what it is advertised to do and whether its a good fit for the given circumstances. All software works to some extent. The larger question is, what kind of resourcing does it take to support it and is it a good fit for a given organization given the data sources, targets, whether mart or enterprise, data volume, frequency of update, existing knowledge of staff, etc. Also - does it address the concerns? I have no concern about producing 500 of the same patterned staging - I can generate that with XML. What's difficult is design challenges such as de-duplication, householding, data cleansing, master data management and the list goes on and on - none of which is addressed with this tool. In my view this tool simply makes what's already really easy - possibly just as easy at best. At worst it won't work at all (such as I have cloud-based sources which are plentiful with mainstream ETL tools and non-existent with Wherescape).

Once again, I'm glad someone else was able to use the tool and feel it has value. I don't know how but its good to know its possible in certain circumstances. I can only speak for my own experience.

Like (0)19 November 16
RayPReal User

Great summary, Gary. Thanks for the thorough review.

Like (0)24 April 17
GaryMReal UserTOP 5POPULAR

An update regarding this software review I want to mention; this review is now 2 years old so if you are evaluating this software you should consider any improvements made since that time.

I still stand by my initial assessment for the software that was current 2 years ago. I would also point out that I had no vested interest in either a positive or negative review as do many reviewers (selling vendor, consultants, resellers, etc.). and that it was based on my 20 years of data warehousing experience.

Like (0)06 July 17
Jim MihalickUserTOP 10

Thanks for the review Gary, and sorry to hear you didn't have a positive experience with RED, but I've been using it for 5 years now on dozens of clients from Big Data, many terabytes of data, to small engagements, and I've never seen any significant issues with it. It seems to always perform as well as the DB underlying it if you had to do the same thing directly on the DB anyway. Wherescape just makes it quicker and easier for you to get to the same result. You do have to know what you're doing with the tool. Wherescape RED lets you do just about anything you can think of (I've never hit a barrier with a single client using it), but many of the things the tool lets you do are things you really shouldn't do and you don't have to do them to stay inside the rails with the product and be successful. I really don't understand your RBAR points. Although Wherescape RED can perform in a RBAR mode, no one who understands the tool would ever use that method. RED EXCELS at Set based data updates. It is the preferred method. If you found yourself producing RBAR code (which RED will gladly do if you request it to do that), then there was a serious disconnect in getting to the correct features of the product to get into Set based data update activities on the DB. One thing everyone needs to understand about the tool, is it doesn't do the data input/output/transformation work, the DB does the work. That is the MOST efficient place to do the ELT DW transformation process at the DB (no network is needed, no row by row activities, etc.). It can't get any faster than that (how would that be possible if everything starts, transforms, and stays at the DB). I can absolutely get any data size to work in the tool. But you do need to know how RED works and sadly a lot of people don't take the time to understand how to operate it fully in its space successfully. Sorry to hear you had somebody from the vendor that you felt didn't get the tool working off the shelf without customizing the effort significantly (which always slows down the progress). But I've been to many clients where the consultants that worked on the Wherescape RED implementation who supposedly were experts with the tool, but upon evaluation they made many mistakes with the implementation, led the clients down dangerous paths that don't fare well. So perhaps if you all had the opportunity to work with someone that really knows the tool well, you would have come away with a different opinion. The tool does support change data capture like behavior, it's again a matter of working with the tool enough to understand how to make that happen. I had this conversation with someone just the other day and they said the same thing, I explained to them how they could get it to work and they didn't realize they needed to be part of the solution to make that happen. But it is doable within the boundaries of RED's off-the-shelf capabilities.

As far as the extensive data footprint that Gary mentions in the stage tables, which by the way, doing multiple stages to duplicate and transform data in steps has all kinds of benefits to resolve data issues that occur in the transformation data flow process, that is a GOOD thing about the tool. It creates componentized, reusable, more easily understand development results. It is MUCH cheaper for anyone to speed up the time it takes to resolve data issues this way than the additional cost of the disk space to support the extra data. If you can speed up data issue resolution 100x, that is far better use of your money than the additional diskspace costs. But if you really don't like all the duplicate data (and I mean data duplicated with different transformations applied to the data of course from one stage step to the next, but basically duplicated set of data), then that is easily solvable from the scheduler so you can truncate data behind you as the process proceeds so you aren't filling your DB. It's super fast, and efficient in speed and disk savings. You just need to know where to enable that. It's easy once you find it and know it's there. The tool is full of gems like this, but like any robust tool, there are A LOT of things you can do with it, but it takes time to understand them all. RED is usually not the problem, it's the person using the tool. Work with a very experienced person and you'll not only learn more about the tool, but you'll see an entirely different resulting experience.

I had a client where something was taking 4 hours for just one stage in RED to run. I looked at it for 2 minutes and said they would get a significant decrease in time just flipping on a feature of RED that wasn't enabled and the code wasn't being generated by RED because of it. They turned that on in 10 seconds, ran it again, and it went from 4 hours to ~20 minutes. With less than 2 minutes of work. I can likely get it down to a minute or less with a little more effort. However, I've been around the tool long enough to know how to improve performance.

I'm not one to shoot down a reviewer and I hope you don't take it this way, because the time you took to write all of that down was impressive and you thought about it a lot, but from someone that has used this tool for quite some time now, I don't agree with many of your points. They're just not completely accurate from my perspective. Not sure which version of RED you were POC'ing, but some of your comments could be true with older versions, but of course many of those have been solved with newer versions of the product. At the same time though, many of your concerns have never been issues with the product.

It's a great tool, and I'm 10x faster with RED getting to the same result as I was with SSIS, or DataStage or Informatica, Talend, etc. Head to head, I crush any major competing tool I've seen at my client sites on the market. In a number of cases, I've done in 2 or 3 days what took the client's prior team 6 weeks to get accomplished. RED is just that incredible if you take the time to understand it.

Just get trained by someone that knows the tool inside and out and all the best practices with the tool. You won't be sorry at all that you invested in it.

Best Regards,
and Happy Data Warehousing

Like (0)20 February 18

Hi Gary,
This is a great review and I really appreciate the detailed evaluation. We are currently looking for a DWA tool and this was very helpful. I understand that it may be different for each organization based on their needs, but I am just wondering if you found a DWA tool you would recommend?

Like (0)27 February 18
Guest
Why do you like it?

Sign Up with Email