Wherescape avatar twitter reasonably small

WhereScape RED Review
Significant architectural issues

2,512
8

Valuable Features:

Documentation has excellent potential as it provides the capability to generate data lineage and a complete set of rich documentation and output in multiple formats.

The tool supports multiple target update methods.

Support for multiple design methodologies including Data Vault which is impressive, although we didn't test that feature.

Improvements to My Organization:

It did not pass usability, reliability, sustainability or performance tests so there was no potential benefit other than minor documentation advantages over the current automated documentation solution. 

Room for Improvement:

In my opinion, the tool requires a different architecture that would make it a more practical solution.

Constraints include...

- Repository only supports a single target database

- No support for change data capture or delta detection!

- Must be used to stage data or it requires a redundant copy of staged data

- Relies 100% upon target database server with no ability to scale ETL to separate server

- Relies 100% upon generated procedural Sql (Tsql, Pl*Sql, etc.) so it inherits performance limitations with the enforced use of cursors.

- Relies 100% on the database engine to do all the work with no ability to offload it some of the work.

- Use of external tools such as SSIS for acquiring data to the staging area is possible but are in the background forcing you to lose some normal visibility that those tools provide. Also you lose the multitude of add-ons that some ETL tools provide such as adapters.

- Actually cannot acquire data from sources other than flat files. It supports use of linked servers (which many shops don't allow) or calling an ETL tool such as SSIS in the background for data movement. In that case you're still relying on an ETL tool for the ELT portion so then you're actually using two different ETL tools - Wherescape and SSIS, etc. instead of just one tool. Realize then that instead of just supporting an ETL tool now you have to support Wherescape and an ETL tool.

Specific list of areas for improvement:

* Performance (this is the area that needs the most improvement)  

* Efficient storage utilization

* Logging

* Count validations (data movements don't validate counts which is 101 basics to prevent data leaks)

* Scheduler reliability and transparency

* Ability to separate data acquisition/staging processing from target schema population

* Providing automatic record count validations

* Support for change data capture

* Support for in-line transformational procedure language that extends beyond database procedures

* Documentation lineage accuracy and completeness

* Support for sharing metadata repository across multiple targets

* Improvements in staging architecture requiring fewer copies of source data

* Supporting transparency into post-gens

* Providing encapsulated pre and post events for custom code that doesn't get wiped out when you re-generate supported code from the tool.

* Better support for task orchestration in a top-down way that can be easily understood.

* Support for direct parameter passing from job schedulers.

Use of Solution:

6 Week POC doing thorough analysis using specific test cases and comparing to an existing production application using a more traditional ETL tool.

Current version as of June 2015

Deployment Issues:

The tool could not meet minimum performance or productivity requirements and showed significant lack of architectural integrity so it was never deployed past the POC.  Current 5 hour process was turned into 7 days even after significant performance tuning.  Storage space required kept grossly exceeding the existing process requirements that it was supposed to replace. Even the training class on-pren was awkward as getting the tool working for all the students was an on going problem.

Stability Issues:

The scheduler often hangs with nothing to inform the user why it isn't working. Sometimes this is simply due to a developer starting a scheduler from their PC which integrates with the repository but isn't obvious when looking at the shared scheduler. Other issues seemed to be only resolved by re-generating the process from scratch and hope for a different result (which often there is), especially if the scheduler hangs. It is very easy to choose an incorrect option when generating processes only to result in something that just hangs with no error message. Then you're knee deep in database procedural language trying to debug a tool's generated code. 

The sheer resources that it consumes easily overwhelms a server and causes the database to freeze due to exceeding resource constraints. The tool generates an unexpected and seemingly unpredictable amount of staging data (both in and out of the database depending on data movement option chosen) consuming very large amounts of disk space with redundant copies of data. This tool will not promote friendly relations with your system administrators.

Scalability Issues:

The product was deemed unscalable in terms of data storage and performance. 

Customer Service:

Very good

Technical Support:

Very good, although this was a POC so I can't speak for production support after the product is purchased.

Previous Solutions:

Currently use a traditional ETL tool that meets our requirements. The test scenario involved around 250 GB of source data with a resulting 50 GB (compressed) star schema target.  The ETL tool can talk to all our sources including multiple cloud sources using SOAP and REST web services and this tool is just a database procedure generator with no adapter capability. It instead relies on ETL tools to do that work so it doesn't eliminate ETL tools ending up having to support not one tool but two tools - both in licensing, support and expertise which was expected to cause significant increase in total support costs.

Initial Setup:

The tool relies on ODBC connections requiring complications in security in an AD environment over using OLEDB. 

It requires a developer to either access the server directly or setup file shares. It was often confusing when doing development from a PC but working with a scheduler on a server and knowing what files need to be available from the PC verses from the server. The tool requires a great deal of attention to disk usage and location especially when using native database load utilities as it unloads all the data first into temporary files during the load which can easily consume your servers available space in unpredictable ways. Getting the environment functioning just for the training class was unexpectedly difficult.

The metadata repository is unique to a single target database and cannot be shared with any other target which prevents use beyond a silo datamart. This means data lineage is a silo repository for each target.

Implementation Team:

Software vendor was the lead implementor.  

ROI:

The ROI was expected to be very negative to the point of failure.

Cost and Licensing Advice:

Factor in the price of specialized consulting who know this product. They're hard to find and expensive.

Other Solutions Considered:

The existing traditional mainstream ETL tool and the previous one have been functioning without problems but the concept of a self-documenting tool that provides a common framework and shared metadata repository was a compelling alternative. This tool was already in use in our shop in a limited way so it begged the question on whether it could serve an expanded role.

Advice:  Before choosing any tool it is best to do your homework and compare features to competitors and test the tool in a real life volume scenario. Software is never what its advertised from any vendor. Don't take other's word for it, especially if they have a vested interest in you using the tool. I know of testimonials about this tool and others that are based on wishful thinking and not reality. Caveat Emptor my friends. So the real question is how far off is it and does it have show stoppers that will prevent it's use in your environment?  Fail to take this precaution at your and your organization's peril.

Other Advice:

The compelling selling point for this and similar tools as "datamart in a box" providers is that they are pre-built frameworks and documentation. The opportunity is that staff with little data warehousing experience can get up to speed easily and not have to worry about building logging, orchestration and error frameworks or even know much about data warehouse design.

Unfortunately this tool's basic architecture has severe flaws which make it likely impractical for most real world marts or data warehouses. It has some significant cons in the following areas:

- Metadata support specific (cannot share across multiple) target database

- Unreliable lineage

- Lacks capability to interface with application API's 

- Lacks out-of-box CDC control capabilities

- Poor/outdated GUI (looks like early 1990s)

- Lack of available resources in the market place who can operate this tool

- Lack of customer base and information available

- Primitive logging and error handling

- Forced duplication of staging data multiple times

- Inability to separate source data capture from targets (tightly couples sources with targets)

- Stores all objects in single target database causing number of issues such as backup/recovery, fragmentation, limited organization options for objects

Disclosure: I am a real user, and this review is based on my own experience and opinions.
5 visitors found this review helpful
E6665cef cd09 4b6a b625 ef5904d52c12 avatar?143619331300c25f17 c00c 496d b2b2 0457e4be50bc avatar27873816 6123 421f 8c65 2fde076f1116 avatarAnonymous avatar x30Anonymous avatar x30

8 Comments

27873816 6123 421f 8c65 2fde076f1116 avatar

Thanks for this detailed description of your PoC with Wherescape Red, very informative.

Like (0)14 April 16
80d0a570 5e04 4358 9147 4cea5e791b8f avatar

Interesting

Like (0)28 April 16
00c25f17 c00c 496d b2b2 0457e4be50bc avatar
C LewisReal UserTOP 10

I am sorry to hear that your POC did not yield the results you were looking for. Having said that it occurs to me that perhaps you were looking for a new ETL tool and not a Data Warehouse management tool like WhereScape RED.

We have successfully deployed WhereScape RED to manage a multi-terabyte data warehouse with over 50 subject areas the largest of which holds 4+ billion transaction rows. After 3 years of use I am extremely satisfied with it capabilities. Having said this there are clear areas in my mind where this tool excels (assumes proper usage of tool explained below):
- Standardizing common load logic for typical data warehouse structures
- Documenting structure and dependencies
- Ability to leverage modern features of underlying RDBMS
- Deployment packages to manage change

I would not consider using WhereScape as an ETL tool. It is a meta-data management tool for the creation of the DDL and the auto-generation of stored procedures to load tables at its most basic. Yes, you can create custom stored procedures and generate SSIS packages but in my mind these features are in support of the need to extract data more efficiently from source systems and should not be used as alternatives to a proper ETL tool or custom development project. In other words, you probably will need an ETL tool in addition to WhereScape RED.

WhereScape RED's features, however, do require the discipline and training to use the product in a prescribed manner and I will admit this can be a real challenge. Not because the tool is difficult but because people are difficult. I have very capable developers who would either: a) say they don't need a tool because the can code it better (probably true), or b) don't want to spend the extra effort to use the product as designed because there is a quicker/easier way. Each of these cases typically leads to a developer writing a custom stored procedure within WhereScape (which is absolute supported).

However, in both of these cases you lose most of the benefits I described above. To balance the needs of my team and the desire for a better managed data warehouse, we have separated (as much as possible) the heavy ETL from the loading of the data warehouse. This approach was worked well for us but it should be noted that we are heavy batch load shop as our typical loads range from 100K - 1 billion rows that are processed.

As for the data being staged multiple times, in my opinion this is a feature of the product. WhereScape attempts to move the data to the target platform and then leverage the investment in the database platform. For us this works fantastic with our data loads, I can move very large tables around with easy (at the cost of disk). For us the disk space is a small price to pay for the performance we gain. Furthermore by using native database features like table compress and columnstore indexes we can greatly minimize disk I/O and these features are available through WhereScape RED.

Lastly, I will agree that WhereScape needs to allow for the separation on their meta-data from the data managed in the data warehouse. Additionally we need the ability to manage all object in any database we choose. I am confident, given recent releases of WhereScape RED we will see this feature in the future.

Like (3)03 June 16
Picture 1288 1365323206
GaryMReal UserTOP 5POPULAR

C Lewis, that's great you've been successful at using the product. I'm completely baffled how you made it work with any volume of data but I'm glad to hear someone can make it the tool work for them.

Like (0)14 June 16
C79bcaaf 3c7e 4adb b9e8 5b428fdd6854 avatar
Joe HortonReal UserTOP 20

Gary, that is a exhaustive description of your POC!

We have been using Wherescape (WS) for nearly 2 years now and how many ETL/ELT tools can take a data source say with 600 tables and get it up and running in well under a week? NOT very many and with that the audibility, tractability, documentation - things that sometimes get left behind because your in a rush to production!

Now we are nowhere near the size of CLewis; in fact our whole target area from WS is at the moment about 408 GB. We have a few tables that have in excess of 200 million rows, but most are under 1 million.

Some of the features you point out that are lacking; I do agree with. My biggest annoyance is proper baked in code for change data capture - but with this shortcoming you build one stored procedure that is reusable across your objects. With one fatal swoop, because it is all metatdata driven, you can easily tell each object to make use of a reusable stored procedure to accomplish CDC.

I was on a big govt Informatica project about 4 years ago - and even this industry known tool has it's own challenges to some of the basic things your were frustrated with WS. SSIS has it's shortcomings. If you sat an seasoned Informatica developer against a WS developer and let them go side by side on 500 tables - Informatica won't even come close on how fast to completion.

Each release there are improvements, as with most products out there, but many things to refine and do better in the future will come. I really believe the metadata driven side of this is their best strength.

In the old days I was an advocate to follow the Kimball methodology, in fact I was a professor at the college teaching this course - but WS turns this notion on its head - instead of spending so much time gathering requirements, and telling the customer "no" when they were asked what they wanted brought into the DW, and they responded "well everything" - you now within an hour or few hours can stand up a small CUBE to allow your customer to browse some of their subject area data and when you see their eyes light up - it's pretty rewarding. I was really good with other ETL tools - but I could never, NEVER get source to a CUBE within hours - or possibly have the whole 500 tables productionalized within one week - what would really have taken me a couple months using my old best practice and reausalbe components with other ETL tools.

They have some things to improve on - but they all do. In fact your analysis was pretty spot on for some of the weaknesses, but their strengths outweigh most of these issues.

I really do recommend anyone looking for such a tool to have a look at it. I've been in this industry for over 18 years, and I've never come across a company with such great customer support. They do answer the phone, email - and will WebEx at a moments notice to help you out.

Like (1)18 November 16
Picture 1288 1365323206
GaryMReal UserTOP 5POPULAR

Joe,

Actually we spent 6 weeks and the vendor was doing it and the requirements and all the coding and staging was already done so the amount of actual work to do was relatively small. And there was pretty much nothing usable at the end of that.

The issues that the consultant was having were clearly all around the constraints of the tool and he was quickly having to do significant custom coding to produce something that would actually work. And that's an expert in the tool.

As to ETL tool shortcomings, certainly. You can read my review of SSIS. Its hardly a perfect tool but its easy to use and meets my expectations more than any other ETL tool I've used. I've used about a half dozen ETL tools in my career and they share this common attribute - none are perfect. But there are really significant differences in terms of quality and reliability . I don't look for perfection but rather software doing what it is advertised to do and whether its a good fit for the given circumstances. All software works to some extent. The larger question is, what kind of resourcing does it take to support it and is it a good fit for a given organization given the data sources, targets, whether mart or enterprise, data volume, frequency of update, existing knowledge of staff, etc. Also - does it address the concerns? I have no concern about producing 500 of the same patterned staging - I can generate that with XML. What's difficult is design challenges such as de-duplication, householding, data cleansing, master data management and the list goes on and on - none of which is addressed with this tool. In my view this tool simply makes what's already really easy - possibly just as easy at best. At worst it won't work at all (such as I have cloud-based sources which are plentiful with mainstream ETL tools and non-existent with Wherescape).

Once again, I'm glad someone else was able to use the tool and feel it has value. I don't know how but its good to know its possible in certain circumstances. I can only speak for my own experience.

Like (0)19 November 16
6839ef5c d1b2 44f2 9a2f a853be17d1fc avatar
RayPReal UserTOP 20

Great summary, Gary. Thanks for the thorough review.

Like (0)24 April 17
Picture 1288 1365323206
GaryMReal UserTOP 5POPULAR

An update regarding this software review I want to mention; this review is now 2 years old so if you are evaluating this software you should consider any improvements made since that time.

I still stand by my initial assessment for the software that was current 2 years ago. I would also point out that I had no vested interest in either a positive or negative review as do many reviewers (selling vendor, consultants, resellers, etc.). and that it was based on my 20 years of data warehousing experience.

Like (0)06 July 17
Anonymous avatar x30
Guest

Have A Question About WhereScape RED?

Our experts can help. 218,478 professionals have used our research on 5,616 solutions.
Why do you like it?

Sign Up with Email