What is our primary use case?
I am responsible for both a combination of documenting our existing data models and using erwin Data Modeler as a primary visual design tool to design and document data models that we implement for our production services.
My primary role is to document our databases using erwin to work with people and ensure that there is logically referential integrity from the perspective of the data models. I also generate the data definition language (DDL) changes necessary to maintain our data models and databases up to our client requirements in terms of their data, analytics, and whatever data manipulation that they want to do. I use erwin a lot.
It is either installed locally or accessed through a server, depending on where I have been. I have had either a single application license or pooled license that I would acquire when I open up erwin from a server.
How has it helped my organization?
We get data from many different sources where I work. We have many clients. The data is all conceptually related. There are primary subject area domains common across most of our clients. However, the physical sources of the data, or how the data is defined and organized, often vary significantly from client to client. Therefore, data modeling tools like erwin provide us with the ability to create a visual construct from a subject area perspective of the data. We then use that as a source to normalize the data conceptually and standardized concepts that are documented or defined differently across our sources. Once we get the data, we can then treat the data that has been managed somewhat disparately from a common conceptual framework, which is quite important.
At the moment, for what I'm doing, the interface to the physical database is really critical. erwin generally is good for databases. It is comfortable in generating a variety of versions of data models into DDL formats. That works fine.
What has been useful, I have been able to reverse engineer our existing data models to document explicitly referential integrity relationships, primary/foreign keys in the model, and create ERDs that are subject area-based which our clients can use when working with our databases. The reality is that our databases are not explicitly documented in the DDL with primary/foreign key relationships. You can't look at the DDL and explicitly understand the primary/foreign key relationships that exist between our tables, so the referential integrity is not easily understood. erwin has allowed me to explicitly document that and create ERDs. This has made it easier for our clients to consume our databases for their own purposes.
What is most valuable?
Its visualization is the most valuable feature. The ability to make global changes throughout the data model. Data models are reasonably large: They are hundreds, and in some cases thousands, of tables and attributes. With any data model, there are many attributes that are common from a naming perspective and a data type perspective. It is possible with erwin to make global changes across all of the tables, columns, or attributes, whether you are doing it logically or physically. Also, we use it to set naming standards, then attempt to enforce naming standards and changes in naming from between the logical version of the data models and the physical versions of the data models, which is very advantageous. It also provides the ability to document primary/foreign key relationships and standardize them along with being able to review conceptually the data model names and data types, then visualize that across fairly large data models.
The solution’s visual data models for helping to overcome data source complexity and enabling understanding and collaboration around maintenance and usage is very important because you can create or define document subject areas within enterprise data models. You can create smaller subsets to be able to document those visually, assess the integrity, and review the integrity of the data models with the primary clients or the users of the data. It can also be used to establish communications that are logically and conceptually correct from a business expert perspective along with maintaining the physical and logical integrity of the data from a data management perspective.
What needs improvement?
We are not using erwin's ability to compare and synchronize data sources with data models in terms of accuracy and speed for keeping them in sync to the fullest extent. Part of it is related to the sources of the data and databases that we are now working with and the ability of erwin to interface with those database platforms. There are some issues right now. Historically, erwin worked relatively well with major relational databases, like Oracle, SQL Server, Informix, and Sybase. Now, we are migrating our platforms to the big data platforms: Hadoop, Hive, and HBase. It is only the more recent versions of erwin that have the ability to interface successfully with the big data platforms. One of the issues that we have right now is that we haven't been able to upgrade the version that we currently have of erwin, which doesn't do a very good job of interfacing with our Hive and Hadoop environments. I believe the 2020 version is more successful, but I haven't been able to test that.
Much of what I do is documenting what we have. I am trying to document our primary data sources and databases in erwin so we have a common platform where we can visually discuss and make changes to the database. In the past couple of years, erwin has kind of supported importing or reverse engineering data models from Hive into erwin, but not necessarily exporting data models or forward generating the erwin-documented data models into Hive or Hadoop (based on my experience). I think the newest versions are better adapted to do that. It is an area of concern and a bit of frustration on my part at this time. I wish I had the latest version of erwin, either the 2020 R1 or R2 version, to see if I could be more successful in importing and exporting data models between erwin and Hive.
erwin generally fails to successfully reverse engineer our Oracle Databases into erwin data models. The way that they are engineered on our side, the syntax is correct from an Oracle perspective, but it seems to be very difficult for erwin to interpret. What I end up doing is using Oracle Data Modeler to reverse engineer into the Oracle data model, then forward engineer the DDL into an Oracle syntax, and importing that DDL into erwin in order to successfully bring in most of the information from our physical data models. That is a bit of a challenge.
There are other characteristics of erwin, as far as interfacing directly with the databases, that we don't do. Historically, while erwin has existed, the problem is the people that I work with and who have done most of the data management and database creation are engineers. Very few of them have any understanding of data modeling tools and don't work conceptually from that perspective. They know how to write DDL syntax for whether it's SQL Server, Oracle, or Sybase, but they don't have much experience using a data modeling tool like erwin. They don't trust erwin nor would they trust any of its competitors. I trust erwin a lot more than our engineers do. The most that they trust the solution to do is to document and be able to see characteristics of the database, which are useful in terms of discussing the database from a conceptual perspective and with clients, rather than directly engineering the database via erwin.
erwin is more of a tool to document what exists, what potentially will exist, and create code that engineers can then harvest and manage/manipulate to their satisfaction. They can then use it to make changes directly to our databases. Currently, when the primary focus is on Hive databases or Hadoop environment, where there is no direct engineering at this point between erwin and those databases, any direct or indirect engineering at the moment is still with our Oracle Database.
For how long have I used the solution?
I have been using the solution on and off for 20 to 30 years.
What do I think about the stability of the solution?
It is pretty stable. Personally, I haven't run into any real glitches or problems with the output, the ability to import data when it does work correctly, the export/creation of DDL, or generation of reports.
We are trying to upgrade. This has been going on now for several months. We're trying to upgrade to the 2020 version. Originally, it was 2020 R1, but I think at this point people are talking about the 2020 R2 version. Now, I'm not part of our direct communications with erwin in regards to Data Modeler, but there are some issues that erwin is currently working on that are issues for my company. This have prevented us from upgrading immediately to the 2020 version.
What do I think about the scalability of the solution?
This gets down to how you do your data modeling. If you do your data modeling in a conceptually correct manner, scaling isn't an issue. If you don't do your data modeling very well, then you are creating unnecessary complexities. Things can get a bit awkward. This isn't an erwin issue, but more a consequence of who is using the product.
In the area that I'm working right now, I'm the only user. Within the company, there are other people and areas using the solution probably far more intimately in regards to their databases. I really don't know the number of licenses out there.
How are customer service and technical support?
The problem is that our issues are related to interfacing erwin Data Modeler with the Hadoop Hive environments. The issues have always been either what I was trying to do was not fully supported by our version of erwin Data Modeler. People have certainly tried to help, but there's only so much that they could tell me. So, it's been difficult. I am hoping that I can get back to people with some better answers once the newest version of erwin is available to us.
Which solution did I use previously and why did I switch?
The people who were previously responsible for the database development were very good engineers who knew how to write SQL. They could program anything themselves that they wanted to program. However, I really don't think that they really understood data modeling as such. They just wrote the code. Our code and models are still developing and not necessarily conformed to good data modeling practices.
How was the initial setup?
In the past, I was involved in the initial setup. In traditional environments, it sets up pretty easily. In my current environment, where I'm trying to get it as intimately integrated with our big data platforms as possible, I'm finding it quite frustrating. However, I'm using an older version and think that is probably a significant part of the problem.
What was our ROI?
In other environments where I've worked, the solution’s ability to generate database code from a model for a wide array of data sources cuts development time. In this environment, erwin is not very tightly integrated into the development cycle. It is used more for documentation purposes at this point and for creating a nascent code which down the road gets potentially implemented. While it's not used that way at my current company, I think it would be better if it were, but there is a culture here that probably will prevent that from ever occurring.
What's my experience with pricing, setup cost, and licensing?
An issue right now would be that erwin doesn't have a freely available browser (that I am aware of) for people who are not data modelers or data engineers that a consumer could use to look at the data models and play with it. This would not be to make any changes, but just to visually look at what exists. There are other products out there which do have end user browsers available and allow them to access data models via the data modeling tool.
Which other solutions did I evaluate?
There is another tool now that people are using. It is not really a data modeling tool. It is more of a data model visualization tool, and that's SchemaSpy. We don't do data modeling with that. You get a visualization of the existing physical database. But that's where the engineers live, and that's what they think is great. This is a cultural, conceptual, understanding issue due to a lack of understanding and appreciation of what good data modeling tools do that I can't see changing based on the current corporate organization.
What other advice do I have?
It is the only meaningful way to do any data modeling. It is impossible to conceptualize and document complex data environments and the integration between different data subject areas. You can write all the code or DDL you want, but it's absolutely impossible to maintain any sort of conceptual or logical integrity across a large complex enterprise environment without using a tool like erwin.
You want to look at what you are trying to accomplish with erwin before implementing it.
- Does the product have the ability to support or accomplish that?
- Based on the technologies that you have decided you want to use to manage your data, how intimately does it integrate with those technologies?
From my perspective of using the traditional relational databases, I think erwin probably works pretty well.
For the newer database technologies, such as the Hadoop environment databases, it's not clear to me how successful erwin is. However, I'm not talking from the perspective of somebody who has been aggressively using the latest version. I don't have access to it, so I'm afraid my concerns or issues may not be valid at this point. I will find out when we finally implement the latest erwin version.
I would give the solution a seven or eight (out of 10).
Which deployment model are you using for this solution?