SAP Data Hub Review

Good push-down approach, on-premise connection, and integration with SAP products, but needs better performance and integration with other solutions


What is our primary use case?

It is build based on the IoT implementations that we have. We are capturing a lot of data for a warehouse and manufacturing company. We capture data about how their machines are performing. For example, for a dishwasher that this company is selling to different clients, we are capturing data about its usage or how different clients are using this dishwasher. We are putting this data inside the IoT big data lakes. 

From the data lakes, we are integrating with their SAP S/4HANA system via SAP Data Hub. We are capturing SAP HANA data and taking it out as graphs via SAP Analytics Cloud. They have the data in SAP HANA, and we use SAP Data Hub for smooth integration with the SAP HANA on-premise products. We are using the SAS version of SAP Data Hub.

How has it helped my organization?

We took the data about the usage of their products and their business data from SAP HANA, which included sales order data, purchase order data, and the data about different sales that have been done and how have they used their manufacturing. We then gave a specific analytics cloud data output, such as how frequently and which particular buttons are the most pressed on the dishwasher, at what particular temperature they are keeping their knobs, and which product version is being sold more and used widely.

On the basis of those graphs, they are changing their production line. They have removed a few products that were not getting sold at a large cost, or they were being sold, but some of the features were not being used much. Based on the data, they have removed such features and reduced the cost. Let's say they had a few extra features. They gave two hot water features, and they gave cold water features, but none of the users used two hot water features. They only used one hot water feature. Based on this data, they can remove a feature to reduce the cost of the product and sell more.

What is most valuable?

Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. 

Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. 

They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have.

What needs improvement?

In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time.

They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power.

Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure.

For how long have I used the solution?

It came out in 2017, and I first worked on it in 2018. At that time, it was not that stable for my use case. I used it for a POC development, not for full-blown application development, because the client was checking out a lot of different tools. It was new in the market, and no one usually trusts a solution in the beginning. As a proper full-fledged solution, I have been using it for the past one and a half years. They have been working on it, and they have pretty much stabilized it.

What do I think about the stability of the solution?

It was not stable when I first used it in 2018, but they have pretty much stabilized it now.

What do I think about the scalability of the solution?

I never got a chance to scale this particular product, but it is based on microservice architecture. So, it should be easily scalable on a cloud platform via microservices and Kubernetes containers. It has such capabilities. It should be scalable because most of the products of SAP are now built over those terms. 

How are customer service and technical support?

You easily get a reply for all SAP products. We have raised tickets for support. If it is a medium priority ticket, they generally reply within five to six hours. Sometimes, they reply within a day. If it is a high priority ticket, they generally get back in one hour, and they just call us. Based on my experience so far, their support is really good. It might also depend upon how many products you buy from them.

How was the initial setup?

It is a cloud solution. I didn't use an on-premise setup, but you may get some complexity for an on-premise setup. If it is on the cloud, there is not much to it. It is a software service. We just bought the license on the cloud, and we started using it on their cloud platform. It is just an activation, and for buying it, we have a Program Manager Office (PMO). We don't deal with costs.

They provide the product with a pre-functional setup. SAP only takes care of their setup part. We got the solution from them within seven or eight hours, and we then started using it. The main part is the configuration or integration for our different systems, which we have to do. 

In terms of the implementation strategy, we had kind of four phases: discovery, joining, implementation, and management or orchestration. We determined the sources from where we needed the data and the output that we want. After this, we had put all the pieces together or connected them. We then determined the data that needed to be referred to and the kind of filtering and transformation that is required because we don't use all the data. After that, we had the output and the orchestration part to manage it.

What about the implementation team?

In terms of people involved in the deployment or implementation of this solution, it depends upon the implementation. For data science and machine learning, we have a senior data scientist and a junior data scientist. The development and integration part was done by us, and we have two people for that, one junior and one senior. We also needed a Hadoop or data lake person for any kind of information from Hadoop. We had one person for the analytics cloud, and then we had one management person and one deployment person to configure this solution. So, overall, we had around 10 to 11 people for its implementation. Later on, we only needed two people for management. They work on a rotational shift. One is available at night and one in the morning.

What other advice do I have?

If you have a very big landscape and wide varieties of only SAP products, it can be a good solution, but still don't blindly go for this solution. If you have all SAP products, don't think that SAP Data Hub will be your best product. There are a lot of solutions in the market. You need to go and properly analyze your options. 

It has seamless integration with SAP solutions but not with solutions from other companies. Go for it only if you have different SAP solutions such as HANA, SAP Data Services, SAP API, and SAP SDI or soft system data integration. In this kind of setup, it could be a central place to orchestrate and manage all these different services of SAP, which other products may not give you. If you have only one or two products of SAP, and the rest of the products are from different brands, it is not a good product to go for. 

I would rate SAP Data Hub a six out of ten.

Which deployment model are you using for this solution?

Private Cloud
**Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Find out what your peers are saying about SAP, Collibra, Informatica and others in Data Governance. Updated: June 2021.
522,693 professionals have used our research since 2012.
Add a Comment
ITCS user
Guest