What is our primary use case?
We use ActiveBatch to run the data warehouse production batch schedule, which is 24/7. We run, on average, about 200 distinct workflows each day to update the warehouse. And once the warehouse tables are loaded, we trigger our business intelligence reports and our analytics reports. We also use ActiveBatch to run a software tool called iCEDQ for data quality, as well as some Alteryx jobs.
Our production servers are in a co-location, and the solution is deployed onsite there.
How has it helped my organization?
Before we had ActiveBatch, we used the Informatica Workflow Scheduler, and we would have to start a downstream workflow, but have it wait for the completion of the first one by a trigger file. So "Workflow B" would be waiting for a control file that said "Workflow A" is done. If we had to do reruns — sometimes we would create a control file by mistake and that would throw off the next day's run — and we'd have to do manual reruns. With ActiveBatch, it's very easy to say, "Workflow A is done, run B," and onward: "Run C, Run D," as soon as they're done. You don't need to worry about whether a control file was created, or how long is the job going to wait for. It gives you much simpler and easy-to-understand control of the flow of jobs, as they run.
Using ActiveBatch hasn't really reduced our code base because we would be developing these workflows in Informatica if we weren't using ActiveBatch. But the scheduling and integration into the batch schedule for something new are much simpler and save us a little bit of time, now that we have everything developed, for the most part. We may go a month without adding anything to our schedule and we may go four or five months without adding anything to the schedule, but it gives us an easier understanding of the flow of the data and helps us make sure dependencies are met in a more straightforward fashion than through the Informatica scheduler.
ActiveBatch hasn't really improved our job success rate percentage. If a job fails, we still get our failure messages from Informatica, and in some cases from ActiveBatch. The biggest benefit is that the biggest issue we were having was the timing of all of the downstream applications from the warehouse, and it has greatly improved that.
And it has saved man-hours, although it has not reduced headcount. It has saved man-hours in that situation when we would have issues and our old scheduling solution would break down because of them. This allows us to not have to worry about how to start the downstream applications, based on the warehouse. I would estimate it saves us about 20 hours per month.
What is most valuable?
One of the valuable features is the ability to trigger workflows, one after another, based on success, without having to worry about overlapping workflows.
The ability to integrate our BI, analytics, and our data quality jobs is also valuable. We used to have everything set up just based on time: Run the data warehouse until five in the morning, run BI at 5:30 in the morning. There were times that we missed the deadline so that when the BI jobs would run, the data would be incomplete, or we had a big gap in time where we were missing out on starting early. It has really saved us a lot of man-hours compared to when we would have a data issue and we would have to manually restart all of the downstream jobs, after the warehouse.
ActiveBatch also provides us with a single pane of glass for end-to-end visibility of workflows. That simplifies the process when we check to see if things have run or how they're running. The Map View feature makes it easy to see what the dependencies are. It's helpful to have a visual, top-down look, from start to finish, at what flows are running when you need to look into that.
In terms of the unlimited bandwidth, as far as I can tell it's handled all of our volume without any issues whatsoever. For the analytics stuff and the business intelligence stuff, I don't keep track of how many jobs they have running each day. I can only really check the warehouse, but as far as I can tell it has handled the total volume of our needs without any issue whatsoever.
We use event triggers and file events, and one job we have uses email triggers. Especially for the business side, if they have a list of call center people or a list of promotions or some costing information that they need loaded into the warehouse, it allows us to say to them, "We don't need a dummy file and we don't need a blank file. Whenever you have a file ready to go, just put it on a shared drive and the job will automatically pick it up." So it simplifies our interactions with the business and allows them more flexibility to get their work done. The triggering doesn't so much reduce delays but it alleviates the need either to have the business create a dummy file or to code the job in such a way that if it doesn't find a file to run each day, it won't error-out or have to send an informational message. If we get a file a day, or if we get five files in a day, or if we only get one file every six months, the job just runs when the business has the data available, without our having to worry about it.
What needs improvement?
We also use an Oracle trigger, although we've had inconsistent performance with the Oracle trigger. It had to do with the timing of the Oracle logs. The Oracle trigger function wouldn't work because Oracle had a lock on the archive log file. We have had a couple of cases where we had to remove that Oracle trigger function from our schedule. But we still use it for some cases.
The thing I've noticed the most is the Help function. It's very difficult, at times, to find examples of how to do something. The Help function will explain what the tool does, but we're not a Windows shop at the data warehouse. Our data warehouse jobs actually run on Linux servers. Finding things for Linux-based solutions is not as easy as it is for Windows-based solutions. I would like to see more examples, and more non-Windows examples as well, in the Help.
For how long have I used the solution?
I have been using ActiveBatch for almost five years.
What do I think about the stability of the solution?
Stability has been excellent. In the four or five years I can't even think of a time when the scheduler went down. We use two agents for production, and a scheduler and two agents for tests, and I can think of maybe three times that we had to reboot one of the agents. But I can't think of a time when the scheduler actually went down.
What do I think about the scalability of the solution?
It seems very scalable. We use a very small portion of the functionality and the available types of jobs. Of the job steps in the library, we only use about 2 or 3 percent of them. We bought it for a specific purpose and it served our purpose quite well.
How are customer service and technical support?
We have used the technical support. On a scale of one to 10, I'd give the Knowledge Base a six or seven. I would give the actual support folks an eight-and-a-half or nine.
It just depends on who you get to respond to your question or to your issue. We've had folks that have been excellent and have pinpointed the problem right away and given us a clear solution to our problems. And there have been times when we have gotten someone who doesn't quite understand the product and it feels like we're providing them more answers than they're providing us. That's been rare but I can think of at least one case where we had to say, "Can you put somebody else on or ask for some help on our question?" And they eventually did, but it was kind of frustrating. But for the most part, it's been fine.
Which solution did I use previously and why did I switch?
Ninety-five percent of the warehouse jobs that we run that were Informatica jobs have been replaced with ActiveBatch. We have a couple of jobs with some specialized logic that we haven't taken the time to figure out how to do in ActiveBatch yet. Of the 200 workflows, we run a day, 190 of them or so run through ActiveBatch.
What was our ROI?
We have seen ROI with the solution. It has simplified the warehouse job flow, our analytics workflow, as well as our business intelligence and data quality workflows. I don't know the exact cost per year of the solution, but it has simplified and made things much easier to understand in terms of dependencies among our data flows.
What other advice do I have?
The breakthrough for us was when we were able to take completely different software tools and integrate them into one long flow of data. We have our Informatica jobs which then trigger some PLC to SQL jobs in ActiveBatch, but they also trigger Alteryx jobs, which is its own software tool. It can integrate and execute iCEDQ, which is its own software, as well as Tableau. The ability to trigger those jobs from completely different software tools, in one flow, has saved us a lot of time and a lot of headaches.
Don't be afraid to dig in and try things. I said one of the weaknesses is the Help, but the Help function has helped me figure a few things out. We have jobs that update the pager email to go from an offsite pager to an onsite pager and back again. So don't be afraid to take the time to try to figure something different out. There are some useful things in the Help.
I'm the primary person using ActiveBatch in the warehouse. A month ago, we had a lot more people using it, but in the travel industry we've already had some severe layoffs. There were 10 people using ActiveBatch. They were all data analysts or data quality analysts, and I am the data warehouse developer. There were also business intelligence developers.
Which deployment model are you using for this solution?