What is our primary use case?
It handles all of our scheduling. All our batch workload runs through it. We use MOVEit for our file process transfers, so we don't use Stonebranch for that.
We have MOVEit integrated with our scheduling, so we run commands to MOVEit from scheduling.
We're running about 46,000 tasks per day.
How has it helped my organization?
Stonebranch enhanced the digital transformation at our company, through the dashboard. We run a customized dashboard for our operations team, where we can quickly see the issues. We can see running task. We use a lot of started and finished, late finished, late started. We also have some gauges where we can see jobs that exceed their average runtime or estimated end-time. It's helped our operations team see issues ahead of time, instead of four hours later when we've already gone past the point of no return.
In terms of our system operators, we're bringing guys off the street that pick this stuff up within two weeks, and they're flying with it. It just seems like it's so easy, once they get the baseline down. Then it's just boom, and they're off and running. There's some work to get that initial understanding, where to go to find what you want, and then these guys are flying with it. Our older people that have been here for a few years were the ones that struggled with the new technology.
What is most valuable?
Their agents are the simplest. They're easy to install, they're easy to get up and running. We do a particular kind of access on our servers for ubroker and then I have the directory created by my Unix admin. After that, I don't have to get them involved anymore. I can install, upgrade, I can name the aliases on the agent, so if we need a passive environment for an agent, that's one of the nice features. If our primary goes down, I can bring up the passive one and I don't have to change anything in the scheduling world. It will start running from that new server.
The agents have treated me very well. I have found the agents to be so much simpler, when compared to ESP. I haven't been exposed to the other tools, how they run their agents, but Stonebranch's agents are by far the simplest I've seen to download, install. I'm up and running within a half hour on it.
Task monitors work extremely well. We haven't had any issues with them: jobs monitoring another job to finish. We do have some that look into the future, but most of ours look backward. We have some that look back two days. The job running on a Friday looks for something that ran on that Wednesday and knows it ran successfully, and the schedule keeps right on going.
They brought in a web service task, which saves you running an agent on a server. I can send http commands directly to servers, which can start processes on that server itself, based on a file coming in. There is an agent cost, but there's a certificate that you have to put on. There's a little more background work for myself, because I have to keep the certificates up to date.
What needs improvement?
One hiccup we've had is due to the fact that we have other internal scheduling tools. We're able to talk to them, but we have trouble with some of the networking between them, so we're still trying to work out the kinks there.
Also, there's the z/OS agent. We've had troubles with GDGs, with recovery. Say we have a job that fails on a Saturday and there are other jobs that update that generation. If they go to fix the one from Friday, it picks up right where it left off. It doesn't know about the future generations that were created. We've been trying to have Stonebranch correct that for us, and that's probably the biggest open issue. And they're the hardest ones to install and upgrade. Mainframe, in general, seems to be a hurdle, in my opinion.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
In terms of resiliency, we run the high-availability, so I have two controllers, one in passive and one primary-active. We switch every month for patching, and the passive one takes over without an issue.
With our database patching, we can see when that stops and when the controller goes into a pause. But less than 15 seconds later, we're back up and running again. There are no job failures associated. It takes off right away. In the patching world, we've seen a significant improvement.
What do I think about the scalability of the solution?
It has high scalability. It's easy to use, it's easy to run with, it's easy to get it turned on and going.
How are customer service and technical support?
I found tech support to be very knowledgeable. They seemed to go above and beyond. I will admit, my former copilot here actually started working for Stonebranch. She went beyond expectations but she's no longer working with support. She moved on to a storage administrative role. But overall, they seem to be very knowledgeable. Within hours, they're getting back to you on a fix. Most of the time, they will provide the fix right upfront or tell you what you need or how to do it. I'm very comfortable with their expertise.
Which solution did I use previously and why did I switch?
The Universal Controller is what I'm running. I like it for web interface. There were four or five products we were reviewing. We came from an ESP shop but we didn't like their web interface. We were leaning toward a web interface and this was the only tool that had it. I believe a couple of them are close, but we didn't like their features. We liked this one better. We have the Controller on our own server, inside the company. I have read some of the cloud stuff and we have other products going in that direction, but I haven't been asked to go that way.
We had an elective in 2013 to get off the mainframe, so we jumped to get going with a different scheduling tool but, guess what, we're still on the mainframe.
How was the initial setup?
I wasn't very Unix-savvy when I started, but Stonebranch came in and showed me how to do it. One of the hurdles we had was that we went with their 5.1 version, and then they had to completely change their mapping, and I ended up doing that all by myself. I ended up copying everything over into a new version, I promoted everything we had over into the new one, and it took me less than an hour to do it. So the conversion to an entirely new server was very easy.
We were 97 percent effective when we converted, but they converted most of it for us, upfront. They were onsite to help us with the conversion. We had a couple of kinks with them. ESP had some inherited dependencies that we overlooked and that was the biggest hurdle we had. We had to break some of the connections for predecessors and successors, but they built all that the same night we went live. We were able to get that going and fixed. There was an AIX agent we had some issues with, but an hour later I had the new version installed and up and running and we were on track.
Our initial deployment took us a little over a year. Stonebranch was onsite. They started converting. We ended up identifying some 50 schedules that were stand-alones, where they didn't impact anything. In the space of seven months we turned them on, and then our peak window hit and we couldn't do any changes from November 1st to January 1st. We waited until after our peak window and I believe it was during the first week in February that we went with everything else. We got a taste of what was happening, and then we put everybody else in.
Our implementation strategy was to get everything converted. We did that first seven months by ourselves, we just turned things on and let them run. We had three people from Stonebranch onsite for our go-live night. They worked eight-hour shifts. My co-compliancer and I ended up pulling two 12-hour shifts, and then we had a third person who helped us out in between them, so we could at least get a bite to eat, or walk away, or unload some of the issues that we were seeing. But most of them were pretty minor. We met our SLA opening morning for our batch processing. We were not behind.
It went very smoothly. There are always going to be some hurdles you have to figure out, but we were expecting bigger hurdles, and we didn't see those really big hurdles.
What about the implementation team?
The Stonebranch reps were extremely knowledgeable. It didn't take much for them to figure out what we wanted, how to do it. Danny Provo was one of the primary guys for us. We had some unique things that we had to have converted, and they came up with a solution for every one of them.
What's my experience with pricing, setup cost, and licensing?
When we reviewed this solution against other vendors, Stonebranch blew everybody out of the water in terms of cost.
There is a maintenance cost that is required every year.
Which other solutions did I evaluate?
We evaluated BMC and we looked at CA's products - we already had CA in house. Tivoli was another we looked at. There were four or five on the list, and we dropped it down to three pretty quickly.
BMC would have been in the running, but they were... "arrogant" is the word I want to say. They just brushed us the wrong way. I think they have a great tool but the sales pitch that they sent to us did more chopping of other products instead of selling their own.
What other advice do I have?
If I were to go to another company, this would probably be the tool I would push for. It's a very sound product. I feel Stonebranch is on the technical edge. I've been to a couple of their conferences. They are going into areas and blowing my mind with where they're going with some of this stuff. They're trying to stay on top of the cutting edge.
When you go to their conferences, you hear how other people are utilizing the tools. Something might spark a concept, where I say, "Maybe I can do that."
We use ServiceNow as our problem management tool, so I'm trying to automate tickets to go into that, but we haven't made it that far yet. We send an email on every task failure over to a public folder, and that's what our operations team copies and pastes. Then they update another gauge in our dashboard so they know that somebody's working on that. Then we have some warning issues. We have things that go into define states, because they could be a sub-apple of a main workflow. Or we have workflows that stack up behind each other because they're the same name. We use resources to control everything. If we do have a maintenance window, I'm using a resource to set it to zero, so any workload coming in after that is waiting for our operations team to release or get the okay after a maintenance window has been performed.
I'm the primary for the maintenance. I have a backup, but he's more my MQ guy. I support MQ as well. I do all the maintenance and controller, so it's one person primarily doing it all. We have three production-control people who do batch scheduling, for new schedules, obsoletions. They reverse their procedures every week. One's doing scheduling, we have one doing user-requests, ad-hoc requests that are coming in on a daily basis to insert into the schedules to run. We have a schedule that we call Production Control and that's where all our user requests go; users who want to run this or that today, that's where they would insert it and run it.
We have about 120 users. They include our DevOps team. We used some business services to lock down some of their pseudo test schedules. We run a production internet environment, and the data that comes out of that actually goes into our development environment, for their testing. We use business services to lock that down. They have eight people who can update tasks, create tasks, etc. That's the only place we're using business services.
We have seven groups. The Administrative group and the "Everything" group comes with the tool. But then we created seven more groups. We strayed away from the default groups and made our own. We have ops-wise-admin, which is the administrative group. We have an ops-wise-all group, which is just readability. Somebody can get into that group and they can see ops-wise, they just can't make changes. Developers is our biggest group. In production, they only have read access, but in our development areas they have full-blown access. We manipulated the permissions to help control production over development. Ops-wise-IT is another group similar to ops-wise-all. I don't know why we had to have that one to give IT some extra abilities, but that's what we did. And we have an operations group for our system operators. They have capabilities to restart workload based on a programmer's request, a plan of failure. They can make modifications to the active instance, but they can't make modifications to the definition. That's how our change control comes into play. Product control has the same access as ops-wise-admin, but they just can't do upgrades.
In terms of the prospect of increasing our usage of the solution, we're looking into the cloud situations, but I haven't been asked how to go that route. Doing it would be a matter of putting an agent out there in the cloud world. Security is the biggest hurdle for me, sometimes; trying to get access. Some of our servers are behind firewalls. It's usually a matter of talking to the right people to get the job done, but I probably have seven agents that are behind firewalls and working just fine.
I run four controllers, but I have six in place. I have two that are high-availability. We were struggling and this is probably an issue with Stonebranch. We had developers who were making changes in our test and development areas, and then we would promote them up to production, but we started having conflicts with sysids. What would happen was that a developer would make a copy of what he wanted to change, and he would go back and rename the original task to "old" or something like that, and then rename the new one to the originally named task. The sysids were now out of sync. Sometimes they would bundle up okay, but once we started seeing a larger volume of them, we started having bundling issues and failures. We elected to go with what we call our change-control environment. It's almost a mimic of our production environment, but now our production control team actually updates the original task upon request. They make the changes in their development and then they submit a change request to have this copied into production or updated into our change-control environment, so we can keep the sysids from getting out of sync. Sysids were probably one of our bigger hurdles, after the fact.
There are no agents running on our product-control system. We variable-ized all our agent definitions and we variable-ized all our credentials. With scheduling, if you hard-code the agent name or the credential, it will actually bundle it up like that, but if you variable-ize them, you can keep them unique between the two systems. In production, this is a production credential, but in test they use an LE-dev credential name. When we go to move that up, it still thinks it's just LE, because we variable-zed it.
Especially when going to a new server - if they want to rebuild a whole new server - all I do is install a new agent as "_new," and the alias name will be whatever, and then, go-live, I just swap the names and scheduling isn't impacted at all. It's pretty sweet the way that works, using the aliases.
I can remember with ESP, we had to have tons of schedule changes and agent name changes to the new one, whereas ops-wise took a lot of that away with the use of variables.