When your company’s growing like a bean stalk on steroids (and weight gain supplements) and you live by Lean principles and practice JIT operations, you need to figure out how to efficiently manage a rapidly growing staff, and the IT infrastructure that comes with it, using a small, capable team of tech ops engineers.
A lot of ops people are rockstars when it comes to automating their production, testing, and dev environments and customer facing infrastructure, and the Conductor team is certainly in that category. But sitting in a standup (yes, sitting, legs were tired) a few months ago, we had this update from Ryan (the engineer who runs our internal IT):
Ryan: “Yesterday I provisioned two machines for new hires. Today I have some more to do.”
Me: “How many are you doing today?”
Me: “What about tomorrow?”
Me: “And the rest of the week?”
Ryan: “Two, two, and two.”
If you’ve ever managed or worked in a help desk, IT service bureau, or NOC, this should sound familiar. We had 10 new machines to provision that week, and one guy to do them, two per day. Quick math can tell you why this moved my hairline back a millimeter or two:
10 computers x 4 hours per computer = 40 hours of work
40 hour work week – 40 hours of work = 0 hours of time to deal with the rest of the internal service requests
0 hours spent on service requests per week = 0 satisfied, productive, fellow employees who made those requests in the first place.
This was an extreme, but if this continued, we would be saturated: not able to cover any of the other, business-critical, internal IT services we provide, and one out-of-band request away from missing our OLAs (Operational Level Agreements). Hiring wasn’t going to slow, and we had to find a way to keep up with our normal ops work and still support the high rate of growth we were seeing.
Everyone in the meeting (almost at once): Hey, why don’t we just automate it?
The answer to the capacity problem came in the form of automation, and automation came in the form of two easy to use, reliable, and proven technologies that many ops engineers use and love already: Puppet and The Foreman.
The Foreman is a complete lifecycle management tool for physical and virtual servers.1
Puppet Open Source is a flexible, customizable framework available under the Apache 2.0 license designed to help system administrators automate the many repetitive tasks they regularly perform.2
The idea was to stop looking at these purely as server management tools, and to apply the same configuration management principles to our employees computers. This gave us a number of advantages: