What is our primary use case?
We use it for provisioning and ongoing configuration management. We provision boxes with Chef by taking a base AMI that already has Chef installed, and already has the appropriate credentials to connect to the main server. Then, this will be able to roll out and deploy the configuration. In addition, it runs every five minutes, so any unexpected changes to the configuration get automatically reverted.
This means, you get developers, who go into the box and change something, thinking it will be okay. Then, they come to you, asking "Why isn't this change that I'm making working?" We have to explain, "Because it shouldn't be going into the box in the first place."
How has it helped my organization?
One thing that we've been able to do is a tiered permission model, allowing developers and their managers to perform their own operations in lower environments. This means a manager can go in and make changes to a whole environment, whereas a developer with less access may only be able to change individual components or be able to upgrade the version for software that they have control over. This allows us to return some of the control back to the developers, which saves our nights and weekends.
What is most valuable?
One advantage Chef has over Ansible is your operations can be entirely headless, meaning that they can interact directly with the Chef database using shared credentials. It never uses any type of human-readable interface. Therefore, you don't have to go into a GUI nor use a command line tool. You can hit the database directly with a library.
With Ansible, a lot of the operations require that you have some type of frontward-facing tool in order for it to perform, e.g., command line or a GUI available. For a smaller scale operation, if you're managing fewer than 100 nodes, this might be fine, as it might be more helpful if you can transfer some of the power over to your developers in order to perform certain operations.
However, if you're handy enough with DSL and you can present your own front-facing interface to your developers, then you can actually have a lot more granular control with Chef in operations over what developers can perform and what they can't.
What needs improvement?
One of the biggest things that I miss is in Chef 11 and earlier, organizations were able to be managed directly through the Chef control command line utility. Now, while we prefer to interact directly with the database, there is still some value in being able to have access to the command line utility. While that functionality is still present and in the documentation, it has been broken since Chef 12. We are now looking at Chef 14, and they already have Chef 15 in the pipeline, but there appears to be no effort to fix this functionality, which is definitely broken, provides a false positive for a result when you perform the operation, and doesn't work.
It would be nice to have an update to Chef Zero, such that it was more geared toward containers. Before Docker took hold, there was something called Chef Zero Vagrant, which was a plugin for Vagrant which would provision your developer's local copies of their environment for local testing. This was great for the technology, but we haven't seen an evolution of it now that the containerization technology has moved forward.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
It all seems to be very solid and stable.
What do I think about the scalability of the solution?
We have rolled out around 500 nodes. Part of the reason why we have stuck with it is that it managed to effectively scale with us and stay stable at the same time.
How is customer service and technical support?
I've contacted them before about the same issues that I have mentioned for improvement. Because Chef is being developed by a hybrid team of open source contributors, as well as the Chef core team, I am not sure my communications have gone to the right people yet.
What about the implementation team?
The integration and configuration of AWS within our environment is a whole other skill set. Any configuration management or infrastructure as code will be a learning curve. Integrating it requires rearchitecting, not necessarily of the design, but certainly of the philosophy by which you approach. That is part of the benefit of it as well, you can develop a new way of thinking among the developers who will assist in producing code, which is automated, scalable, easier to write automated tests for, etc.
I don't know if it can be made easier in the adoption of it, since it is already a significant change, which is a good thing.
What's my experience with pricing, setup cost, and licensing?
When we're rolling out a new server, we're not using the AWS Marketplace AMI, we're using our own AMI, but we are paying them a licensing fee.
We went the AWS route because we are fully cloud-based anyway. It was something that people who came before me were already familiar with, so it was a lot easier for me to get buy-in.
The price per node is a little weird. It doesn't scale along with your organization. If you're truly utilizing Chef to its fullest, then the number of nodes which are being utilized in any particular day might scale or change based on your Auto Scaling groups. How do you keep track of that or audit it? Then, how do you appropriately license it? It's difficult.
All you can do is communicate with them what's happening and get something that you're both comfortable with. However, if you're doing that, then what's the point of having the per-node model in the first place? It would be better to move to a fixed-pricing model.
Which other solutions did I evaluate?
We have also looked at Ansible, Puppet, and SaltStack. They all sort of have managed solutions which you can potentially purchase. Puppet definitely has a sort of old school thought process working behind it.
Over two to three years, we have not seen a stable release of Salt. They have some good ideas, but it isn't stable enough yet to use in a production environment.
Make sure that the operations crew has a background in Ruby, if you're going to choose Chef. If you have a Python crew, then look at Ansible as a potential option. Because I think they're catching up, and they will surpass Chef in pretty much every way sometime in the next 12 to 18 months.
Though, Chef Automate is still the most reliable solution.
What other advice do I have?
At the top level, it is integrated with Terraform, which is delivering whole entities and groups of nodes. Then, those nodes are individually being provisioned with Chef. The integration is seamless.
I've run my own Chef server before. We've done completely headless with Chef Zero, where we're distributing the code directly to box during provisioning. We've used Chef pretty much every way that it can be used.
The AWS software is good. There is definitely value for somebody who is trying to understand it and be able to have a deployment of it for observation. Coming into it, there's a lot to understand, as with any technology.
If I'm thinking about coming into it now or trying to bring somebody up to speed, it would be good to have an already functioning setup of the server where you can interact with the NoSQL database and play with some of the tools available to understanding how they work in an AWS instance. It will be very similar to the way Chef Automate works, in general. Therefore, I do see value in it.