Improving Automation in Systems Engineering

When I joined Bronto in 2013, I felt we had a reasonably modern procedure for provisioning new systems:

  1. Work with the requesting team to determine the resources needed.
  2. Define the system resources in Foreman and slather on some puppet classes.
  3. Push the shiny ‘Build’ button and wait.
  4. Tackle all of the fiddly little bits that Foreman wasn’t handling at the time.
  5. Complete peer review and system turnover.

This request might take a business day or two to process, longer if something languished in peer review, exposed some technical debt, or just lead to a yak shave. It was a ‘good enough’ solution in an environment where these sorts of requests were infrequent, and we were well aware of the rough edges that needed to be filed off this process when the time was right.

Knowing that our developers were pushing to transition to a more service-oriented architecture and break down the remaining pieces of the old, monolithic code base, we knew it was time to streamline this process before it became a pain point. Requests for new systems were going to be more frequent and more urgent, and we needed to get ahead of the problem by devoting the time to make things better.

After dredging up the relevant improvement requests from our backlog, we tackled the task of filing off those rough edges by:

Writing custom Foreman hooks: This helped with the worst of the manual tasks and freed us from the pain of having to manually update Nagios, LDAP, and any number of additional integration points within the infrastructure.

Automating the peer review process: Borrowing the idea of test-driven development, we’ve finally reached the point where we have test-driven infrastructure. By writing a set of system tests and launching them from another Foreman hook, we were able to replace manual peer review with automation. Results are then announced in a chat room for cross-team visibility.

Writing an ad hoc API for RackTables: RackTables was a great early solution, but we’re approaching the point where it’s no longer a good fit. Although we’re not quite ready for a new solution to datacenter asset management, being able to programmatically twiddle the information in RackTables was a win.

Creating Architect to batch-provision virtual machines: Since VMs tend to be requested in groups to create a resilient service, we wrote a tool to automate away the repetitive tasks. Architect gathers the system requirements, creates a configuration file, and then selects suitable hypervisors and builds out each of the systems. This has been a huge win when it comes to fulfilling requests for 10 or more systems at once.

While there are additional automation improvements we’d like to make, these efforts have allowed us to more rapidly respond to the needs of our development teams and generate systems in minutes instead of days. System provisioning is a core function of our team, and we are always looking for ways to improve our abilities in that area.

Leave a comment