At Caplin, we have 7 ESX 4.0 data stores, filled to the brim with Linux and Windows VMs. We use these VMs to test and deploy artifacts from many different projects, all under the management of a Thoughtworks GO server. It’s vital for our continuous integration strategy to work, that there be lots of build agents available to give quick feedback about failing builds.
Thus far, automating the tests and deployments that run on the VMs has been relatively simple, using GO agent services as proxies to carry out scripted commands on whatever machine that you want the build to run on. The problem with automating these tasks, is the constant battle with the state of the build environment.
Saving the environment!
At Caplin, we use VMware vSphere to manage the VM farm. This gives us the option to snapshot each machine to a known state. In theory, an automated build should pass, when it is running on a clean environment. However, browsers cache information, Maven builds cache artifacts, files can get deleted, people can add or remove programs – very quickly, a VM can become a complete mess.
Via the GUI, it’s very simple to click “Revert to snapshot”. However, this is hardly convenient to do on a regular basis, especially when you have 50 VMs you want to revert at a time.
Rolling out changes
Another headache is upgrading all of your VMs. Should you wish to update Java across a group of VMs, you have to;
- Revert every VM to snapshot
- Either manually install Java yourself, or run a loop over SSH
- Delete every snapshot
- Create new snapshots
Doing this manually is a colossal job that can take several hours, and is subject to human error.
Another option is to edit the template that the VM is based on, delete all the VMs and redeploy them. Again, a mammoth task to repeat through the GUI.
Let the computer do the work
Raising our concerns with Vmware about the challenges we were facing, they suggested that we script our tasks using PowerCLI. PowerCLI is a snap-in that lies on top of Windows Powershell. It’s a well documented set of cmdlets that can do anything that vSphere manager can do, and more besides. It’s 100% script based, which makes it perfect for automation.
I had never used Powershell before, so I was sent on a 2 day course in London with a VMware specialist, who could teach me to effectively use PowerCLI.
Back to school
The course I went on was titled “VMware vSphere: Automation with vSphere PowerCLI”. When I entered the classroom, I was very surprised to find that I was the only person taking part that week. It was not advertised as a 1-on-1 tutoring arrangement, but due to my lack of Powershell experience (I’m a bash man usually), it was a very convenient arrangement.
I picked up the basics of Powershell very quickly by following a training handbook that gave you tasks to follow. I found it to be a very effective method of learning – the trainer briefly explained the concepts of each section, answering my varied questions about the tasks, and how PowerCLI could be used at Caplin.
The first day mainly covered the creation of virtual hardware (datastores and virtual network interfaces), I wasn’t quite as interested in this aspect of PowerCLI, but it was a great way to get to grips with the Powershell syntax.
The second day was when I made the most notes, it covered reverting a VM (or a group of VMs from a CSV file) to a specified snapshot, deploying VMs, checking to see if VM hardware was consistent, running remote scripts on VMs via VMware tools and producing reports about the VMs.
I recieved my certificate for completing the course material, and left the classroom with a confident grasp of Powershell and PowerCLI.
Sadly, the course that I went on no longer exists (I was the last one to take the course in London!), however they are planning on starting a new course that uses an updated version of ESX. I was highly impressed with the VMware course material and the professionalism of the instructor. I would happily recommend them to anyone that is interested in learning.
The aftermath
Currently we have a pipeline in GO dedicated to reverting a group of VMs to snapshot every Saturday morning, this has increased the stability of the build. A more ambitious task that I’ve undertaken is to automatically revert a specific VM when a build pipeline fails in Jenkins. I have published the solution online – simply google “vmware revert jenkins slave to a snapshot” and it should be one of the top results.
PowerCLI is an invaluable tool for anyone that is managing more than 1 VM at a time, I find it hard to remember how we managed without it.