In the cloud, nobody can hear you stream
Since its inception over two years ago the Caplin Managed Services team has been responsible for building and maintaining Caplin’s SaaS cloud offerings. Our cloud infrastructure is handled by a DevOps toolchain made up primarily of Terraform, automated builds, and containerised deployments, with the goal of streamlining as much of the deployment and upgrade process as possible.
In the Managed Services team, monitoring is paramount, and we’re always looking for ways to improve our monitoring so that we can intervene earlier to avert failure and downtime. To keep an eye on our AWS deployments we use a variety of tools, both AWS-managed (e.g. CloudWatch) and developed in-house (the “Status Page”).
For this HackDay, I wanted to improve the speed of our response to logged events by automating the search for key logged events and notifying our staff using the rich interface of Slack. Despite the relative speed with which our rollout and monitoring toolkits have matured, at times we felt our monitoring stance was reactive rather than proactive, so we looked to increase the number of push notifications from our deployments. As a result, I decided to look into the feasibility of generating and pushing notifications to the company Slack directly from the cloud.
I was already somewhat familiar with AWS Lambda (a way of running code in the cloud without needing a server or container to provide a runtime) from previous work in the Managed Services team, and after further investigation decided it would be the ideal AWS tool for my use case.
I began by spinning up a simple ECS Fargate cluster in our AWS sandbox environment, and deployed a simple demo Liberator container there. Within a few minutes of starting up, the Liberator was rigged to throw a NOTIFY level message, which it would then print to the CloudWatch logs. I then created an AWS Lambda function in the same AWS account and configured a trigger for it, designed to fire whenever a “NOTIFY”-level message in CloudWatch was printed to the logs.
There was now the question of which runtime to use for my Lambda function. I decided on Python, as it is our go-to high-level scripting language in Managed Services, and generally lends itself well to DevOps use cases. After a bit of trial and error I was able to decipher the Lambda event object (a JSON object sent by the trigger to the Lambda function) and parse it using the Python JSON package. Using the Python requests package and the Caplin Slack webhook endpoint, I wrote the code for a function culminating in a push notification to the #silence_of_the_lambdas Slack channel (which I created specially for the occasion).
Results and future work
Once I had zipped up and deployed the Lambda function it was time to put it to the test. I fired up the Liberator ECS container, and within a minute or two (which was the time it took for the container to throw the “NOTIFY”-level message) I had received an automated notification from the Slack webhook user I had prepared especially for the HackDay.
However, I soon discovered a limitation: my Lambda function could only parse one trigger event per log file, meaning that sending multiple notifications for a given container was impossible. Upon further research, it came to my attention that the AWS Simple Notification Service (SNS) could perhaps resolve this issue by way of a message bus. Sending multiple push notifications by way of SNS (perhaps even in tandem with the AWS Simple Messaging Service to push these notifications to a mobile phone, or AWS ChatBot) is an approach that certainly warrants further investigation. Although Hackday was only 24 hours, at Caplin we regularly participate in self-directed project days; I’ll be using these to investigate SNS and see how it could help us build our own cloud alerting solution.