One of my earliest jobs was as an admin for an MSP. Weād routinely generate alerts that werenāt actionable, lacked context, and for most of our customers, were considered noise. From a monitoring perspective, it was bad. Customers didnāt trust in the alerts they received and often resorted to having some additional monitoring product installed on their systems. Itās safe to say that our auto-generated tickets and emails were largely ignored.
In an effort to avoid repeating mistakes of the past, I want to ensure that I have actionable alerts that are context heavy. Thankfully, there are a couple of tools that Iāve found go a long way in helping me with that effort: Sensu and Gremlin. Iāll do an intro to those tools today, and weāll pick up on how these tools work together in the next post. Itās worth mentioning that I am a Sensu employee.
With that out of the way, letās get to the tools!
The Tools
Sensu
If youāve not used Sensu before, allow me to make a formal introduction. At Sensu, we talk about it as a āmonitoring event pipeline.ā The concept is similar to a CI/CD pipeline, except that instead of releasing software software, Iām sending monitoring event data. The goal with the pipeline being that by the time I receive a monitoring event or alert, I know beyond a shadow of a doubt that what I have in front of me has been verified and provides me with exactly what I need to act on the data.
For this series, Iāll be using Sensu as my monitoring tool of choice.
Gremlin
If youāre in IT, youāre probably familiar with Gremlins.
![gremlin gif][1]
Yes, those ones. Theyāve been known to cause many an issue, but in this case, Iām talking about this [Gremlin][2] in particular. Gremlin is a [chaos engineering][3] tool that allows you to run targeted attacks on your infrastructure. This can be anything from a time-drift attack to more complicated types of attacks. The goal here will be to apply the principles of chaos engineering to uncover any weaknesses in our Sensu deployment and ensure that it is able to withstand real-world conditions.
Weāll also use Gremlin to introduce conditions that will generate Sensu alerts. By introducing those conditions, weāll be able to ensure that the alerts generated follow the [CASE][4] method.
The Setup
Sensu
Iāve already set up Sensu in my own environment (which is Ubuntu 18.04), so Iām not going to walk through that here. However, if you donāt have a working Sensu deployment, youāll want to checkout the [Sensu installation doc][5], so that you can get all of the various Sensu components installed. Itās worth noting that for some our later testing, weāll be using a clustered deployment. For that, youāll want to take a look at the [clustering doc][6].
Gremlin
Just like Sensu, weāll need to install Gremlinās agent so we can start doing pseudo-nefarious stuff performing attacks on our test boxen. š In this case, since Iām using Ubuntu 18.04 as my test box of choice, Iāll also follow [Gremlinās installation guide][7] for Ubuntu as well (though itās for Ubuntu 16.04, this should still work in our case).
Next Steps
Once youāve got both Sensu and Gremlin installed, letās run a couple of tests to make sure things are working like we expect them to.
Sensu
One of the cool things about Sensu is that you can monitor anything and you can have alerts generated from any number of things, not just the [community plugins][8] or [assets][8] Sensu offers. We can create some ad-hoc alerts using the [agent API][9] just to see what an alert might look like in our dashboard. To do that, run the following on your test VM:
curl -X POST \
-H 'Content-Type: application/json' \
-d '{
"check": {
"metadata": {
"name": "mysql-backup-job"
},
"status": 0,
"output": "mysql backup initiated",
"ttl": 25200
}
}' \
http://127.0.0.1:3031/events
That command creates a mock event and sends it to the agent API. Now, this might be useful if I had some sort of code that monitored a mysql backup job and emitted this message. In our case, itās just for us to make sure that weāve set up and configured Sensu correctly. A successful test should leave you with an event that looks something like this:
BOOM! š„ Our test worked! Letās just run a quick sample attack with Gremlin now.
Gremlin
Just like we tested Sensu to make sure weāre able to receive events, weāre going to test our Gremlin agent. You can see me run the attack below:
Screen Recording 2019-06-14 at 03.25 PM.mov
There we have it! Both Sensu and Gremlin are working like we expect them to. In the next post, Iām going to dig a bit more into the āwhyā of using a chaos engineering tool like Gremlin to test monitoring tools like Sensu.
[1]: https://media.giphy.com/media/BqPljBK6V9ZPG/giphy.gif [2]: https://www.gremlin.com/ [3]: https://principlesofchaos.org/ [4]: http://onemogin.com/monitoring/case-method-better-monitoring-for-humans.html [5]: https://docs.sensu.io/sensu-go/5.9/installation/install-sensu/#install-the-sensu-backend [6]: https://docs.sensu.io/sensu-go/5.9/guides/clustering/ [7]: https://www.gremlin.com/community/tutorials/how-to-install-and-use-gremlin-on-ubuntu-16-04/