We subject our clusters to a lot of automated tests in the widest sense – monitoring, health checks, load tests, penetration tests, vulnerability scans, the list goes on – but every so often I come across test cases that are not well served by any of them. They are usually specific to the way a cluster is used, or how the organisation operating it works. Sometimes there is no objectively correct or incorrect answer, no obvious expected value to specify in our assert statements. I will look at three examples to explain why I think these tests are worth your while.
The first test concerns the OpenShift default of letting all authenticated users create (or, more accurately, request) projects. Let’s say we want to deny non-admin users this power. How do we make sure we have complied with this rule?
Second, our architecture may require an application scaled to three pods to be distributed across three data centre zones for high availability. We need a test that shows that the built infrastructure matches the architectural requirement.
Third, let’s assume we have just experienced an unplanned downtime. Communication between two projects has failed. Clearly remediation comes first, but how would the administrator go about writing a test that makes sure the pod network is configured correctly?
The three scenarios have a number of things in common. Each requires direct access to the cluster state held in the master’s etcd database. That interaction alone ensures that these are not inexpensive tests in performance terms. Broadly speaking, these tests should run daily, preferably at a time of reduced load, not every hour of the day. Running them is perhaps most useful after cluster maintenance or upgrades. We will look at sample implementations of these tests in just a moment.
How much work will creating tests like these involve? Thankfully, very little. If we are unsure what to test, a quick glance at our operational guidelines or architecture documentation will help us get started. Writing tests will come naturally to anyone familiar with OpenShift, and should take no more than five minutes in most cases. Kubernetes gives us all the tools we need to implement our test runner.
The CronJob object triggers nightly test runs. The payload is a lightweight single-container pod with Kate Ward’s unit test framework shUnit2, oc client, and assorted tools (curl, psql, mysql, jq, awk). All test data is taken from a ConfigMap mounted at launch. The ConfigMap in turn is generated from a folder of test scripts in Git. We will return to the scripts in just a moment.
For now the CronJob object waits for the appointed hour, then triggers a test run. shunit2 processes the test suite (consisting of all test scripts in /etc/openshift-unit.d) and then reports results. Due to a limitation of the CronJob API prior to Kubernetes 1.8, the pod reports success (zero) even in case of errors as returning an error leads to constant redeployments and considerable load on the cluster.