In the second half I show how you can test your Logstash configuration. However first I want to show why automatic tests for configuration files are important. Feel free to skip this part if you already know this.
Configuration is source code and should be tested
Writing automatic tests for source code is one of the core activities of a software developer. One reason is that such automatic tests allow the developer to express the expected semantic of the code under test. Another reason is that such tests will make sure that future code changes will not break existing behavior. While there is some discussion if tests should be written before or after the code it is generally accepted that source code should be tested in an automatic way.
The above statement is true for cases which are obviously source code. There are however other cases. Besides source code are pure data files like
/etc/passwd which can be judged “correct” or “incorrect” by a human just by looking at it. But there are also files which are processed by a program and it is harder for a human to decide if the file content in question exhibits the desired semantics. One large group of such processed files are configuration files. The expressive power does of course varies between simple (like a crontab) and complex which may have variables and control flow (i.e. sendmail, emacs, puppet, mule). In general the expressive power of the config file depends on the configuration demand of the application. More and complex configuration options yield a more complex config file.
Should configuration files be tested in an automatic way? Yes for the same reasons as other source code: automatic tests verify that the first version has the expected behavior and that later changes do not break any of this behavior.
Are configuration files tested as often as “classical” source code? The answer is sadly: no, not as frequent. One reason is that the changes occur less frequent. While one can image that “classical” code changes all the time the situation of configuration files may be perceived differently: the config has to be only checked if there are some new requirements (rarely) or after some update (application or OS) which also occur not this often. Because such changes are infrequent a manual approach looks plausible to also avoid the setup costs of automatic tests. However such statements about the frequency are loosing their validity these days:
- Because of horizontal scaling (scale out) multiple instances in a very similar configuration are required. To meet this goal provisioning tools like puppet and ansible have been created. With these tools however it is easy to roll out changes and therefore usually the change frequency increases.
- For common tasks the problem domain is better understood these days. Therefore COTS and open source software can be used and customized using configuration. This leads to larger and more complex configurations. In such cases software engineering procedures like refactoring become important. But the application of such procedures is dangerous without tests in general and slow and costly if such changes must be tested manually.
There is Logstash and Logstash has a configuration file (or directory). How can these be tested? An internet search shows that Logstash supports RSpec. One can embed the config in the RSpec file (two projects) or load the config files during test execution time (one project). It is also possible to run the test inside docker (another two projects and a third one which uses Go in addition).
Being not a ruby person I have some problems with the RSpec. Especially with the escaping. In addition there are quite a few statements required to construct a test case. Additionally there is also the point of dependencies. RSpec requires rake. And some solutions as mentioned above need Docker and one even Go.
What properties does an ideal solution have?
- the format of the test cases is easy to understand and hard to make a mistake
- the test execution is not complex to avoid problems caused by the test runner
- the test execution is very similar to the real non-test execution of Logstash
- no extra dependencies (e.g. there is no native support for docker on Mac)
I have sketched down a variant which IMHO comes close to such an ideal solution. Key points are:
- the Logstash config is adjusted in a minimal and controlled way
- Logstash is run like in production
- test cases sources and expected output are JSON files of the events you drop in a directory
- the test runner compares the complete output with the expected output (after both formatting them)
- only depends on Logstash, bash and vanilla python installation
- clone the repo
- run the demo:
- change the Logstash config in
- add new test cases in
- rerun the test:
cd test; ./run_tests.sh
There are two points worth mentioning:
- Where are the input JSON files coming from? One approach is to disable any filter configuration in Logstash, restart Logstash and copy the JSON from Kibana. This is advised for the first contact with a new log source. Afterwards it is possible to create new test cases using copy, paste and modify.
- The code in the repository is not a final framework or application but more like a sketch. Depending on how you distribute your Logstash configuration/deploy the Logstash server(s) more adjustments are needed. There are too many ways how deployment is done today to try to cover them. There are at least:
- creating a package of the software and install it on the target server
- use a custom script to copy the config files to the target and restart Logstash
- use tools like puppet or chef to perform the copy and restart tasks
- create a virtual container (like a docker image) and deploy it on a host
In addition there multiple ways to trigger the deployment:
- something like source to image from OpenShift
- a CI server
In this regard some manual adjustment are required to fit your specific case.
Test Test Test
Configuration files should be tested to make sure that the behavior is the expected one and that changes to the application or configuration will not break the behavior. Also as shown above the Logstash configuration can be tested easily. Or in other words: there is nothing preventing you from just doing it.