//

Measuring your OpenStack Cloud with Gnocchi and Ceph storage backend

15.7.2018 | 4 minutes of reading time

To solve our performance problems with Gnocchi and the whole OpenStack telemetry stack, we tried Gnocchi with Ceph as backend starting with OpenStack-Ansible Newton. The experience wasn’t good. Sooner or later, we experienced slow requests and stuck PGs in our Ceph. In one case, only deleting the Gnocchi pool saved our cluster.

As a result, we switched back to MongoDB as the storage backend for ceilometer. It was not performing well, but at least it did not put our whole storage cluster at risk.

This left us with our performance problems, but then we stumbled upon the following performance tests for Gnocchi. One was done by Julien Danjou, the developer of Gnocchi. They got us thinking what went wrong with our setup.

So with Openstack-Ansible Pike and a new cloud, we gave Gnocchi another try. After our experience with Gnocchi and Ceph, we didn’t want to take the performance tests for granted. And as every setup is a bit different, we set up a simple performance test on our own. We started 700 VMs over time and then got a cup of coffee. OK, more than one cup. After some days we experienced the same problems with Ceph we already knew. We saw more and more slow requests.

As we use OpenStack-Ansible for our cloud and a three-controller setup, we deployed Gnocchi on each of our controllers. The default parameters of OpenStack-Ansible use file as storage backend and MySQL as the coordination backend. We changed the storage backend to Ceph and kept the rest of the default settings.

1gnocchi_storage_driver: ceph
2

The MySQL backend is not a recommended coordination backend by tooz (https://docs.openstack.org/tooz/latest/user/drivers.html), so we used Zookeeper. As OpenStack-Ansible cannot include a role for everything, we had to integrate the Zookeeper role (https://github.com/openstack/ansible-role-zookeeper.git) into our setup:

1conf.d:
2zookeeper_hosts:
3{% for server in groups['control_nodes'] %}
4 {{ server }}:
5   ip: {{ hostvars[server]['ansible_default_ipv4']['address'] }}
6{% endfor%}
7
1env.d:
2component_skel:
3 zookeeper_server:
4   belongs_to:
5   - zookeeper_all
6
7container_skel:
8 zookeeper_container:
9   belongs_to:
10     - infra_containers
11     - shared-infra_containers
12   contains:
13     - zookeeper_server
14   properties:
15     service_name: zookeeper
16

Now we could set up Zookeeper as coordination backend for Gnocchi:

1gnocchi_coordination_url: "zookeeper://{% for host in groups['zookeeper_all'] %}{{ hostvars[host]['container_address'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}"
2
3gnocchi_pip_packages:
4 - cryptography
5 - redis
6 - gnocchiclient
7# this is what we want:
8#  - "gnocchi[mysql,ceph,ceph_alternative_lib,redis]"
9# but as there is no librados >=12.2 pip package we have to first install ceph without alternative support
10# after adding the ceph repo to gnocchi container, python-rados>=12.2.0 is installed and linked automatically
11# and gnocchi will automatically take up the features present in the used rados lib.
12 - "gnocchi[mysql,ceph,redis]"
13 - keystonemiddleware
14 - python-memcached
15# addiitional pip packages needed for zookeeper coordination backend
16 - tooz
17 - lz4
18 - kazoo
19

A word of caution: the name of the Ceph alternative lib implementation (ceph_alternative_lib) varies between Gnocchi versions.

This will help distribute the work across all metric processors on all controllers.

But that didn’t solve our problem either. The problem seemed to be our Ceph cluster. Searching the web, a lot of bug tickets showed other people experienced the same problem. But all the bug tickets put us on the right track. Newer versions of Gnocchi can separate the storage of your data. You can use a different storage type for incoming short-lived data and long-time storage.

The next step was to set up the storage layer for our incoming data. We chose Redis, as recommended, from the list of supported backends. To set up the Redis cluster, we chose this ansible role . Next, we had to configure Gnocchi with OpenStack-Ansible to use the Redis Cluster as incoming storage:

1gnocchi_conf_overrides:
2 incoming:
3   driver: redis
4   redis_url: redis://{{ hostvars[groups['redis-master'][0]]['ansible_default_ipv4']['address'] }}:{{ hostvars[groups['redis-master'][0]]['redis_sentinel_port'] }}?sentinel=master01{% for host in groups['redis-slave'] %}&sentinel_fallback={{ hostvars[host]['ansible_default_ipv4']['address'] }}:{{ hostvars[host]['redis_sentinel_port'] }}{% endfor %}
5
6gnocchi_distro_packages:
7 - apache2
8 - apache2-utils
9 - libapache2-mod-wsgi
10 - git
11 - build-essential
12 - python-dev
13 - libpq-dev
14 - python-rados
15# additional package for python redis client
16 - python-redis
17

We ran our performance test again and eureka! No more slow requests in Ceph. Our performance test included 700 VMs with one vCPU and one GB of RAM. We weren’t interested in the VMs but only in the telemetry data they would generate. We assume it will take some time for our cloud to grow beyond 700 VMs. In the meantime, our cluster might evolve, e.g. Ceph only has SSD journals, no SSD storage, Gnocchi will evolve, our knowledge about Gnocchi and Ceph will evolve. So we expect our current setup to cope with the upcoming load. All in all, it will give us enough time to experiment with more hints from this talk to aim for the 10000 VMs. We hope this article will help some other people to integrate Gnocchi into their OpenStack setup.

share post

Likes

0

//

More articles in this subject area\n

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.