A Zabbix active agent2 template for monitoring containers and container logs

The problem

Zabbix (up to v7.2) doesn’t seem to be able to monitor the Docker engine and containers in active mode, nor can it collected container logs, which is surprising, since it collects a wide range of container-, image- and runtime engine metrics in passive mode. The solution is a custom agent2 (I’ll refer to it henceforth simply as agent) template which patches the Docker functionality to work in active mode.

A necessary disclaimer: I’m simplifying statements for the sake of brevity. Eg. when I write that something can’t be done it only means that I couldn’t get it done, not that it’s objectively impossible.

There are several levels of complexity:

  • The agent template that ships with Zabbix implements Docker metrics collection only in passive mode.
  • I want to use active mode and not passive mode, as my monitored hosts are behind firewalls and can’t have incoming connections. This is a common case when monitoring servers that aren’t in your data centre, eg. server belonging to customers.
  • Although agent uses the Docker engine API to collect metrics, it doesn’t do that for logs; one has to find out the location where the Docker engine persists logs onto the file system and read them in like regular log files.
  • Log files must be collected with the logrt[] function which requires active mode.
  • In summary: getting Docker metrics works only in passive mode and getting Docker logs works only in active mode, but we can’t do both at the same time with the default template.
  • No solution I am aware of has been documented publicly.

The solution

I’m not publishing a ready to use template because Zabbix still evolves and old templates don’t seem to work with newer installations. Instead, I’m documenting the process I followed to get Docker log collection and monitoring working in active mode. The plan is to export the default agent template, patch it, and re-import it, overwriting the old one.

1. Export the agent2 docker template

2. Edit it with a YAML editor

3. Find the items that need to change from passive (implicit, so it’s not written in the template) to active:

How do you find these keys? The thorough, but hard, way is to write down all items and draw a dependency tree. Items which are of type: DEPENDENT reference parent items. When you have done all that, you’ll find items that don’t have parents. These are root items and these need to change to type: ZABIX_ACTIVE. In 7.2, these items keys are: docker.container, docker.data_usage, docker.images, docker.info, docker.ping, docker.containers.discovery[false], docker.container_info[“{#NAME}”,full], docker.container_stats[“{#NAME}”], docker.images.discovery

These changes allow Docker metrics collection in active mode.

4. Add this to the discovery_rules -> item_prototypes key:

- uuid: f853858a7a0a4bd287749a0b9fc0136f
  name: 'Container {#NAME} logs'
  key: 'logrt["/var/lib/docker/containers/{#ID}/{#ID}-json.log"]'
  value_type: LOG
  description: 'Container logs'
  type: ZABBIX_ACTIVE
  tags:
    - tag: component
      value: container-logs
    - tag: container
      value: '{#NAME}'

This entry will add a new item to every container that is being monitored which will stream the container’s logs. The assumption here is the default Docker logging mechanism is used. Some background: the tricky part which took me forever was figuring out that {#ID} is the container ID. AFAIK, that’s not documented anywhere. I had to exec into an agent container and inspect the JSON with:

zabbix_agent2 -t "docker.container_info[/somecontainername]"

which spits out among other things the ID:

[s|{“Id”:”abcdefg12345678″,”Created”:12345678,”Path”:”somecontainername”,…]

A surprising side effect of getting the {#ID} wrong (eg. depending where you’re googling, it comes up as {#CONTAINERID} or {#FCONTAINERID}) is that during discovery, the item just won’t auto-register on the host without throwing any errors.

5. Save the template

6. Import it into Zabbix and watch out for errors.

7. Make /var/lib/docker/containers accessible to the agent and restart it.

You might have to wait a bit for the Docker logs item to show up on containers, because discovery isn’t instant.

Epilogue

Things I tried and failed with:

  • declaring the item under the “items” key: we need a per-container item, declaring the top-level item in the template would make this a per-host item, bypassing discovery.
  • coding the container {#NAME} instead of {#ID}: logrt[] (and any other Zabbix mechanism I’m aware of) doesn’t collect the logs over the docker API or a container console but reads the log files written by Docker engine. The path where those log files are isn’t predictable and not related to the container name, so we need the container ID.
  • Faking passive agent through a proxy. logrt[] is available only with the active agent. Faking it with a proxy (active agent talking to passive proxy) didn’t cut it either. This has attached a certain Fermat’s last theorem vibe, let me know if you’d like a follow-up post about chaining an agent with a proxy.

Things to watch out for:

  • Running Zabbix agent2 as a container requires mounting /var/lib/docker/containers into the agent container, otherwise it won’t read logs.
  • You’ll probably have to fiddle with agent time-outs and buffer sizes, especially for large log volumes.
  • If you’re only after specific events, eg. errors, applying a regular expression to logrt[] and collecting only matching logs might be a smart thing to do

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.