May 05 2019

Projects Monitoring a Synology with Graphite and Tessera

Table of Contents

    Overview

    With an always-on home server storing and serving access to all my media, of course I'm going to set up to store and graph operational time-series metrics on it.

    Docker Install

    Synology's DSM operating system provides an easy one-click install option for setting up Docker itself, through their Package Manager. It also happens to include one of the nicest UIs I've seen for Docker, if you prefer to manage things through a UI. I find it occasionally helpful, but I generally ssh into my synology and use the command line (zsh forever!).

    What to run?

    For dashboards, Grafana is probably the most well-rounded open-source dashboard system for time series metrics around, but for this I'm going to run my own open-source dashboard system, Tessera. I recently started updating it again, and I have more long term plans for enhancing both the front-end and the server, so it makes sense to use it for my day-to-day.

    For the time series database, Tessera currently only supports Graphite, so that's what I'm using.

    For collecting metrics, it's hard to beat telegraf as an all around reliable Swiss Army knife of collection agents, with support for many sources and many storage servcices.

    Setting up the Containers

    You can set everything up manually with the docker command line one container at a time, or capture a multi-container setup in a declarative form using an orcestration utility like docker-compose. I initially set up the containers with the CLI to debug them (over & over), then captured the entire configuration in a docker-compose file.

    Host Storage

    Containers are inherently transient - everything persistent should be stored either in a docker volume, or on the filesystem of the host, mapped into the containers. I chose to put all my monitoring data at /volume1/monitoring on the Synology host. This is a share I've set up so I can easily change the config files remotely without ssh'ing into the box.

    Network & Volumes

    First define a volume for Graphite storage, and a named network for the monitoring containers to share.

    version: '3'
    
    networks:
      monitoring:
        driver: bridge
    
    volumes:
      graphite-storage:
    

    Services

    Graphite

    This is pretty straightforward - use the volume declared above for storage, and map all the Graphite ports into a specified range. I've set it to use 8100 for the primary Graphite web app port, and mapped all the other open ports into the 81xx range as well.

      graphite:
        container_name: graphite
        image: graphiteapp/graphite-statsd
        restart: always
        ports:
          - "8100:80"
          - "8103-8104:2003-2004"
          - "8123-8124:2023-2024"
          - "8125:8125/udp"
          - "8126:8126"
        volumes:
          - graphite-storage:/opt/graphite/storage
        networks:
          - monitoring
    

    Telegraf

    Telegraf requires that quite a few things be mapped from the host into the container.

    • Mapping the docker.sock socket into the container allows telegraf to monitor Docker, and record stats on overall Docker usage and per-container metrics.
    • Synology exposes lots of data via SNMP, so we're also mapping the Synology MIB files in for telegraf to read
    • Mapping various host filesystem paths in for telegraf to monitor. Telegraf will get CPU, memory, disk, etc... stats from the mapped /proc filesystem
    • Finally, map the config file, stored on the docker host
    Custom Metrics

    I built a custom Docker container to run for telegraf, which adds a simple shell script to run via the [[inputs.exec]] input to gather some custom metrics.

    FROM telegraf
    
    COPY metrics.sh /metrics.sh
    

    See below for what's in the custom metrics, and here's the docker-compose.yml section for telegraf:

      telegraf:
        container_name: telegraf
        image: telegraf-syno
        restart: always
        volumes:
           - /var/run/docker.sock:/var/run/docker.sock
           - /usr/share/snmp/mibs:/usr/share/snmp/mibs
           - /etc/snmp/snmpd.conf:/etc/snmp/snmpd.conf
           - /:/hostfs:ro
           - /etc:/hostfs/etc:ro
           - /proc:/hostfs/proc:ro
           - /sys:/hostfs/sys:ro
           - /var/run/utmp:/var/run/utmp:ro
           - ${MONITORING_ROOT}/conf/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf
        environment:
          HOST_ETC: /hostfs/etc
          HOST_PROC: /hostfs/proc
          HOST_SYS: /hostfs/sys
          HOST_MOUNT_PREFIX: /hostfs
        networks:
          - monitoring
    

    Tessera

    Tessera's UI fetches data directly from Graphite in your browser, so it has to be pointed to the exposed graphite port on the Synology's hostname. A future container repackaging tessera will handle proxying that data to eliminate that, and more long-term my plans for Tessera include handling the metrics queries on the backend instead of in the browser.

    Because it's under heavy development, I tend to run Tessera locally instead of on the Synology.

      tessera:
        container_name: tessera
        image: tesserametrics/tessera-simple
        restart: always
        ports:
          - "8400:5000"
        environment:
          GRAPHITE_URL: "http://riptalon.local:8100"
        networks:
          - monitoring
    

    Telegraf Config

    Telegraf has a very extensive set of inputs, which are all self-documenting through a generated .conf file, so I'm not going to go over most of them here, just the Synology-specific parts.

    SNMP

    Setting up an SNMP config for all the points to be pulled is pretty tedious, so I mostly copied it from this reddit thread. You will need to enable SNMP on your Synology, of course.

    Media Indexing

    When you copy photos and videos to the Synology, it will index them for the Photo Station or Moments apps to display, which can take quite a while with a lot of data (it took about 5 days to index my photo collection). I wanted to monitor that progress, which can be tracked by some files in /var/spool on the Synology. Those are mapped into the telegraf container as /hostfs/var/spool.

    #!/bin/sh
    
    input_photo=/hostfs/var/spool/conv_progress_photo
    input_video=/hostfs/var/spool/conv_progress_video
    input_index_queue=/hostfs/var/spool/syno_indexing_queue
    
    total=$(cat "$input_photo"|grep 'total='|cut -f2 -d"=")
    total_thumb=$(cat "$input_photo"|grep 'total_thumb='|cut -f2 -d"=")
    completed=$(cat "$input_photo"|grep 'completed='|cut -f2 -d"=")
    completed_thumb=$(cat "$input_photo"|grep 'completed_thumb='|cut -f2 -d"=")
    
    time=$(date +%s)
    
    echo "synoindexer.photo.completed $completed $time"
    echo "synoindexer.photo.completed_thumb $completed_thumb $time"
    echo "synoindexer.photo.total $total $time"
    echo "synoindexer.photo.total_thumb $total_thumb $time"
    
    total=$(cat "$input_video"|grep 'total='|cut -f2 -d"=")
    completed=$(cat "$input_video"|grep 'completed='|cut -f2 -d"=")
    
    echo "synoindexer.video.completed $completed $time"
    echo "synoindexer.video.total $total $time"
    

    This grabs stats on how many photos and videos have been indexed and generated thumbnails for, and how many are remaining. It only reports non-zero data while indexing is in progress. I'm simply outputing to stdout in Graphite line format, which is handled by this telefraf input config:

    [[inputs.exec]]
      commands = ["/metrics.sh"]
      templates = ["measurement.measurement.field"]
      data_format = "graphite"
    

    The metrics.sh file is simply built into a custom Docker image derived from the official telegraf image.