Building a DNS Incident Timer

February 4, 2026 tech, projects

raspberry-pidockerprometheuspythondns

If you’ve worked in tech long enough, you’ve experienced it: the dreaded DNS failure. Whether it’s a misconfigured record, an expired zone transfer, or the classic “it’s always DNS” moment, these incidents have a way of humbling even the most seasoned engineers.

At some point, I thought: what if we could track these moments in a fun, tangible way?

Enter the DNS Incident Timer—a physical “days since last incident” counter that sits on a shelf, counting the seconds, minutes, hours, and days since someone last broke DNS. And when it inevitably gets reset? It plays an audio clip. Because sometimes you just need a little fanfare when things go sideways.

The Hardware

The setup is pretty straightforward:

Raspberry Pi (any model with GPIO support)
64x32 RGB LED Matrix with an Adafruit HAT
Physical button for the satisfying reset experience
Speaker connected to the 3.5mm jack

The LED matrix displays two lines: “DAYS SINCE DNS” in white, and the elapsed time in red (years, months, days, hours, minutes, seconds). It’s gloriously over-the-top for what’s essentially a timer.

The Software Stack

Here’s where things get interesting. I wanted this to be:

Persistent — survives reboots and remembers the last reset time
Accessible — viewable and resettable from a web browser
Observable — because what’s an ops project without Prometheus metrics?
Containerized — because I’m not a monster

The core is a Python application that handles:

GPIO button input with proper debouncing
RGB matrix rendering at 1Hz
A Flask web server for remote access
State persistence via JSON file
Audio playback via ALSA

def format_duration(self, duration: timedelta) -> Tuple[str, str]:
    """Format duration into two display lines."""
    total_seconds = int(duration.total_seconds())

    years = duration.days // 365
    months = (duration.days % 365) // 30
    days = (duration.days % 365) % 30
    hours = total_seconds // 3600 % 24
    minutes = total_seconds % 3600 // 60
    seconds = total_seconds % 60

    line1 = f"{years:02d}y {months:02d}mo {days:02d}d"
    line2 = f"{hours:02d}h {minutes:02d}m {seconds:02d}s"
    return (line1, line2)

The Docker Journey

Running Python directly on a Pi is fine, but I wanted something more reproducible. Docker on a Raspberry Pi, however, comes with its own set of… let’s call them “learning opportunities.”

The biggest challenge was audio. Getting ALSA to work inside a Docker container on a Pi requires:

Mounting /dev/snd into the container
Setting up udev rules on the host for proper device permissions
A shell wrapper that waits for audio devices before starting Python

That last one was particularly fun to debug. Turns out, when a container starts, the audio devices might not be fully enumerated by the time Python’s ALSA library tries to cache them. The fix was an entrypoint script:

#!/bin/bash
# Wait for audio devices to be fully available
MAX_ATTEMPTS=60
ATTEMPT=0
echo "Waiting for audio devices..."
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
    if aplay -l 2>&1 | grep -q "Headphones"; then
        echo "Audio devices ready after $((ATTEMPT + 1)) seconds"
        break
    fi
    ATTEMPT=$((ATTEMPT + 1))
    sleep 1
done
exec python dns_counter.py "$@"

The Web Interface

The web UI is minimal but functional—a dark theme with a red accent (fitting for incidents), showing the same elapsed time as the physical display. Click the reset button, and it:

Resets the timer
Plays audio through your browser speakers
Plays audio through the Pi’s speakers
Records a Prometheus metric

Speaking of metrics…

Prometheus Integration

Because why build something if you can’t graph it?

RESET_COUNTER = Counter(
    'dnsfail_resets_total',
    'Total number of timer resets',
    ['source']  # 'button' or 'web'
)

AUDIO_PLAYBACK_ERRORS = Counter(
    'dnsfail_audio_errors_total',
    'Total number of audio playback errors'
)

Now I can track how often we’re resetting (and whether it’s via the physical button or the web UI), plus monitor for audio failures. The /metrics endpoint exposes everything in standard Prometheus format.

Lessons Learned

A few things I picked up along the way:

Docker audio on Pi is tricky — device permissions and timing matter more than you’d expect
gpiod v1 vs v2 APIs are different — if you’re writing GPIO code, detect the version and handle both
Browser autoplay policies are strict — audio triggered by a button click works, but preloading helps

What’s Next?

Right now it’s sitting on my shelf, quietly counting away. A few ideas for future iterations:

Integration with actual alerting systems to auto-reset on incidents
Historical tracking of incident frequency
Maybe a leaderboard? (“Congratulations, February was our worst month!”)

Wrapping Up

Is this a practical project? Arguably not. Is it satisfying to press that button and hear the audio play while watching the counter reset to zeros? Absolutely.

Sometimes the best projects are the ones that make you smile while solving a problem that didn’t really need solving. If you want to build your own incident timer, the code’s all on GitHub.

Cheers!