GoAccess is a great log parsing tool

| goaccess tools i use this logging

A year ago I wrote about analyzing Planet Golang’s traffic with Julia. It was cool as an experiment, but I wanted something that I could run quicker, everywhere, and without depending on a single log format. I found GoAccess.

Most of my websites are static and hosted on AWS S3, served via CloudFront. Naturally, I wanted something that could parse CloudFront logs. I also run a home server (wow, is that post due for an update) which serves a bunch of things via nginx.

GoAccess can cope with both log formats, and more. In fact it can cope with any format, as long as you can give it a descriptor for it. Here it is, live-parsing logs from nginx:

Would you look at that, most chatter is going into my Mastodon instance. Yes, I self-host, which is a story in and of itself - but maybe for a different day.

This instance of goaccess is installed on the server itself, so I can just pop in over SSH and take a look at the logs. I do it slightly differently for CloudFront.

I wrote this Makefile:

sync:
        aws s3 sync --profile pjw "s3://paweljw.al-logs/" logs/

stats:
        find logs -name "*.gz" | \
            xargs gzcat | \
            grep --invert-match --file=exclude.txt | \
            goaccess \
                --log-format CLOUDFRONT \
                --date-format CLOUDFRONT \
                --time-format CLOUDFRONT \
                --ignore-crawlers \
                --ignore-status=301 \
                --ignore-status=302 \
                --output index.html

The sync command pulls the CloudFront logs from S3 to a local directory. There’s surprisingly not that much of it - some 290MB gzipped. S3 download speeds are not amazing, but thanks to using sync instead of cp only the initial import is painful.

After that, I feed all of those .gz files into GoAccess, specifying CLOUDFRONT log formats, dropping redirects and crawlers, and outputting a nice HTML dashboard to index.html. Said dashboard looks like this for this website:

It’s jam-packed with info, and I can refresh it via make sync && make stats. It’ll take maybe a minute, so I can satisfy my curiosity on a whim… instead of booting up Jupyter and messing around with Julia or Python.

GoAccess made it into my go-to toolbox, and I wholeheartedly suggest you give it a whirl if you work with any logs at all.

Built with ❤ by Paweł J. Wal in 2023. Hugo helped.

Blog contents, except where otherwise noted, are CC BY-SA 4.0. Code of this blog is MIT.

Toggle dark/light mode