A year ago I wrote about analyzing Planet Golang’s traffic with Julia. It was cool as an experiment, but I wanted something that I could run quicker, everywhere, and without depending on a single log format. I found GoAccess.
Most of my websites are static and hosted on AWS S3, served via CloudFront. Naturally, I wanted something that could parse CloudFront logs. I also run a home server (wow, is that post due for an update) which serves a bunch of things via nginx.
GoAccess can cope with both log formats, and more. In fact it can cope with any format, as long as you can give it a descriptor for it. Here it is, live-parsing logs from nginx:

Would you look at that, most chatter is going into my Mastodon instance. Yes, I self-host, which is a story in and of itself - but maybe for a different day.
This instance of goaccess is installed on the server itself, so I can just pop in over SSH and take a look at the logs. I do it slightly differently for CloudFront.
I wrote this Makefile:
sync:
aws s3 sync --profile pjw "s3://paweljw.al-logs/" logs/
stats:
find logs -name "*.gz" | \
xargs gzcat | \
grep --invert-match --file=exclude.txt | \
goaccess \
--log-format CLOUDFRONT \
--date-format CLOUDFRONT \
--time-format CLOUDFRONT \
--ignore-crawlers \
--ignore-status=301 \
--ignore-status=302 \
--output index.html
The sync
command pulls the CloudFront logs from S3 to a local directory. There’s surprisingly not that much of it - some
290MB gzipped. S3 download speeds are not amazing, but thanks to using sync
instead of cp
only the initial import is painful.
After that, I feed all of those .gz
files into GoAccess, specifying CLOUDFRONT
log formats, dropping redirects and crawlers,
and outputting a nice HTML dashboard to index.html
. Said dashboard looks like this for this website:

It’s jam-packed with info, and I can refresh it via make sync && make stats
. It’ll take maybe a minute, so I can satisfy
my curiosity on a whim… instead of booting up Jupyter and messing around with Julia or Python.
GoAccess made it into my go-to toolbox, and I wholeheartedly suggest you give it a whirl if you work with any logs at all.