A year ago I wrote about analyzing Planet Golang’s traffic with Julia. It was cool as an experiment, but I wanted something that I could run quicker, everywhere, and without depending on a single log format. I found GoAccess.
Most of my websites are static and hosted on AWS S3, served via CloudFront. Naturally, I wanted something that could parse CloudFront logs. I also run a home server (wow, is that post due for an update) which serves a bunch of things via nginx.
GoAccess can cope with both log formats, and more. In fact it can cope with any format, as long as you can give it a descriptor for it. Here it is, live-parsing logs from nginx:
Would you look at that, most chatter is going into my Mastodon instance. Yes, I self-host, which is a story in and of itself - but maybe for a different day.
This instance of goaccess is installed on the server itself, so I can just pop in over SSH and take a look at the logs. I do it slightly differently for CloudFront.
I wrote this Makefile:
sync: aws s3 sync --profile pjw "s3://paweljw.al-logs/" logs/ stats: find logs -name "*.gz" | \ xargs gzcat | \ grep --invert-match --file=exclude.txt | \ goaccess \ --log-format CLOUDFRONT \ --date-format CLOUDFRONT \ --time-format CLOUDFRONT \ --ignore-crawlers \ --ignore-status=301 \ --ignore-status=302 \ --output index.html
sync command pulls the CloudFront logs from S3 to a local directory. There’s surprisingly not that much of it - some
290MB gzipped. S3 download speeds are not amazing, but thanks to using
sync instead of
cp only the initial import is painful.
After that, I feed all of those
.gz files into GoAccess, specifying
CLOUDFRONT log formats, dropping redirects and crawlers,
and outputting a nice HTML dashboard to
index.html. Said dashboard looks like this for this website:
It’s jam-packed with info, and I can refresh it via
make sync && make stats. It’ll take maybe a minute, so I can satisfy
my curiosity on a whim… instead of booting up Jupyter and messing around with Julia or Python.
GoAccess made it into my go-to toolbox, and I wholeheartedly suggest you give it a whirl if you work with any logs at all.