Home > Coding > Parsing Apache access logs with Python

Parsing Apache access logs with Python

April 8th, 2007

I operate an Apache web server. Occasionally I want to see what’s happening on that web server, i.e., which documents are being viewed, what links folks are following to this site, etc. In addition to these basic things, I also want to know when my web server is being probed by hackers for vulnerabilities that they might exploit. I haven’t found any free or commercial log analyzers that does exactly what I want.

So, I decided to write my own, in Python of course because that’s my hack-tools-quickly language of choice and has been for a while now. One of the basic things that any Apache log analyzer needs is a bit of code that’s able to parse the Apache access log format. After a superficial bit of Googling I didn’t see any libraries with a clean enough interface.

Given that this sort of thing isn’t rocket science, I spent a few minutes blasting something out. The resulting module is called apachelogs. I tried to make the API for apachelogs as simple and straightforward as possible. For instance, the following code is all that is required to open a log file, count the number of 40x responses therein, and print the result.

import apachelogs

if __name__ == '__main__':
  alf = apachelogs.ApacheLogFile('data/access.log.1')
  num_40xs = 0
  for log_line in alf:
    if log_line.http_response_code.startswith('40'):
      num_40xs += 1
  print "Saw %d 40x responses." % num_40xs


  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.