Download AWS S3 Logs with Python & boto
I've written a Python script to help automation of downloading Amazon S3 logs to process with AWStats.
I've started to move my static content for some of my sites to Amazon Web Services using S3 and CloudFront for delivery. I've enabled logging for my CloudFront distributions as well as my public S3 buckets, and wanted to be able to automatically download the logs using cron to my server for processing with AWStats.
To make this happen I've written a script in Python with the boto module that downloads all generated log files to a local folder and then deletes them from the Amazon S3 Bucket when done. The log files downloaded to the local folder can then be further processed with logresolvemerge and AWStats.
You need to have the boto module installed for this to work. Personally I'm w orking with Ubuntu 10.04, where boto can be easily installed by executing:
sudo apt-get install python-boto
The script takes some command-line arguments that are listed in the doc header. All of these can have a default value set in the head of the get_logs class. If you set default values in the script, the command-line arguments are useful if you need to override a default value on some occasions.
get-aws-logs.py
#! /usr/bin/env python
"""Download and delete log files for AWS S3 / CloudFront
Usage: python get-aws-logs.py [options]
Options:
-b ..., --bucket=... AWS Bucket
-p ..., --prefix=... AWS Key Prefix
-a ..., --access=... AWS Access Key ID
-s ..., --secret=... AWS Secret Access Key
-l ..., --local=... Local Download Path
-h, --help Show this help
-d Show debugging information while parsing
Examples:
get-aws-logs.py -b eqxlogs
get-aws-logs.py --bucket=eqxlogs
get-aws-logs.py -p logs/cdn.example.com/
get-aws-logs.py --prefix=logs/cdn.example.com/
This program requires the boto module for Python to be installed.
"""
__author__ = "Johan Steen (https://blog.bitbebop.com/)"
__version__ = "0.5.0"
__date__ = "28 Nov 2010"
import boto
import getopt
import sys, os
_debug = 0
class get_logs:
"""Download log files from the specified bucket and path and then delete them from the bucket.
Uses: http://boto.s3.amazonaws.com/index.html
"""
# Set default values
AWS_BUCKET_NAME = '{bucket}'
AWS_KEY_PREFIX = '{prefix}'
AWS_ACCESS_KEY_ID = '{access key}'
AWS_SECRET_ACCESS_KEY = '{secret key}'
LOCAL_PATH = '{local path}'
# Don't change below here
s3_conn = None
bucket_list = None
def __init__(self):
s3_conn = None
bucket_list = None
def start(self):
"""Connect, get file list, copy and delete the logs"""
self.s3Connect()
self.getList()
self.copyFiles()
def s3Connect(self):
"""Creates a S3 Connection Object"""
self.s3_conn = boto.connect_s3(self.AWS_ACCESS_KEY_ID, self.AWS_SECRET_ACCESS_KEY)
def getList(self):
"""Connects to the bucket and then gets a list of all keys available with the chosen prefix"""
bucket = self.s3_conn.get_bucket(self.AWS_BUCKET_NAME)
self.bucket_list = bucket.list(self.AWS_KEY_PREFIX)
def copyFiles(self):
"""Creates a local folder if not already exists and then download all keys and deletes them from the bucket"""
# Using makedirs as it's recursive
if not os.path.exists(self.LOCAL_PATH):
os.makedirs(self.LOCAL_PATH)
for key_list in self.bucket_list:
key = str(key_list.key)
# Get the log filename (L[-1] can be used to access the last item in a list).
filename = key.split('/')[-1]
# check if file exists locally, if not: download it
if not os.path.exists(self.LOCAL_PATH+filename):
key_list.get_contents_to_filename(self.LOCAL_PATH+filename)
if _debug:
print "Downloaded from bucket: "+filename
# check so file is downloaded, if so: delete from bucket
if os.path.exists(self.LOCAL_PATH+filename):
key_list.delete()
if _debug:
print "Deleted from bucket: "+filename
def usage():
print __doc__
def main(argv):
try:
opts, args = getopt.getopt(argv, "hb:p:l:a:s:d", ["help", "bucket=", "prefix=", "local=", "access=", "secret="])
except getopt.GetoptError:
usage()
sys.exit(2)
logs = get_logs()
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit()
elif opt == '-d':
global _debug
_debug = 1
elif opt in ("-b", "--bucket"):
logs.AWS_BUCKET_NAME = arg
elif opt in ("-p", "--prefix"):
logs.AWS_KEY_PREFIX = arg
elif opt in ("-a", "--access"):
logs.AWS_ACCESS_KEY_ID = arg
elif opt in ("-s", "--secret"):
logs.AWS_SECRET_ACCESS_KEY = arg
elif opt in ("-l", "--local"):
logs.LOCAL_PATH = arg
logs.start()
if __name__ == "__main__":
main(sys.argv[1:])
Have in mind that I'm pretty new to Linux and to Python, so I bet things can be solved better, easier or in a more beautiful way than what I've done, as well as making it more fail safe.
Feel free to suggest improvements that can be made to the code.