Using S3 with Python

Modules

Amazon maintains their own module, Boto3, for communicating with the S3 API.
Simply installed by pip:

# pip3 install boto3

This is a powerful tool, able to perform any kind of request you need to your buckets, and is the recommended starting point for using S3 with Python.

Official Boto3 documentation

Note that if you only need to perform simple PUT or GET-requests, you can get just as far using a more general tool like requests against specific object paths.

Configuring access

The simplest way to gain access is by setting up pairs of access keys in
~/.aws/credentials as explained in the starter guide, as Boto3 will automatically pick up the configured profiles from this file.

Then you can create a Python object which sets up an authorized connections to your bucket like this:

import boto3

# Mutable parameters
endpoint_url = "https://s3-oslo.educloud.no"
profile_name = <profile name>
bucket_name  = <bucket name>

# To set up a connection with correct profile
session  = boto3.Session(profile_name)
s3       = session.resource('s3', endpoint_url=endpoint_url)
bucket   = s3.Bucket(bucket_name)

If you only have a [default] profile configured, you may exclude profile_name altogether in boto3.Session().
The resulting bucket object, can then be used to run queries against the specified bucket.

High-level vs. low-level interactions

The Boto3 module has two different modes which can be used in the communication with the bucket;
Client, used for "low-level" interactions, can practically be used to perform any action you'll need.
On the other hand we have Resource, which practically is a more abstracted (and modern) function with a considerably more user-friendly syntax for performing high-level interactions like list, delete, or upload/download.

Usually you can get away with using Resource only, and we'll also showcase it in most of the following examples.
If you want/need to use Client, it is configured similarly as shown previously shown:

session = boto3.Session(profile_name=profile)
s3      = session.client('s3', endpoint_url=endpoint)

You can read more about the differences here:

On Boto3 resource vs. client

Examples on queries

Uploading files

At its simplest:

bucket.upload_file(FILE_PATH, OBJECT_NAME)

As a function where the object name is set to the file name if not provided, and printing an error message if upload fails:

def upload_file(bucket, file_name, object_name=None):
    # If no object name provided, use the file_name
    if object_name is None:
        object_name = os.path.basename(file_name)

    # Attempt to upload file
    try:
        response = bucket.upload_file(file_name, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

For more advanced options, consult AWS' guide.

Listing contents

To list out all object names in a bucket::

bucket = s3.Bucket('1003-green-markusor-test')

for obj in bucket.objects.all():
    print(obj.key)


Output
-------
bar.txt
baz.md
foo.txt
mappe/
mappe/abc.txt
mappe/def.md
mappe/ghj.json

With filtering on prefix (ex. directory name):

for obj in bucket.objects.filter(Prefix='testdir/):
    print(obj.key)


Output
-------
testdir/
testdir/abc.txt
testdir/def.md
testdir/ghj.json

You can also iterate with Pythonic conditions, ex. here where we only want to see markdown (.md) files:

for obj in bucket.objects.all():
    if obj.key.endswith('.md'):
        print(obj.key)


Output
-------
baz.md
mappe/def.md

Downloading files

At its simplest, note the reversed order for object and file path:

bucket.download_file(OBJECT_NAME, FILE_PATH)

Alternatively, you can stream the binary contents of an object directly into a new file using client:

s3 = session.client('s3', endpoint_url=endpoint)

with open(FILNAVN, 'wb') as f:
    s3.download_fileobj(OBJEKTNAVN, f)

The latter method will start a multi-threaded multiparts download if required, and can therefore be more effective for retrieving large objects (GBs).

Tags: S3, object storage, Python By Markus Syse

Published Sep. 26, 2024 12:07 PM - Last modified Feb. 3, 2025 3:28 PM