Modules
Amazon maintains their own module, Boto3, for communicating with the S3 API.
Simply installed by pip:
# pip3 install boto3
This is a powerful tool, able to perform any kind of request you need to your buckets, and is the recommended starting point for using S3 with Python.
Note that if you only need to perform simple PUT or GET-requests, you can get just as far using a more general tool like requests against specific object paths.
Configuring access
The simplest way to gain access is by setting up pairs of access keys in
~/.aws/credentials as explained in the starter guide, as Boto3 will automatically pick up the configured profiles from this file.
Then you can create a Python object which sets up an authorized connections to your bucket like this:
import boto3 # Mutable parameters endpoint_url = "https://s3-oslo.educloud.no" profile_name = <profile name> bucket_name = <bucket name> # To set up a connection with correct profile session = boto3.Session(profile_name) s3 = session.resource('s3', endpoint_url=endpoint_url) bucket = s3.Bucket(bucket_name)
If you only have a [default] profile configured, you may exclude profile_name altogether in boto3.Session().
The resulting bucket object, can then be used to run queries against the specified bucket.
High-level vs. low-level interactions
The Boto3 module has two different modes which can be used in the communication with the bucket;
Client, used for "low-level" interactions, can practically be used to perform any action you'll need.
On the other hand we have Resource, which practically is a more abstracted (and modern) function with a considerably more user-friendly syntax for performing high-level interactions like list, delete, or upload/download.
Usually you can get away with using Resource only, and we'll also showcase it in most of the following examples.
If you want/need to use Client, it is configured similarly as shown previously shown:
session = boto3.Session(profile_name=profile) s3 = session.client('s3', endpoint_url=endpoint)
You can read more about the differences here:
Examples on queries
Uploading files
At its simplest:
bucket.upload_file(FILE_PATH, OBJECT_NAME)
As a function where the object name is set to the file name if not provided, and printing an error message if upload fails:
def upload_file(bucket, file_name, object_name=None): # If no object name provided, use the file_name if object_name is None: object_name = os.path.basename(file_name) # Attempt to upload file try: response = bucket.upload_file(file_name, object_name) except ClientError as e: logging.error(e) return False return True
For more advanced options, consult AWS' guide.
Listing contents
To list out all object names in a bucket::
bucket = s3.Bucket('1003-green-markusor-test') for obj in bucket.objects.all(): print(obj.key) Output ------- bar.txt baz.md foo.txt mappe/ mappe/abc.txt mappe/def.md mappe/ghj.json
With filtering on prefix (ex. directory name):
for obj in bucket.objects.filter(Prefix='testdir/): print(obj.key) Output ------- testdir/ testdir/abc.txt testdir/def.md testdir/ghj.json
You can also iterate with Pythonic conditions, ex. here where we only want to see markdown (.md) files:
for obj in bucket.objects.all(): if obj.key.endswith('.md'): print(obj.key) Output ------- baz.md mappe/def.md
Downloading files
At its simplest, note the reversed order for object and file path:
bucket.download_file(OBJECT_NAME, FILE_PATH)
Alternatively, you can stream the binary contents of an object directly into a new file using client:
s3 = session.client('s3', endpoint_url=endpoint) with open(FILNAVN, 'wb') as f: s3.download_fileobj(OBJEKTNAVN, f)
The latter method will start a multi-threaded multiparts download if required, and can therefore be more effective for retrieving large objects (GBs).