In this guest post Giuseppe shares what he learned having to cleanup a large number of objects in an S3 bucket. He introduces us to some
boto3 as well as
freezegun he used to test his code. Enter Giuseppe ...
Delete S3 objects
This is a bit of code I wrote for a much bigger script used to monitor and cleanup objects inside an S3 bucket. The rest of the script is proprietary and unfortunately cannot be shared.
The script.py module contains the
cleanup() function. It uses
boto3 to connect to AWS, pull a list of all the objects contained in a specific bucket and then delete all the ones older than
I have included a few examples of creating a
boto3.client which is what the function is expecting as the first argument. The other arguments are used to build the path to the directory inside the S3 bucket where the files are located. This path in AWS terms is called a Prefix.
As the number of the objects in the bucket can be larger than 1000, which is the limit for a single GET in the
GET Bucket (List Objects) v2, I used a paginator to pull the entire list. The objects removal follow the same principle and process batches of 1000 objects.
Testing the code
Now this was all good and fun but the really interesting part was how to unittest this code, see test_script.py.
After some researching I found moto, the Mock AWS Services library. It's brilliant! Using this library the test will mock access to the S3 bucket and create several objects in the bucket. You can leave the dummy AWS credentials in the script as they won't be needed.
At this point I wanted to create multiple objects in the S3 mocked environment with different timestamps, but unfortunately I discovered that this was not possible. Once an S3 object is created its creation date (metadata) cannot be easily altered, see the object-metadata docs for reference.
Enter another awesome library called freezegun. I ended up using
freeze_time in my tests to mock the date/time and create S3 objects with different timestamps. This way we can safely experiment with the logic of
cleanup(), that is leaving objects older than n days and deleting everything else within the prefix.
Here is the test script's output:
$ python test_script.py mock-root-prefix/mock-sub-prefix/test_object_01 2019-08-29 00:00:00+00:00 mock-root-prefix/mock-sub-prefix/test_object_02 2019-08-28 00:00:00+00:00 mock-root-prefix/mock-sub-prefix/test_object_03 2019-08-27 00:00:00+00:00 mock-root-prefix/mock-sub-prefix/test_object_04 2019-08-26 00:00:00+00:00 mock-root-prefix/mock-sub-prefix/test_object_05 2019-08-25 00:00:00+00:00 mock-root-prefix/mock-sub-prefix/test_object_06 2019-08-24 00:00:00+00:00 <class 'botocore.client.S3'> Cleanup S3 backups Working in the bucket: my-mock-bucket The prefix is: mock-root-prefix/mock-sub-prefix/ The threshold (n. days) is: 4 Total number of files in the bucket: 7 Number of files to be deleted: 3 Deleting the files from the bucket ... Deleted: 3 Left to delete: 0 . ---------------------------------------------------------------------- Ran 1 test in 0.798s OK
Again you can find the code for this project here.
Keep Calm and Code in Python!
See an error in this post? Please submit a pull request on Github.