Backup Your Digital Life To AWS S3
More an more of our lives are purely digital, and a lot of us have outsourced the storage and back up of our those digital lives to products and services that are easy. There is no doubt that these services come with massive benefits including ease of use, redundancy, and experienced teams ensuring reliability and security well beyond our own capabilities. Yes some of these services are free, but they come at cost to our privacy. What if I told you that you could get most of those benefits with a small investment without giving up your privacy?
There are a lot of ways to solve this problem, this outlines one of many. It is not meant to be THE recommended way, only an option so allowing you to decide what works best for you.
Why AWS?
There are other solutions, so why AWS?
- Privacy
- Reliability
- Reasonable Cost
AWS was started as a tool for businesses, but driven by the Amazon ethos it is provided as a self service tool available to individuals. The benefit you get from AWS’s lineage is that businesses pay for reliable and private storage since that’s where businesses store customer data. Those businesses are subject to laws, such as GDPR, and constraints that mean that AWS is 100% focused on privacy and reliability. They guarantee it. If they fail they cease to be valuable to businesses. AWS also continuously improves services and drives costs down for their customers.
AWS S3 is world wide and has unmatched service and reliability. Their costs are not the cheapest but extremely competitive, one of the best values.
Why NOT AWS?
This is definitely not for everyone. AWS services require significant learning and are geared towards businesses and developers. They are not “consumer facing” simple and easy.
This article assumes you have the basics of that experience in computering things, i.e. a tech bent. No, I would not recommended this for my Grandma, but that’s mostly because she loves Korn Shell and that has been a major sticking point at Thanksgiving for years.
AWS is also not the cheapest. There are cheaper and easier solutions if you want to “set it and forget it”.
Make It Happen
Everything following assumes that you have:
- Setup an AWS account
- Downloaded your AWS credentials and setup the AWS CLI with those credentials
- Are comfortable on the command line
- Some understanding of AWS or a curiosity to learn more. Nnot everything will be explained in this short article.
- Bonus: scheduling and creating an IAM account for credentials only access and least privilege for your backup process
AWS S3 stores data in what’s called bucket. From here on out we’ll call our bucket stuff-backup
and the data we will store there will come a folder named my-stuff
.
Create And Configure Your Bucket
- Create
aws s3api create-bucket — bucket stuff-backup — acl private
Recommended practice is to create a bucket for each set of things you want to store instead of one bucket for all of them. So for photos and documents you would want to create two bucketsphotos-backup
anddocuments-backup
- Set the bucket as non-public which prevents any public access of data. Don’t. Skip. This. Bit.
aws s3api put-public-access-block --bucket stuff-backup --public-access-block--configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
- Enable versioning, which will keep versions of your items as they change and allows you to use storage tiers to reduce costs
aws s3api put-bucket-versioning --bucket stuff-backup --versioning-configuration Status=Enabled
- Enable encryption. This will ensure all your files stored in S3 will be encrypted by your default AWS key with AES encryption.
aws s3api put-bucket-encryption --bucket stuff-backup --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'
- Add a lifecycle policy. This uses a file (below) to move your backups to the S3 Glacier storage tier and versions to Glacier Deep Archive. This means your files will take longer to get back (up to 24–48 hours) but your costs will be orders of magnitude less.
aws s3api put-bucket-lifecycle-configuration --bucket stuff-backup --lifecycle-configuration file://bucket-lifecycle.json
Lifecycle Policy bucket-lifecycle.json
{
"Rules": [
{
"ID": "Upload",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Transitions" : [
{
"Days": 30,
"StorageClass": "GLACIER"
}
]
},
{
"ID": "old-versions-to-glacier",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 60,
"StorageClass": "DEEP_ARCHIVE"
}
]
}
]
}
Backup It Up
Use the AWS S3 command line to back up my-stuff
to S3. The --delete
flag here makes sure that files deleted locally are marked as deleted from S3. Since we setup versioning they are still there.
aws s3 sync --delete my-stuff s3://stuff-backup
Depending on how much you have to backup, this may take a while the first time. sync
will only upload changes, so subsequent runs will be fast and efficient.
Bonus Material
Scheduling
To make this valuable, run this command on your scheduling tool of choice. For me this is on Linux on systemd enabled distro.
Create the service (/etc/systemd/system/stuff-backup.service
) which runs a shell script to do the work:
[Unit]
Description=Backup Stuff[Service]
Type=onshot
User=root
ExecStart=/root/stuff-backup.sh[Install]
WantedBy=default.target
Create the timer (/etc/systemd/system/stuff-backup.timer
):
[Unit]
Description=Scheduled stuff-backup[Timer]
Persistent=true
OnCalendar=*-*-* 02:35:00
Unit=stuff-backup.service[Install]
WantedBy=timers.target
Enable them:
systemctl enable stuff-backup.service
systemctl enable stuff-backup.timer
Security
It’s always a good idea to follow least privilege for access for your processes, like this backup process. Create a separate AWS IAM account with credential only access and only give it permissions to what it needs.
Here I will assume you created that IAM account and created and put that account into an IAM group named backups
. You should execute your backup script or commands under those credentials.
Create a policy for your new bucket
aws iam create-policy --policy-name stuff-s3-backup-access --policy-document file://access-policy.json --query 'Policy.Arn' --output text
This creates a new policy called stuff-s3-backup-access
and assigns it the rights from the access-policy.json
file and gives it the minimum access to only to our newly created stuff-backup
bucket. It will output the value (arn) needed in the next step
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetObject",
"s3:GetBucketLocation",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::stuff-backup",
"arn:aws:s3:::stuff-backup/*"
]
}
]
}
Assign the policy to your group
Remember your backups
group? This command will give all members of that group that policy, and least privilege access necessary to do the job of backing up your data. You’ll need the output of the previous command when you created the policy.
aws iam attach-group-policy --group-name backups --policy-arn <copied-from-previous-command-output>