18 November 2015
Do not use Amazon Glacier with files smaller than 200KB
If this is not obvious to you (it was not obvious to me), read on.
Precision is important when you start multiplying by millions and trillions. For consistency and precision, the following units are used throughout this article.
Note that an interesting relationship is that there are 2^20 KB in a GB:
What if, instead of storing one 1GB file, we stored 1,048,576 1KB files. The total amount of data stored is still just 1GB. In S3 it will cost $0.36 per year (I can’t find any information about metadata charges in S3, which would make this higher). When migrated to Glacier, it will cost $58.08 for the first year, and $5.65 each year thereafter.
There are 3 factors to the cost of storage in Glacier:
Notice that the first factor is the only one typically considered. Indeed, with large files, metadata storage and request fees are minuscule. However, with small files, they can become significant.
To see what is going on here, let’s decompose the cost of archiving 1,048,576 (or 2^20) 1KB files to Glacier and storing them for one year.
First, the request fee: $0.05 per 1,000 requests. It doesn’t sound like a lot, but 2^20 files times .05/1000 cost per file is $52.43.
Second, metadata storage is also billable.
For each object archived to Amazon Glacier, Amazon S3 uses 8 KB of storage for the name of the object and other metadata. Amazon S3 stores this metadata so that you can get a real-time list of your archived objects by using the Amazon S3 API (see Get Bucket (List Objects)). You are charged standard Amazon S3 rates for this additional storage.
For each archived object, Amazon Glacier adds 32 KB of storage for index and related metadata. This extra data is necessary to identify and restore your object. You are charged Amazon Glacier rates for this additional storage.
So for each 1KB file, there is 8KB of metadata stored in S3, and 32KB of metadata stored in Glacier. The cost of metadata for our 1KB files dwarfs the cost of the data itself.
2^20 files times 8KB S3 metadata times $.03/mo times 12 months comes to $2.88.
2^20 files times 32KB Glacier metadata times $.07/mo times 12 months comes to $2.69.
The cost of the actual data in the files is just $.08.
A 200KB file takes about a year to break even when migrated to Glacier. Smaller files take longer; larger files shorter. With lots of small files, it is best to just leave them in S3 until they can be removed. Deletes are free!
Enter a file size in bytes, and see how many months until its Glacier migration breaks even. If the answer is negative, Glacier will never break even.
Here’s a plot showing file size on the vertical axis, and months until break even on the horizontal axis. Under about 100k, Glacier makes no sense.
I suspect there are errors in my math. Please e-mail me with corrections.