The Rub

AUTOMATICALLY SIMPLE SINCE 2002

Glacier Costlier than S3 for Small Files

Rule of Thumb

Do not use Amazon Glacier with files smaller than 200KB

If this is not obvious to you (it was not obvious to me), read on.

Math Review

Precision is important when you start multiplying by millions and trillions. For consistency and precision, the following units are used throughout this article.

  • KB: 1,024 bytes, expressed as 2^10
  • MB: 1,048,576 bytes, expressed as 2^20
  • GB: 1,073,741,824 bytes, expressed as 2^30

Note that an interesting relationship is that there are 2^20 KB in a GB:

  • 2^30/2^10 = 2^20

Example

Based on standard pricing, a 1GB file will cost $0.36 per year in S3, and $0.084 per year in Glacier.

What if, instead of storing one 1GB file, we stored 1,048,576 1KB files. The total amount of data stored is still just 1GB. In S3 it will cost $0.36 per year (I can’t find any information about metadata charges in S3, which would make this higher). When migrated to Glacier, it will cost $58.08 for the first year, and $5.65 each year thereafter.

Cost Factors

There are 3 factors to the cost of storage in Glacier:

  • The raw storage used
    • $.03/GB/mo for S3
    • $.007/GB/mo for Glacier
  • The metadata storage used
    • 8KB/file for S3 (when using Glacier)
    • 32KB/file for Glacier
  • Request fees
    • $.05 per 1000 glacier archive requests

Notice that the first factor is the only one typically considered. Indeed, with large files, metadata storage and request fees are minuscule. However, with small files, they can become significant.

The Math

To see what is going on here, let’s decompose the cost of archiving 1,048,576 (or 2^20) 1KB files to Glacier and storing them for one year.

First, the request fee: $0.05 per 1,000 requests. It doesn’t sound like a lot, but 2^20 files times .05/1000 cost per file is $52.43.

Second, metadata storage is also billable.

For each object archived to Amazon Glacier, Amazon S3 uses 8 KB of storage for the name of the object and other metadata. Amazon S3 stores this metadata so that you can get a real-time list of your archived objects by using the Amazon S3 API (see Get Bucket (List Objects)). You are charged standard Amazon S3 rates for this additional storage.

For each archived object, Amazon Glacier adds 32 KB of storage for index and related metadata. This extra data is necessary to identify and restore your object. You are charged Amazon Glacier rates for this additional storage.

So for each 1KB file, there is 8KB of metadata stored in S3, and 32KB of metadata stored in Glacier. The cost of metadata for our 1KB files dwarfs the cost of the data itself.

2^20 files times 8KB S3 metadata times $.03/mo times 12 months comes to $2.88.

2^20 files times 32KB Glacier metadata times $.07/mo times 12 months comes to $2.69.

The cost of the actual data in the files is just $.08.

Recommendations

A 200KB file takes about a year to break even when migrated to Glacier. Smaller files take longer; larger files shorter. With lots of small files, it is best to just leave them in S3 until they can be removed. Deletes are free!

Enter a file size in bytes, and see how many months until its Glacier migration breaks even. If the answer is negative, Glacier will never break even.

(source formula for the widget)

Here’s a plot showing file size on the vertical axis, and months until break even on the horizontal axis. Under about 100k, Glacier makes no sense.

Size vs Time until Glacier break even

Errors

I suspect there are errors in my math. Please edit on GitHub or e-mail me with corrections.

References