Database performance is greatly affected by the performance of the underlying memory. For reads, having a lot of RAM can speed things up, but for operations with a lot of writes, the bottleneck is the SSD of the hard drive on which it runs out. AWS has tons of storage options, so which one is best for you?
Database-focused EC2 instances
Besides just the underlying storage, there are many other factors that affect the performance of the database. AWS has many different types of instances with individual layers in them.
The most database-oriented instance are the R5 series. These are optimized for memory performance, both with RAM speed and size and EBS performance. They offer a high core to available memory ratio, up to 768 GB RAM on the r5.24xlarge.
There is also the r5d series, a subclass of R5 that provides direct local disk, not on EBS. The largest tier has four 900 GB NVMe SSDs. Smaller than EBS̵
There is also the D3 series, which provides the greatest amount of local storage for an EC2 instance, up to 336 TB. If you want to run a particularly large instance that stores a lot of data, D3 may work best for you.
EBS volume types
EBS has a few different levels. The most common is gp3, a general-purpose volume with SSD support that offers solid performance at a higher price than hard-drive volumes.
gp3 is the latest generation and replaces
gp2and offers 4x better performance with PCIe Gen 4 SSDs.
gp3 uses a burst bucket pricing model. Depending on the size of the volume, it earns “IO Credits” per hour that are automatically used to purchase IOPS or input-output operations per second. This allows for quick bursts of performance when needed, but if you need a steady, solid performance it’s not a good idea to rely on this. There is also a maximum number of IOPS; for gp3 it is 16,000.
Volumes earn IO credits at a rate of 3 per GB per second. This means that if you have a volume greater than 1TB, your bucket will always be full and you don’t have to worry about burst performance. Anything lower than that, and you’re limited to the basic achievements based on how many credits you earn.
What this means in practice is that if you need extra performance, you want to use the second SSD-based volume,
io2, also known as a provisioned IOPS SSD. It literally lets you buy disk performance directly, delivered on your EBS volume. The best level,
io2 Block Express, offers up to 4000 MB / s per volume and 7500 MB / s per instance.
That’s up to four times the performance of
gp3but only if you can afford it – bandwidth is expensive, and you have to pay for every part of it. A top class
io2 volume can easily cost thousands of dollars per month, more than the EC2 instance running on it. That’s on top of the 83% increase in storage costs per GB.
io2 is an option for customers who need all possible performance unless you make the most of your drive, the overall goal
gp3 volumes will be great for many people.
Hard drive volumes
There are two main hard drive EBS volumes: throughput optimized hard drives (st1) volumes and cold hard drives (sc1) volumes. The names speak for themselves – st1 is optimized for adequate sequential read speeds (although terrible random performance, as all hard drives have). For non-critical applications requiring large file sizes, sc1 offers great local storage performance.
Both types of volumes also use the burst bucket model, but arrive at a fixed MBps number based on the volume size.
However, for databases, random read and write performance can matter a lot, as can latency. It’s 2020 and your users don’t have to wait for a drive to boot and wait for a magnetic read head to get some basic data. Not to mention how it would handle complex SQL queries that could bring the disk to a halt.
For anything user-centric, performance matters and you should use an SSD. The only case where it makes sense is in heavy reading applications where the database is small enough to be kept mainly in memory but even then it would be small enough where the small premium of even a standard gp3 volume would be worth it to be .
However, for big data, analytics, and other internal databases, the database can be so large that the cost of local storage is too high to run on SSDs. If you’re looking for a high-capacity data lake or a multi-server cluster, you might not care much about a slightly slower disk speed, especially if it saves you money in the process.