Disks and Disk Models

Memory Hierarchy

As we go down the hierarchy, we get things that are cheaper and slower.

  • We've got some memory that's resident with the CPU. Very small amount of registers are very fast to access, but very expensive and therefore limited.

  • There is also some cache memory which is best to look up but more expensive.

  • The computing that you've done so far has probably been that the data is sitting in main memory. It's also known as RAM; random access memory.

  • If you have more data that can fit in main memory, it's going to be sitting externally and that is often going to be on a disk such as hard disks or SSDs. Modern systems usually have some flash memory in there as well.

  • If we have really huge amounts of data that we don't need very often, we might archive it on magnetic tape.

Essentially these things above the line are using electricity to store the data and when the power goes off, it's lost. The things below the line are using magnetism or other phenomena, and when the powers off it is preserved.

This refers to whether the power needs to be on in order for the memory to be the bits that are set to be preserved.


The data that we work with in databases, we think of as mainly living on the disk. One is because we think of often having very large data sets that won't all fit in main memory at the same time. But beyond that, even if they do fit, we have the volatility issue.

However, to work with the data, for example to execute a query, we need to get the data to the CPU, which means that we need to move it from the disk to main memory and then possibly into cache and or registers, and then finally do some computations with it.

Disk Model

We've got a bunch of these platters that are stacked up and they're all rotated very fast. All of these different platters, all together, that's called the cylinder. We've also got an arm that's got a read/write head for each platter, but they all move together. What they can do is move in and out to a different radius out from the center.

We're not going to read 1 bit or 1 byte at a time. The disk has a sector, 512 bytes - basically the amount that the read/write head will read simultaneously. A bunch of sectors that are together at the same radius on a platter is called a track.

  • To read data, disk needs to swivel access arm so head is over track holding data, wait for the right sector underneath the head and read all the data. In fact we're not going to just read a single sector at a time, we're going to have logical units called blocks comprising a bunch of sectors. In other words, we'll read a block at a time.

  • Disk access latency (How long it takes to read from the disk?): swivel + rotation (moving the arm into place and waiting while the disk rotates to the right place)

  • Max transfer rate per rotation (How long does it take to read?): all data that fits on one track (read a whole bunch of contiguous bytes that are all sitting next to each other on the same track)

Files modeled as having sequential layout

  • Seek time is related to speed of electric motor in access arm (=moving the arm)

  • Rotational latency depends on rotations per second: 7200 RPM = 120 RPS; how fast the disk is rotation? (=waiting until it spins around so that the very beginning of the file is under the disk head)

  • Transfer cost depends on how much data per track, plus RPS(Revolutions Per Second)

More Details

There's various things that we won't worry about:

  1. The disk arm speed isn't constant.

  1. Estimate of how long it's going to take on average to get the arm positioned.

  2. There will be typically some memory that's allocated as a buffer where we'll read things that we think we might want and have it in memory already and so forth.

  3. There might also be trying to use the bus connecting the disk to the main memory and there might be contention on that (SCSI, master-slave on IDE, SATA)

  4. Directory lookups may have significant costs

  5. Elevator Algorithm used under high load

Today's disks

Many orders of magnitude are slower than any work with data once it's in main memory. Even if we use bad algorithms to work with it, once it's in memory, that's usually much more efficient.

  • 70-120 MB/s (cheap SATA disks)

  • 1-6TB capacity at $70-$200

  • Access time between 5ms and 10ms (super slow compared to the amount of time it takes to move data from main memory to the CPU or CPU operations on it)

The overwhelming cost here is going to come from moving the data from the disk into main memory, and we're going to ignore basically what happens to it once it's in memory. For that there'll be two components: access time and transfer time.

Disk Model

  • Access time: time to find the start of the data (~5-10ms)

  • Transfer time: transfer rate * the amount of data (=amount of data/rate)

Note: transfer rate is MB/s of data retrieved afterwards (~50-120MB/s)

SSD: Solid State Drives (ctd)

They're going to have random access rather than sequential access to the data in roughly constant time. It's just generally faster than a magnetic disc, but it's still much slower than accessing the data once it's in main memory.

It also wears out, so if you repeatedly rate the same portion of a flash drive, then it's going to wear out and not work anymore.

They use less power, but in data centers there's going to be some mix of hard disks and solid-state devices. But we're just going to worry about hard disks.

Disk Performance Modeling

Time to read blocks: tRt_R= 10ms (access time) + 5ms (the amount of data that we want to read divided by the transfer rate: 400KB/80KB/ms)

Note: we're typically going to be reading at least one block at a time, and the block is going to be of 4/8/16 KB, but kind of that order of magnitude.

Disk Models

1. Block Model

Lets say, it takes k ms to read each block of size 4KB (or 8KB, or 16KB, but block size is fixed).

It takes 10k/ms to read 40KB (even if in same file) and it's a poor estimate because we're counting the access time over and over again here. Even though we might not need to do another access to get the next block if we've got a bunch of blocks that are contiguous to one another where we do one access followed by a bunch of transfers without additional accesses.

2. Seek or Latency Transfer-Rate (LTR) Model

In this case we're only going to count the access time once for the file. Therefore it takes "access time + transfer time" to read a file.

We're going to assume that the file is laid out sequentially on the disk, and that having access the first block, we can then leaving the disk arm in place as the disk rotates, pick up the second block, the third block, etc.

If all we're doing is a bunch of small reads that are scattered all over the place (small random reads), it doesn't matter. But if we're reading large sequential chunks of data sequentially, this is going to give us a more precise estimate.

Example: 7200 RPM, 5ms Seek, 80MB/s Transfer Rate

1. Block Model: 4KB Block Size

5ms (seek) + 1000/240ms (average rotational latency) + 0.05ms (read the block of data once the disk head is positioned correctly)

Why 240? 120 RPS (=7200 RPM / 60) means 240 half-roations per second and on average we have to do half a rotation to get to having the right part of the disk located under the disk head.

Let's go dive into the calculation

1000/240ms?

When the hard disk tries to access data, it doesn't know in advance where the data is located with respect to the current position of the disk head. The data could be just a little bit ahead of the current position, or it could be almost a full rotation away.

On average, however, we can say that the data will be about half a rotation away. This is because half a rotation represents the middle ground between the best-case scenario (data just ahead) and the worst-case scenario (data almost a full rotation away). This is an average expectation, meaning it won't always be exactly half a rotation, but over many attempts, it should average out to about half a rotation.

That's why the model assumes half a rotation to get to the right part of the disk located under the disk head. This makes the model simpler and gives a reasonable approximation for calculations.

The number 240 refers to the number of half-rotations per second. The drive spins at 120 full rotations per second, but each full rotation consists of two half-rotations.

Hence, the time for one average half-rotation is 1 / 240 seconds, or 1000 / 240 milliseconds (since 1 second = 1000 milliseconds).

0.05ms?

If the drive has a transfer rate of 80MB/s (MegaBytes per second), it means that the drive can read 80 * 1024 * 1024 = 83,886,080 bytes per second. To find out how much time it takes to read a 4KB (KiloByte) block, we need to divide the block size by the transfer rate.

First, convert 4KB to bytes: 4 * 1024 = 4096 bytes.

Then, divide this by the number of bytes the drive can read per second: 4096 / 83,886,080 = 0.000048828125 seconds.

Finally, convert seconds to milliseconds by multiplying by 1000: 0.000048828125 * 1000 = 0.048828125 milliseconds.

So, it takes approximately 0.05 milliseconds to read a 4KB block at a transfer rate of 80MB/s.

  • 9.216ms to read block

  • 100 * 9.21666 = 921.666ms to read 400KB file

Note: vast overestimate of the actual time needed because we're counting that access time many times.

The calculation 100 * 9.21666 = 921.666ms for reading a 400KB file is a bit of an overestimate because it assumes that for each 4KB block read (total 100 blocks for a 400KB file), the disk head needs to seek the new position and wait for the disk to rotate to the correct position. This is often not the case, especially for files that are stored contiguously on the disk. In these cases, the disk head can remain in the same position and read consecutive blocks without needing to move, significantly reducing the total access time. However, in the worst-case scenario (for example, if every block is scattered randomly around the disk), this estimate could be accurate. Therefore, it's mentioned as a "vast overestimate of the actual time needed because we're counting that access time many times"

2. Seek Model: much more realistic

  • (5ms + 1000/240ms) + 5ms (=100 * 0.05)

  • 14.16ms to read 400KB file

  • Simpler? (no reasoning about data layout invloved)

  • OK if mostly small random rads and writes (transactions)

  • Not good for search engines and data mining workloads

But we're not going to use it.

Disk Workload Examples

  • TRANSACTIONS: in case of many small reads and writes, typical DB transaction workload, block or seek model is fine.

  • OLAP, SEs, and Scientific processing: when it comes to reading and writing large files, block model is not good at all, therefore, use seek model.

  • INTERACTIVE (e.g., UNIX users): if there's many repeated reads, few writes, caching would work. (motivation for log structured file systems)

Last updated