In computer systems, one of the main challenges is persisting the information/data despite various failures. One of the oldest medium in use to persist information is Hard Disk Drives. In this blog post, lets try to deep dive into the famous storage medium, HDD.

What is a Hard Disk Drive (HDD)?

It is a non-volatile storage device that is commonly used in digital devices to store and retrieve data. It is a non-volatile input/output (IO) device for any computer based system.

What is the technology for storing data persistently?

Data is persistently stored on the HDD in binary format. Essentially it remembers a stream of 1s and 0s. HDD uses magnetic storage technology to store the data.

The Interface

The basic unit of interacting with HDD is a sector. Disk has sectors (generally 512 bytes to few KBs). Each sector is a unit that can be read or written. The sectors are numbered for 0 to n -1. Hence the disk can be viewed as an array of sectors. Multi-sector operations are possible, however, the only guarantee drive manufacturers make is that a single sector write is atomic (either complete entirely or fail entirely).

The Mechanical structure

Stack of disk platters, heads and the HDD chassis

Diagrams to illustrate the HDD structure

A single track plus head. Numbers represent individual sectors on the track.
A HDD with one platter containing single track. The track has 4 sectors. Head is at sector 4. Platter rotates at constant speed around the spindle.
HDD with 1 platter which contains 2 tracks. Head seeks from one track to another and then starts reading the data. Numbers represent sectors on the track.

Physics behind reading/writing the data to the disk

Architecture of a HDD

Numbers to remember as System Architect/Software Engineer

Sample Specs

Cheetah is performance drive vs Barracuda is capacity drive

Latency math

T (I/O) = T (seek) + T (rotation) + T (transfer)

Summary

So How to use HDDs efficiently?
- T(seek) + T(rotation) is the time required to start reading the data. For efficient use of the HDD, this needs to be minimized.
- T(rotation) is relatively small and can be eliminated by purchasing the disk with high RPM (high performance disk).
- T(seek) can be amortized and reduced by reading large amount of data or issuing IOs sequentially (read/write sectors physically closer to each other).
- Now its easy to understand why random IOs on HDD may not have a good performance. T(seek) essentially may be required for every other IO which reduces disk performance. This is where the concept of data locality on the disk and random vs sequential IO performance originates.

It’s important to note that HDDs are relatively slower and have higher latency as compared to solid-state drives (SSDs). SSDs offer significantly higher IOPS, faster throughput, and much lower latency, which makes them a preferred choice for performance-critical applications and workloads that require rapid data access.

We will go in depth of the SSDs in the next blog.

References