In computer systems, one of the main challenges is persisting the information/data despite various failures. One of the oldest medium in use to persist information is Hard Disk Drives. In this blog post, lets try to deep dive into the famous storage medium, HDD.
What is a Hard Disk Drive (HDD)?
It is a non-volatile storage device that is commonly used in digital devices to store and retrieve data. It is a non-volatile input/output (IO) device for any computer based system.
What is the technology for storing data persistently?
Data is persistently stored on the HDD in binary format. Essentially it remembers a stream of 1s and 0s. HDD uses magnetic storage technology to store the data.
The Interface
The basic unit of interacting with HDD is a sector. Disk has sectors (generally 512 bytes to few KBs). Each sector is a unit that can be read or written. The sectors are numbered for 0 to n -1. Hence the disk can be viewed as an array of sectors. Multi-sector operations are possible, however, the only guarantee drive manufacturers make is that a single sector write is atomic (either complete entirely or fail entirely).
The Mechanical structure
Stack of disk platters, heads and the HDD chassis
Disk: A disk has one or more platters to store the data persistently
Platter: a circular hard surface on which data is stored persistently.
Spindle: The platters are all bound by the spindle at the centre.
Motor: The spindle is connected to the motor which helps rotate all the platters at a constant speed (7000 to 15000 RPM) using a BLDC motor (Brush less DC).
Surface: Aplatter has two sides called surfaces.
Track: Data is encoded in concentric circles called tracks on each surface of the platter. A single surface may contain thousands of tracks. Latest HDDs can have more than 500k tracks.
Sector: Each track is divided into sectors. Inner tracks have less sectors as compared to the outer tracks on the surface of the platter.
Data in the sector: - preamble to identify the sector - address of this sector - user data - Error correcting code (ECC)
Head stack assembly: - One arm per surface, i.e. one arm below and one arm above each platter. - One read and one write head at the end of each arm. - Heads position could be adjusted anywhere on the radius of the platter to read/write from different tracks.
Diagrams to illustrate the HDD structure
A HDD with one platter containing single track. The track has 4 sectors. Head is at sector 4. Platter rotates at constant speed around the spindle.HDD with 1 platter which contains 2 tracks. Head seeks from one track to another and then starts reading the data. Numbers represent sectors on the track.
Physics behind reading/writing the data to the disk
The disk platter is a Aluminium Magnesium alloy which has a special magnetic functional layer used to store bits by using the direction of the magnetic filed. The magnetic functional layer is on Cobalt, Chromium, Tantalum alloy 120nm wide. This has small magnetic regions whose direction could be manipulated via external magnetic field.
Writing data (1 bit): Done by manipulating the magnetization of a localized domain. Essentially changing the direction of the magnetic field of a single cell.
Reading data (1 bit): Done by reading the direction of the magnetic field from a single cell.
The key is that even if the head moves or the disk looses and regains power, the direction of the magnetic field in the region remains same. And this is where the persistence is achieved.
Important Note: While I have tried to make it conceptually simple, the way it is done is by keeping the change in magnetic field between two adjacent cells. But I guess, we can skip the details at this layer and below for now.
Architecture of a HDD
Don’t be surprised to find out that a HDD by itself has the following components at the least. Modern HDDs run sophisticated disk scheduling algorithms to get the maximum IO performance.
Microprocessor: Functional program of the disk. It runs the disk scheduling algorithm.
DRAM: Scratch pad for processor and buffer for the user data.
BLDC motor power driver.
SATA driver and port to connect to any other microprocessor.
Numbers to remember as System Architect/Software Engineer
Typical size of the HDDs: 1–8 TB
Throughput (Data Transfer Rate): - The sequential read/write throughput of a consumer HDD is typically in the range of 100 to 200 megabytes per second (MB/s). - Enterprise-grade HDDs may offer slightly higher sequential throughput, ranging from 150 to 250 MB/s or more.
Latency: - The average seek time for consumer HDDs is often in the range of 4 to 10 milliseconds (ms). This is the time it takes for the drive’s read/write head to move to the desired track. - Rotational latency, which is the time it takes for the target data sector to rotate under the read/write head, can add an additional 2 to 6 ms on average. - Therefore, the total latency for a consumer HDD might be in the range of 6 to 16 ms for random I/O operations. - Enterprise HDDs may have slightly lower seek and rotational latencies, but they are still relatively high compared to solid-state drives (SSDs).
IOPS (Input/Output Operations Per Second): - For random read or write operations, a typical consumer-grade HDD may offer around 50 to 200 IOPS. - Enterprise-grade HDDs designed for data centers and server environments can provide higher IOPS, often in the range of 100 to 300 IOPS or more.
Note that these numbers vary a lot for every manufacturer and the model of the disk. I have pasted sample spec of two different HDDs, one optimized for capacity and the other for performance.
Sample Specs
Cheetah is performance drive vs Barracuda is capacity drive
Latency math
T (I/O) = T (seek) + T (rotation) + T (transfer)
Summary
So How to use HDDs efficiently? - T(seek) + T(rotation) is the time required to start reading the data. For efficient use of the HDD, this needs to be minimized. - T(rotation) is relatively small and can be eliminated by purchasing the disk with high RPM (high performance disk). - T(seek) can be amortized and reduced by reading large amount of data or issuing IOs sequentially (read/write sectors physically closer to each other). - Now its easy to understand why random IOs on HDD may not have a good performance. T(seek) essentially may be required for every other IO which reduces disk performance. This is where the concept of data locality on the disk and random vs sequential IO performance originates.
It’s important to note that HDDs are relatively slower and have higher latency as compared to solid-state drives (SSDs). SSDs offer significantly higher IOPS, faster throughput, and much lower latency, which makes them a preferred choice for performance-critical applications and workloads that require rapid data access.
We will go in depth of the SSDs in the next blog.
References
All of the above information is coming from reading multiple text books, blogs, educational videos, etc. Please feel free to drop a message if you find any misleading information. Always ready to learn! I highly recommend reading the following chapter if you have some more time to explore: https://pages.cs.wisc.edu/~remzi/OSTEP/file-disks.pdf