6 min readDec 3, 2024

When working with databases, it’s important to make sure data is handled safely and reliably. That’s where the ACID principles come in: Atomicity, Consistency, Isolation, and Durability. These principles help ensure that database transactions are processed correctly, even if something goes wrong, like a crash or power failure. In this post, we’ll take a closer look at each of these principles and why they’re so important for building reliable systems.

select * from posts;

| id  | title           | user_id |
| - - | - - - - - - - - | - - - - |
| 1   | Some Post Title | 2       |

select * from users;

| id  | username       |
| - - | - - - - - - - -|
| 1   | john_appleseed |

In this case, the database’s consistency is violated because the `user_id` value in the posts table references a non-existent user. Without a user with an ID of 2 in the `users` table, this foreign key relationship is broken, leading to an inconsistent state.

One way to maintain consistency in such situations is by using features like cascading deletes. For instance, if a user is deleted from the users table, a cascading delete can automatically remove all associated posts in the posts table, preserving data integrity.

Consistency plays a critical role in preventing errors and ensuring the data reflects real-world relationships and constraints. By enforcing these rules, the database guarantees that every transaction moves the database from one valid state to another, never leaving it in a partially valid or invalid state.

Consistency in Reads

In scaled database architectures, maintaining consistency in reads becomes more challenging. When databases are scaled horizontally with replicas, data is often copied across multiple servers to handle higher read loads. However, this introduces the risk of read inconsistencies because replicas may not be updated simultaneously. For example, if a write operation updates the primary server but the change hasn’t propagated to the replicas yet, a query to one of the replicas could return outdated data.

Eventual Consistency

Eventual consistency is commonly used in distributed systems where performance and availability are prioritized over immediate consistency. It allows replicas to process updates asynchronously, which reduces latency and improves system scalability. However, this comes with the trade-off that reads might temporarily return stale data.

Isolation

Isolation ensures that transactions operate independently of one another, preventing unintended interactions that could compromise data integrity. It controls how and when the changes made by one transaction become visible to others, thereby safeguarding against common read phenomena that can occur when multiple transactions access the same data simultaneously.

Read Phenomena

1. Dirty Reads: This occurs when a transaction reads uncommitted changes made by another transaction. If the other transaction rolls back, the data read might no longer be valid, leading to inconsistencies.

2. Non-Repeatable Reads: In this scenario, a transaction reads the same value multiple times, but the value changes during the transaction because another one modifies it.

3. Phantom Reads: This happens during range queries when new records are added or existing ones are deleted by another transaction after an initial read, causing subsequent reads to return different results.

4. Lost Updates: This occurs when two transactions update the same data simultaneously, but one transaction’s updates overwrite or negate the other’s without knowledge of the conflict. For example, if one transaction adds 5 and another adds 10, the first addition might be “lost” if not handled properly.

Isolation Levels

Isolation is managed through various levels, which define the degree to which one transaction is isolated from the effects of others:

Read Uncommitted: No isolation is enforced. Transactions can read any changes, whether committed or uncommitted, leading to potential dirty reads.

2. Read Committed: A transaction only sees changes that have been committed by other transactions. Prevents dirty reads but does not address non-repeatable or phantom reads.

3. Repeatable Read: Ensures that if a row is read once in a transaction, it will not change until the transaction ends. Prevents dirty and non-repeatable reads but not phantom reads.

4. Snapshot: A snapshot of the database is taken at the start of the transaction, and all operations are performed on this consistent view. Prevents dirty, non-repeatable, and phantom reads for the snapshot’s duration.

5. Serializable: Provides the highest level of isolation by ensuring that transactions are executed in a fully sequential manner as if they were processed one at a time. Prevents all read phenomena but comes at the cost of concurrency and performance.

Isolation Implementation in DBMS

Database systems implement isolation levels using two primary strategies:

1. Pessimistic Concurrency Control: Uses locks (row, table, or page locks) to prevent other transactions from accessing the data being modified. Ensures high isolation but may lead to deadlocks or reduced concurrency.

2. Optimistic Concurrency Control: Transactions operate without locks, but conflicts are checked at commit time. If a conflict is detected, one of the transactions fails and must be retried.

3. Combination: For example, Repeatable Read often uses row locks to prevent other transactions from modifying data that has already been read.

Isolation levels primarily affect how one transaction interacts with other transactions. A transaction always reads its own writes, regardless of the isolation level. Also, different database management systems (DBMS) implement these isolation levels differently, and nuances in their behavior should be considered when designing an application.

Durability

Durability ensures that once a transaction is committed, its changes are permanently saved and will persist even in the event of a system crash, power failure, or other unexpected events. This property guarantees that the database remains consistent and no committed data is lost, providing a critical layer of reliability.

To achieve durability, database systems employ techniques that ensure changes are recorded in non-volatile storage, such as disks or SSDs. Let’s explore some of the common durability mechanisms.

Durability Techniques

Write-Ahead Logging (WAL)

- Changes made during a transaction are first written to a write-ahead log, which is stored in durable storage.

- The log records the deltas (changes) rather than directly updating the data on disk. Once the changes are safely logged, the transaction is considered committed.

- WAL ensures that even if a crash occurs, the database can recover by replaying the logged changes to bring the system back to a consistent state.

- This approach minimizes disk I/O, as logging small deltas is less expensive than updating the entire dataset.

Async Snapshots

- Periodic snapshots of the database are taken and written to durable storage asynchronously.

- While WAL tracks incremental changes, snapshots provide a full backup of the database at specific intervals. This combination ensures both recent and historical durability.

Append-Only File (AOF)

- In this approach, all changes are appended to a file in sequential order.

- Unlike WAL, AOF keeps a record of every change made to the database, allowing for replaying these operations during recovery. This can provide finer granularity but may require more storage.

OS Cache

Modern operating systems use a cache to improve write performance. When a database issues a write request, the data is first written to the OS cache (memory) and then flushed to disk in batches. While this speeds up operations, it introduces a risk: if a crash occurs before the cache is flushed, the changes may be lost, compromising durability.

fsync Command

- The `fsync` system call forces the OS to write data directly from the cache to disk immediately, bypassing deferred writes.

- While this ensures durability, it is slower and introduces latency due to the overhead of frequent disk writes.

Batch Writing

- Many systems balance durability and performance by batching writes to disk, reducing the number of `fsync` calls while still maintaining a reasonable level of safety.

Conclusion

The ACID principles — Atomicity, Consistency, Isolation, and Durability — serve as the foundation for reliable and robust database systems. Each principle addresses a specific challenge in handling transactions, ensuring that data integrity is maintained even in complex, concurrent, or failure-prone environments. From grouping operations into atomic transactions to enforcing rules for consistency, managing concurrency with isolation levels, and safeguarding data with durability mechanisms, ACID provides a roadmap for building trustworthy systems.

Understanding these principles is crucial for developers, architects, and database administrators who strive to design systems that can scale, perform, and handle failures gracefully. Whether you’re optimizing for high performance or prioritizing absolute reliability, ACID principles provide the flexibility to meet diverse requirements. By mastering them, you can ensure your database remains the backbone of a dependable and resilient application.