Expand description
§Journaling for Crash Consistency.
File systems must ensure data consistency in the presence of crashes or power failures. When a system crash or power failure occurs, in-progress file operations may leave the file system in an inconsistent state, where metadata and data blocks are only partially updated. This can lead to file corruption, orphaned blocks, or even complete data loss. Thus, modern file systems must guard against these scenarios to ensure durability and recoverability.
To address this, modern file systems employ journaling. Journaling provides crash-consistency by recording intended changes to a special log (called the journal) before applying them to the main file system. In the event of a crash, the journal can be replayed to recover to a consistent state. This significantly reduces the risk of data corruption and allows faster recovery after unclean shutdowns, without the need for full file system checks.
In this approach, all intended updates, such as block allocations, inode changes, or directory modifications, are first written to a special log called the journal. Only after the log is safely persisted to disk, the actual file system structures updated. In the event of a crash, the system can replay the journal to restore a consistent state. This method provides a clear “intent before action” protocol, making recovery predictable and bounded.
§Journaling in KeOS
To explore the fundamentals of crash-consistent file systems, KeOS implements a minimal meta-data journaling mechanism using the well-known technique of write-ahead logging. This mechanism ensures that updates to file system structures are made durable and recoverable.
The journaling mechanism is anchored by a journal superblock, which
includes a commited flag. This flag indicates whether the journal area
currently holds valid, committed journal data that has not yet been
checkpointed.
Journals in KeOS structured around four key stages: Metadata updates, commit, checkpoint, and recovery.
§1. Metadata Updates
In KeOS, journaling is tightly integrated with the RunningTransaction
struct, which acts as the central abstraction for managing write-ahead
logging of file system changes. All journaled operations must be serialized
through this structure to ensure consistency.
Internally, RunningTransaction is protected by a SpinLock on the
journal superblock, enforcing global serialization of journal writes.
This design guarantees that only one transaction may be in progress at any
given time, preventing concurrent updates to the same block, which could
otherwise result in a corrupted or inconsistent state.
Crucially, KeOS uses Rust’s strong type system to enforce this safety at
compile time: without access to an active RunningTransaction, it is
impossible to write metadata blocks. All metadata modifications must be
submitted explicitly via the submit() method, which stages the changes for
journaling.
If you forget to submit modified blocks through RunningTransaction, the
kernel will panic with a clear error message, catching the issue early
and avoiding silent corruption. This design provides both safety and
transparency, making metadata updates robust and auditable.
§2. Commit Phase: RunningTransaction::commit
In the commit phase, KeOS records all pending modifications to a dedicated journal area before applying them to their actual on-disk locations:
A transaction begins with a TxBegin block, which contains a list of
logical block addresses that describe where the updates will eventually be
written. This is followed by the journal data blocks, which contain the
actual contents to be written to the specified logical blocks. Once all data
blocks have been written, a TxEnd block is appended to mark the
successful conclusion of the transaction. This write-ahead logging
discipline guarantees that no update reaches the main file system until its
full intent is safely recorded in the journal.
You can write journal blocks with JournalWriter struct. This structure
is marked with a type that represent the stages of commit phase, enforcing
you to write journal blocks in a correct order.
§3. Checkpoint Phase: Journal::checkpoint
After a transaction is fully committed, the system proceeds to
checkpoint the journal. During checkpointing, the journaled data blocks
are copied from the journal area to their final destinations in the main
file system (i.e., to the logical block addresses specified in the TxBegin
block).
Once all modified blocks have been written to their final locations, the
system clears the journal by resetting the commited flag in the journal
superblock. This indicates that the journal is no longer recovered when
crash.
In modern file systems, checkpointing is typically performed
asynchronously in the background to minimize the latency of system calls
like write() or fsync(). This allows the file system to acknowledge the
operation as complete once the journal is committed, without waiting for the
final on-disk update.
However, for simplicity in this project, checkpointing is done synchronously: the file system waits until all journaled updates are copied to their target locations before clearing the journal. This simplifies correctness, avoids the need for background threads or deferred work mechanisms, and reduces work for maintaining consistent view between disk and commited data.
§4. Recovery: Journal::recovery
If a crash occurs before the checkpointing phase completes, KeOS recovers the file system during the next boot. It begins by inspecting the journal superblock to determine whether a committed transaction exists.
If the committed flag is set and a valid TxBegin/TxEnd pair is
present, this indicates a completed transaction whose changes have not yet
been checkpointed. In this case, KeOS retries the checkpointing. If the
journal is not marked as committed, the system discards the journal
entirely. This rollback ensures consistency by ignoring partially written
or aborted transactions.
This recovery approach is both bounded and idempotent: it scans only the small, fixed-size journal area, avoiding costly full file system traversal, and it can safely retry recovery without side effects if interrupted again.
§Implementation Requirements
You need to implement the followings:
Journal::recoveryJournal::checkpointJournalWriter::<TxBegin>::write_tx_beginJournalWriter::<Block>::write_blocksJournalWriter::<TxEnd>::write_tx_end
After implement the functionalities, move on to the last section of the
KeOS.
Structs§
- Block
- Marker type for the second phase of a journal commit: writing the metadata blocks.
- Journal
- A structure representing the journal metadata used for crash consistency.
- Journal
Writer - A staged writer for committing a transaction to the journal.
- Running
Transaction - Represents an in-progress file system transaction using write-ahead journaling.
- TxBegin
- Marker type for the first phase of a journal commit: TxBegin.
- TxEnd
- Marker type for the final phase of a journal commit: TxEnd.