keos_project5/ffs/journal.rs
1//! # Journaling for Crash Consistency.
2//!
3//! File systems must ensure data consistency in the presence of crashes or
4//! power failures. When a system crash or power failure occurs, in-progress
5//! file operations may leave the file system in an inconsistent state, where
6//! metadata and data blocks are only partially updated. This can lead to file
7//! corruption, orphaned blocks, or even complete data loss. Thus, modern file
8//! systems must guard against these scenarios to ensure durability and
9//! recoverability.
10//!
11//! To address this, modern file systems employ **journaling**. Journaling
12//! provides crash-consistency by recording intended changes to a special log
13//! (called the journal) before applying them to the main file system. In the
14//! event of a crash, the journal can be replayed to recover to a consistent
15//! state. This significantly reduces the risk of data corruption and allows
16//! faster recovery after unclean shutdowns, without the need for full
17//! file system checks.
18//!
19//! In this approach, all intended updates, such as block allocations, inode
20//! changes, or directory modifications, are first written to a special log
21//! called the **journal**. Only after the log is safely persisted to disk,
22//! the actual file system structures updated. In the event of a crash, the
23//! system can replay the journal to restore a consistent state. This method
24//! provides a clear "intent before action" protocol, making recovery
25//! predictable and bounded.
26//!
27//! ## Journaling in KeOS
28//!
29//! To explore the fundamentals of crash-consistent file systems, **KeOS
30//! implements a minimal meta-data journaling mechanism** using the well-known
31//! technique of **write-ahead logging**. This mechanism ensures that
32//! updates to file system structures are made durable and recoverable.
33//!
34//! The journaling mechanism is anchored by a **journal superblock**, which
35//! includes a `commited` flag. This flag indicates whether the journal area
36//! currently holds valid, committed journal data that has not yet been
37//! checkpointed.
38//!
39//! Journals in KeOS structured around four key stages: **Metadata updates**,
40//! **commit**, **checkpoint**, and **recovery**.
41//!
42//! ### 1. Metadata Updates
43//!
44//! In KeOS, journaling is tightly integrated with the [`RunningTransaction`]
45//! struct, which acts as the central abstraction for managing write-ahead
46//! logging of file system changes. All journaled operations must be serialized
47//! through this structure to ensure consistency.
48//!
49//! Internally, [`RunningTransaction`] is protected by a `SpinLock` on the
50//! journal superblock, enforcing **global serialization** of journal writes.
51//! This design guarantees that only one transaction may be in progress at any
52//! given time, preventing concurrent updates to the same block, which could
53//! otherwise result in a corrupted or inconsistent state.
54//!
55//! Crucially, KeOS uses Rust’s strong type system to enforce this safety at
56//! compile time: without access to an active [`RunningTransaction`], it is
57//! **impossible** to write metadata blocks. All metadata modifications must be
58//! submitted explicitly via the `submit()` method, which stages the changes for
59//! journaling.
60//!
61//! If you forget to submit modified blocks through [`RunningTransaction`], the
62//! kernel will **panic** with a clear error message, catching the issue early
63//! and avoiding silent corruption. This design provides both safety and
64//! transparency, making metadata updates robust and auditable.
65//!
66//!
67//! ### 2. Commit Phase: [`RunningTransaction::commit`]
68//!
69//! In the commit phase, KeOS records all pending modifications to a dedicated
70//! **journal area** before applying them to their actual on-disk locations:
71//!
72//! A transaction begins with a **`TxBegin` block**, which contains a list of
73//! logical block addresses that describe where the updates will eventually be
74//! written. This is followed by the **journal data blocks**, which contain the
75//! actual contents to be written to the specified logical blocks. Once all data
76//! blocks have been written, a **`TxEnd` block** is appended to mark the
77//! successful conclusion of the transaction. This write-ahead logging
78//! discipline guarantees that no update reaches the main file system until its
79//! full intent is safely recorded in the journal.
80//!
81//! You can write journal blocks with [`JournalWriter`] struct. This structure
82//! is marked with a type that represent the stages of commit phase, enforcing
83//! you to write journal blocks in a correct order.
84//!
85//! ### 3. Checkpoint Phase: [`Journal::checkpoint`]
86//!
87//! After a transaction is fully committed, the system proceeds to
88//! **checkpoint** the journal. During checkpointing, the journaled data blocks
89//! are copied from the journal area to their final destinations in the main
90//! file system (i.e., to the logical block addresses specified in the `TxBegin`
91//! block).
92//!
93//! Once all modified blocks have been written to their final locations, the
94//! system clears the journal by resetting the `commited` flag in the journal
95//! superblock. This indicates that the journal is no longer recovered when
96//! crash.
97//!
98//! In modern file systems, checkpointing is typically performed
99//! **asynchronously** in the background to minimize the latency of system calls
100//! like `write()` or `fsync()`. This allows the file system to acknowledge the
101//! operation as complete once the journal is committed, without waiting for the
102//! final on-disk update.
103//!
104//! However, for simplicity in this project, **checkpointing is done
105//! synchronously**: the file system waits until all journaled updates are
106//! copied to their target locations before clearing the journal. This
107//! simplifies correctness, avoids the need for background threads or
108//! deferred work mechanisms, and reduces work for maintaining consistent view
109//! between disk and commited data.
110//!
111//!
112//! ### 4. Recovery: [`Journal::recovery`]
113//!
114//! If a crash occurs before the checkpointing phase completes, KeOS
115//! **recovers** the file system during the next boot. It begins by inspecting
116//! the journal superblock to determine whether a committed transaction exists.
117//!
118//! If the `committed` flag is set and a valid `TxBegin`/`TxEnd` pair is
119//! present, this indicates a completed transaction whose changes have not yet
120//! been checkpointed. In this case, KeOS retries the **checkpointing**. If the
121//! journal is not marked as committed, the system discards the journal
122//! entirely. This rollback ensures consistency by ignoring partially written
123//! or aborted transactions.
124//!
125//! This recovery approach is both **bounded** and **idempotent**: it scans only
126//! the small, fixed-size journal area, avoiding costly full file system
127//! traversal, and it can safely retry recovery without side effects if
128//! interrupted again.
129//!
130//! ## Implementation Requirements
131//! You need to implement the followings:
132//! - [`Journal::recovery`]
133//! - [`Journal::checkpoint`]
134//! - [`JournalWriter::<TxBegin>::write_tx_begin`]
135//! - [`JournalWriter::<Block>::write_blocks`]
136//! - [`JournalWriter::<TxEnd>::write_tx_end`]
137//!
138//! After implement the functionalities, move on to the last [`section`] of the
139//! KeOS.
140//!
141//! [`section`]: mod@crate::advanced_file_structs
142
143use crate::ffs::{
144 FastFileSystemInner, JournalIO, LogicalBlockAddress,
145 disk_layout::{JournalSb, JournalTxBegin, JournalTxEnd},
146};
147use alloc::{boxed::Box, vec::Vec};
148use core::cell::RefCell;
149use keos::{KernelError, sync::SpinLockGuard};
150
151/// A structure representing the journal metadata used for crash consistency.
152///
153/// Journaling allows the file system to recover from crashes by recording
154/// changes in a write-ahead log before committing them to the main file system.
155/// This ensures that partially written operations do not corrupt the file
156/// system state.
157///
158/// The `Journal` struct encapsulates the journaling superblock.
159/// It is responsible for managing the checkpointing process, which commits
160/// durable changes and clears completed transactions.
161///
162/// # Fields
163/// - `sb`: The journal superblock, containing configuration and state of the
164/// journal.
165pub struct Journal {
166 /// Journal superblock.
167 pub sb: Box<JournalSb>,
168}
169
170impl Journal {
171 /// Recovers and commited but not checkpointed transactions from the
172 /// journal.
173 ///
174 /// This function is invoked during file system startup to ensure
175 /// metadata consistency in the event of a system crash or power failure.
176 /// It scans the on-disk journal area for valid transactions and re-applies
177 /// them to the file system metadata.
178 ///
179 /// If no complete transaction is detected, the journal is left unchanged.
180 /// If a partial or corrupt transaction is found, it is safely discarded.
181 ///
182 /// # Parameters
183 /// - `ffs`: A reference to the core file system state, used to apply
184 /// recovered metadata.
185 /// - `io`: The journal I/O interface used to read journal blocks and
186 /// perform recovery writes.
187 ///
188 /// # Returns
189 /// - `Ok(())` if recovery completed successfully or no action was needed.
190 /// - `Err(KernelError)` if an unrecoverable error occurred during recovery.
191 pub fn recovery(
192 &mut self,
193 ffs: &FastFileSystemInner,
194 io: &JournalIO,
195 ) -> Result<(), KernelError> {
196 todo!()
197 }
198
199 /// Commits completed journal transactions to the file system.
200 ///
201 /// This method performs the **checkpoint** operation: it flushes completed
202 /// transactions from the journal into the main file system, ensuring their
203 /// effects are permanently recorded.
204 ///
205 /// # Parameters
206 /// - `ffs`: A reference to the file system core (`FastFileSystemInner`),
207 /// needed to apply changes to metadata blocks.
208 /// - `io`: An object for performing I/O operations related to the journal.
209 /// - `debug_journal`: If true, enables debug logging for checkpointing.
210 ///
211 /// # Returns
212 /// - `Ok(())`: If checkpointing succeeds and all transactions are flushed.
213 /// - `Err(KernelError)`: If I/O or consistency errors are encountered.
214 pub fn checkpoint(
215 &mut self,
216 ffs: &FastFileSystemInner,
217 io: &JournalIO,
218 debug_journal: bool,
219 ) -> Result<(), KernelError> {
220 if self.sb.commited != 0 {
221 let mut block = Box::new([0; 4096]);
222 let tx_begin = JournalTxBegin::from_io(io, ffs.journal().start + 1)?;
223 if debug_journal {
224 println!("[FFS-Journal]: Transaction #{} [", tx_begin.tx_id);
225 }
226 for (idx, slot) in tx_begin.lbas.iter().enumerate() {
227 if let Some(slot) = slot {
228 if debug_journal {
229 println!("[FFS-Journal]: #{:04}: {:?},", idx, slot);
230 }
231 todo!();
232 } else {
233 break;
234 }
235 }
236 if debug_journal {
237 println!("[FFS-Journal]: ] Checkpointed.");
238 }
239 self.sb.commited = 0;
240 self.sb.writeback(io, ffs)?;
241 }
242 Ok(())
243 }
244}
245
246/// Represents an in-progress file system transaction using write-ahead
247/// journaling.
248///
249/// A `RunningTransaction` buffers metadata updates to disk blocks before they
250/// are permanently written, ensuring crash consistency. When a transaction is
251/// committed, the buffered blocks are flushed to the journal area first. Once
252/// the journal write completes, the updates are applied to the actual metadata
253/// locations on disk.
254///
255/// Transactions are used to group file system changes atomically — either all
256/// updates in a transaction are committed, or none are, preventing partial
257/// updates.
258///
259/// # Fields
260/// - `tx`: A buffer that stores staged metadata writes as a list of (LBA, data)
261/// tuples.
262/// - `journal`: A locked handle to the global `Journal`, used during commit.
263/// - `tx_id`: Unique identifier for the current transaction.
264/// - `io`: The journal I/O interface used for block-level reads/writes.
265/// - `debug_journal`: Enables logging of journal operations for debugging.
266/// - `ffs`: A reference to the file system's core structure.
267pub struct RunningTransaction<'a> {
268 tx: RefCell<Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>>,
269 journal: Option<SpinLockGuard<'a, Journal>>,
270 tx_id: u64,
271 io: Option<JournalIO<'a>>,
272 debug_journal: bool,
273 pub ffs: &'a FastFileSystemInner,
274}
275
276impl<'a> RunningTransaction<'a> {
277 /// Begins a new journaled transaction.
278 ///
279 /// Initializes the transaction state and prepares to buffer metadata
280 /// writes.
281 ///
282 /// # Parameters
283 /// - `name`: A label for the transaction, useful for debugging.
284 /// - `ffs`: The file system core structure.
285 /// - `io`: The journal I/O interface for block operations.
286 /// - `debug_journal`: Enables verbose logging if set to `true`.
287 #[inline]
288 pub fn begin(
289 name: &str,
290 ffs: &'a FastFileSystemInner,
291 io: JournalIO<'a>,
292 debug_journal: bool,
293 ) -> Self {
294 let mut journal = ffs.journal.as_ref().map(|journal| journal.lock());
295 let tx_id = journal
296 .as_mut()
297 .map(|j| {
298 let tx_id = j.sb.tx_id;
299 j.sb.tx_id += 1;
300 tx_id
301 })
302 .unwrap_or(0);
303 if debug_journal && journal.is_some() {
304 println!("[FFS-Journal]: Transaction #{} \"{}\" [", tx_id, name);
305 }
306 RunningTransaction {
307 tx: RefCell::new(Vec::new()),
308 journal,
309 io: Some(io),
310 tx_id,
311 debug_journal,
312 ffs,
313 }
314 }
315
316 /// Buffers a metadata block modification for inclusion in the transaction.
317 ///
318 /// The actual write is deferred until `commit()` is called.
319 ///
320 /// # Parameters
321 /// - `lba`: The logical block address where the metadata will eventually be
322 /// written.
323 /// - `data`: A boxed page of data representing the new metadata contents.
324 /// - `ty`: A type string name of the metadata (for debugging).
325 #[inline]
326 pub fn write_meta(&self, lba: LogicalBlockAddress, data: Box<[u8; 4096]>, ty: &str) {
327 if self.debug_journal {
328 println!(
329 "[FFS-Journal]: #{:04}: {:20} - {:?},",
330 self.tx.borrow_mut().len(),
331 ty.split(":").last().unwrap_or("?"),
332 lba
333 );
334 }
335 self.tx.borrow_mut().push((lba, data));
336 }
337
338 /// Commits the transaction to the journal and applies changes to disk.
339 ///
340 /// This method performs the following steps:
341 /// 1. Writes all staged metadata blocks to the journal region on disk.
342 /// 2. Updates the journal superblock.
343 /// 3. Checkpoint the journal.
344 ///
345 /// # Returns
346 /// - `Ok(())`: If the transaction was successfully committed and
347 /// checkpointed.
348 /// - `Err(KernelError)`: If an I/O or consistency error occurred.
349 pub fn commit(mut self) -> Result<(), KernelError> {
350 // In real filesystem, there exist more optimizations to reduce disk I/O, such
351 // as merging the same LBA in a journal into one block.
352 let (io, tx, journal, tx_id, ffs, debug_journal) = (
353 self.io.take().unwrap(),
354 core::mem::take(&mut *self.tx.borrow_mut()),
355 self.journal.take(),
356 self.tx_id,
357 self.ffs,
358 self.debug_journal,
359 );
360
361 if let Some(journal) = journal {
362 if debug_journal {
363 println!("[FFS-Journal]: ] Commited.");
364 }
365 let (mut journal, io) = JournalWriter::new(tx, journal, io, ffs, tx_id)
366 .write_tx_begin()?
367 .write_blocks()?
368 .write_tx_end()?;
369
370 // In real file system, the checkpointing works asynchronously by the kernel
371 // thread.
372 //
373 // However, to keep the implementation simple, synchronously checkpoints the
374 // journaled update right after the commit.
375 let result = journal.checkpoint(ffs, &io, debug_journal);
376 journal.unlock();
377 result
378 } else {
379 // When a journaling is not supported, write the metadata directly on the
380 // locations.
381 for (lba, block) in tx.into_iter() {
382 io.write_metadata_block(lba, block.as_array().unwrap())?;
383 }
384 Ok(())
385 }
386 }
387}
388
389impl Drop for RunningTransaction<'_> {
390 fn drop(&mut self) {
391 if let Some(journal) = self.journal.take() {
392 journal.unlock();
393 }
394 }
395}
396
397/// Marker type for the first phase of a journal commit: TxBegin.
398///
399/// Used with [`JournalWriter`] to enforce commit stage ordering via the type
400/// system.
401pub struct TxBegin {}
402
403/// Marker type for the second phase of a journal commit: writing the metadata
404/// blocks.
405///
406/// Ensures that [`JournalWriter::write_tx_begin`] must be called before
407/// [`JournalWriter::write_blocks`].
408pub struct Block {}
409
410/// Marker type for the final phase of a journal commit: TxEnd.
411///
412/// Ensures that [`JournalWriter::write_blocks`] are completed before finalizing
413/// the transaction.
414pub struct TxEnd {}
415
416/// A staged writer for committing a transaction to the journal.
417///
418/// `JournalWriter` uses a type-state pattern to enforce the correct sequence of
419/// journal writes:
420/// - `JournalWriter<TxBegin>`: Can only call [`JournalWriter::write_tx_begin`].
421/// - `JournalWriter<Block>`: Can only call [`JournalWriter::write_blocks`].
422/// - `JournalWriter<TxEnd>`: Can only call [`JournalWriter::write_tx_end`].
423///
424/// This staged API ensures that transactions are written in the correct order
425/// and prevents accidental misuse.
426pub struct JournalWriter<'a, WriteTarget> {
427 /// Staged list of (LBA, data) pairs representing metadata blocks to commit.
428 tx: Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>,
429
430 /// A lock-protected handle to the journal structure.
431 journal: SpinLockGuard<'a, Journal>,
432
433 /// I/O interface for reading/writing disk blocks.
434 io: JournalIO<'a>,
435
436 /// Reference to the filesystem's core state.
437 ffs: &'a FastFileSystemInner,
438
439 /// Unique identifier of the transaction.
440 tx_id: u64,
441
442 /// Internal index tracking progress through `tx`.
443 index: usize,
444
445 /// Phantom data used to track the current commit stage.
446 _write_target: core::marker::PhantomData<WriteTarget>,
447}
448
449impl<'a> JournalWriter<'a, TxBegin> {
450 /// Creates a new `JournalWriter` in the initial `TxBegin` stage.
451 ///
452 /// This prepares the writer for the staged commit sequence of the given
453 /// transaction.
454 ///
455 /// # Parameters
456 /// - `tx`: The list of metadata blocks to be written.
457 /// - `journal`: A locked handle to the global journal state.
458 /// - `io`: The disk I/O interface.
459 /// - `ffs`: A reference to the file system.
460 /// - `tx_id`: A unique ID assigned to the transaction.
461 ///
462 /// # Returns
463 /// A `JournalWriter` instance in the `TxBegin` state.
464 pub fn new(
465 tx: Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>,
466 journal: SpinLockGuard<'a, Journal>,
467 io: JournalIO<'a>,
468 ffs: &'a FastFileSystemInner,
469 tx_id: u64,
470 ) -> Self {
471 Self {
472 tx,
473 journal,
474 io,
475 ffs,
476 tx_id,
477 index: 0,
478 _write_target: core::marker::PhantomData,
479 }
480 }
481
482 /// Writes the `TxBegin` marker to the journal.
483 ///
484 /// This signals the start of a journaled transaction. Must be called before
485 /// writing the data blocks.
486 ///
487 /// # Returns
488 /// A `JournalWriter` in the `Block` stage.
489 pub fn write_tx_begin(mut self) -> Result<JournalWriter<'a, Block>, KernelError> {
490 let mut tx_begin = JournalTxBegin::new(self.tx_id);
491 todo!();
492 Ok(JournalWriter {
493 tx: self.tx,
494 journal: self.journal,
495 ffs: self.ffs,
496 io: self.io,
497 tx_id: self.tx_id,
498 index: self.index,
499 _write_target: core::marker::PhantomData,
500 })
501 }
502}
503
504impl<'a> JournalWriter<'a, Block> {
505 /// Writes all staged metadata blocks to the journal.
506 ///
507 /// Each block is written sequentially to a dedicated journal area.
508 /// This must be called after `write_tx_begin()` and before finalizing with
509 /// `write_tx_end()`.
510 ///
511 /// # Returns
512 /// A `JournalWriter` in the `TxEnd` stage.
513 pub fn write_blocks(mut self) -> Result<JournalWriter<'a, TxEnd>, KernelError> {
514 todo!();
515 Ok(JournalWriter {
516 tx: self.tx,
517 journal: self.journal,
518 ffs: self.ffs,
519 io: self.io,
520 tx_id: self.tx_id,
521 index: self.index,
522 _write_target: core::marker::PhantomData,
523 })
524 }
525}
526
527impl<'a> JournalWriter<'a, TxEnd> {
528 /// Writes the `TxEnd` and completes the transaction by updating journal
529 /// superblock.
530 ///
531 /// This signals a successfully completed transaction and allows recovery
532 /// mechanisms to apply the journal contents to the actual file system
533 /// metadata.
534 ///
535 /// # Returns
536 /// - The locked journal and I/O handle, to checkpoint the journal.
537 /// - `Err(KernelError)` if the final commit stage fails.
538 pub fn write_tx_end(
539 mut self,
540 ) -> Result<(SpinLockGuard<'a, Journal>, JournalIO<'a>), KernelError> {
541 let tx_end = JournalTxEnd::new(self.tx_id);
542 // In the real-file system, this TxEnd block usally omitted to reduce the disk
543 // I/O.
544 todo!();
545
546 // Mark the Transaction is commited to the JournalSb.
547 let Self {
548 mut journal,
549 io,
550 ffs,
551 ..
552 } = self;
553 journal.sb.commited = 1;
554 match journal.sb.writeback(&io, ffs) {
555 Ok(_) => Ok((journal, io)),
556 Err(e) => {
557 journal.unlock();
558 Err(e)
559 }
560 }
561 }
562}