keos_project5/ffs/journal.rs
1//! # Journaling for Crash Consistency.
2//!
3//! File systems must ensure data consistency in the presence of crashes or
4//! power failures. When a system crash or power failure occurs, in-progress
5//! file operations may leave the file system in an inconsistent state, where
6//! metadata and data blocks are only partially updated. This can lead to file
7//! corruption, orphaned blocks, or even complete data loss. Thus, modern file
8//! systems must guard against these scenarios to ensure durability and
9//! recoverability.
10//!
11//! To address this, modern file systems employ **journaling**. Journaling
12//! provides crash-consistency by recording intended changes to a special log
13//! (called the journal) before applying them to the main file system. In the
14//! event of a crash, the journal can be replayed to recover to a consistent
15//! state. This significantly reduces the risk of data corruption and allows
16//! faster recovery after unclean shutdowns, without the need for full
17//! file system checks.
18//!
19//! In this approach, all intended updates, such as block allocations, inode
20//! changes, or directory modifications, are first written to a special log
21//! called the **journal**. Only after the log is safely persisted to disk,
22//! the actual file system structures updated. In the event of a crash, the
23//! system can replay the journal to restore a consistent state. This method
24//! provides a clear "intent before action" protocol, making recovery
25//! predictable and bounded.
26//!
27//! ## Journaling in KeOS
28//!
29//! To explore the fundamentals of crash-consistent file systems, **KeOS
30//! implements a minimal meta-data journaling mechanism** using the well-known
31//! technique of **write-ahead logging**. This mechanism ensures that
32//! updates to file system structures are made durable and recoverable.
33//!
34//! The journaling mechanism is anchored by a **journal superblock**, which
35//! includes a `commited` flag. This flag indicates whether the journal area
36//! currently holds valid, committed journal data that has not yet been
37//! checkpointed.
38//!
39//! Journals in KeOS structured around four key stages: **Metadata updates**,
40//! **commit**, **checkpoint**, and **recovery**.
41//!
42//! ### 1. Metadata Updates
43//!
44//! In KeOS, journaling is tightly integrated with the [`RunningTransaction`]
45//! struct, which acts as the central abstraction for managing write-ahead
46//! logging of file system changes. All journaled operations must be serialized
47//! through this structure to ensure consistency.
48//!
49//! Internally, [`RunningTransaction`] is protected by a `SpinLock` on the
50//! journal superblock, enforcing **global serialization** of journal writes.
51//! This design guarantees that only one transaction may be in progress at any
52//! given time, preventing concurrent updates to the same block, which could
53//! otherwise result in a corrupted or inconsistent state.
54//!
55//! Crucially, KeOS uses Rust’s strong type system to enforce this safety at
56//! compile time: without access to an active [`RunningTransaction`], it is
57//! **impossible** to write metadata blocks. All metadata modifications must be
58//! submitted explicitly via the `submit()` method, which stages the changes for
59//! journaling.
60//!
61//! If you forget to submit modified blocks through [`RunningTransaction`], the
62//! kernel will **panic** with a clear error message, catching the issue early
63//! and avoiding silent corruption. This design provides both safety and
64//! transparency, making metadata updates robust and auditable.
65//!
66//!
67//! ### 2. Commit Phase: [`RunningTransaction::commit`]
68//!
69//! In the commit phase, KeOS records all pending modifications to a dedicated
70//! **journal area** before applying them to their actual on-disk locations:
71//!
72//! A transaction begins with a **`TxBegin` block**, which contains a list of
73//! logical block addresses that describe where the updates will eventually be
74//! written. This is followed by the **journal data blocks**, which contain the
75//! actual contents to be written to the specified logical blocks. Once all data
76//! blocks have been written, a **`TxEnd` block** is appended to mark the
77//! successful conclusion of the transaction. This write-ahead logging
78//! discipline guarantees that no update reaches the main file system until its
79//! full intent is safely recorded in the journal.
80//!
81//! You can write journal blocks with [`JournalWriter`] struct. This structure
82//! is marked with a type that represent the stages of commit phase, enforcing
83//! you to write journal blocks in a correct order.
84//!
85//! ### 3. Checkpoint Phase: [`Journal::checkpoint`]
86//!
87//! After a transaction is fully committed, the system proceeds to
88//! **checkpoint** the journal. During checkpointing, the journaled data blocks
89//! are copied from the journal area to their final destinations in the main
90//! file system (i.e., to the logical block addresses specified in the `TxBegin`
91//! block).
92//!
93//! Once all modified blocks have been written to their final locations, the
94//! system clears the journal by resetting the `commited` flag in the journal
95//! superblock. This indicates that the journal is no longer recovered when
96//! crash.
97//!
98//! In modern file systems, checkpointing is typically performed
99//! **asynchronously** in the background to minimize the latency of system calls
100//! like `write()` or `fsync()`. This allows the file system to acknowledge the
101//! operation as complete once the journal is committed, without waiting for the
102//! final on-disk update.
103//!
104//! However, for simplicity in this project, **checkpointing is done
105//! synchronously**: the file system waits until all journaled updates are
106//! copied to their target locations before clearing the journal. This
107//! simplifies correctness, avoids the need for background threads or
108//! deferred work mechanisms, and reduces work for maintaining consistent view
109//! between disk and commited data.
110//!
111//!
112//! ### 4. Recovery: [`Journal::recovery`]
113//!
114//! If a crash occurs before the checkpointing phase completes, KeOS
115//! **recovers** the file system during the next boot. It begins by inspecting
116//! the journal superblock to determine whether a committed transaction exists.
117//!
118//! If the `committed` flag is set and a valid `TxBegin`/`TxEnd` pair is
119//! present, this indicates a completed transaction whose changes have not yet
120//! been checkpointed. In this case, KeOS retries the **checkpointing**. If the
121//! journal is not marked as committed, the system discards the journal
122//! entirely. This rollback ensures consistency by ignoring partially written
123//! or aborted transactions.
124//!
125//! This recovery approach is both **bounded** and **idempotent**: it scans only
126//! the small, fixed-size journal area, avoiding costly full file system
127//! traversal, and it can safely retry recovery without side effects if
128//! interrupted again.
129//!
130//! ## Implementation Requirements
131//! You need to implement the followings:
132//! - [`Journal::recovery`]
133//! - [`Journal::checkpoint`]
134//! - [`JournalWriter::<TxBegin>::write_tx_begin`]
135//! - [`JournalWriter::<Block>::write_blocks`]
136//! - [`JournalWriter::<TxEnd>::write_tx_end`]
137//!
138//! After implement the functionalities, move on to the last [`section`] of the
139//! KeOS.
140//!
141//! [`section`]: mod@crate::advanced_file_structs
142
143use crate::ffs::{
144 FastFileSystemInner, JournalIO, LogicalBlockAddress,
145 disk_layout::{JournalSb, JournalTxBegin, JournalTxEnd},
146};
147use alloc::{boxed::Box, vec::Vec};
148use core::cell::RefCell;
149use keos::{KernelError, sync::SpinLockGuard};
150
151/// A structure representing the journal metadata used for crash consistency.
152///
153/// Journaling allows the file system to recover from crashes by recording
154/// changes in a write-ahead log before committing them to the main file system.
155/// This ensures that partially written operations do not corrupt the file
156/// system state.
157///
158/// The `Journal` struct encapsulates the journaling superblock and the total
159/// size of the journal region on disk. It is responsible for managing the
160/// checkpointing process, which commits durable changes and clears completed
161/// transactions.
162///
163/// # Fields
164/// - `sb`: The journal superblock, containing configuration and state of the
165/// journal.
166/// - `size`: The total number of blocks allocated for the journal region.
167pub struct Journal {
168 /// Journal superblock.
169 pub sb: Box<JournalSb>,
170}
171
172impl Journal {
173 /// Recovers and commited but not checkpointed transactions from the
174 /// journal.
175 ///
176 /// This function is invoked during file system startup to ensure
177 /// metadata consistency in the event of a system crash or power failure.
178 /// It scans the on-disk journal area for valid transactions and re-applies
179 /// them to the file system metadata.
180 ///
181 /// If no complete transaction is detected, the journal is left unchanged.
182 /// If a partial or corrupt transaction is found, it is safely discarded.
183 ///
184 /// # Parameters
185 /// - `ffs`: A reference to the core file system state, used to apply
186 /// recovered metadata.
187 /// - `io`: The journal I/O interface used to read journal blocks and
188 /// perform recovery writes.
189 ///
190 /// # Returns
191 /// - `Ok(())` if recovery completed successfully or no action was needed.
192 /// - `Err(KernelError)` if an unrecoverable error occurred during recovery.
193 pub fn recovery(
194 &mut self,
195 ffs: &FastFileSystemInner,
196 io: &JournalIO,
197 ) -> Result<(), KernelError> {
198 todo!()
199 }
200
201 /// Commits completed journal transactions to the file system.
202 ///
203 /// This method performs the **checkpoint** operation: it flushes completed
204 /// transactions from the journal into the main file system, ensuring their
205 /// effects are permanently recorded.
206 ///
207 /// # Parameters
208 /// - `ffs`: A reference to the file system core (`FastFileSystemInner`),
209 /// needed to apply changes to metadata blocks.
210 /// - `io`: An object for performing I/O operations related to the journal.
211 /// - `debug_journal`: If true, enables debug logging for checkpointing.
212 ///
213 /// # Returns
214 /// - `Ok(())`: If checkpointing succeeds and all transactions are flushed.
215 /// - `Err(KernelError)`: If I/O or consistency errors are encountered.
216 pub fn checkpoint(
217 &mut self,
218 ffs: &FastFileSystemInner,
219 io: &JournalIO,
220 debug_journal: bool,
221 ) -> Result<(), KernelError> {
222 if self.sb.commited != 0 {
223 let mut block = Box::new([0; 4096]);
224 let tx_begin = JournalTxBegin::from_io(io, ffs.journal().start + 1)?;
225 if debug_journal {
226 println!("[FFS-Journal]: Transaction #{} [", tx_begin.tx_id);
227 }
228 for (idx, slot) in tx_begin.lbas.iter().enumerate() {
229 if let Some(slot) = slot {
230 if debug_journal {
231 println!("[FFS-Journal]: #{:04}: {:?},", idx, slot);
232 }
233 todo!();
234 } else {
235 break;
236 }
237 }
238 if debug_journal {
239 println!("[FFS-Journal]: ] Checkpointed.");
240 }
241 self.sb.commited = 0;
242 self.sb.writeback(io, ffs)?;
243 }
244 Ok(())
245 }
246}
247
248/// Represents an in-progress file system transaction using write-ahead
249/// journaling.
250///
251/// A `RunningTransaction` buffers metadata updates to disk blocks before they
252/// are permanently written, ensuring crash consistency. When a transaction is
253/// committed, the buffered blocks are flushed to the journal area first. Once
254/// the journal write completes, the updates are applied to the actual metadata
255/// locations on disk.
256///
257/// Transactions are used to group file system changes atomically — either all
258/// updates in a transaction are committed, or none are, preventing partial
259/// updates.
260///
261/// # Fields
262/// - `tx`: A buffer that stores staged metadata writes as a list of (LBA, data)
263/// tuples.
264/// - `journal`: A locked handle to the global `Journal`, used during commit.
265/// - `tx_id`: Unique identifier for the current transaction.
266/// - `io`: The journal I/O interface used for block-level reads/writes.
267/// - `debug_journal`: Enables logging of journal operations for debugging.
268/// - `ffs`: A reference to the file system's core structure.
269pub struct RunningTransaction<'a> {
270 tx: RefCell<Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>>,
271 journal: Option<SpinLockGuard<'a, Journal>>,
272 tx_id: u64,
273 io: Option<JournalIO<'a>>,
274 debug_journal: bool,
275 pub ffs: &'a FastFileSystemInner,
276}
277
278impl<'a> RunningTransaction<'a> {
279 /// Begins a new journaled transaction.
280 ///
281 /// Initializes the transaction state and prepares to buffer metadata
282 /// writes.
283 ///
284 /// # Parameters
285 /// - `name`: A label for the transaction, useful for debugging.
286 /// - `ffs`: The file system core structure.
287 /// - `io`: The journal I/O interface for block operations.
288 /// - `debug_journal`: Enables verbose logging if set to `true`.
289 #[inline]
290 pub fn begin(
291 name: &str,
292 ffs: &'a FastFileSystemInner,
293 io: JournalIO<'a>,
294 debug_journal: bool,
295 ) -> Self {
296 let mut journal = ffs.journal.as_ref().map(|journal| journal.lock());
297 let tx_id = journal
298 .as_mut()
299 .map(|j| {
300 let tx_id = j.sb.tx_id;
301 j.sb.tx_id += 1;
302 tx_id
303 })
304 .unwrap_or(0);
305 if debug_journal && journal.is_some() {
306 println!("[FFS-Journal]: Transaction #{} \"{}\" [", tx_id, name);
307 }
308 RunningTransaction {
309 tx: RefCell::new(Vec::new()),
310 journal,
311 io: Some(io),
312 tx_id,
313 debug_journal,
314 ffs,
315 }
316 }
317
318 /// Buffers a metadata block modification for inclusion in the transaction.
319 ///
320 /// The actual write is deferred until `commit()` is called.
321 ///
322 /// # Parameters
323 /// - `lba`: The logical block address where the metadata will eventually be
324 /// written.
325 /// - `data`: A boxed page of data representing the new metadata contents.
326 /// - `ty`: A type string name of the metadata (for debugging).
327 #[inline]
328 pub fn write_meta(&self, lba: LogicalBlockAddress, data: Box<[u8; 4096]>, ty: &str) {
329 if self.debug_journal {
330 println!(
331 "[FFS-Journal]: #{:04}: {:20} - {:?},",
332 self.tx.borrow_mut().len(),
333 ty.split(":").last().unwrap_or("?"),
334 lba
335 );
336 }
337 self.tx.borrow_mut().push((lba, data));
338 }
339
340 /// Commits the transaction to the journal and applies changes to disk.
341 ///
342 /// This method performs the following steps:
343 /// 1. Writes all staged metadata blocks to the journal region on disk.
344 /// 2. Updates the journal superblock.
345 /// 3. Checkpoint the journal.
346 ///
347 /// # Returns
348 /// - `Ok(())`: If the transaction was successfully committed and
349 /// checkpointed.
350 /// - `Err(KernelError)`: If an I/O or consistency error occurred.
351 pub fn commit(mut self) -> Result<(), KernelError> {
352 // In real filesystem, there exist more optimizations to reduce disk I/O, such
353 // as merging the same LBA in a journal into one block.
354 let (io, tx, journal, tx_id, ffs, debug_journal) = (
355 self.io.take().unwrap(),
356 core::mem::take(&mut *self.tx.borrow_mut()),
357 self.journal.take(),
358 self.tx_id,
359 self.ffs,
360 self.debug_journal,
361 );
362
363 if let Some(journal) = journal {
364 if debug_journal {
365 println!("[FFS-Journal]: ] Commited.");
366 }
367 let (mut journal, io) = JournalWriter::new(tx, journal, io, ffs, tx_id)
368 .write_tx_begin()?
369 .write_blocks()?
370 .write_tx_end()?;
371
372 // In real file system, the checkpointing works asynchronously by the kernel
373 // thread.
374 //
375 // However, to keep the implementation simple, synchronously checkpoints the
376 // journaled update right after the commit.
377 let result = journal.checkpoint(ffs, &io, debug_journal);
378 journal.unlock();
379 result
380 } else {
381 // When a journaling is not supported, write the metadata directly on the
382 // locations.
383 for (lba, block) in tx.into_iter() {
384 io.write_metadata_block(lba, block.as_array().unwrap())?;
385 }
386 Ok(())
387 }
388 }
389}
390
391impl Drop for RunningTransaction<'_> {
392 fn drop(&mut self) {
393 if let Some(journal) = self.journal.take() {
394 journal.unlock();
395 }
396 }
397}
398
399/// Marker type for the first phase of a journal commit: TxBegin.
400///
401/// Used with [`JournalWriter`] to enforce commit stage ordering via the type
402/// system.
403pub struct TxBegin {}
404
405/// Marker type for the second phase of a journal commit: writing the metadata
406/// blocks.
407///
408/// Ensures that [`JournalWriter::write_tx_begin`] must be called before
409/// [`JournalWriter::write_blocks`].
410pub struct Block {}
411
412/// Marker type for the final phase of a journal commit: TxEnd.
413///
414/// Ensures that [`JournalWriter::write_blocks`] are completed before finalizing
415/// the transaction.
416pub struct TxEnd {}
417
418/// A staged writer for committing a transaction to the journal.
419///
420/// `JournalWriter` uses a type-state pattern to enforce the correct sequence of
421/// journal writes:
422/// - `JournalWriter<TxBegin>`: Can only call [`JournalWriter::write_tx_begin`].
423/// - `JournalWriter<Block>`: Can only call [`JournalWriter::write_blocks`].
424/// - `JournalWriter<TxEnd>`: Can only call [`JournalWriter::write_tx_end`].
425///
426/// This staged API ensures that transactions are written in the correct order
427/// and prevents accidental misuse.
428pub struct JournalWriter<'a, WriteTarget> {
429 /// Staged list of (LBA, data) pairs representing metadata blocks to commit.
430 tx: Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>,
431
432 /// A lock-protected handle to the journal structure.
433 journal: SpinLockGuard<'a, Journal>,
434
435 /// I/O interface for reading/writing disk blocks.
436 io: JournalIO<'a>,
437
438 /// Reference to the filesystem's core state.
439 ffs: &'a FastFileSystemInner,
440
441 /// Unique identifier of the transaction.
442 tx_id: u64,
443
444 /// Internal index tracking progress through `tx`.
445 index: usize,
446
447 /// Phantom data used to track the current commit stage.
448 _write_target: core::marker::PhantomData<WriteTarget>,
449}
450
451impl<'a> JournalWriter<'a, TxBegin> {
452 /// Creates a new `JournalWriter` in the initial `TxBegin` stage.
453 ///
454 /// This prepares the writer for the staged commit sequence of the given
455 /// transaction.
456 ///
457 /// # Parameters
458 /// - `tx`: The list of metadata blocks to be written.
459 /// - `journal`: A locked handle to the global journal state.
460 /// - `io`: The disk I/O interface.
461 /// - `ffs`: A reference to the file system.
462 /// - `tx_id`: A unique ID assigned to the transaction.
463 ///
464 /// # Returns
465 /// A `JournalWriter` instance in the `TxBegin` state.
466 pub fn new(
467 tx: Vec<(LogicalBlockAddress, Box<[u8; 4096]>)>,
468 journal: SpinLockGuard<'a, Journal>,
469 io: JournalIO<'a>,
470 ffs: &'a FastFileSystemInner,
471 tx_id: u64,
472 ) -> Self {
473 Self {
474 tx,
475 journal,
476 io,
477 ffs,
478 tx_id,
479 index: 0,
480 _write_target: core::marker::PhantomData,
481 }
482 }
483
484 /// Writes the `TxBegin` marker to the journal.
485 ///
486 /// This signals the start of a journaled transaction. Must be called before
487 /// writing the data blocks.
488 ///
489 /// # Returns
490 /// A `JournalWriter` in the `Block` stage.
491 pub fn write_tx_begin(mut self) -> Result<JournalWriter<'a, Block>, KernelError> {
492 let mut tx_begin = JournalTxBegin::new(self.tx_id);
493 todo!();
494 Ok(JournalWriter {
495 tx: self.tx,
496 journal: self.journal,
497 ffs: self.ffs,
498 io: self.io,
499 tx_id: self.tx_id,
500 index: self.index,
501 _write_target: core::marker::PhantomData,
502 })
503 }
504}
505
506impl<'a> JournalWriter<'a, Block> {
507 /// Writes all staged metadata blocks to the journal.
508 ///
509 /// Each block is written sequentially to a dedicated journal area.
510 /// This must be called after `write_tx_begin()` and before finalizing with
511 /// `write_tx_end()`.
512 ///
513 /// # Returns
514 /// A `JournalWriter` in the `TxEnd` stage.
515 pub fn write_blocks(mut self) -> Result<JournalWriter<'a, TxEnd>, KernelError> {
516 todo!();
517 Ok(JournalWriter {
518 tx: self.tx,
519 journal: self.journal,
520 ffs: self.ffs,
521 io: self.io,
522 tx_id: self.tx_id,
523 index: self.index,
524 _write_target: core::marker::PhantomData,
525 })
526 }
527}
528
529impl<'a> JournalWriter<'a, TxEnd> {
530 /// Writes the `TxEnd` and completes the transaction by updating journal
531 /// superblock.
532 ///
533 /// This signals a successfully completed transaction and allows recovery
534 /// mechanisms to apply the journal contents to the actual file system
535 /// metadata.
536 ///
537 /// # Returns
538 /// - The locked journal and I/O handle, to checkpoint the journal.
539 /// - `Err(KernelError)` if the final commit stage fails.
540 pub fn write_tx_end(
541 mut self,
542 ) -> Result<(SpinLockGuard<'a, Journal>, JournalIO<'a>), KernelError> {
543 let tx_end = JournalTxEnd::new(self.tx_id);
544 // In the real-file system, this TxEnd block usally omitted to reduce the disk
545 // I/O.
546 todo!();
547
548 // Mark the Transaction is commited to the JournalSb.
549 let Self {
550 mut journal,
551 io,
552 ffs,
553 ..
554 } = self;
555 journal.sb.commited = 1;
556 match journal.sb.writeback(&io, ffs) {
557 Ok(_) => Ok((journal, io)),
558 Err(e) => {
559 journal.unlock();
560 Err(e)
561 }
562 }
563 }
564}