keos_project5/page_cache/
mod.rs

1//! # Page Cache.
2//!
3//! The **page cache** is a fundamental component of the operating system’s file
4//! system infrastructure. It functions as a high-speed, memory-resident buffer
5//! that caches frequently accessed file data, thereby significantly reducing
6//! the overhead of disk I/O operations.
7//!
8//! When a process reads from a file, the kernel first checks whether the
9//! requested data is present in the page cache. If it is, the data is served
10//! directly from memory, bypassing the much slower disk access path. If not,
11//! the system fetches the data from disk, inserts it into the cache, and then
12//! returns it to the process. This mechanism greatly enhances system
13//! responsiveness and throughput, especially in workloads involving repeated or
14//! sequential file access patterns.
15//!
16//! Because disk I/O is orders of magnitude slower than memory access, the page
17//! cache is essential for delivering high performance in modern operating
18//! systems. Without it, every file read or write would necessitate direct disk
19//! access, resulting in substantial latency and performance degradation.
20//!
21//! Once the page cache is introduced, file data can be directly mapped into a
22//! process’s virtual memory space through `mmap()`. Instead of allocating
23//! memory and copying file contents manually, `mmap()` establishes a direct
24//! mapping to pages backed by the page cache. This integration enables
25//! efficient, demand-paged file I/O: page faults during memory access are
26//! resolved by loading data from the page cache. If the page is not yet cached,
27//! the kernel fetches it from disk into the cache, then maps it into the
28//! process. When mapped pages are modified, they are marked as *dirty*, and
29//! later written back to disk as part of the page cache’s write-back policy.
30//! This mechanism ensures consistency while minimizing redundant I/O
31//! operations.
32//!
33//! To maintain consistency and durability, the page cache uses a **write-back**
34//! policy. Modifications to cached file data are initially applied in memory,
35//! and the affected pages are marked as dirty. These pages are later flushed to
36//! disk, either explicitly via the `fsync()` system call or automatically by
37//! background kernel threads. This approach optimizes performance by deferring
38//! costly write operations, while still ensuring data persistence.
39//!
40//! Since memory is a limited resource, the kernel must eventually evict
41//! pages from the page cache to make room for new data. This requires a cache
42//! eviction policy that decides which pages to reclaim based on usage
43//! patterns—commonly using heuristics such as Least Recently Used (LRU).
44//! Eviction is a critical aspect of page cache design, as it balances memory
45//! pressure with the goal of retaining useful data in cache to maximize
46//! performance.
47//!
48//! The page cache also enables **readahead**, a performance optimization that
49//! preemptively loads file pages into memory based on access patterns. When the
50//! kernel detects sequential file access, it predicts future reads and
51//! asynchronously fetches additional pages into the cache. This significantly
52//! reduces the number of page faults and improves I/O latency, making readahead
53//! particularly effective for streaming large files or reading large datasets.
54//!
55//! Implementing a page cache is a critical step toward building a modern,
56//! high-performance operating system. It not only enhances file system
57//! efficiency but also provides insight into how the OS bridges the performance
58//! gap between fast volatile memory and slow persistent storage.
59//!
60//! ## Page Cache in KeOS
61//!
62//! Your goal is to extend KeOS to support a page cache to bridge file I/O
63//! operations with memory-backed caching. The template provides three major
64//! abstraction:
65//!
66//! - [`PageCacheInner`]: An internal wrapper that coordinates low-level
67//!   interactions between the cache, and storage layer.
68//!
69//! - [`PageCacheState`]: This is the high-level cache manager. It embeds an
70//!   [`LRUCache`] keyed by `(InodeNumber, FileBlockNumber)` and manages up to
71//!   512 slots (~2MiB). It is the central entry point for page-cache-aware file
72//!   I/O.
73//!
74//! - [`Slot`]: A [`Slot`] represents a single cached file block. It contains
75//!   the owning [`RegularFile`], the corresponding file block number (`FBA`),
76//!   the backing [`Page`], and metadata such as the write-back size. Dirty
77//!   slots track modifications and are eventually flushed back to disk.
78//!
79//! ### Readahead Policy
80//!
81//! KeOS employs a simple readahead policy: when a file block is read, the cache
82//! preemptively loads up to 16 subsequent blocks. This heuristic is designed to
83//! optimize sequential access workloads (e.g., file scans or streaming),
84//! reducing future read latency and improving throughput. Random workloads
85//! remain unaffected, since readahead is limited and opportunistic.
86//!
87//! ### Cache Replacement: LRU
88//!
89//! [`PageCacheState`] relies on an Least-Recently-Used (LRU) policy to manage
90//! memory pressure. When the cache reaches capacity, the least recently used
91//! slot is evicted. If the slot is dirty, its contents are flushed back to disk
92//! before eviction. This policy balances simplicity and efficiency by retaining
93//! hot (recently accessed) pages while discarding cold ones. All these
94//! functionalities are provided by the [`LRUCache`] struct.
95//!
96//! ### Workflow
97//!
98//! 1. **Read**: On a read request, the cache checks for an existing slot. If
99//!    present, data is served from memory; otherwise, the block is loaded from
100//!    disk, inserted into the cache, and readahead is triggered.
101//!
102//! 2. **Write**: Writes update the cached slot in place. The slot is marked
103//!    dirty and write-back occurs lazily, either via explicit sync or eviction.
104//!
105//! 3. **mmap**: Pages can be directly mapped into user space from the page
106//!    cache. Faults are resolved by pulling in the corresponding slot.
107//!
108//! 4. **Unlink**: When a file is deleted, all its slots are invalidated without
109//!    flushing, ensuring consistency with the file system state.
110//!
111//! 5. **Writeback**: Dirty slots are flushed either explicitly (via `fsync`) or
112//!    opportunistically during eviction. This ensures persistence while
113//!    reducing redundant disk I/O.
114//!
115//! The following diagram depicts the work-flow of the page cache subsystem of
116//! the KeOS.
117//! ```text
118//!            +-------------------------------+
119//!            |       Process issues          |
120//!            |  read(), write(), or mmap()   |
121//!            +---------------+---------------+
122//!                            |
123//!                            v
124//!                 +-------------------+
125//!                 |  Check Slot in    |
126//!                 |  PageCacheState   |
127//!                 +--------+----------+
128//!                  Hit            |   Miss
129//!                   |             |
130//!                   v             v
131//!            +------------+   +----------------------+
132//!            | Serve data |   | Load block from disk |
133//!            | from cache |   +-----------+----------+
134//!            +-----+------+               |
135//!                  |                      v
136//!                  |             +----------------------+
137//!                  |             | Insert block as Slot |
138//!                  |             |  into LRUCache       |
139//!                  |             |  Trigger readahead   |
140//!                  |             +-----------+----------+
141//!                  |                         |
142//!                  +-------------------------+
143//!                  |
144//!                  v
145//!           +-------------------+
146//!           | Is this a write?  |
147//!           +--------+----------+
148//!                  Yes
149//!                   |
150//!                   v
151//!     +---------------+---------+
152//!     | Mark Slot dirty (defer  |
153//!     | writeback to disk)      |
154//!     +-----------+-------------+
155//! ```
156//!
157//! ## Implementation Requirements
158//! You need to implement the followings:
159//! - [`PageCacheState::readahead`]
160//! - [`PageCacheState::do_read`]
161//! - [`PageCacheState::do_write`]
162//! - [`PageCacheState::do_mmap`]
163//! - [`Slot::read_page`]
164//! - [`Slot::write_page`]
165//! - [`Slot::writeback`]
166//! - [`Slot::drop`]
167//! - [`PageCache::read`]
168//!
169//! After implement the functionalities, move on to the next [`section`].
170//!
171//! [`section`]: mod@crate::ffs
172use crate::lru::LRUCache;
173use alloc::{string::ToString, sync::Arc};
174use core::ops::{Deref, DerefMut};
175use keos::{
176    KernelError,
177    channel::{Sender, channel},
178    fs::{FileBlockNumber, InodeNumber, RegularFile, traits::FileSystem},
179    mm::Page,
180    thread::{JoinHandle, ThreadBuilder},
181};
182use keos_project4::sync::mutex::Mutex;
183
184pub mod overlaying;
185
186/// A single entry in the page cache.
187///
188/// A [`Slot`] corresponds to one file block stored in memory.
189pub struct Slot {
190    /// The file this slot belongs to.
191    pub file: RegularFile,
192    /// The file block number this slot represents.
193    pub fba: FileBlockNumber,
194    /// The backing page containing the block’s data.
195    pub page: Page,
196    /// Size to be write-backed if dirtied. If the slot is clean, this will be
197    /// `None`.
198    pub writeback_size: Option<usize>,
199}
200
201impl Slot {
202    /// Create a new slot for the given file, block, and backing page.
203    ///
204    /// By default, the slot is clean (i.e., `writeback_size` is `None`).
205    pub fn new(file: keos::fs::RegularFile, fba: FileBlockNumber, page: Page) -> Self {
206        Slot {
207            file,
208            fba,
209            page,
210            writeback_size: None,
211        }
212    }
213
214    /// Copy the page contents into the provided buffer.
215    ///
216    /// The buffer must be exactly 4096 bytes long, representing a full
217    /// page. This method does not trigger I/O.
218    pub fn read_page(&self, buf: &mut [u8; 4096]) {
219        todo!()
220    }
221
222    /// Update the page contents from the provided buffer.
223    ///
224    /// Marks the slot as dirty with a write-back of at least `min_size` bytes.
225    /// The buffer must be exactly 4096 bytes long.
226    pub fn write_page(&mut self, buf: &[u8; 4096], min_size: usize) {
227        todo!()
228    }
229
230    /// Write back the dirty portion of the page to the underlying file.
231    ///
232    /// - If `writeback_size` is `Some(size)`, representing slot is dirty, marked with 
233    ///   the desired minimum file size (in bytes) after write-back.
234    /// - On success, clears the `writeback_size` to `None`.
235    ///
236    /// If the slot is clean, this does not trigger the I/O.
237    pub fn writeback(&mut self) -> Result<(), keos::KernelError> {
238       todo!() 
239    }
240}
241
242impl Drop for Slot {
243    fn drop(&mut self) {
244        // Called on eviction.
245        todo!()
246    }
247}
248
249/// The global page cache state.
250///
251/// [`PageCacheState`] wraps an [`LRUCache`] mapping `(InodeNumber,
252/// FileBlockNumber)` to [`Slot`] entries. It enforces a bounded capacity of 512
253/// slots, corresponding to 2 MiB of cached file data.
254///
255/// This state is protected by a [`Mutex`] inside [`PageCacheInner`], allowing
256/// concurrent access from multiple threads with safe eviction.
257#[repr(transparent)]
258pub struct PageCacheState(
259    LRUCache<(InodeNumber, FileBlockNumber), Slot, 512>, // 2MiB
260);
261
262impl Deref for PageCacheState {
263    type Target = LRUCache<(InodeNumber, FileBlockNumber), Slot, 512>;
264    fn deref(&self) -> &Self::Target {
265        &self.0
266    }
267}
268
269impl DerefMut for PageCacheState {
270    fn deref_mut(&mut self) -> &mut Self::Target {
271        &mut self.0
272    }
273}
274
275impl PageCacheState {
276    /// Perform readahead on sequential file blocks.
277    ///
278    /// Reads up to **16 consecutive blocks** after the given `fba`
279    /// (file block address) into the cache.
280    ///
281    /// Existing cached slots are not overwritten.
282    pub fn readahead(&mut self, file: keos::fs::RegularFile, fba: FileBlockNumber) {
283        todo!()
284    }
285
286    /// Insert a new [`Slot`] into the page cache.
287    ///
288    /// Associates the given `(inode, fba)` pair with the slot.
289    /// If the cache is at capacity, the least-recently-used slot
290    /// will be automatically evicted (writing back its contents if dirty).
291    pub fn insert(&mut self, id: (InodeNumber, FileBlockNumber), slot: Slot) {
292        self.0.put(id, slot);
293    }
294
295    /// Read a file block into the provided buffer.
296    ///
297    /// - If the block is cached, copies directly from the page cache.
298    /// - If the block is not cached, loads it **synchronously** from the file
299    ///   system, inserts it into the cache, and copies it into the buffer.
300    ///
301    /// This method does not triggers the read-ahead requests.
302    /// 
303    /// Returns Ok(true) if there exists any byte read.
304    pub fn do_read(
305        &mut self,
306        file: keos::fs::RegularFile,
307        fba: FileBlockNumber,
308        buf: &mut [u8; 4096],
309    ) -> Result<bool, keos::KernelError> {
310        todo!()
311    }
312
313    /// Write a file block through the page cache.
314    ///
315    /// Updates (or inserts) the slot corresponding to the `(file, fba)`
316    /// pair with the provided buffer. The slot is marked dirty, and at least
317    /// `min_size` bytes are scheduled for write-back.
318    ///
319    /// This method does not immediately flush to disk; explicit
320    /// [`PageCacheState::do_writeback`] or eviction is required for
321    /// persistence.
322    pub fn do_write(
323        &mut self,
324        file: keos::fs::RegularFile,
325        fba: FileBlockNumber,
326        buf: &[u8; 4096],
327        min_size: usize,
328    ) -> Result<(), keos::KernelError> {
329        todo!()
330    }
331
332    /// Provide a memory-mapped page for the given file block.
333    ///
334    /// - If the block is cached, returns a clone of the backing [`Page`].
335    /// - If not, loads the block into the cache and returns the new [`Page`].
336    ///
337    /// This allows direct access to the cached page memory.
338    pub fn do_mmap(
339        &mut self,
340        file: keos::fs::RegularFile,
341        fba: FileBlockNumber,
342    ) -> Result<Page, keos::KernelError> {
343        todo!()
344    }
345
346    /// Remove all slots associated with a given file.
347    ///
348    /// Slots are dropped without flushing dirty data back to the file system.
349    /// This is typically used during file unlink (deletion), where data
350    /// persistence is no longer required.
351    pub fn do_unlink(&mut self, file: keos::fs::RegularFile) {
352        let ino = file.0.ino();
353        // Remove all slots associated with this file without writeback
354        self.0.retain(|(id_ino, _), v| {
355            if *id_ino == ino {
356                v.writeback_size = None;
357                false
358            } else {
359                true
360            }
361        });
362    }
363
364    /// Write back all dirty slots belonging to the given file.
365    ///
366    /// Ensures that all cached modifications to the file are persisted
367    /// to the underlying file system.
368    pub fn do_writeback(&mut self, file: keos::fs::RegularFile) -> Result<(), keos::KernelError> {
369        let ino = file.0.ino();
370        // Write back all slots associated with this file
371        self.0
372            .iter_mut()
373            .filter(|((id_ino, _), _)| *id_ino == ino)
374            .for_each(|(_, slot)| {
375                let _ = slot.writeback();
376            });
377
378        Ok(())
379    }
380}
381
382/// Internal representation of a [`PageCache`].
383pub struct PageCacheInner<FS: FileSystem> {
384    /// The file system that the page cache operates on.
385    pub fs: FS,
386    /// The shared state of the page cache.
387    pub inner: Arc<Mutex<PageCacheState>>,
388    /// Channel for sending read-ahead requests to the background thread.
389    pub request: Sender<(keos::fs::RegularFile, FileBlockNumber)>,
390    /// Join handle for the read-ahead thread.
391    _readahead_thread: JoinHandle,
392}
393
394/// A reference-counted handle to the page cache.
395///
396/// [`PageCache`] wraps an [`Arc`] around [`PageCacheInner`], allowing
397/// safe sharing across multiple threads. It provides methods to
398/// construct a cache and to read pages with read-ahead support.
399pub struct PageCache<FS: FileSystem>(pub Arc<PageCacheInner<FS>>);
400
401impl<FS: FileSystem> Clone for PageCache<FS> {
402    fn clone(&self) -> Self {
403        Self(self.0.clone())
404    }
405}
406
407impl<FS: FileSystem> PageCache<FS> {
408    /// Create a new page cache associated with the given file system.
409    ///
410    /// Spawns a background thread to service read-ahead requests.
411    pub fn new(fs: FS) -> Self {
412        info!("Mounting {} to PageCache.", core::any::type_name::<FS>());
413        let (request, rx) = channel(100);
414        let inner = Arc::new(Mutex::new(PageCacheState(LRUCache::new())));
415        let cloned_inner = inner.clone();
416        let _readahead_thread = ThreadBuilder::new("[Readahead]".to_string()).spawn(move || {
417            println!(
418                "Start [Readahead] (TID: {})",
419                keos::thread::Current::get_tid()
420            );
421            while let Ok((file, fba)) = rx.recv() {
422                let mut guard = cloned_inner.lock();
423                guard.readahead(file, fba);
424                guard.unlock();
425            }
426        });
427        PageCache(Arc::new(PageCacheInner {
428            fs,
429            inner,
430            request,
431            _readahead_thread,
432        }))
433    }
434
435    /// Read a page from the cache or underlying file system.
436    ///
437    /// A read-ahead request for subsequent pages is issued to the
438    /// background thread.
439    pub fn read(
440        &self,
441        file: &keos::fs::RegularFile,
442        fba: FileBlockNumber,
443        buf: &mut [u8; 4096],
444    ) -> Result<bool, KernelError> {
445        // TODO:
446        // 1. read the requested file synchronously.
447        // 2. send a read-ahead request to the readahead thread.
448        todo!()
449    }
450}
451
452impl<FS: FileSystem> Drop for PageCacheInner<FS> {
453    fn drop(&mut self) {
454        if keos::PANIC_DEPTH.load(core::sync::atomic::Ordering::SeqCst) == 0 {
455            let readahead_tid = self._readahead_thread.tid;
456            println!(
457                "Stop [Readahead] (TID: {}) / success: {}",
458                readahead_tid,
459                keos::thread::kill_by_tid(readahead_tid, 0).is_ok()
460            );
461        }
462    }
463}