keos_project5/page_cache/mod.rs
1//! # Page Cache.
2//!
3//! The **page cache** is a fundamental component of the operating system’s file
4//! system infrastructure. It functions as a high-speed, memory-resident buffer
5//! that caches frequently accessed file data, thereby significantly reducing
6//! the overhead of disk I/O operations.
7//!
8//! When a process reads from a file, the kernel first checks whether the
9//! requested data is present in the page cache. If it is, the data is served
10//! directly from memory, bypassing the much slower disk access path. If not,
11//! the system fetches the data from disk, inserts it into the cache, and then
12//! returns it to the process. This mechanism greatly enhances system
13//! responsiveness and throughput, especially in workloads involving repeated or
14//! sequential file access patterns.
15//!
16//! Because disk I/O is orders of magnitude slower than memory access, the page
17//! cache is essential for delivering high performance in modern operating
18//! systems. Without it, every file read or write would necessitate direct disk
19//! access, resulting in substantial latency and performance degradation.
20//!
21//! Once the page cache is introduced, file data can be directly mapped into a
22//! process’s virtual memory space through `mmap()`. Instead of allocating
23//! memory and copying file contents manually, `mmap()` establishes a direct
24//! mapping to pages backed by the page cache. This integration enables
25//! efficient, demand-paged file I/O: page faults during memory access are
26//! resolved by loading data from the page cache. If the page is not yet cached,
27//! the kernel fetches it from disk into the cache, then maps it into the
28//! process. When mapped pages are modified, they are marked as *dirty*, and
29//! later written back to disk as part of the page cache’s write-back policy.
30//! This mechanism ensures consistency while minimizing redundant I/O
31//! operations.
32//!
33//! To maintain consistency and durability, the page cache uses a **write-back**
34//! policy. Modifications to cached file data are initially applied in memory,
35//! and the affected pages are marked as dirty. These pages are later flushed to
36//! disk, either explicitly via the `fsync()` system call or automatically by
37//! background kernel threads. This approach optimizes performance by deferring
38//! costly write operations, while still ensuring data persistence.
39//!
40//! Since memory is a limited resource, the kernel must eventually evict
41//! pages from the page cache to make room for new data. This requires a cache
42//! eviction policy that decides which pages to reclaim based on usage
43//! patterns—commonly using heuristics such as Least Recently Used (LRU).
44//! Eviction is a critical aspect of page cache design, as it balances memory
45//! pressure with the goal of retaining useful data in cache to maximize
46//! performance.
47//!
48//! The page cache also enables **readahead**, a performance optimization that
49//! preemptively loads file pages into memory based on access patterns. When the
50//! kernel detects sequential file access, it predicts future reads and
51//! asynchronously fetches additional pages into the cache. This significantly
52//! reduces the number of page faults and improves I/O latency, making readahead
53//! particularly effective for streaming large files or reading large datasets.
54//!
55//! Implementing a page cache is a critical step toward building a modern,
56//! high-performance operating system. It not only enhances file system
57//! efficiency but also provides insight into how the OS bridges the performance
58//! gap between fast volatile memory and slow persistent storage.
59//!
60//! ## Page Cache in KeOS
61//!
62//! Your goal is to extend KeOS to support a page cache to bridge file I/O
63//! operations with memory-backed caching. The template provides three major
64//! abstraction:
65//!
66//! - [`PageCacheInner`]: An internal wrapper that coordinates low-level
67//! interactions between the cache, and storage layer.
68//!
69//! - [`PageCacheState`]: This is the high-level cache manager. It embeds an
70//! [`LRUCache`] keyed by `(InodeNumber, FileBlockNumber)` and manages up to
71//! 512 slots (~2MiB). It is the central entry point for page-cache-aware file
72//! I/O.
73//!
74//! - [`Slot`]: A [`Slot`] represents a single cached file block. It contains
75//! the owning [`RegularFile`], the corresponding file block number (`FBA`),
76//! the backing [`Page`], and metadata such as the write-back size. Dirty
77//! slots track modifications and are eventually flushed back to disk.
78//!
79//! ### Readahead Policy
80//!
81//! KeOS employs a simple readahead policy: when a file block is read, the cache
82//! preemptively loads up to 16 subsequent blocks. This heuristic is designed to
83//! optimize sequential access workloads (e.g., file scans or streaming),
84//! reducing future read latency and improving throughput. Random workloads
85//! remain unaffected, since readahead is limited and opportunistic.
86//!
87//! ### Cache Replacement: LRU
88//!
89//! [`PageCacheState`] relies on an Least-Recently-Used (LRU) policy to manage
90//! memory pressure. When the cache reaches capacity, the least recently used
91//! slot is evicted. If the slot is dirty, its contents are flushed back to disk
92//! before eviction. This policy balances simplicity and efficiency by retaining
93//! hot (recently accessed) pages while discarding cold ones. All these
94//! functionalities are provided by the [`LRUCache`] struct.
95//!
96//! ### Workflow
97//!
98//! 1. **Read**: On a read request, the cache checks for an existing slot. If
99//! present, data is served from memory; otherwise, the block is loaded from
100//! disk, inserted into the cache, and readahead is triggered.
101//!
102//! 2. **Write**: Writes update the cached slot in place. The slot is marked
103//! dirty and write-back occurs lazily, either via explicit sync or eviction.
104//!
105//! 3. **mmap**: Pages can be directly mapped into user space from the page
106//! cache. Faults are resolved by pulling in the corresponding slot.
107//!
108//! 4. **Unlink**: When a file is deleted, all its slots are invalidated without
109//! flushing, ensuring consistency with the file system state.
110//!
111//! 5. **Writeback**: Dirty slots are flushed either explicitly (via `fsync`) or
112//! opportunistically during eviction. This ensures persistence while
113//! reducing redundant disk I/O.
114//!
115//! The following diagram depicts the work-flow of the page cache subsystem of
116//! the KeOS.
117//! ```text
118//! +-------------------------------+
119//! | Process issues |
120//! | read(), write(), or mmap() |
121//! +---------------+---------------+
122//! |
123//! v
124//! +-------------------+
125//! | Check Slot in |
126//! | PageCacheState |
127//! +--------+----------+
128//! Hit | Miss
129//! | |
130//! v v
131//! +------------+ +----------------------+
132//! | Serve data | | Load block from disk |
133//! | from cache | +-----------+----------+
134//! +-----+------+ |
135//! | v
136//! | +----------------------+
137//! | | Insert block as Slot |
138//! | | into LRUCache |
139//! | | Trigger readahead |
140//! | +-----------+----------+
141//! | |
142//! +-------------------------+
143//! |
144//! v
145//! +-------------------+
146//! | Is this a write? |
147//! +--------+----------+
148//! Yes
149//! |
150//! v
151//! +---------------+---------+
152//! | Mark Slot dirty (defer |
153//! | writeback to disk) |
154//! +-----------+-------------+
155//! ```
156//!
157//! ## Implementation Requirements
158//! You need to implement the followings:
159//! - [`PageCacheState::readahead`]
160//! - [`PageCacheState::do_read`]
161//! - [`PageCacheState::do_write`]
162//! - [`PageCacheState::do_mmap`]
163//! - [`Slot::read_page`]
164//! - [`Slot::write_page`]
165//! - [`Slot::writeback`]
166//! - [`Slot::drop`]
167//! - [`PageCache::read`]
168//!
169//! After implement the functionalities, move on to the next [`section`].
170//!
171//! [`section`]: mod@crate::ffs
172use crate::lru::LRUCache;
173use alloc::{string::ToString, sync::Arc};
174use core::ops::{Deref, DerefMut};
175use keos::{
176 KernelError,
177 channel::{Sender, channel},
178 fs::{FileBlockNumber, InodeNumber, RegularFile, traits::FileSystem},
179 mm::Page,
180 thread::{JoinHandle, ThreadBuilder},
181};
182use keos_project4::sync::mutex::Mutex;
183
184pub mod overlaying;
185
186/// A single entry in the page cache.
187///
188/// A [`Slot`] corresponds to one file block stored in memory.
189pub struct Slot {
190 /// The file this slot belongs to.
191 pub file: RegularFile,
192 /// The file block number this slot represents.
193 pub fba: FileBlockNumber,
194 /// The backing page containing the block’s data.
195 pub page: Page,
196 /// Size to be write-backed if dirtied. If the slot is clean, this will be
197 /// `None`.
198 pub writeback_size: Option<usize>,
199}
200
201impl Slot {
202 /// Create a new slot for the given file, block, and backing page.
203 ///
204 /// By default, the slot is clean (i.e., `writeback_size` is `None`).
205 pub fn new(file: keos::fs::RegularFile, fba: FileBlockNumber, page: Page) -> Self {
206 Slot {
207 file,
208 fba,
209 page,
210 writeback_size: None,
211 }
212 }
213
214 /// Copy the page contents into the provided buffer.
215 ///
216 /// The buffer must be exactly 4096 bytes long, representing a full
217 /// page. This method does not trigger I/O.
218 pub fn read_page(&self, buf: &mut [u8; 4096]) {
219 todo!()
220 }
221
222 /// Update the page contents from the provided buffer.
223 ///
224 /// Marks the slot as dirty with a write-back of at least `min_size` bytes.
225 /// The buffer must be exactly 4096 bytes long.
226 pub fn write_page(&mut self, buf: &[u8; 4096], min_size: usize) {
227 todo!()
228 }
229
230 /// Write back the dirty portion of the page to the underlying file.
231 ///
232 /// - If `writeback_size` is `Some(size)`, representing slot is dirty, marked with
233 /// the desired minimum file size (in bytes) after write-back.
234 /// - On success, clears the `writeback_size` to `None`.
235 ///
236 /// If the slot is clean, this does not trigger the I/O.
237 pub fn writeback(&mut self) -> Result<(), keos::KernelError> {
238 todo!()
239 }
240}
241
242impl Drop for Slot {
243 fn drop(&mut self) {
244 // Called on eviction.
245 todo!()
246 }
247}
248
249/// The global page cache state.
250///
251/// [`PageCacheState`] wraps an [`LRUCache`] mapping `(InodeNumber,
252/// FileBlockNumber)` to [`Slot`] entries. It enforces a bounded capacity of 512
253/// slots, corresponding to 2 MiB of cached file data.
254///
255/// This state is protected by a [`Mutex`] inside [`PageCacheInner`], allowing
256/// concurrent access from multiple threads with safe eviction.
257#[repr(transparent)]
258pub struct PageCacheState(
259 LRUCache<(InodeNumber, FileBlockNumber), Slot, 512>, // 2MiB
260);
261
262impl Deref for PageCacheState {
263 type Target = LRUCache<(InodeNumber, FileBlockNumber), Slot, 512>;
264 fn deref(&self) -> &Self::Target {
265 &self.0
266 }
267}
268
269impl DerefMut for PageCacheState {
270 fn deref_mut(&mut self) -> &mut Self::Target {
271 &mut self.0
272 }
273}
274
275impl PageCacheState {
276 /// Perform readahead on sequential file blocks.
277 ///
278 /// Reads up to **16 consecutive blocks** after the given `fba`
279 /// (file block address) into the cache.
280 ///
281 /// Existing cached slots are not overwritten.
282 pub fn readahead(&mut self, file: keos::fs::RegularFile, fba: FileBlockNumber) {
283 todo!()
284 }
285
286 /// Insert a new [`Slot`] into the page cache.
287 ///
288 /// Associates the given `(inode, fba)` pair with the slot.
289 /// If the cache is at capacity, the least-recently-used slot
290 /// will be automatically evicted (writing back its contents if dirty).
291 pub fn insert(&mut self, id: (InodeNumber, FileBlockNumber), slot: Slot) {
292 self.0.put(id, slot);
293 }
294
295 /// Read a file block into the provided buffer.
296 ///
297 /// - If the block is cached, copies directly from the page cache.
298 /// - If the block is not cached, loads it **synchronously** from the file
299 /// system, inserts it into the cache, and copies it into the buffer.
300 ///
301 /// This method does not triggers the read-ahead requests.
302 ///
303 /// Returns Ok(true) if there exists any byte read.
304 pub fn do_read(
305 &mut self,
306 file: keos::fs::RegularFile,
307 fba: FileBlockNumber,
308 buf: &mut [u8; 4096],
309 ) -> Result<bool, keos::KernelError> {
310 todo!()
311 }
312
313 /// Write a file block through the page cache.
314 ///
315 /// Updates (or inserts) the slot corresponding to the `(file, fba)`
316 /// pair with the provided buffer. The slot is marked dirty, and at least
317 /// `min_size` bytes are scheduled for write-back.
318 ///
319 /// This method does not immediately flush to disk; explicit
320 /// [`PageCacheState::do_writeback`] or eviction is required for
321 /// persistence.
322 pub fn do_write(
323 &mut self,
324 file: keos::fs::RegularFile,
325 fba: FileBlockNumber,
326 buf: &[u8; 4096],
327 min_size: usize,
328 ) -> Result<(), keos::KernelError> {
329 todo!()
330 }
331
332 /// Provide a memory-mapped page for the given file block.
333 ///
334 /// - If the block is cached, returns a clone of the backing [`Page`].
335 /// - If not, loads the block into the cache and returns the new [`Page`].
336 ///
337 /// This allows direct access to the cached page memory.
338 pub fn do_mmap(
339 &mut self,
340 file: keos::fs::RegularFile,
341 fba: FileBlockNumber,
342 ) -> Result<Page, keos::KernelError> {
343 todo!()
344 }
345
346 /// Remove all slots associated with a given file.
347 ///
348 /// Slots are dropped without flushing dirty data back to the file system.
349 /// This is typically used during file unlink (deletion), where data
350 /// persistence is no longer required.
351 pub fn do_unlink(&mut self, file: keos::fs::RegularFile) {
352 let ino = file.0.ino();
353 // Remove all slots associated with this file without writeback
354 self.0.retain(|(id_ino, _), v| {
355 if *id_ino == ino {
356 v.writeback_size = None;
357 false
358 } else {
359 true
360 }
361 });
362 }
363
364 /// Write back all dirty slots belonging to the given file.
365 ///
366 /// Ensures that all cached modifications to the file are persisted
367 /// to the underlying file system.
368 pub fn do_writeback(&mut self, file: keos::fs::RegularFile) -> Result<(), keos::KernelError> {
369 let ino = file.0.ino();
370 // Write back all slots associated with this file
371 self.0
372 .iter_mut()
373 .filter(|((id_ino, _), _)| *id_ino == ino)
374 .for_each(|(_, slot)| {
375 let _ = slot.writeback();
376 });
377
378 Ok(())
379 }
380}
381
382/// Internal representation of a [`PageCache`].
383pub struct PageCacheInner<FS: FileSystem> {
384 /// The file system that the page cache operates on.
385 pub fs: FS,
386 /// The shared state of the page cache.
387 pub inner: Arc<Mutex<PageCacheState>>,
388 /// Channel for sending read-ahead requests to the background thread.
389 pub request: Sender<(keos::fs::RegularFile, FileBlockNumber)>,
390 /// Join handle for the read-ahead thread.
391 _readahead_thread: JoinHandle,
392}
393
394/// A reference-counted handle to the page cache.
395///
396/// [`PageCache`] wraps an [`Arc`] around [`PageCacheInner`], allowing
397/// safe sharing across multiple threads. It provides methods to
398/// construct a cache and to read pages with read-ahead support.
399pub struct PageCache<FS: FileSystem>(pub Arc<PageCacheInner<FS>>);
400
401impl<FS: FileSystem> Clone for PageCache<FS> {
402 fn clone(&self) -> Self {
403 Self(self.0.clone())
404 }
405}
406
407impl<FS: FileSystem> PageCache<FS> {
408 /// Create a new page cache associated with the given file system.
409 ///
410 /// Spawns a background thread to service read-ahead requests.
411 pub fn new(fs: FS) -> Self {
412 info!("Mounting {} to PageCache.", core::any::type_name::<FS>());
413 let (request, rx) = channel(100);
414 let inner = Arc::new(Mutex::new(PageCacheState(LRUCache::new())));
415 let cloned_inner = inner.clone();
416 let _readahead_thread = ThreadBuilder::new("[Readahead]".to_string()).spawn(move || {
417 println!(
418 "Start [Readahead] (TID: {})",
419 keos::thread::Current::get_tid()
420 );
421 while let Ok((file, fba)) = rx.recv() {
422 let mut guard = cloned_inner.lock();
423 guard.readahead(file, fba);
424 guard.unlock();
425 }
426 });
427 PageCache(Arc::new(PageCacheInner {
428 fs,
429 inner,
430 request,
431 _readahead_thread,
432 }))
433 }
434
435 /// Read a page from the cache or underlying file system.
436 ///
437 /// A read-ahead request for subsequent pages is issued to the
438 /// background thread.
439 pub fn read(
440 &self,
441 file: &keos::fs::RegularFile,
442 fba: FileBlockNumber,
443 buf: &mut [u8; 4096],
444 ) -> Result<bool, KernelError> {
445 // TODO:
446 // 1. read the requested file synchronously.
447 // 2. send a read-ahead request to the readahead thread.
448 todo!()
449 }
450}
451
452impl<FS: FileSystem> Drop for PageCacheInner<FS> {
453 fn drop(&mut self) {
454 if keos::PANIC_DEPTH.load(core::sync::atomic::Ordering::SeqCst) == 0 {
455 let readahead_tid = self._readahead_thread.tid;
456 println!(
457 "Stop [Readahead] (TID: {}) / success: {}",
458 readahead_tid,
459 keos::thread::kill_by_tid(readahead_tid, 0).is_ok()
460 );
461 }
462 }
463}