Expand description
§Page Cache.
The page cache is a fundamental component of the operating system’s file system infrastructure. It functions as a high-speed, memory-resident buffer that caches frequently accessed file data, thereby significantly reducing the overhead of disk I/O operations.
When a process reads from a file, the kernel first checks whether the requested data is present in the page cache. If it is, the data is served directly from memory, bypassing the much slower disk access path. If not, the system fetches the data from disk, inserts it into the cache, and then returns it to the process. This mechanism greatly enhances system responsiveness and throughput, especially in workloads involving repeated or sequential file access patterns.
Because disk I/O is orders of magnitude slower than memory access, the page cache is essential for delivering high performance in modern operating systems. Without it, every file read or write would necessitate direct disk access, resulting in substantial latency and performance degradation.
Once the page cache is introduced, file data can be directly mapped into a
process’s virtual memory space through mmap(). Instead of allocating
memory and copying file contents manually, mmap() establishes a direct
mapping to pages backed by the page cache. This integration enables
efficient, demand-paged file I/O: page faults during memory access are
resolved by loading data from the page cache. If the page is not yet cached,
the kernel fetches it from disk into the cache, then maps it into the
process. When mapped pages are modified, they are marked as dirty, and
later written back to disk as part of the page cache’s write-back policy.
This mechanism ensures consistency while minimizing redundant I/O
operations.
To maintain consistency and durability, the page cache uses a write-back
policy. Modifications to cached file data are initially applied in memory,
and the affected pages are marked as dirty. These pages are later flushed to
disk, either explicitly via the fsync() system call or automatically by
background kernel threads. This approach optimizes performance by deferring
costly write operations, while still ensuring data persistence.
Since memory is a limited resource, the kernel must eventually evict pages from the page cache to make room for new data. This requires a cache eviction policy that decides which pages to reclaim based on usage patterns—commonly using heuristics such as Least Recently Used (LRU). Eviction is a critical aspect of page cache design, as it balances memory pressure with the goal of retaining useful data in cache to maximize performance.
The page cache also enables readahead, a performance optimization that preemptively loads file pages into memory based on access patterns. When the kernel detects sequential file access, it predicts future reads and asynchronously fetches additional pages into the cache. This significantly reduces the number of page faults and improves I/O latency, making readahead particularly effective for streaming large files or reading large datasets.
Implementing a page cache is a critical step toward building a modern, high-performance operating system. It not only enhances file system efficiency but also provides insight into how the OS bridges the performance gap between fast volatile memory and slow persistent storage.
§Page Cache in KeOS
Your goal is to extend KeOS to support a page cache to bridge file I/O operations with memory-backed caching. The template provides three major abstraction:
-
PageCacheInner: An internal wrapper that coordinates low-level interactions between the cache, and storage layer. -
PageCacheState: This is the high-level cache manager. It embeds anLRUCachekeyed by(InodeNumber, FileBlockNumber)and manages up to 512 slots (~2MiB). It is the central entry point for page-cache-aware file I/O. -
Slot: ASlotrepresents a single cached file block. It contains the owningRegularFile, the corresponding file block number (FBA), the backingPage, and metadata such as the write-back size. Dirty slots track modifications and are eventually flushed back to disk.
§Readahead Policy
KeOS employs a simple readahead policy: when a file block is read, the cache preemptively loads up to 16 subsequent blocks. This heuristic is designed to optimize sequential access workloads (e.g., file scans or streaming), reducing future read latency and improving throughput. Random workloads remain unaffected, since readahead is limited and opportunistic.
§Cache Replacement: LRU
PageCacheState relies on an Least-Recently-Used (LRU) policy to manage
memory pressure. When the cache reaches capacity, the least recently used
slot is evicted. If the slot is dirty, its contents are flushed back to disk
before eviction. This policy balances simplicity and efficiency by retaining
hot (recently accessed) pages while discarding cold ones. All these
functionalities are provided by the LRUCache struct.
§Workflow
-
Read: On a read request, the cache checks for an existing slot. If present, data is served from memory; otherwise, the block is loaded from disk, inserted into the cache, and readahead is triggered.
-
Write: Writes update the cached slot in place. The slot is marked dirty and write-back occurs lazily, either via explicit sync or eviction.
-
mmap: Pages can be directly mapped into user space from the page cache. Faults are resolved by pulling in the corresponding slot.
-
Unlink: When a file is deleted, all its slots are invalidated without flushing, ensuring consistency with the file system state.
-
Writeback: Dirty slots are flushed either explicitly (via
fsync) or opportunistically during eviction. This ensures persistence while reducing redundant disk I/O.
The following diagram depicts the work-flow of the page cache subsystem of the KeOS.
+-------------------------------+
| Process issues |
| read(), write(), or mmap() |
+---------------+---------------+
|
v
+-------------------+
| Check Slot in |
| PageCacheState |
+--------+----------+
Hit | Miss
| |
v v
+------------+ +----------------------+
| Serve data | | Load block from disk |
| from cache | +-----------+----------+
+-----+------+ |
| v
| +----------------------+
| | Insert block as Slot |
| | into LRUCache |
| | Trigger readahead |
| +-----------+----------+
| |
+-------------------------+
|
v
+-------------------+
| Is this a write? |
+--------+----------+
Yes
|
v
+---------------+---------+
| Mark Slot dirty (defer |
| writeback to disk) |
+-----------+-------------+§Implementation Requirements
You need to implement the followings:
PageCacheState::readaheadPageCacheState::do_readPageCacheState::do_writePageCacheState::do_mmapSlot::read_pageSlot::write_pageSlot::writebackSlot::dropPageCache::read
After implement the functionalities, move on to the next section.
Modules§
- overlaying
- An overlaying mechanism for appling page cache to any file system.
Structs§
- Page
Cache - A reference-counted handle to the page cache.
- Page
Cache Inner - Internal representation of a
PageCache. - Page
Cache State - The global page cache state.
- Slot
- A single entry in the page cache.