Module page_cache

Expand description

§Page Cache.

The page cache is a fundamental component of the operating system’s file system infrastructure. It functions as a high-speed, memory-resident buffer that caches frequently accessed file data, thereby significantly reducing the overhead of disk I/O operations.

When a process reads from a file, the kernel first checks whether the requested data is present in the page cache. If it is, the data is served directly from memory, bypassing the much slower disk access path. If not, the system fetches the data from disk, inserts it into the cache, and then returns it to the process. This mechanism greatly enhances system responsiveness and throughput, especially in workloads involving repeated or sequential file access patterns.

Because disk I/O is orders of magnitude slower than memory access, the page cache is essential for delivering high performance in modern operating systems. Without it, every file read or write would necessitate direct disk access, resulting in substantial latency and performance degradation.

Once the page cache is introduced, file data can be directly mapped into a process’s virtual memory space through mmap(). Instead of allocating memory and copying file contents manually, mmap() establishes a direct mapping to pages backed by the page cache. This integration enables efficient, demand-paged file I/O: page faults during memory access are resolved by loading data from the page cache. If the page is not yet cached, the kernel fetches it from disk into the cache, then maps it into the process. When mapped pages are modified, they are marked as dirty, and later written back to disk as part of the page cache’s write-back policy. This mechanism ensures consistency while minimizing redundant I/O operations.

To maintain consistency and durability, the page cache uses a write-back policy. Modifications to cached file data are initially applied in memory, and the affected pages are marked as dirty. These pages are later flushed to disk, either explicitly via the fsync() system call or automatically by background kernel threads. This approach optimizes performance by deferring costly write operations, while still ensuring data persistence.

Since memory is a limited resource, the kernel must eventually evict pages from the page cache to make room for new data. This requires a cache eviction policy that decides which pages to reclaim based on usage patterns—commonly using heuristics such as Least Recently Used (LRU). Eviction is a critical aspect of page cache design, as it balances memory pressure with the goal of retaining useful data in cache to maximize performance.

The page cache also enables readahead, a performance optimization that preemptively loads file pages into memory based on access patterns. When the kernel detects sequential file access, it predicts future reads and asynchronously fetches additional pages into the cache. This significantly reduces the number of page faults and improves I/O latency, making readahead particularly effective for streaming large files or reading large datasets.

Implementing a page cache is a critical step toward building a modern, high-performance operating system. It not only enhances file system efficiency but also provides insight into how the OS bridges the performance gap between fast volatile memory and slow persistent storage.

§Page Cache in KeOS

Your goal is to extend KeOS to support a page cache to bridge file I/O operations with memory-backed caching. The template provides three major abstraction:

PageCacheInner: An internal wrapper that coordinates low-level interactions between the cache, and storage layer.
PageCacheState: This is the high-level cache manager. It embeds an LRUCache keyed by (InodeNumber, FileBlockNumber) and manages up to 512 slots (~2MiB). It is the central entry point for page-cache-aware file I/O.
Slot: A Slot represents a single cached file block. It contains the owning RegularFile, the corresponding file block number (FBA), the backing Page, and metadata such as the write-back size. Dirty slots track modifications and are eventually flushed back to disk.

§Readahead Policy

KeOS employs a simple readahead policy: when a file block is read, the cache preemptively loads up to 16 subsequent blocks. This heuristic is designed to optimize sequential access workloads (e.g., file scans or streaming), reducing future read latency and improving throughput. Random workloads remain unaffected, since readahead is limited and opportunistic.

§Cache Replacement: LRU

PageCacheState relies on an Least-Recently-Used (LRU) policy to manage memory pressure. When the cache reaches capacity, the least recently used slot is evicted. If the slot is dirty, its contents are flushed back to disk before eviction. This policy balances simplicity and efficiency by retaining hot (recently accessed) pages while discarding cold ones. All these functionalities are provided by the LRUCache struct.

§Workflow

Read: On a read request, the cache checks for an existing slot. If present, data is served from memory; otherwise, the block is loaded from disk, inserted into the cache, and readahead is triggered.
Write: Writes update the cached slot in place. The slot is marked dirty and write-back occurs lazily, either via explicit sync or eviction.
mmap: Pages can be directly mapped into user space from the page cache. Faults are resolved by pulling in the corresponding slot.
Unlink: When a file is deleted, all its slots are invalidated without flushing, ensuring consistency with the file system state.
Writeback: Dirty slots are flushed either explicitly (via fsync) or opportunistically during eviction. This ensures persistence while reducing redundant disk I/O.

The following diagram depicts the work-flow of the page cache subsystem of the KeOS.

           +-------------------------------+
           |       Process issues          |
           |  read(), write(), or mmap()   |
           +---------------+---------------+
                           |
                           v
                +-------------------+
                |  Check Slot in    |
                |  PageCacheState   |
                +--------+----------+
                 Hit            |   Miss
                  |             |
                  v             v
           +------------+   +----------------------+
           | Serve data |   | Load block from disk |
           | from cache |   +-----------+----------+
           +-----+------+               |
                 |                      v
                 |             +----------------------+
                 |             | Insert block as Slot |
                 |             |  into LRUCache       |
                 |             |  Trigger readahead   |
                 |             +-----------+----------+
                 |                         |
                 +-------------------------+
                 |
                 v
          +-------------------+
          | Is this a write?  |
          +--------+----------+
                 Yes
                  |
                  v
    +---------------+---------+
    | Mark Slot dirty (defer  |
    | writeback to disk)      |
    +-----------+-------------+

§Implementation Requirements

You need to implement the followings:

After implement the functionalities, move on to the next section.

Modules§

overlaying: An overlaying mechanism for appling page cache to any file system.

Structs§

PageCache: A reference-counted handle to the page cache.
PageCacheInner: Internal representation of a PageCache.
PageCacheState: The global page cache state.
Slot: A single entry in the page cache.