Expand description
§File state of a process.
One of the kernel’s primary responsibilities is managing process states. A process is an instance of a program being executed, abstracting a machine by encompassing various states like memory allocation, CPU registers, and the files it operates on. These process states are crucial for the kernel to allocate resources, prioritize tasks, and manage the process lifecycle (including creation, execution, suspension, and termination). The kernel processes system calls by evaluating the current state of the associated processes, checking resource availability, and ensuring that the requested operation is carried out safely and efficiently. Between them, this project focuses on the kernel’s interaction with the file system.
§File
A file primary refers an interface for accessing disk-based data. At its core, a file serves as a sequential stream of bytes. There are two primary types of files in most file systems:
-
Regular files: These contain user or system data, typically organized as a sequence of bytes. They can store text, binary data, executable code, and more. Regular files are the most common form of file used by applications for reading and writing data.
-
Directories: A directory is a special kind of file that contains mappings from human-readable names (filenames) to other files or directories. Directories form the backbone of the file system’s hierarchical structure, allowing files to be organized and accessed via paths.
Processes interact with files through file descriptors, which serve as handles to open file objects. File descriptors provide an indirection layer that allows user programs to perform operations like reading, writing, seeking, and closing, without exposing the internal details of file objects. This file descriptor plays a crucial security role: actual file objects reside in kernel space, and are never directly accessible from user space. By using descriptors as opaque references, the operating system enforces strict isolation between user and kernel memory, preventing accidental or malicious tampering with sensitive kernel-managed resources.
File descriptors are small integer values, typically starting from 0, that index into the process’s file descriptor table. This table holds references to open file objects, including metadata like the file’s location, access mode (e.g., read or write), and other details necessary for I/O operations. When a process issues a file operation (e.g., reading, writing, or seeking), it provides the appropriate file descriptor as an argument to the system call. The kernel uses this descriptor to access the corresponding entry in the table and perform the requested operation.
§“Everything is a File”
Beyond the abstraction about disk, the file abstraction is applied uniformly across a wide range of system resources. “Everything is a file” is a Unix-inspired design principle that simplifies system interaction by treating various resources—including devices, sockets, and processes—as files. While not an absolute rule, this philosophy influences many Unix-based systems, encouraging the representation of objects as file descriptors and enabling interaction through standard I/O operations. This approach provides a unified and consistent way to handle different types of system objects.
A key aspect of this principle is the existence of standard file descriptors:
- Standard Input (stdin) - File Descriptor 0: Used for reading input data (e.g., keyboard input or redirected file input).
- Standard Output (stdout) - File Descriptor 1: Used for writing output data (e.g., printing to the terminal or redirecting output to a file).
- Standard Error (stderr) - File Descriptor 2: Used for writing error messages separately from standard output.
Another important mechanism following this design is the pipe, which allows interprocess communication by connecting the output of one process to the input of another. Pipes function as a buffer between processes, facilitating seamless data exchange without requiring intermediate storage in a file. For example, executing:
ls | grep "file"connects the ls command’s output to the grep command’s input through a
pipe.
§Files in KeOS
You need to extend KeOS to support the following system call with a file abstraction:
open: Open a file.read: Read data from a file.write: Write data to a file.close: Close an open file.seek: Set the file pointer to a specific position.tell: Get the current position of the file.pipe: Create an interprocess communication channel.
To manage the state about file, KeOS manages per-process specific state
about file called FileStruct, which is corresponding to the Linux
kernel’s struct file_struct. Through this struct, you need to manage file
descriptors that represent open files within a process. Since many system
interactions are built around file descriptors, understanding this principle
will help you design efficient and flexible system call handlers for file
operations.
You need to implement system call handlers with FileStruct struct that
manages file states for a process. For example, it contains current working
directory of a file (cwd), and tables of file descriptors, which map each
file descriptor (fd) to a specific FileKind state. When invoking system
calls, you must update these file states accordingly, ensuring the correct
file state is used for each operation. To store the mapping between file
descriptor and FileKind state, KeOS utilizes BTreeMap provided by the
alloc::collections module. You might refer to channel and
teletype module for implementing stdio and channel I/O.
As mentioned before, kernel requires careful error handling. The kernel must properly ensuring that errors are reported in a stable and reliable manner without causing system crashes.
§User Memory Access
Kernel MUST NOT believe the user input. User might maliciously or mistakenly inject invalid inputs to the system call arguments. If such input represents the invalid memory address or kernel address, directly accessing the address can leads security threats.
To safely interact with user-space memory when handling system call, KeOS
provides uaccess module:
UserPtrRO: A read-only user-space pointer, used for safely retrieving structured data from user memory.UserPtrWO: A write-only user-space pointer, used for safely writing structured data back to user memory.UserCString: Read null-terminated strings from user-space (e.g., file paths).UserU8SliceRO: Read byte slices from user-space (e.g., buffers for reading files).UserU8SliceWO: Write byte slices to user-space (e.g., buffers for writing files).
These types help prevent unsafe memory access and ensure proper bounds
checking before performing read/write operations. When error occurs during
the check, it returns the Err with KernelError::BadAddress. You can
simply combining the ? operator with the methods to propagate the error to
the system call entry. Therefore, you should never use unsafe code
directly for accessing user-space memory. Instead, utilize these safe
abstractions, which provide built-in validation and access control, reducing
the risk of undefined behavior, security vulnerabilities, and kernel
crashes.
§Implementation Requirements
You need to implement the followings:
FileStruct::install_fileFileStruct::openFileStruct::readFileStruct::writeFileStruct::seekFileStruct::tellFileStruct::closeFileStruct::pipe
This ends the project 1.
Structs§
- File
- The
Filestruct represents an abstraction over a file descriptor in the operating system. - File
Descriptor - Represents an index into a process’s file descriptor table.
- File
Struct - The
FileStructrepresents the filesystem state for a specific process, which corresponding to the Linux kernel’sstruct files_struct.
Enums§
- File
Kind - The type of a file in the filesystem.