Module file_struct

Module file_struct 

Source
Expand description

§File state of a process.

One of the kernel’s primary responsibilities is managing process states. A process is an instance of a program being executed, abstracting a machine by encompassing various states like memory allocation, CPU registers, and the files it operates on. These process states are crucial for the kernel to allocate resources, prioritize tasks, and manage the process lifecycle (including creation, execution, suspension, and termination). The kernel processes system calls by evaluating the current state of the associated processes, checking resource availability, and ensuring that the requested operation is carried out safely and efficiently. Between them, this project focuses on the kernel’s interaction with the file system.

§File

A file primary refers an interface for accessing disk-based data. At its core, a file serves as a sequential stream of bytes. There are two primary types of files in most file systems:

  • Regular files: These contain user or system data, typically organized as a sequence of bytes. They can store text, binary data, executable code, and more. Regular files are the most common form of file used by applications for reading and writing data.

  • Directories: A directory is a special kind of file that contains mappings from human-readable names (filenames) to other files or directories. Directories form the backbone of the file system’s hierarchical structure, allowing files to be organized and accessed via paths.

Processes interact with files through file descriptors, which serve as handles to open file objects. File descriptors provide an indirection layer that allows user programs to perform operations like reading, writing, seeking, and closing, without exposing the internal details of file objects. This file descriptor plays a crucial security role: actual file objects reside in kernel space, and are never directly accessible from user space. By using descriptors as opaque references, the operating system enforces strict isolation between user and kernel memory, preventing accidental or malicious tampering with sensitive kernel-managed resources.

File descriptors are small integer values, typically starting from 0, that index into the process’s file descriptor table. This table holds references to open file objects, including metadata like the file’s location, access mode (e.g., read or write), and other details necessary for I/O operations. When a process issues a file operation (e.g., reading, writing, or seeking), it provides the appropriate file descriptor as an argument to the system call. The kernel uses this descriptor to access the corresponding entry in the table and perform the requested operation.

§“Everything is a File”

Beyond the abstraction about disk, the file abstraction is applied uniformly across a wide range of system resources. “Everything is a file” is a Unix-inspired design principle that simplifies system interaction by treating various resources—including devices, sockets, and processes—as files. While not an absolute rule, this philosophy influences many Unix-based systems, encouraging the representation of objects as file descriptors and enabling interaction through standard I/O operations. This approach provides a unified and consistent way to handle different types of system objects.

A key aspect of this principle is the existence of standard file descriptors:

  • Standard Input (stdin) - File Descriptor 0: Used for reading input data (e.g., keyboard input or redirected file input).
  • Standard Output (stdout) - File Descriptor 1: Used for writing output data (e.g., printing to the terminal or redirecting output to a file).
  • Standard Error (stderr) - File Descriptor 2: Used for writing error messages separately from standard output.

Another important mechanism following this design is the pipe, which allows interprocess communication by connecting the output of one process to the input of another. Pipes function as a buffer between processes, facilitating seamless data exchange without requiring intermediate storage in a file. For example, executing:

ls | grep "file"

connects the ls command’s output to the grep command’s input through a pipe.

§Files in KeOS

You need to extend KeOS to support the following system call with a file abstraction:

  • open: Open a file.
  • read: Read data from a file.
  • write: Write data to a file.
  • close: Close an open file.
  • seek: Set the file pointer to a specific position.
  • tell: Get the current position of the file.
  • pipe: Create an interprocess communication channel.

To manage the state about file, KeOS manages per-process specific state about file called FileStruct, which is corresponding to the Linux kernel’s struct file_struct. Through this struct, you need to manage file descriptors that represent open files within a process. Since many system interactions are built around file descriptors, understanding this principle will help you design efficient and flexible system call handlers for file operations.

You need to implement system call handlers with FileStruct struct that manages file states for a process. For example, it contains current working directory of a file (cwd), and tables of file descriptors, which map each file descriptor (fd) to a specific FileKind state. When invoking system calls, you must update these file states accordingly, ensuring the correct file state is used for each operation. To store the mapping between file descriptor and FileKind state, KeOS utilizes BTreeMap provided by the alloc::collections module. You might refer to channel and teletype module for implementing stdio and channel I/O.

As mentioned before, kernel requires careful error handling. The kernel must properly ensuring that errors are reported in a stable and reliable manner without causing system crashes.

§User Memory Access

Kernel MUST NOT believe the user input. User might maliciously or mistakenly inject invalid inputs to the system call arguments. If such input represents the invalid memory address or kernel address, directly accessing the address can leads security threats.

To safely interact with user-space memory when handling system call, KeOS provides uaccess module:

  • UserPtrRO: A read-only user-space pointer, used for safely retrieving structured data from user memory.
  • UserPtrWO: A write-only user-space pointer, used for safely writing structured data back to user memory.
  • UserCString: Read null-terminated strings from user-space (e.g., file paths).
  • UserU8SliceRO: Read byte slices from user-space (e.g., buffers for reading files).
  • UserU8SliceWO: Write byte slices to user-space (e.g., buffers for writing files).

These types help prevent unsafe memory access and ensure proper bounds checking before performing read/write operations. When error occurs during the check, it returns the Err with KernelError::BadAddress. You can simply combining the ? operator with the methods to propagate the error to the system call entry. Therefore, you should never use unsafe code directly for accessing user-space memory. Instead, utilize these safe abstractions, which provide built-in validation and access control, reducing the risk of undefined behavior, security vulnerabilities, and kernel crashes.

§Implementation Requirements

You need to implement the followings:

This ends the project 1.

Structs§

File
The File struct represents an abstraction over a file descriptor in the operating system.
FileDescriptor
Represents an index into a process’s file descriptor table.
FileStruct
The FileStruct represents the filesystem state for a specific process, which corresponding to the Linux kernel’s struct files_struct.

Enums§

FileKind
The type of a file in the filesystem.