keos_project2/loader/mod.rs
1//! ## ELF Loading.
2//!
3//! When you run a program on a modern operating system, the kernel needs to
4//! know **how to take the program stored on disk and place it into memory so it
5//! can start running**. The file format that describes this mapping is called
6//! **ELF (Executable and Linkable Format)**.
7//!
8//! 
9//!
10//! Think of ELF as a "blueprint" for a program’s memory layout. It tells the
11//! kernel where each part of the program (code, data, uninitialized variables)
12//! should go in memory, what permissions they need (read, write, execute), and
13//! where the program should begin execution.
14//!
15//! An ELF file contains:
16//! - **ELF header** – a small table of contents that points to the rest of the
17//! file and gives the program’s entry point (where execution starts).
18//! - **Program headers** – a list of **segments**, each describing a chunk of
19//! the file that should be loaded into memory.
20//!
21//! In KeOS, we only care about the **program headers**, because they tell us
22//! how to build the process’s memory image. Each program header (`Phdr`) says:
23//! - **Virtual address** ([`Phdr::p_vaddr`]) – Where in memory they should go.
24//! - **Memory size** ([`Phdr::p_memsz`]) – How big the memory region should be
25//! in total.
26//! - **File size** ([`Phdr::p_filesz`]) – How many bytes come from the file.
27//! - **File offset** ([`Phdr::p_offset`]) - Where in the file the bytes are
28//! stored.
29//! - **Permissions** ([`Phdr::p_flags`]) – What permissions the region needs.
30//!
31//! You can iterate through the [`Phdr`]s with [`Elf::phdrs()`], which returns
32//! an iterator over [`Phdr`] entries. The most important type of header is
33//! [`PType::Load`], meaning “this segment must be loaded into memory.” To load
34//! it, KeOS does the following:
35//! 1. Allocate memory at the given virtual address using [`MmStruct::do_mmap`].
36//! 2. Copy `filesz` bytes from the ELF file (starting at `p_offset`) into that
37//! memory.
38//! 3. If `memsz > filesz`, fill the extra space with zeros — this is how ELF
39//! represents the **`.bss` section**, which holds uninitialized global
40//! variables.
41//!
42//! By repeating this for every loadable segment, the kernel reconstructs the
43//! program’s expected memory image: code in `.text`, constants in `.rodata`,
44//! variables in `.data`, and zero-initialized memory in `.bss`. When this is
45//! done, the program’s virtual memory matches exactly what the compiler and
46//! linker prepared, and the kernel can safely jump to the entry point to start
47//! execution.
48//!
49//! There are some pitfalls while loding a ELF:
50//! - `p_vaddr` must be page-aligned. If not, round it down and adjust offsets
51//! accordingly.
52//! - Ensure segments do not overwrite existing mappings like the stack or
53//! kernel memory.
54//!
55//! ## State on Program Startup
56//!
57//! The KeOS user-space C library (`kelibc`) defines `_start()`, located in
58//! `kelibc/entry.c`, as the program entry point. It calls `main()` and
59//! exits when `main()` returns. The kernel must set up the registers and user
60//! program's stack correctly before execution, passing arguments according to
61//! the standard calling convention. [`Registers`] contains the CPU states on
62//! launching a program, including instruction pointer, stack pointer, and
63//! general-purpose registers.
64//
65//!
66//! **Example command:** `/bin/ls -l foo bar`
67//!
68//! 1. Split the command into words: `"/bin/ls"`, `"-l"`, `"foo"`, `"bar"`.
69//! 2. Copy the argument strings to the top of the stack (order does not
70//! matter).
71//! 3. Push their addresses, followed by a null sentinel (`argv[argc] = NULL`).
72//! - Align the stack pointer to an 8-byte boundary for performance.
73//! 4. Set `%rdi = argc` (argument count) and `%rsi = argv` (argument array).
74//! 5. Push a fake return address to maintain stack integrity.
75//!
76//! **Example stack layout before execution:**
77//!
78//! | Address | Name | Data | Type |
79//! | ---------- | -------------- | ---------- | ----------- |
80//! | 0x4747fffc | argv\[3\]\[...\] | 'bar\0' | char\[4\] |
81//! | 0x4747fff8 | argv\[2\]\[...\] | 'foo\0' | char\[4\] |
82//! | 0x4747fff5 | argv\[1\]\[...\] | '-l\0' | char\[3\] |
83//! | 0x4747ffed | argv\[0\]\[...\] | '/bin/ls\0'| char\[8\] |
84//! | 0x4747ffe8 | word-align | 0 | uint8_t\[\] |
85//! | 0x4747ffe0 | argv\[4\] | 0 | char * |
86//! | 0x4747ffd8 | argv\[3\] | 0x4747fffc | char * |
87//! | 0x4747ffd0 | argv\[2\] | 0x4747fff8 | char * |
88//! | 0x4747ffc8 | argv\[1\] | 0x4747fff5 | char * |
89//! | 0x4747ffc0 | argv\[0\] | 0x4747ffed | char * |
90//! | 0x4747ffb8 | return address | 0 | void (*) () |
91//!
92//! The stack pointer (`rsp`) is initialized to `0x4747ffb8`. The first two
93//! arguments, `%rdi` and `%rsi`, should be `4` and `0x4747ffc0`, respectively.
94//! The user program stack always starts at `0x47480000` in KeOS, and always
95//! grows downward.
96//!
97//! [`StackBuilder`] is a utility for constructing user-space stacks. It
98//! provides:
99//!
100//! - [`StackBuilder::push_usize`] – Pushes a `usize` value (e.g., pointers like
101//! `argv[]`).
102//! - [`StackBuilder::push_str`] – Pushes a null-terminated string and returns
103//! its address.
104//! - [`StackBuilder::align`] – Aligns the stack pointer for proper memory
105//! access.
106//!
107//! You can use these methods to set up the stack in
108//! [`LoadContext::build_stack`].
109//!
110//! #### Launching a Process
111//!
112//! After loading the program in memory and setting up the stack, the kernel
113//! switches to user mode and begin execution. This is done using
114//! [`Registers::launch`]. Calling [`Registers::launch`] causes the CPU to
115//! change the privilege level to user mode and start executing from `rip`. You
116//! don't need to implement this functionality as it is already implemented in
117//! the outside of this module.
118//!
119//! ## Implementation Requirements
120//! You need to implement the followings:
121//! - [`Phdr::permission`]
122//! - [`StackBuilder::push_bytes`]
123//! - [`LoadContext::load_phdr`]
124//! - [`LoadContext::build_stack`]
125//!
126//! This ends the project 2.
127//!
128//! [`Registers::launch`]: ../../keos/syscall/struct.Registers.html#method.launch
129
130#[allow(dead_code)]
131pub mod elf;
132pub mod stack_builder;
133
134use crate::{mm_struct::MmStruct, pager::Pager};
135#[cfg(doc)]
136use elf::Phdr;
137use elf::{Elf, PType};
138#[cfg(doc)]
139use keos::mm::page_table::Permission;
140use keos::{
141 KernelError,
142 addressing::{PAGE_MASK, Va},
143 fs::RegularFile,
144 syscall::Registers,
145};
146use stack_builder::StackBuilder;
147
148/// A context that holds the necessary state for loading and initializing a user
149/// program.
150///
151/// `LoadContext` is used during the loading an ELF binary into memory. It
152/// encapsulates both the memory layout for the program and its initial register
153/// state, allowing the loader to fully prepare the user-space
154/// execution context.
155pub struct LoadContext<P: Pager> {
156 /// Virtual memory layout for the new user program.
157 pub mm_struct: MmStruct<P>,
158 /// Initial CPU register values for the user process, including the
159 /// instruction pointer.
160 pub regs: Registers,
161}
162
163impl<P: Pager> LoadContext<P> {
164 /// Loads program headers ([`Phdr`]s) from an ELF binary into memory.
165 ///
166 /// This function iterates over the ELF program headers and maps the
167 /// corresponding segments into the process's memory space. It ensures
168 /// that each segment is correctly mapped according to its permissions
169 /// and alignment requirements.
170 ///
171 /// # Parameters
172 /// - `elf`: The ELF binary representation containing program headers.
173 ///
174 /// # Returns
175 /// - `Ok(())` on success, indicating that all segments were successfully
176 /// loaded.
177 /// - `Err(KernelError)` if any error occurs during the loading process,
178 /// such as an invalid memory mapping, insufficient memory, or an
179 /// unsupported segment type.
180 ///
181 /// # Behavior
182 /// - Iterates over all program headers using [`Elf::phdrs`].
183 /// - Maps each segment into memory if its type is [`PType::Load`].
184 /// - Applies appropriate memory permissions using [`Phdr::permission`].
185 /// - Ensures proper alignment and memory allocation before mapping.
186 pub fn load_phdr(&mut self, elf: Elf) -> Result<(), KernelError> {
187 let mut bss = Va::new(0).unwrap();
188
189 for phdr in elf.phdrs().map_err(|_| KernelError::InvalidArgument)? {
190 if phdr.type_ == PType::Load {
191 let (vaddr, memsz, filesz, fileofs, perm): (Va, _, _, _, _) =
192 (todo!(), todo!(), todo!(), todo!(), phdr.permission());
193 bss = bss.max(vaddr + filesz as usize);
194 todo!()
195 }
196 }
197
198 if bss.into_usize() & PAGE_MASK != 0 {
199 self.mm_struct
200 .get_user_page_and(bss, |mut page, _| {
201 page.inner_mut()[bss.into_usize() & PAGE_MASK..].fill(0);
202 })
203 .unwrap();
204 }
205 Ok(())
206 }
207
208 /// Builds a user stack and initializes it with arguments.
209 ///
210 /// This function sets up a new stack for the process by allocating memory,
211 /// pushing program arguments (`argv`), and preparing the initial register
212 /// state.
213 ///
214 /// # Parameters
215 /// - `arguments`: A slice of strs representing the command-line arguments
216 /// (`argv`).
217 /// - `regs`: A mutable reference to the register state, which will be
218 /// updated with the initial stack pointer (`sp`) and argument count
219 /// (`argc`).
220 ///
221 /// # Returns
222 /// - `Ok(())` on success, indicating that the stack has been built
223 /// correctly.
224 /// - `Err(KernelError)` if an error occurs during memory allocation or
225 /// argument copying.
226 ///
227 /// # Behavior
228 /// - Pushes the argument strings onto the stack.
229 /// - Sets up `argv` and `argc` for the process.
230 /// - Aligns the stack pointer to ensure proper function call execution.
231 ///
232 /// # Safety
233 /// - The function must be called before transferring control to user space.
234 /// - The memory layout should follow the standard calling convention for
235 /// argument passing.
236 pub fn build_stack(&mut self, arguments: &[&str]) -> Result<(), KernelError> {
237 let Self {
238 mm_struct: mm_state,
239 regs,
240 } = self;
241 let mut builder = StackBuilder::new(mm_state)?;
242 todo!()
243 }
244
245 /// Creates a new memory state and initializes a user process from an ELF
246 /// executable.
247 ///
248 /// This function loads an ELF binary into memory, sets up the program
249 /// headers, and constructs the initial user stack with arguments. It
250 /// also prepares the register state for execution.
251 ///
252 /// # Parameters
253 /// - `file`: A reference to the ELF executable file.
254 /// - `args`: A slice of strs representing the command-line arguments
255 /// (`argv`).
256 ///
257 /// # Returns
258 /// - `Ok((Self, Registers))` on success, where:
259 /// - `Self` is the initialized memory state.
260 /// - `Registers` contains the initial register values.
261 /// - `Err(KernelError)` if an error occurs while loading the ELF file or
262 /// setting up memory.
263 ///
264 /// # Behavior
265 /// - Parses the ELF file and validates its format.
266 /// - Loads program headers ([`PType::Load`]) into memory.
267 /// - Allocates and builds the user stack.
268 /// - Initializes the register state (`rip` -> entry point, `rsp` -> stack
269 /// pointer, arg1 -> the number of arguments, arg1 -> address of arguments
270 /// vector.).
271 pub fn load(mut self, file: &RegularFile, args: &[&str]) -> Result<Self, KernelError> {
272 if let Some(elf) = elf::Elf::from_file(file) {
273 *self.regs.rip() = elf.header.e_entry as usize;
274 self.load_phdr(elf)?;
275 self.build_stack(args)?;
276
277 Ok(self)
278 } else {
279 Err(KernelError::InvalidArgument)
280 }
281 }
282}