Indexed and Extensible Files
The basic file system allocates files as a single extent, making it vulnerable to external fragmentation, that is, it is possible that an n-block file cannot be allocated even though n blocks are free (i.e. external fragmentation). Eliminate this problem by modifying the on-disk inode structure.
In practice, this probably means using an index structure with direct, indirect, and doubly indirect blocks. In previous semesters, most students adopted something like what you have learned as Berkeley UNIX FFS with multi-level indexing. However, to make your life easier, we make you implement it in an easier way: FAT. You must implement FAT with given skeleton code. Your code MUST NOT contain any code for multi-level indexing (FFS in the lecture). You will receive 0pts for file growth parts.
NOTE: You can assume that the file system partition will not be larger than 8 MB. You must support files as large as the partition (minus metadata). Each inode is stored in one disk sector, limiting the number of block pointers that it can contain.
Indexing large files with FAT (File Allocation Table)
Warning: this document assumes you have already understood the basic principle of general filesystem and FAT from the lecture. If not, please go back to the lecture note and understand what is filesystem and FAT.
In the basic filesystem that you have used for the previous projects, a file was stored as a contiguous single chunk over multiple disk sectors. Let's call contiguous chunk as cluster, since a cluster (chunk) can contain one or more contiguous disk sectors. In this point of view, the size of a cluster in the basic file system was equal to the size of a file that stored in the cluster.
To mitigate external fragmentation, we can shrink the size of cluster (recall the page size in
virtual memory). For simplicity, in our skeleton code, we fixed number of sectors in a cluster
as 1
. When we use smaller clusters like it, a cluster might not enough to store the entire file.
In this case, we need multiple clusters for a file, so we need a data structure to index the clusters
for a file in the inode. One of the easiest way is to use linked-list (a.k.a chain). An inode
can contain the sector number of the first block of the file, and the first block
may contain the sector number of the second block. This naïve approach, however, is too slow
because we have to read every block of the file, even though what we really need was only the
last block. To overcome this, FAT (File Allocation Table) puts the connectivity of blocks in
a fixed-size File Allocation Table rather than the blocks themselves. Since FAT only contains the
connectivity value rather than the actual data, its size is small enough to be cached in DRAM.
As a result, we can just read corresponding entries in the table.
You will implement inode indexing with the skeleton code provided in filesys/fat.c
. This
section briefly describes what already-implemented functions in fat.c
do and what you
should implement.
First of all, 6 functions in fat.c
(i.e. fat_init()
, fat_open()
, fat_close()
, fat_create()
,
and fat_boot_create()
) initialize and format disks at the boot time, so you don't
have to modify them. However, you'll need to write fat_fs_init()
, and briefly understanding what they
do could be helpful.
cluster_t fat_fs_init (void);
Initialize FAT file system. You have to initialize
fat_length
anddata_start
field offat_fs
.fat_length
stores how many clusters in the filesystem anddata_start
stores in which sector we can start to store files. You may want to exploit some values stored infat_fs->bs
. Also, you may want to initialize some other useful data in this function.
cluster_t fat_create_chain (cluster_t clst);
Extend a chain by appending a cluster after the cluster specified in
clst
(cluster indexing number). Ifclst
is equal to zero, then create a new chain. Return the cluster number of newly allocated cluster.
void fat_remove_chain (cluster_t clst, cluster_t pclst);
Starting from
clst
, remove clusters from a chain.pclst
should be the direct previous cluster in the chain. This means, after the execution of this function,pclst
should be the last element of the updated chain. Ifclst
is the first element in the chain,pclst
should be 0.
void fat_put (cluster_t clst, cluster_t val);
Update FAT entry pointed by cluster number
clst
toval
. Since each entry in FAT points the next cluster in a chain (if exist; otherwiseEOChain
), this could be used to update connectivity.
cluster_t fat_get (cluster_t clst);
Return in which cluster number the given cluster
clst
points.
disk_sector_t cluster_to_sector (cluster_t clst);
Translates a cluster number
clst
into the corresponding sector number and return the sector number.
You may wish to exploit these functions in filesys.c
and inode.c
to augment the basic file system.
File Growth
Implement extensible files. In the basic file system, the file size is specified when the file is created. In most modern file systems, however, a file is initially created with size 0 and is then expanded every time a write is made off the end of the file. Your file system must allow this.
There should be no predetermined limit on the size of a file, except that a file cannot exceed the size of the file system (minus metadata). This also applies to the root directory file, which should now be allowed to expand beyond its initial limit of 16 files.
User programs are allowed to seek beyond the current end-of-file (EOF). The seek itself does not extend the file. Writing at a position past EOF extends the file to the position being written, and any gap between the previous EOF and the start of the write must be filled with zeros. A read starting from a position past EOF returns no bytes.
Writing far beyond EOF can cause many blocks to be entirely zero. Some file systems allocate and write real data blocks for these implicitly zeroed blocks. Other file systems do not allocate these blocks at all until they are explicitly written. The latter file systems are said to support "sparse files." You may adopt either allocation strategy in your file system.