team-karma team mailing list archive

Thread
Date
ASDFS

To: team-karma@xxxxxxxxxxxxxxxxxxx
From: arvind iyer <iyer.arvind.sundaram@xxxxxxxxx>
Date: Fri, 18 Sep 2009 14:26:30 +0530
*Consider this as a preliminary documentation of ASDFS*


*ASDFS*

*Requirement of a File System*

There are several instances in scientific computing where data, which has
been serialized needs to be written into files for later retrival. When the
size of this serialized data is small (say a few bytes), but many in number,
it makes little sense to have each data written to a separate file, because
of the minimum block size of the underlying file system(eg 4kb in ext3 will
result in a size 512 times required). One option ofcourse is arrangement of
the data in a single large structure and writing it in a single file,
however this has serious limitations as the data cannot expand at runtime in
the middle of the file. The requirement hence is clear, have a single file
on the native filesystem, and implement a linked block-architecture inside
it, so that expansion of data is possible. Effectively create a filesystem
with favourable properties as enumerated below.

*Features*


   1. *Linked Block Structure*:The file system under consideration shall
   havea linked block structure. This effectively means that one will be able
   to open several independant file streams (as if they were separate
   files),but inside the same file on the native file system. This is the most
   preliminary feature needed.
   2. *Muliple Block Sizes:* In any file system there is an optimization for
   the block size. If the block size is small, it results in lesser wastage
   space in the form of semi filled blocks, and on the other side leads to
   larger number of blocks when a large file is written. On the other hand a
   large block size will lead fewer blocks, but a wastage is incurred when we
   have lot of small files. Typically one would wish to have smaller blocks for
   smaller data and larger blocks for larger data. Hence in the current
   implementation, we have the feature of multiple block sizes where one may
   choose the block size to best suit the data.
   3. *Atomic Blocks:* Several instances arise when the size of the data is
   fixed, and is never going to expand. Eg writing just a double requires
   exactly 8 bytes, and a spatial complex vector with three complex numbers
   each having two floats, requires 3*2*4 bytes. This data shall never expand.
   Hence it makes little sense to have the *linking of blocks* feature. For
   such cases a number of *Atomic Blocks* shall be provided. These shall be
   typically small.
   4. *Directory-File Structure:*  A directory and a file structure very
   similar to the existing file systems shall be implemented. This will allow
   easy management of the streams. We plan to have hashed and tree structure
   for directory entries, for a quicker name lookup
   5. *Large Data Size:*  We plan to have a capability of huge file size
   upto 128TB
   6. *Hard Links: *These are useful when the same content is guranteed to
   appear in may locations in the file system. A typical example is the header
   information of images of same size and depth.
   7. *FUSE mountable:* Since we are having almost all features of a file
   system it makes sense to have it FUSE mountable too, initially readonly, and
   later if possible read-write  too. This will allow the data stored in the
   file system easily accessible to other applications.


*     ...to be continued. Prasad could you edit/enhance the above*


-- 
================================
Work while you are alive, you can rest later
Follow ups

ASDFS
From: arvind iyer, 2009-09-20