scheme shell
about
download
support
resources
docu
links
 
scsh.net

From: Olin Shivers <shivers@tokyo.cc.gatech.edu>
Newsgroups: comp.lang.scheme.scsh
Subject: file-name-as-directory & filename grammar
Date: 10 Mar 2001 20:36:33 -0500
Organization: College of Computing, Georgia Tech
Message-ID: <k1snkljdcu.fsf@tokyo.cc.gatech.edu>

A while ago, Noah Friedman raised the issue of file-name-as-directory's behaviour on the empty string. Scsh defines f-n-a-d to return "/" in this case; Noah felt it should return "./".

This is something that is discussed at some length in the scsh manual, but Noah was not convinced. This issue has been raised again by Mike Sperber, in the context of the release we're assembling, so I have written up the following explication that may help explain the why of scsh's design.

Filenames are strings that belong to some grammar defining all the syntactically legal filenames. This grammar, if it is a good one, does more than specify the set of legal strings that are filenames -- its structure mirrors the structure of the associated filesystem tree that the filenames index. [Note TrivialGrammar] Operations on the strings that are filenames should mirror operations on these parse trees should mirror operations on the associated filesystem tree; effectively, the filename string ops are implicitly ops on the associated parse trees -- there's an isomorphism. Furthermore, other things being equal, the simpler grammar is the better grammar.

Here is a rather simple (five productions) grammar that fills the bill:

    name ::= [^/]+	; A non-empty sequence of chars not containing 
		    	; slash (or nul). This includes "." and "..".

    afile ::= "" | adir name 	; Absolute file name
    adir  ::= afile "/"		; Absolute directory

    rfile ::= rdir name		; Relative file name
    rdir  ::= "" | rfile "/"	; Relative directory
  • This grammar gives a very simple invariant: to locate a relative file or directory R within an absolute or relative directory D, just append them: D,R. R determines whether the result is a dir or file; D determines whether the result is absolute or relative.

    So here is a general resolve function:

        resolve(cwd, file) =
          if FILE is absolute then return FILE
          else return concat(CWD, FILE)

    Note that FILE is a filename (afile or rfile) and CWD is a directory (adir or rdir).

  • The empty-string afile is the root directory (as a file name)
  • The empty-string rdir is the current-working directory.
  • Dot & dot-dot are not special in the grammar. This is reflected in the fact that they are not special in the "kernel filesystem" navigation code & typedefs presented below in ML (rather, they are handled by the primitive "semantic" function FETCH). Filename simplification that eliminates .'s and ..'s exploits their semantics; again, at the syntactic level, they are just another filename.

    The five-line grammar above is not only very simple, it also structurally reflects the semantics of what we are doing with filesystems. That is, if you build an AST for the grammar, then parsing a filename into an AST gives a structure that is useful for navigating the Unix filesystem.

    In fact, the code below shows a simple ML AST for the grammar above, and some filesystem navigation functions you might write for moving around the directory structure inside a kernel.

  • Some alternate productions, that give equivalent grammars
    
      + afile ::= "" | afile "/" name	; Can eliminate AFILE's need for ADIR.
      + adir  ::= "/" rdir 			; Alternate production for ADIR
  • Any proposal for manipulating filenames needs to proceed from a grammar explaining filenames, both absolute and relative. So, I'd like to see the grammar your system is using, when you propose alternate definitions for functions that manipulate strings in your grammar.

    -Olin


(* ML-style code defining an AST for the filename grammar and a simple
** navigation function to retrieve entries from the "kernel's" filesystem
** data structure. We assume the kernel's directory data structure can be
** indexed with a FETCH operation.
*)
datatype Afile = afroot | afpath of Adir*Name (* afile ::= "" | adir name *)
type Adir = Afile                             (* adir  ::= afile "/"      *)
type Rfile = Rdir * Name		      (* rfile ::= rdir name      *)
datatype Rdir = rdroot | rdpath of Rfile      (* rdir  ::= "" | rfile "/" *)

datatype Filename = afn of Afile | rfn of Rfile
datatype Dir = ad of Adir | rd of Rdir

(* Fetch the node of ROOT indexed by an absolute file name *)
fun afn2entry(root, afroot) = rootdir
  | afn2entry(root, afpath(adir,name)) = fetch(afn2entry(root,adir), name)

(* Fetch the node of CWD indexed by a relative file name *)
fun rf2entry(cwd, (rd,n)) = fetch(rd2entry(cwd,rd), n)

(* Fetch the note of ROOT indexed by a relative directory. *)
fun rd2entry(cwd, rdroot) = cwd
  | rd2entry(cwd, rdpath(rdir,name)) = fetch(rd2entry(cwd,rdir), name)

[Note TrivialGrammar]
In fact, useful structure is the sole benefit of a good grammar for filenames, since filenames are sufficiently simple that any string is a syntactically legal filename. So the trivial grammar given by regexp

  .*

is sufficient to define the set of legal strings.

In the grammars we're constructing, we're ignoring other issues such as how long a filename can be, in order to focus on the directory-structure issues; it's not relevant to the discussion at hand.


Up