Running scsh

Scsh is currently implemented on top of Scheme 48, a freely-available Scheme implementation written by Jonathan Rees and Richard Kelsey. Scheme 48 uses a byte-code interpreter for good code density, portability and medium efficiency. It is R5RS. It also has a module system designed by Jonathan Rees.

Scsh's design is not Scheme 48 specific, although the current implementation is necessarily so. Scsh is intended to be implementable in other Scheme implementations. The Scheme 48 virtual machine that scsh uses is a specially modified version; standard Scheme 48 virtual machines cannot be used with the scsh heap image.

There are several different ways to invoke scsh. You can run it as an interactive Scheme system, with a standard read-eval-print interaction loop. Scsh can also be invoked as the interpreter for a shell script by putting a ``#!/usr/local/bin/scsh -s'' line at the top of the shell script.

Descending a level, it is also possible to invoke the underlying virtual machine byte-code interpreter directly on dumped heap images. Scsh programs can be pre-compiled to byte-codes and dumped as raw, binary heap images. Writing heap images strips out unused portions of the scsh runtime (such as the compiler, the debugger, and other complex subsystems), reducing memory demands and saving loading and compilation times. The heap image format allows for an initial #!/usr/local/lib/scsh/scshvm trigger on the first line of the image, making heap images directly executable as another kind of shell script.

Finally, scsh's static linker system allows dumped heap images to be compiled to a raw Unix a.out(5) format, which can be linked into the text section of the vm binary. This produces a true Unix executable binary file. Since the byte codes comprising the program are in the file's text section, they are not traced or copied by the garbage collector, do not occupy space in the vm's heap, and do not need to be loaded and linked at startup time. This reduces the program's startup time, memory requirements, and paging overhead.

This chapter will cover these various ways of invoking scsh programs.

11.1  Scsh command-line switches

When the scsh top-level starts up, it scans the command line for switches that control its behaviour. These arguments are removed from the command line; the remaining arguments can be accessed as the value of the scsh variable command-line-arguments.

11.1.1  Scripts and programs

The scsh command-line switches provide sophisticated support for the authors of shell scripts and programs; they also allow the programmer to write programs that use the Scheme 48 module system.

There is a difference between a script, which performs its action as it is loaded, and a program, which is loaded/linked, and then performs its action by having control transferred to an entry point (e.g., the main() function in C programs) that was defined by the load/link operation.

A script, by the above definition, cannot be compiled by the simple mechanism of loading it into a scsh process and dumping out a heap image -- it executes as it loads. It does not have a top-level main()-type entry point.

It is more flexible and useful to implement a system as a program than as a script. Programs can be compiled straightforwardly; they can also export procedural interfaces for use by other Scheme packages. However, scsh supports both the script and the program style of programming.

11.1.2  Inserting interpreter triggers into scsh programs

When Unix tries to execute an executable file whose first 16 bits are the character pair ``#!'', it treats the file not as machine-code to be directly executed by the native processor, but as source code to be executed by some interpreter. The interpreter to use is specified immediately after the ``#!'' sequence on the first line of the source file (along with one optional initial argument). The kernel reads in the name of the interpreter, and executes that instead. The interpreter is passed the source filename as its first argument, with the original arguments following. Consult the Unix man page for the exec system call for more information.

Scsh allows Scheme programs to have these triggers placed on their first line. Scsh treats the character sequence ``#!'' as a block-comment sequence,15 and skips all following characters until it reads the comment-terminating sequence newline/exclamation-point/sharp-sign/newline (i.e., the sequence ``!#'' occurring on its own line).

In this way, the programmer can arrange for an initial


#!/usr/local/bin/scsh -s
!#
header appearing in a Scheme program to be ignored when the program is loaded into scsh.

11.1.3  Module system

Scsh uses the Scheme 48 module system, which defines packages, structures, and interfaces.

Package
A package is an environment -- that is, a set of variable/value bindings. You can evaluate Scheme forms inside a package, or load a file into a package. Packages export sets of bindings; these sets are called structures.

Structure
A structure is a named view on a package -- a set of bindings. Other packages can open the structure, importing its bindings into their environment. Packages can provide more than one structure, revealing different portions of the package's environment.

Interface
An interface is the ``type'' of a structure. An interface is the set of names exported by a structure. These names can also be marked with other static information (e.g., advisory type declarations, or syntax information).

More information on the the Scheme 48 module system can be found in the file module.ps in the doc directory of the Scheme 48 and scsh releases.

Programming Scheme with a module system is different from programming in older Scheme implementations, and the associated development problems are consequently different. In Schemes that lack modular abstraction mechanisms, everything is accessible; the major problem is preventing name-space conflicts. In Scheme 48, name-space conflicts vanish; the major problem is that not all bindings are accessible from every place. It takes a little extra work to specify what packages export which values.

It may take you a little while to get used to the new style of program development. Although scsh can be used without referring to the module system at all, we recommend taking the time to learn and use it. The effort will pay off in the construction of modular, factorable programs.

11.1.3.1  Module warning

Most scsh programs will need to import from the scheme structure as well as from the scsh structure. However, putting both of these structures in the same open clause is a bad idea because the structures scheme and scsh export some names of I/O functions in common but with different definitions. The current implementation of the module system does not recognize this as an error but silently overwrites the exports of one structure with the exports of the other. If the scheme structure overwrites the exports of the scsh structures the program will access the R5RS definitions of the I/O functions which is not what you want.

Previous versions of this manual suggested to list scheme and scsh in a specific order in the open clause of a structure to ensure that the definitions from scsh overwrite the ones from scheme. This approach is error-prone and fragile: A simple change in the implementation of the module system will render thousands of programs useless. Starting with release 0.6.3 scsh provides a better means to deal with this problem: the structure scheme-with-scsh provides all the exports of the modules scheme and scsh but exports the right denotations for the I/O functions in question. To make a long story short:

Scsh programs should open the structure scheme-with-scsh if they need access to the exports of scheme and scsh.

For programs which should run in versions of scsh prior to release 0.6.3, programmers should make sure to always put the scsh reference first.

11.1.4  Library directories search facility

Scsh's command line switches allow loading of code not present in the script file or the heap image at startup. To relief the user from specifying full path names and to improve flexibility, scsh offers the library directories path list. This list contains directories in which scsh searches automatically for a file name argument of the -ll or -le switch.

This section describes the programmatic interface to the library directories search facility. In addition, various command line switches for scsh modify the library directories path list. Section 11.1.5 describes these switches and the switches to actually load files.

Another way to change the library directories path list is the environment variable $SCSH_LIB_DIRS. If this variable is set, scsh uses it to set library directories path list. The value of this environment variable is treated as a sequence of s-expressions, which are ``read'' from the string:

A $SCSH_LIB_DIRS assignment of this form

SCSH_LIB_DIRS='"." "/usr/contrib/lib/scsh/" #f "/home/shivers/lib/scsh"'

would produce this list of strings for the library-directories list:
          ("." "/usr/contrib/lib/scsh/" 
          "/usr/local/lib/scsh/modules/" 
          "/home/shivers/lib/scsh")

It is a startup error if reading the $SCSH_LIB_DIRS environment variable causes a read error, or produces a value that isn't a list of strings or #f.

default-lib-dirs         string list 

The default list of library directories. The original value of this variable is ("$prefix/lib/scsh/modules/"). starting with version 0.6.5 the option --with-lib-dirs-list of the configure script changes for a new installation.

(find-library-file file lib-dirs script-file)     --->     undefined         (procedure) 
Searches the list of library directories lib-dirs for file and returns the full path. The variable script-file is used to resolve references to the directory of the current script.

When searching for a directory containing a given library module, nonexistent or read-protected directories are silently ignored; it is not an error to have them in the library-directories list.

Directory search can be recursive. A directory name that ends with a slash is recursively searched.

(lib-dirs)     --->     string list         (procedure) 
Returns the current library directories path list.

(lib-dirs-prepend-script-dir!)     --->     undefined         (procedure) 
(lib-dirs-append-script-dir!)     --->     undefined         (procedure) 
Add the directory of the current script file to the beginning or end of the library-directories path list, respectively.

(lib-dirs-append! dir)     --->     undefined         (procedure) 
(lib-dirs-prepend! dir)     --->     undefined         (procedure) 
Add directory lib-dir to the beginning or end of the library-directories path list, respectively.

(clear-lib-dirs!)     --->     undefined         (procedure) 
Set the library-directories path list to the empty list.

(reset-lib-dirs!)     --->     undefined         (procedure) 
Set the library-directories path list to system default, i.e. to the value of default-lib-dirs.

11.1.5  Switches

The scsh top-level takes command-line switches in the following format:

scsh [meta-arg] [switchi ...] [end-option arg1 ... argn]
where
meta-arg: \ script-file-name
switch: -e entry-point Specify top-level entry-point.

-o structure Open structure in current package.

-m structure Switch to package.

-n new-package Switch to new package.

-lm module-file-name Load module into config package.

-le exec-file-name Load module into exec package.

-l file-name Load file into current package.

-ll module-file-name As in -lm, but search the library path list.
-lel exec-file-name As in -le, but search the library path list.
+lp dir Add dir to front of library path list.
lp+ dir Add dir to end of library path list.
+lpe dir +lp, with env var and ~user expansion.
lpe+ dir lp+, with env var and ~user expansion.
+lpsd Add script-file's dir to front of path list.
lpsd+ Add script-file's dir to end of path list.
-lp-clear Clear library path list to ().
-lp-default Reset library path list to system default.

-ds Do script.
-dm Do script module.
-de Do script exec.
end-option: -s script
-sfd num
-c exp
-
These command-line switches essentially provide a little linker language for linking a shell script or a program together with Scheme 48 modules or Scheme 48 exec programs 16. The command-line processor serially opens structures and loads code into a given package. Switches that side-effect a package operate on a particular ``current'' package; there are switches to change this package. (These switches provide functionality equivalent to the interactive ,open ,load ,in and ,new commands.) Except where indicated, switches specify actions that are executed in a left-to-right order. The initial current package is the user package, which is completely empty and opens (imports the bindings of) the R5RS and scsh structures.

If the Scheme process is started up in an interactive mode, then the current package in force at the end of switch scanning is the one inside which the interactive read-eval-print loop is started.

The command-line switch processor works in two passes: it first parses the switches, building a list of actions to perform, then the actions are performed serially. The switch list is terminated by one of the end-option switches. The argi arguments occurring after an end-option switch are passed to the scsh program as the value of command-line-arguments and the tail of the list returned by (command-line). That is, an end-option switch separates switches that control the scsh ``machine'' from the actual arguments being passed to the scsh program that runs on that machine.

The following switches and end options are defined:

11.1.6  The meta argument

The scsh switch parser takes a special command-line switch, a single backslash called the ``meta-argument,'' which is useful for shell scripts. If the initial command-line argument is a ``\'' argument, followed by a filename argument fname, scsh will open the file fname and read more arguments from the second line of this file. This list of arguments will then replace the ``\'' argument -- i.e., the new arguments are inserted in front of fname, and the argument parser resumes argument scanning. This is used to overcome a limitation of the #! feature: the #! line can only specify a single argument after the interpreter. For example, we might hope the following scsh script, ekko, would implement a simple-minded version of the Unix echo program:


#!/usr/local/bin/scsh -e main -s
!#
(define (main args)
  (map (lambda (arg) (display arg) (display " "))
       (cdr args))
  (newline))
The idea would be that the command
ekko Hi there.
would by expanded by the exec(2) kernel call into

/usr/local/bin/scsh -e main -s ekko Hi there.
In theory, this would cause scsh to start up, load in file ekko, call the entry point on the command-line list
(main '("ekko" "Hi" "there."))
and exit.

Unfortunately, the Unix exec(2) syscall's support for scripts is not very general or well-designed. It will not handle multiple arguments; the #! line is usually required to contain no more than 32 characters; it is not recursive. If these restrictions are violated, most Unix systems will not provide accurate error reporting, but either fail silently, or simply incorrectly implement the desired functionality. These are the facts of Unix life.

In the ekko example above, our #! trigger line has three arguments (``-e'', ``main'', and ``-s''), so it will not work. The meta-argument is how we work around this problem. We must instead invoke the scsh interpreter with the single \ argument, and put the rest of the arguments on line two of the program. Here's the correct program:


#!/usr/local/bin/scsh \
-e main -s
!#
(define (main args) 
  (map (lambda (arg) (display arg) (display " "))
       (cdr args))
  (newline))
Now, the invocation starts as
ekko Hi there.
and is expanded by exec(2) into
    
/usr/local/bin/scsh \ ekko Hi there.
When scsh starts up, it expands the ``\'' argument into the arguments read from line two of ekko, producing this argument list:

-e main -s ekko Hi there.
        
^
|

Expanded from \ ekko
With this argument list, processing proceeds as we intended.

11.1.6.1  Secondary argument syntax

Scsh uses a very simple grammar to encode the extra arguments on the second line of the scsh script. The only special characters are space, tab, newline, and backslash.

You have to construct these line-two argument lines carefully. In particular, beware of trailing spaces at the end of the line -- they'll give you extra trailing empty-string arguments. Here's an example:

#!/bin/interpreter \
foo bar  quux\ yow

would produce the arguments
("foo" "bar" "" "quux yow")

11.1.7  Examples

Note that the sort example can be compiled into a Unix program by loading the file into an scsh process, and dumping a heap with top-level top. Even if we don't want to export the sort's functionality as a subroutine library, it is still useful to write the sort program with the module language. The command line design allows us to run this program as either an interpreted script (given the #! args in the header) or as a compiled heap image.

11.1.8  Process exit values

Scsh ignores the value produced by its top-level computation when determining its exit status code. If the top-level computation completed with no errors, scsh dies with exit code 0. For example, a scsh process whose top-level is specified by a -c exp or a -e entry entry point ignores the value produced by evaluating exp and calling entry, respectively. If these computations terminate with no errors, the scsh process exits with an exit code of 0.

To return a specific exit status, use the exit procedure explicitly, e.g.,


scsh -c \
  "(exit (status:exit-val (run (| (fmt) (mail shivers)))))"

11.2  The scsh virtual machine

To run the Scheme 48 implementation of scsh, you run a specially modified copy of the Scheme 48 virtual machine with a scsh heap image. The scsh binary is actually nothing but a small cover program that invokes the byte-code interpreter on the scsh heap image for you. This allows you to simply start up an interactive scsh from a command line, as well as write shell scripts that begin with the simple trigger

#!/usr/local/bin/scsh -s

You can also directly execute the virtual machine, which takes its own set of command-line switches.. For example, this command starts the vm up with a 1Mword heap (split into two semispaces):

scshvm -o scshvm -h 1000000 -i scsh.image arg1 arg2 ...
The vm peels off initial vm arguments up to the -i heap image argument, which terminates vm argument parsing. The rest of the arguments are passed off to the scsh top-level. Scsh's top-level removes scsh switches, as discussed in the previous section; the rest show up as the value of command-line-arguments.

Directly executing the vm can be useful to specify non-standard switches, or invoke the virtual machine on special heap images, which can contain pre-compiled scsh programs with their own top-level procedures.

11.2.1  VM arguments

The vm takes arguments in the following form:

scshvm [meta-arg] [vm-options+] [end-option scheme-args]
where
meta-arg: \ filename
vm-option: -h heap-size-in-words
-s stack-size-in-words
-o object-file-name
end-option: -i image-file-name
-

The vm's meta-switch ``\ filename'' is handled the same as scsh's meta-switch, and serves the same purpose.

11.2.1.1  VM options

The -o object-file-name switch tells the vm where to find relocation information for its foreign-function calls. Scsh will use a pre-compiled default if it is not specified. Scsh must have this information to run, since scsh's syscall interfaces are done with foreign-function calls.

The -h and -s options tell the vm how much space to allocate for the heap and stack. The heap size value is the total number of words allocated for the heap; this space is then split into two semi-spaces for Scheme 48's stop-and-copy collector.

11.2.1.2  End options

End options terminate argument parsing. The -i switch is followed by the name of a heap image for the vm to execute. The image-file-name string is also taken to be the name of the program being executed by the VM; this name becomes the head of the argument list passed to the heap image's top-level entry point. The tail of the argument list is constructed from all following arguments.

The - switch terminates argument parsing without giving a specific heap image; the vm will start up using a default heap (whose location is compiled into the vm). All the following arguments comprise the tail of the list passed off to the heap image's top-level procedure.

Notice that you are not allowed to pass arguments to the heap image's top-level procedure (e.g., scsh) without delimiting them with -i or - flags.

11.2.2  Stripped image

Besides the standard image scsh.image scsh also ships with the much smaller image stripped-scsh.image. This image contains the same code as the standard image but has almost all debugging information removed. stripped-scsh.image is intended to be used with standalone programs where startup time and memory consumption count but debugging the scheme code is not that important. To use the image the VM has to be called directly and the path to the image must be given after the -i argument.

11.2.3  Inserting interpreter triggers into heap images

Scheme 48's heap image format allows for an informational header: when the vm loads in a heap image, it ignores all data occurring before the first control-L character (ASCII 12). This means that you can insert a ``#!'' trigger line into a heap image, making it a form of executable ``shell script.'' Since the vm requires multiple arguments to be given on the command line, you must use the meta-switch. Here's an example heap-image header:


#!/usr/local/lib/scsh/scshvm \
-o /usr/local/lib/scsh/scshvm -i
... Your heap image goes here ...

11.2.4  Inserting a double-level trigger into Scheme programs

If you're a nerd, you may enjoy doing a double-level machine shift in the trigger line of your Scheme programs with the following magic:


#!/usr/local/lib/scsh/scshvm \
-o /usr/local/lib/scsh/scshvm -i /usr/local/lib/scsh/scsh.image -s
!#
... Your Scheme program goes here ...

11.3  Compiling scsh programs

Scsh allows you to create a heap image with your own top-level procedure. Adding the pair of lines


#!/usr/local/lib/scsh/scshvm \
-o /usr/local/lib/scsh/scshvm -i
to the top of the heap image will turn it into an executable Unix file.

You can create heap images with the following two procedures.

(dump-scsh-program main fname)     --->     undefined         (procedure) 
This procedure writes out a scsh heap image. When the heap image is executed by the Scheme 48 vm, it will call the main procedure, passing it the vm's argument list. When main returns an integer value i, the vm exits with exit status i. The Scheme vm will parse command-line switches as described in section 11.2.1; remaining arguments form the tail of the command-line list that is passed to main. (The head of the list is the name of the program being executed by the vm.) Further argument parsing (as described for scsh in section 11.1.5) is not performed.

The heap image created by dump-scsh-program has unused code and data pruned out, so small programs compile to much smaller heap images.

(dump-scsh fname)     --->     undefined         (procedure) 
This procedure writes out a heap image with the standard scsh top-level. When the image is resumed by the vm, it will parse and execute scsh command-line switches as described in section 11.1.5.

You can use this procedure to write out custom scsh heap images that have specific packages preloaded and start up in specific packages.

Unfortunately, Scheme 48 does not support separate compilation of Scheme files or Scheme modules. The only way to compile is to load source and then dump out a heap image. One occasionally hears rumours that this is being addressed by the Scheme 48 development team.

11.4  Standard file locations

Because the scshvm binary is intended to be used for writing shell scripts, it is important that the binary be installed in a standard place, so that shell scripts can dependably refer to it. The standard directory for the scsh tree should be /usr/local/lib/scsh/. Whenever possible, the vm should be located in

/usr/local/lib/scsh/scshvm
and a scsh heap image should be located in
/usr/local/lib/scsh/scsh.image
The top-level scsh program should be located in
/usr/local/lib/scsh/scsh
with a symbolic link to it from
/usr/local/bin/scsh

The Scheme 48 image format allows heap images to have #! triggers, so scsh.image should have a #! trigger of the following form:


#!/usr/local/lib/scsh/scshvm \
-o /usr/local/lib/scsh/scshvm -i
... heap image goes here ...


15 Why a block-comment instead of an end-of-line delimited comment? See the section on meta-args.

16 See the Section ``Command programs'' in the Scheme 48 manual for a description of the exec language.