scheme shell
about
download
support
resources
docu
links
 
scsh.net
Date: Tue, 18 Feb 2003 09:35:54 +0100
From: Michel Schinz <Michel.Schinz@epfl.ch>
Subject: Re: Usage of Shivers' SRE regular expression notation

zw@netspeed-tech.com (zhaoway) writes:

> Olin Shivers' SRE regular expression notation
>
> http://www.ai.mit.edu/~shivers/sre.txt
>
> Could you please provide some usage examples for SRE? I want to
> understand what the goodness of such an notation is comparing with
> plain string notation. Olin Shivers had explainations. Here I wanted
> to know some example projects using SRE.

I use them to parse various IMAP responses. A really nice feature of
SREs in that case is the ability to build big regexps by composing
small one.

An example: IMAP defines something called an atom, which is somewhat
like a LISP atom, i.e. a Scheme symbol. I have the following regexp
for them:

(define atom-rx (rx (+ (~ control ("(){ %*") #\" #\\)))))

A nice feature of SRE which is useful here is that you do not need to
quote meta-characters. The quote and the backslash both need to be
quoted, but that's because they are special characters in scsh
strings. Here I decided to include them as character constants.

As a comparison, the Posix equivalent of the above is (as given by
scsh's regexp->posix-string):

"[^\"%()*\\{^?^A- ]+"

Now, an IMAP flag is (loosely) defined as an optional backslash
followed by an atom. The SRE is therefore defined as:

(define flag-rx (rx (? #\\) ,atom-rx))

Finally, what I call an extended flag is either the string "\*" or a
flag. This gives me the following SRE:

(define ext-flag-rx (rx (| "\\*" ,flag-rx)))

whose Posix equivalent is the daunting

"\\\\\\*|\\\\?[^\"%()*\\{^?^A- ]+"

This kind of definitions of big regular expressions from small ones
can more or less be achieved with strings by careful string
concatenation, if you take care of putting parentheses where they are
needed. It's a lot more error-prone though, and ugly (if you care
about this point). And then it becomes a nightmare when the regular
expressions you manipulate include groups ("submatches" in scsh) that
you want to extract later.

Another nice feature that I use in my IMAP client is the ability to
specify case sensitivity. IMAP has a "nil" constant which is case
insensitive. The SRE to match it is defined simply as:

(rx (w/nocase "NIL"))

equivalent to the following much more obscure Posix string:

"[Nn][Ii][Ll]"

Finally, I also have a function which takes a list of keywords, and
return an SRE which matches any of the keywords, in a case-insensitive
way. It is as easy as that:

(define (keyword-parser . keywords)
  (sre->regexp (list 'w/nocase (cons '| keywords))))

Michel.

Up