REGULAR EXPRESSIONS

8/25/2001

Regular Expressions: A method of specifying character strings other than simple keyboard text. Regular expressions are founded on the work of mathematician Stephen Cole Kleene and were incorporated into various aspects of Unix when Unix was created at Bell Telephone Labs. REs are widely used in Unix and its descendants as well as scripting languages e.g. PERL, Java Script, REXX, VBSCRIPT. There are said to be some subtle implementation differences between implementations. Some, including VBSCRIPT, may not be complete implementations.

Regular expressions assign special meanings to a few standard characters including \.?([^+-*{. These characters are then used to define characters that can not be typed or to define search conditions. For example \c7 is RE shorthand the ASCII character 7 (Bell) and . is shorthand for match any single character.

Since it is sometimes necessary to use the metacharacters normally as in "3.14". This is done by use of \. 3.14 specifies any string of the form "3x14" where x is any character. 3\.14 specifies exactly the string "3.14". \\ is the character \. Likewise. \followed by a character often gives the character special meaning. d is just the character d. \d is any single digit. \D is any single non-digit.

Regular expressions are very flexible, but they are also quite confusing. Most normal people confine their use to scripts as creating properly formed REs on the fly is difficult.

RE metacharacters.

\ Turns the next character into a normal character if it is normally a metacharacter.
\ Turns the next character into a metacharacter if it is legal as a metacharacter.
^ Beginning of line
$ End of line
* - + ? {} {,} Various matching options for specific characters
. Any character
(x) match and remember match
| or
[] List of possible matches. - indicates a range
[^] List of non-matches. - indicates a range
[\b] Backspace
\b Word boundary
\B Non-word boundary
\cX Control character
\d Digit [decimal]
\D Non-digit [decimal]
\f Form feed
\n Line Feed
\r Carriage Return
\s Any white space character
\S Any non-white space character
\t Tab
\v Vertical Tab
\w Alphanumeric character = [A-Za-z0-9_]
\W Non-Alphanumeric character
\n nth remembered substring
\o Octal value
\x Hexadecimal value

~~http://www.builder.com/Programming/Scripter/050698/?tag=st.cn.sr1.dir~~
~~http://virtual.park.uga.edu/humcomp/perl/regex2a.html~~