aboutsummaryrefslogtreecommitdiff
path: root/TWiki/RegularExpression.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'TWiki/RegularExpression.mdwn')
-rw-r--r--TWiki/RegularExpression.mdwn143
1 files changed, 0 insertions, 143 deletions
diff --git a/TWiki/RegularExpression.mdwn b/TWiki/RegularExpression.mdwn
deleted file mode 100644
index f450744c..00000000
--- a/TWiki/RegularExpression.mdwn
+++ /dev/null
@@ -1,143 +0,0 @@
-Regular expressions (REs), unlike simple queries, allow you to search for text which matches a particular pattern.
-
-REs are similar to (but more poweful than) the "wildcards" used in the command-line interfaces found in operating systems such as Unix and MS-DOS. REs are used by sophisticated search engines, as well as by many Unix-based languages and tools ( e.g., `awk`, `grep`, `lex`, `perl`, and `sed` ).
-
-**Examples**
-
-<table>
- <tr>
- <td> compan(y|ies) </td>
- <td> Search for <em>company</em> , <em>companies</em></td>
- </tr>
- <tr>
- <td> (peter|paul) </td>
- <td> Search for <em>peter</em> , <em>paul</em></td>
- </tr>
- <tr>
- <td> bug* </td>
- <td> Search for <em>bug</em> , <em>bugs</em> , <em>bugfix</em></td>
- </tr>
- <tr>
- <td> [Bb]ag </td>
- <td> Search for <em>Bag</em> , <em>bag</em></td>
- </tr>
- <tr>
- <td> b[aiueo]g </td>
- <td> Second letter is a vowel. Matches <em>bag</em> , <em>bug</em> , <em>big</em></td>
- </tr>
- <tr>
- <td> b.g </td>
- <td> Second letter is any letter. Matches also <em>b&amp;amp;g</em></td>
- </tr>
- <tr>
- <td> [a-zA-Z] </td>
- <td> Matches any one letter (not a number and a symbol) </td>
- </tr>
- <tr>
- <td> [^0-9a-zA-Z] </td>
- <td> Matches any symbol (not a number or a letter) </td>
- </tr>
- <tr>
- <td> [A-Z][A-Z]* </td>
- <td> Matches one or more uppercase letters </td>
- </tr>
- <tr>
- <td> [0-9][0-9][0-9]-[0-9][0-9]- <br /> [0-9][0-9][0-9][0-9] </td>
- <td valign="top"> US social security number, e.g. 123-45-6789 </td>
- </tr>
-</table>
-
-Here is stuff for our UNIX freaks: <br /> (copied from 'man grep')
-
- \c A backslash (\) followed by any special character is a
- one-character regular expression that matches the spe-
- cial character itself. The special characters are:
-
- + `.', `*', `[', and `\' (period, asterisk,
- left square bracket, and backslash, respec-
- tively), which are always special, except
- when they appear within square brackets ([]).
-
- + `^' (caret or circumflex), which is special
- at the beginning of an entire regular expres-
- sion, or when it immediately follows the left
- of a pair of square brackets ([]).
-
- + $ (currency symbol), which is special at the
- end of an entire regular expression.
-
- . A `.' (period) is a one-character regular expression
- that matches any character except NEWLINE.
-
- [string]
- A non-empty string of characters enclosed in square
- brackets is a one-character regular expression that
- matches any one character in that string. If, however,
- the first character of the string is a `^' (a circum-
- flex or caret), the one-character regular expression
- matches any character except NEWLINE and the remaining
- characters in the string. The `^' has this special
- meaning only if it occurs first in the string. The `-'
- (minus) may be used to indicate a range of consecutive
- ASCII characters; for example, [0-9] is equivalent to
- [0123456789]. The `-' loses this special meaning if it
- occurs first (after an initial `^', if any) or last in
- the string. The `]' (right square bracket) does not
- terminate such a string when it is the first character
- within it (after an initial `^', if any); that is,
- []a-f] matches either `]' (a right square bracket ) or
- one of the letters a through f inclusive. The four
- characters `.', `*', `[', and `\' stand for themselves
- within such a string of characters.
-
- The following rules may be used to construct regular expres-
- sions:
-
- * A one-character regular expression followed by `*' (an
- asterisk) is a regular expression that matches zero or
- more occurrences of the one-character regular expres-
- sion. If there is any choice, the longest leftmost
- string that permits a match is chosen.
-
- ^ A circumflex or caret (^) at the beginning of an entire
- regular expression constrains that regular expression
- to match an initial segment of a line.
-
- $ A currency symbol ($) at the end of an entire regular
- expression constrains that regular expression to match
- a final segment of a line.
-
- * A regular expression (not just a one-
- character regular expression) followed by `*'
- (an asterisk) is a regular expression that
- matches zero or more occurrences of the one-
- character regular expression. If there is
- any choice, the longest leftmost string that
- permits a match is chosen.
-
- + A regular expression followed by `+' (a plus
- sign) is a regular expression that matches
- one or more occurrences of the one-character
- regular expression. If there is any choice,
- the longest leftmost string that permits a
- match is chosen.
-
- ? A regular expression followed by `?' (a ques-
- tion mark) is a regular expression that
- matches zero or one occurrences of the one-
- character regular expression. If there is
- any choice, the longest leftmost string that
- permits a match is chosen.
-
- | Alternation: two regular expressions
- separated by `|' or NEWLINE match either a
- match for the first or a match for the
- second.
-
- () A regular expression enclosed in parentheses
- matches a match for the regular expression.
-
- The order of precedence of operators at the same parenthesis
- level is `[ ]' (character classes), then `*' `+' `?'
- (closures),then concatenation, then `|' (alternation)and
- NEWLINE.