diff options
Diffstat (limited to 'TWiki/RegularExpression.mdwn')
-rw-r--r-- | TWiki/RegularExpression.mdwn | 143 |
1 files changed, 0 insertions, 143 deletions
diff --git a/TWiki/RegularExpression.mdwn b/TWiki/RegularExpression.mdwn deleted file mode 100644 index f450744c..00000000 --- a/TWiki/RegularExpression.mdwn +++ /dev/null @@ -1,143 +0,0 @@ -Regular expressions (REs), unlike simple queries, allow you to search for text which matches a particular pattern. - -REs are similar to (but more poweful than) the "wildcards" used in the command-line interfaces found in operating systems such as Unix and MS-DOS. REs are used by sophisticated search engines, as well as by many Unix-based languages and tools ( e.g., `awk`, `grep`, `lex`, `perl`, and `sed` ). - -**Examples** - -<table> - <tr> - <td> compan(y|ies) </td> - <td> Search for <em>company</em> , <em>companies</em></td> - </tr> - <tr> - <td> (peter|paul) </td> - <td> Search for <em>peter</em> , <em>paul</em></td> - </tr> - <tr> - <td> bug* </td> - <td> Search for <em>bug</em> , <em>bugs</em> , <em>bugfix</em></td> - </tr> - <tr> - <td> [Bb]ag </td> - <td> Search for <em>Bag</em> , <em>bag</em></td> - </tr> - <tr> - <td> b[aiueo]g </td> - <td> Second letter is a vowel. Matches <em>bag</em> , <em>bug</em> , <em>big</em></td> - </tr> - <tr> - <td> b.g </td> - <td> Second letter is any letter. Matches also <em>b&amp;g</em></td> - </tr> - <tr> - <td> [a-zA-Z] </td> - <td> Matches any one letter (not a number and a symbol) </td> - </tr> - <tr> - <td> [^0-9a-zA-Z] </td> - <td> Matches any symbol (not a number or a letter) </td> - </tr> - <tr> - <td> [A-Z][A-Z]* </td> - <td> Matches one or more uppercase letters </td> - </tr> - <tr> - <td> [0-9][0-9][0-9]-[0-9][0-9]- <br /> [0-9][0-9][0-9][0-9] </td> - <td valign="top"> US social security number, e.g. 123-45-6789 </td> - </tr> -</table> - -Here is stuff for our UNIX freaks: <br /> (copied from 'man grep') - - \c A backslash (\) followed by any special character is a - one-character regular expression that matches the spe- - cial character itself. The special characters are: - - + `.', `*', `[', and `\' (period, asterisk, - left square bracket, and backslash, respec- - tively), which are always special, except - when they appear within square brackets ([]). - - + `^' (caret or circumflex), which is special - at the beginning of an entire regular expres- - sion, or when it immediately follows the left - of a pair of square brackets ([]). - - + $ (currency symbol), which is special at the - end of an entire regular expression. - - . A `.' (period) is a one-character regular expression - that matches any character except NEWLINE. - - [string] - A non-empty string of characters enclosed in square - brackets is a one-character regular expression that - matches any one character in that string. If, however, - the first character of the string is a `^' (a circum- - flex or caret), the one-character regular expression - matches any character except NEWLINE and the remaining - characters in the string. The `^' has this special - meaning only if it occurs first in the string. The `-' - (minus) may be used to indicate a range of consecutive - ASCII characters; for example, [0-9] is equivalent to - [0123456789]. The `-' loses this special meaning if it - occurs first (after an initial `^', if any) or last in - the string. The `]' (right square bracket) does not - terminate such a string when it is the first character - within it (after an initial `^', if any); that is, - []a-f] matches either `]' (a right square bracket ) or - one of the letters a through f inclusive. The four - characters `.', `*', `[', and `\' stand for themselves - within such a string of characters. - - The following rules may be used to construct regular expres- - sions: - - * A one-character regular expression followed by `*' (an - asterisk) is a regular expression that matches zero or - more occurrences of the one-character regular expres- - sion. If there is any choice, the longest leftmost - string that permits a match is chosen. - - ^ A circumflex or caret (^) at the beginning of an entire - regular expression constrains that regular expression - to match an initial segment of a line. - - $ A currency symbol ($) at the end of an entire regular - expression constrains that regular expression to match - a final segment of a line. - - * A regular expression (not just a one- - character regular expression) followed by `*' - (an asterisk) is a regular expression that - matches zero or more occurrences of the one- - character regular expression. If there is - any choice, the longest leftmost string that - permits a match is chosen. - - + A regular expression followed by `+' (a plus - sign) is a regular expression that matches - one or more occurrences of the one-character - regular expression. If there is any choice, - the longest leftmost string that permits a - match is chosen. - - ? A regular expression followed by `?' (a ques- - tion mark) is a regular expression that - matches zero or one occurrences of the one- - character regular expression. If there is - any choice, the longest leftmost string that - permits a match is chosen. - - | Alternation: two regular expressions - separated by `|' or NEWLINE match either a - match for the first or a match for the - second. - - () A regular expression enclosed in parentheses - matches a match for the regular expression. - - The order of precedence of operators at the same parenthesis - level is `[ ]' (character classes), then `*' `+' `?' - (closures),then concatenation, then `|' (alternation)and - NEWLINE. |