In computing, a regular expression, also referred to as regex or regexp, provides a concise and flexible means for matching stringsof text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
The following examples illustrate a few specifications that could be expressed in a regular expression:
- The sequence of characters "car" appearing consecutively in any context, such as in "car", "cartoon", or "bicarbonate"
- The sequence of characters "car" occurring in that order with other characters between them, such as in "Icelander" or "chandler"
- The word "car" when it appears as an isolated word
- The word "car" when preceded by the word "blue" or "red"
- The word "car" when not preceded by the word "motor"
- A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example, "$100" or "$245.99").
Regular expressions can be much more complex than these examples.
Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based onpatterns. Some of these languages, including Perl, Ruby, Awk, and Tcl, have fully integrated regular expressions into the syntax of the core language itself. Others like C, C++, .NET, Java, and Python instead provide access to regular expressions only through libraries. Utilities provided by Unix distributions—including the editor ed and the filter grep—were the first to popularize the concept of regular expressions.
As an example of the syntax, the regular expression
\bex
can be used to search for all instances of the string "ex" that occur after "word boundaries" (signified by the \b
). Thus \bex
will find the matching string "ex" in two possible locations, (1) at the beginning of words, and (2) between two characters in a string, where one is a word character and the other is not a word character. For instance, in the string "Texts for experts", \bex
matches the "ex" in "experts" but not in "Texts" (because the "ex" occurs inside a word and not immediately after a word boundary). Many modern computing systems provide wildcard characters in matching filenames from a file system. This is a core capability of many command-line shells and is also known as globbing. Wildcards differ from regular expressions in generally expressing only limited forms of patterns.
A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings "Handel", "Händel", and "Haendel" can be described by the pattern H(ä|ae?)ndel (or alternatively, it is said that the pattern matches each of the three strings). In most formalisms, if there is any regex that matches a particular set then there is an infinite number of such expressions. Most formalisms provide the following operations to construct regular expressions.
Boolean "or"
Grouping
Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey andgr(a|e)y are equivalent patterns which both describe the set of "gray" and "grey".
A quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene cross).
? | The question mark indicates there is zero or one of the preceding element. For example, colou?r matches both "color" and "colour". |
* | The asterisk indicates there are zero or more of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on. |
+ | The plus sign indicates that there is one or more of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example, H(ae?|ä)ndel and H(a|ae|ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä|ae?)ndel. The precise syntax for regular expressions varies among tools and with context; more detail is given in the Syntax section. |
The Structure of a Regular Expression
There are three important parts to a regular expression. Anchors are used to specify the position of the pattern in relation to a line of text. Character Setsmatch one or more characters in a single position. Modifiers specify how many times the previous character set is repeated. A simple example that demonstrates all three parts is the regular expression "^#*". The up arrow is an anchor that indicates the beginning of the line. The character "#" is a simple character set that matches the single character "#". The asterisk is a modifier. In a regular expression it specifies that the previous character set can appear any number of times, including zero. This is a useless regular expression, as you will see shortly.
There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "regular" regular expression. From now on, if I talk about a "regular expression," it describes a feature in both types.
Here is a table of the Solaris (around 1991) commands that allow you to specify regular expressions:
There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "regular" regular expression. From now on, if I talk about a "regular expression," it describes a feature in both types.
Here is a table of the Solaris (around 1991) commands that allow you to specify regular expressions:
Utility | Regular Expression Type |
vi | Basic |
sed | Basic |
grep | Basic |
csplit | Basic |
dbx | Basic |
dbxtool | Basic |
more | Basic |
ed | Basic |
expr | Basic |
lex | Basic |
pg | Basic |
nl | Basic |
rdist | Basic |
awk | Extended |
nawk | Extended |
egrep | Extended |
EMACS | EMACS Regular Expressions |
PERL | PERL Regular Expressions |
Walang komento:
Mag-post ng isang Komento