# Defining Regular Expressions PHP

At the highest level, a regular expression is one or more branches separated by the vertical bar character (|). This character is considered to have the properties of a logical- OR. Any of the branches could match with an evaluated string.

Each branch contains one or more atoms. These atoms may be followed by characters that modify the number of times the atom may be matched in succession. An asterisk (*) means the atom can match any number of times. A plus sign (+) means the atom must match at least once. A question mark (?) signifies that the atom may match once or not at all.

Alternatively, the atom may be bound, which means it is followed by curly braces, { and }, that contain integers. If the curly braces contain a single number, then the atom must be matched exactly that number of times. If the curly braces contain a number followed by a comma, the atom must be matched that number of times or more. If the curly braces contain two numbers separated by a comma, the atom must match at least the first number of times, but not more than the second number.

An atom is a series of characters, some having special meaning, others simply standing for a character that must be matched. A period (.) matches any single character. A carat (^) matches the beginning of the string. A dollar sign ($) matches the end of the string. If you need to match one of the special characters (^ . []$ () | * ? {} ), put a backslash in front of it. In fact, any character preceded by a backslash will be treated literally, even if it has no special meaning. Any character with no special meaning will beconsidered just a character to be matched, backslash or not. You may also group atoms with parentheses so that they are treated as an atom.

Square brackets ([]) are used to specify a range of possible values. This may take the form of a list of legal characters. A range may be specified using the dash character (-). If the list or range is preceded by a carat (^), the meaning is taken to be any character not in the following list or range. Take note of this double meaning for the carat. In addition to lists and ranges, square brackets may contain a character class. These class names are further surrounded by colons, so that to match any alphabetic character you write [:alpha:]. The classes are alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, and xdigit. You may wish to look at the man page for ctype to get a description of these classes. Finally, two additional square bracket codes specify the beginning and ending of a word. They are [:<:] and [:>:], respectively. A word in this sense is defined as any sequence of alphanumeric characters and the underscore characters.