Using Regular Expressions in PHP Scripts PHP

The basic function for executing regular expressions is ereg. This function evaluates a string against a regular expression, returning TRUE if the pattern described by the regular expression appears in the string. In this minimal form, you can check that a string conforms to a certain form. For example, you can ensure that a U.S. postal zip code is in the proper form of five digits followed by a dash and four more digits.

Square-Brackets-in-Regular-Expressions

Checking a ZIP Code

Checking a ZIP Code

The script offers a form for inputting a zip code. It must have five digits and may be followed by a dash and four more digits. The functionality of the script hinges on the regular expression

^([0-9]{5})(-[0-9]{4})?$
which is compared to user input. It will be instructive to examine this expression in detail. It starts with a carat. This causes the expression to match only from the beginning of the evaluated string. If this were left out, the zip code could be preceded by any number of characters, such as abc12345-1234, and still be a valid match. Likewise, the dollar sign at the end of the expression matches the end of the string. This stops matching of strings like 12345-1234abc. The combination of using a carat and a dollar sign allows us to match only exact strings. The first subexpression is ([0-9]{5}). The square-bracketed range allows only characters from zero to nine. The curly braces specify that there must be exactly five of these characters.

The second subexpression is (-[0-9]{4})?. Like the first, it specifies exactly four digits. The dash is a literal character that must precede the digits. The question mark specifies that the entire subexpression may match once or not at all. This makes the fourdigit extension optional.
You can easily expand this idea to check phone numbers or dates. Regular expressions provide a neat way of checking variables returned from forms. Consider the alternative ofnesting if statements and searching strings with the strpos function.

You may also choose to have subexpression matches returned in an array. This is useful in situations where you need to break a string into components. The string a browser uses to identify itself is a good string for this method. Encoded in this string are the browser's name, version, and the type of computer it's running on. Pulling this information out into separate variables will allow you to customize your site based on the capabilities of the browser.

Listing is a script for creating a set of variables that aid in cloaking a site for a particular browser. For the purpose of illustration, we will customize a link based on the browser being used. If the user visits the page with Netscape Navigator, we will provide a link to the download page for Microsoft Internet Explorer. Otherwise, we'll put a link to Netscape's download page. This is an example of customizing content, but the same method can be used to decide whether to use advanced features or not.

Evaluating HTTP_USER_AGENT

Evaluating HTTP_USER_AGENT

Evaluating HTTP_USER_AGENT

In this script the main ereg function is not used in an if statement. It assumes the browser will identify itself minimally as a name, a slash, and the version. The match array gets set with the parts of the evaluated string that match with the parts of the regular expression. There are three subexpressions for name, version, and any extra description. Most browsers follow this form, including Navigator and Internet Explorer. Since Internet Explorer always reports that it is a Mozilla (Netscape) browser, extra steps must be taken to determine if a browser is really a Netscape browser or an imposter. This is done with a call to eregi.

If you are wondering why element zero is ignored, that's because the zero element holds the substring that matches the entire regular expression. In this situation it is not interesting. Usually the zero element is useful when you are searching for a particular string in a larger context. For example, you may be scanning the body of a Web page for URLs. Listing fetches the PHP home page and lists all the links on the page.

The main loop of this script gets lines of text from the file stream and looks for HREF properties. If one is found in a line, it will be placed in the zero element of the match array. The script prints it out and then removes it from the line using the ereg_replace function. This function replaces text matched with a regular expression with a string. In this case the script replaces the HREF property with an empty string. The reason for finding the link and then removing it is that it is possible for two links to be on one line of HTML. The ereg function will match the first substring only. The solution is to find and remove each link until none remain.
Notice that when removing the link a replace variable is prepared. Some links might contain a question mark, a valid character in a URL that separates a filename from form variables. Since this character has special meaning to regular expressions, the script places a backslash before it to let PHP know it's to be taken literally.

I frequently use ereg_replace to convert text for use in a new context. You can use ereg_replace for replacement of end-of-line characters with break tags. Listing demonstrates this idea. You can also use it to collapse multiple spaces with a single space.
Scanning Text for URLs

Scanning Text for URLs

Scanning Text for URLs

Replacing Linefeeds with HTML Line Breaks

Replacing Linefeeds with HTML Line Breaks

Replacing Linefeeds with HTML Line Breaks
By now you most likely understand regular expressions, but one new idea is worth noting. The call to ereg_replace in Listing uses an integer to stand for a linefeed. This is because ASCII 10 is a linefeed character. You might think of using backslash-n here,but that would not give the results you want. Recall that the backslash character in regular expressions causes the character to be treated literally. The ereg_replace function allows you to specify a single character by ASCII value for its first argument.



Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

PHP Topics