Using Patterns - Shell Scripting

The gawk program supports several types of matching patterns to filter data records, similar to how the sed editor does. Chapter 16 already showed two special patterns in action. The BEGIN and END keywords are special patterns that execute statements before or after the data stream data has been read. Similarly, you can create other patterns to execute statements when matching data appears in the data stream.

This section demonstrates how to use matching patterns in your gawk scripts to limit what records a program script applies to.

Regular expressions

You can use either a Basic Regular Expression (BRE) or an Extended Regular Expression (ERE) to filter which lines in the data stream the program script applies to.

When using a regular expression, the regular expression must appear before the left brace of the program script that it controls:

$ gawk ’BEGIN{FS=","} /11/{print $1}’ data1
data11
$

The regular expression /11/ matches records that contain the string 11 anywhere in the data fields. The gawk program matches the defined regular expression against all the data fields in the record, including the field separator character:

$ gawk ’BEGIN{FS=","} /,d/{print $1}’ data1
data11
data21
data31
$

This example matches the comma used as the field separator in the regular expression. This is not always a good thing. It can lead to problems trying to match data specific to one data field that may also appear in another data field. If you need to match a regular expression to a specific data instance, you should use the matching operator.

The matching operator

The matching operator allows you to restrict a regular expression to a specific data field in the records. The matching operator is the tilde symbol (∼). You specify the matching operator, along with the data field variable, and the regular expression to match: $1 ~ /^data/

The $1 variable represents the first data field in the record. This expression filters records where the first data field starts with the text data. Here’s an example of using it in a gawk program script:

$ gawk ’BEGIN{FS=","} $2 ~ /^data2/{print $0}’ data1
data21,data22,data23,data24,data25
$

The matching operator compares the second data field with the regular expression /&data2/, which indicates the string starts with the text data2.This is a powerful tool that is commonly used in gawk program scripts to search for specific data elements in a data file:

$ gawk -F: ’$1 ~ /rich/{print $1,$NF}’ /etc/passwd
rich /bin/bash
$

This example searches the first data field for the text rich. When it finds the pattern in a record, it prints the first and last data field values of the record. You can also negate the regular expression match by using the ! symbol:$1 !~ /expression/ If the regular expression isn’t found in the record, the program script is applied to the record data:

$ gawk ’BEGIN{FS=","} $2 !~ /^data2/{print $1}’ data1
data11
data31
$

In this example the gawk program script prints the first data field of records where the second data field doesn’t start with the text data2.

Mathematical expressions

Besides regular expressions you can also use mathematical expressions in the matching pattern. This feature comes in handy when matching numerical values in data fields. For example, if you want to display all of the system users who belong to the root users group (group number 0), you could use this script:

$ gawk -F: ’$4 == 0{print $1}’ /etc/passwd
root
sync
shutdown
halt
operator
$

The script checks for records where the fourth data field contains the value 0. On my Linux system there are five user accounts that belong to the root user group. You can use any of the normal mathematical comparison expressions:

  • x == y: Value x is equal to y.
  • x ‹= y: Value x is less than or equal to y.
  • x ‹ y: Value x is less than y.
  • x ›= y: Value x is greater than or equal to y.
  • x › y: Value x is greater than y.

You can also use expressions with text data, but you must be careful. Unlike regular expressions, expressions are an exact match. The data must match exactly with the pattern:

$ gawk -F, ’$1 == "data"{print $1}’ data1
$
$ gawk -F, ’$1 == "data11"{print $1}’ data1
data11
$

The first test doesn’t match any records as the first data field value isn’t data in any of the records. The second test matches one record with the value data11.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Shell Scripting Topics