UNIX / Linux Regular Expressions with SED - Unix/Linux

What are UNIX / Linux Regular Expression with SED?

In this section, we will argue in specify about regular expressions with SED in UNIX.

A regular expression is a string that can be used to explain some sequences of characters. Regular expressions are used by several different UNIX commands, including end, seed, ask, grip, and to a more partial extent, VI.

Here SED stands for stream editor. This stream-oriented editor was formed totally for executing scripts. Thus, all the input you feed into it passes through and goes to STDOUT and it does not modify the input file.

Invoking seed

Before we start, let us ensure we have a local copy of /etc/passed text file to work with sed.

As mention before, seed can be invoked by sending data through a pipe to it as follows –

The cat command dumps the contents of /etc/passed to seed through the pipe into seed’s pattern space. The pattern space is the inner work buffer that seed uses for its operations.

The seed General Syntax

Following is the general syntax for seed –

Here, pattern is a normal expression, and action is one of the commands given in the following table. If pattern is absent, action is performing for every line as we have seen above.

The slash character (/) that setting the pattern is required because they are used as delimiters.

S.No. Range & Description
1
p
Prints the line
2
d
Deletes the line
3
s/pattern1/pattern2/
Substitutes the first occurrence of pattern1 with pattern2

Deleting All Lines with seed

We will now recognize how to remove all lines with sed. Invoke seed again; but the seed is now believed to use the editing command delete line, denote by the single letter d

Instead of invoking seed by sending a file to it through a pipe, the seed can be instructed to read the information from a file, as in the following instance.

The following command does exactly the same as in the preceding instance, without the cat command –

The seed also supports addresses. Addresses are also particular location in a file or a collection where an exacting editing command should be applied. When the seed encounter no address, it performs its operations on every line in the file.

The following command adds a basic address to the seed command you've been using –

Notice that the number 1 is added before the delete edit command. This instructs the seed to execute the editing command on the first line of the file. In this instance, the seed will remove the first line of /etc/password and print the rest of the file.

We will currently recognize how to work with the seed address ranges. So what if you want to eliminate more than one line from a file? You can identify an address range with seed as follows –

The above command will be useful on all the lines starting from 1 through 5. This deletes the first five lines.

Try out the following address ranges −

S.No. Range & Description
1
'4,10d'
Lines starting from the 4thtill the 10thare deleted
2
'10,4d'
Only 10thline is deleted, because the sed does not work in reverse direction
3
'4,+5d'
This matches line 4 in the file, deletes that line, continues to delete the next five lines, and then ceases its deletion and prints the rest
4
'2,5!d'
This deletes everything except starting from 2ndtill 5thline
5
'1~3d'
This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues to apply this pattern until the end of the file.
6
'2~2d'
This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached
7
'4,10p'
Lines starting from 4thtill 10thare printed
8
'4,d'
This generates the syntax error
9
',10d'
This would also generate syntax error

Note − While using the p act, you should use the -n selection to avoid repetition of line printing. Check the difference in between the following two instructions –

Check the above command without -n as follows –

The Substitution Command

The substitution command, denote by s, will substitute any string that you identify with any other string that you state.

To substitute one string with another, the seed desires to have the information on where the first string ends and the substitution string begins.

For this, we precede with bookending the two strings with the forward slash (/) character.

The following command substitutes the first occurrence on a line of the string root with the string am rood.

It is very main to note that seed substitutes only the first incidence on a line. If the string roots occur more than once on a line only the first match will be replaced.

For the seed to perform a global replacement, add the letter g to the end of the command as follows –

Substitution Flags

There are a number of other useful flags that can be passed in addition to the g flag, and you can identify more than one at a time.

 S.No. Flag & Description 1 g Replaces all matches, not just the first match 2 NUMBER Replaces only NUMBERthmatch 3 p If substitution was made, then prints the pattern space 4 w FILENAME If substitution was made, then writes result to FILENAME 5 I or i Matches in a case-insensitive manner 6 M or m In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and$ to match the empty string before a newline

Using an Alternative String Separator

Suppose you have to do an exchange on a string that includes the forward slash character. In this case, you can state a different separator by provide the nominated character after the s.

In the above instance, we have used: as the delimiter in its place of slash / because we were trying to search /root instead of the easy root.

Replacing with Empty Space

Use an empty substitution string to remove the root string from the /etc/passed file completely –

If you desire to substitute the string she with the string quiet only on line 10, you can denote it as follows –

Similarly, to do an address variety substitution, you could do something like the following –

As you can observe from the output, the first five lines had the string she altered to quiet, but the rest of the lines were left untouched.

The Matching Command

You would apply the p option along with the -n selection to print all the matching lines as follows –

Using Regular Expression

While corresponding patterns, you can use the regular expression which provide more flexibility.

Check the following instance which match all the lines starting with daemon and then delete them –

Following is the instance which delete all the lines ending with she –

The following table lists four particular characters that are very useful in accepted expressions.

S.No. Character & Description
1
^
Matches the beginning of lines
2
$Matches the end of lines 3 . Matches any single character 4 * Matches zero or more occurrences of the previous character 5 [chars] Matches any one of the characters given in chars, where chars is a sequence of characters. You can use the - character to indicate a range of characters. Matching Characters Look at a few more expressions to display the use of met characters. For instance, the following pattern − S.No. Expression & Description 1 /a.c/ Matches lines that contain strings such asa+c,a-c,abc,match, anda3c 2 /a*c/ Matches the same strings along with strings such asace,yacc, andarctic 3 /[tT]he/ Matches the stringTheandthe 4 /^$/
Matches blank lines
5
/^.*$/ Matches an entire line whatever it is 6 / */ Matches one or more spaces 7 /^$/
Matchesblanklines

Following table shows some frequently used sets of characters –

S.No. Set & Description
1
[a-z]
Matches a single lowercase letter
2
[A-Z]
Matches a single uppercase letter
3
[a-zA-Z]
Matches a single letter
4
[0-9]
Matches a single number
5
[a-zA-Z0-9]
Matches a single letter or number

Character Class Keywords

Some particular keywords are normally obtainable to regexps, mainly GNU utilities that use regexps. These are very useful for seed regular expressions as they complicate things and enhance readability.

For instance, the characters a through z and the font A through Z, represent one such class of characters that has the keyword [[: alpha:]]
Using the alphabet character class keyword, this command print only those lines in the /etc/syslog.conf file that start with a letter of the alphabet –

The following table is an entire list of the presented character class keywords in GNU sed.

S.No. Character Class & Description
1
[[:alnum:]]
Alphanumeric [a-z A-Z 0-9]
2
[[:alpha:]]
Alphabetic [a-z A-Z]
3
[[:blank:]]
Blank characters (spaces or tabs)
4
[[:cntrl:]]
Control characters
5
[[:digit:]]
Numbers [0-9]
6
[[:graph:]]
Any visible characters (excludes whitespace)
7
[[:lower:]]
Lowercase letters [a-z]
8
[[:print:]]
Printable characters (non-control characters)
9
[[:punct:]]
Punctuation characters
10
[[:space:]]
Whitespace
11
[[:upper:]]
Uppercase letters [A-Z]
12
[[:xdigit:]]
Hex digits [0-9 a-f A-F]

Ampersand Referencing

The seed met character & represents the contents of the pattern that was matched. For instance, say you have a file called phone.txt full of phone numbers, such as the following –

You desire to make the area code (the first three digits) surrounded by parentheses for easier reading. To do this, you can use the ampersand substitution character –

Here in the pattern part you are matching the first 3 digits and then using &you are replacing those 3 digits with the surrounding parentheses.

Using Multiple seed Commands

You can use many seed commands in a single seed command as follows –

Here command1 through command are seed commands of the type discuss formerly. These commands are applied to each of the lines in the list of files specified by files.

Using the same mechanism, we can write the above phone number instance as follows –

Note − in the above instance, instead of repeat the character class keyword [[: digit:]] three times, we replaced it with \{3\}, which means the above regular expression is corresponding three times. We have also used \ to give line break and this has to be isolated before the command is run.

Back References

The ampersand met character is useful, but even more useful is the aptitude to define particular regions in regular expressions. These unique regions can be used as reference in your replacement strings. By defining exact parts of a regular expression, you can then refer back to those parts with a special orientation character.

To do back references, you have to first describe a state and then consign back to that state. To define a region, you insert backslashes parentheses around each region of interest. The first region that you surround with backslashes is then referenced by \1, the second region by \2, and so on.

Assuming phone.txt has the following text –

Try the following command –

Note − in the above instance, each regular expression inside the parenthesis would be back referenced by \1, \2 and so on. We have used \ to give line break here. This should be separate before running the command.