Search and Sort Tools:grep, egrep and fgrep
Module 11: Search and Sort Tools
Unix philosophy is to provide a rich set of generic tools, each with a variety of options. These primitive Unix tools can be combined in imaginative ways (by using pipes) to enhance user productivity. The tool suite also facilitates to build either a user customized application or a more sophisticated and specialised tool.
We shall discuss many primitive tools that are useful in the context of text files. These tools are often called “filters” because these tools help in searching the presence or absence of some specified pattern(s) of text in text files. Tools that fall in this category include ls, grep, and find. For viewing the output from these tools one uses tools like more, less, head, tail. The sort tool helps to sort and tools like wc help to obtain statistics about files. In this chapter we shall dwell upon each of these tools briefly. We shall also illustrate some typical contexts of usage of these tools.
11.1 grep, egrep and fgrep
grep stands for general regular expression parser. egrep is an enhanced version of grep. It allows a greater range of regular expressions to be used in pattern matching. fgrep is for fast but fixed string matching. As a tool, grep usage is basic and it follows the syntax: grep options pattern files with the semantics that search the file(s) for lines with the pattern and options as command modifiers. The following example1 shows how we can list the lines with int declarations in a program called add.c. Note that we could use this trick to collate all the declarations from a set of files to make a common include file of definitions.
bhatt@SE-0 [~/UPE] >>grep int ./C/add.c
extern int a1; /* First operand */
extern int a2; /* Second operand */
extern int add();
printf("The addition gives %d \n", add());
Table 11.1: Regular expression options.
Table 11.2: Regular expression combinations.
Note: print has int in it !!
In other words, grep matches string literals. A little later we will see how we may use options to make partial patterns for intelligent searches. We could have used *.c to list the lines in all the c programs in that directory. In such a usage it is better to use it as shown in the example below
grep int ./C/*.c | more
This shows the use of a pipe with another tool more which is a good screen viewing tool. more offers a one screen at a time view of a long file. As stated in the last chapter, there is a program called less which additionally permits scrolling.
Regular Expression Conventions: Table 11.1 shows many of the grep regular expression conventions. In Table 11.1, RE, RE1, and RE2 denote regular expressions. In practice we may combine Regular Expressions in arbitrary ways as shown in Table 11.2. egrep is an enhanced grep that allows additionally the above pattern matching capabilities. Note that an RE may be enclosed in parentheses. To practice the above we make a file called testfile with entries as shown. Next, we shall try matching patterns using various options. Below we show a session using our text file called testfile.
aaa
a1a1a1
456
10000001
This is a test file.
bhatt@SE-0 [F] >>grep '[0-9]' testfile
a1a1a1
456
10000001
bhatt@SE-0 [F] >>grep '^4' testfile
456
bhatt@SE-0 [F] >>grep '1$' testfile
a1a1a1
10000001
bhatt@SE-0 [F] >>grep '[A-Z]' testfile
This is a test file.
bhatt@SE-0 [F] >>grep '[0-4]' testfile
a1a1a1
456
10000001
bhatt@SE-0 [F] >>fgrep '000' testfile
10000001
bhatt@SE-0 [F] >>egrep '0..' testfile
10000001
\ The back slash is used to consider a special character literally. This is required when the character used is also a command option as in case of -, * etc.
See the example below where we are matching a period symbol.
bhatt@SE-0 [F] >>grep '\.' testfile
This is a test file.
We may use a character's characteristics as options in grep. The options available are shown in Table 11.3.
bhatt@SE-0 [F] >>grep -v 'a1' testfile
aaa
456
10000001
This is a test file.
bhatt@SE-0 [F] >>grep 'aa' testfile
aaa
bhatt@SE-0 [F] >>grep -w 'aa' testfile
bhatt@SE-0 [F] >>grep -w 'aaa' testfile
aaa
bhatt@SE-0 [F] >>grep -l 'aa' testfile
testfile
Context of use: Suppose we wish to list all sub-directories in a certain directory.
ls -l | grep ^d
bhatt@SE-0 [M] >>ls -l | grep ^d
drwxr-xr-x 2 bhatt bhatt 512 Oct 15 13:15 M1
drwxr-xr-x 2 bhatt bhatt 512 Oct 15 12:37 M2
drwxr-xr-x 2 bhatt bhatt 512 Oct 15 12:37 M3
drwxr-xr-x 2 bhatt bhatt 512 Oct 16 09:53 RAND
Suppose we wish to select a certain font and also wish to find out if it is available as a bold font with size 18. We may list these with the instruction shown below.
xlsfonts | grep bold\-18 | more
bhatt@SE-0 [M] >>xlsfonts | grep bold\-18 | more
lucidasans-bold-18
lucidasans-bold-18
lucidasanstypewriter-bold-18
lucidasanstypewriter-bold-18
Suppose we wish to find out at how many terminals a certain user is logged in at the moment. The following command will give us the required information:
who | grep username | wc -l > count
The wc with -l options gives the count of lines. Also, who|grep will output one line for every line matched with the given pattern (username) in it.
Comments
Post a Comment