AWK Tool in Unix:The Data to Process.
Module 12: AWK Tool in Unix
AWK was developed in 1978 at the famous Bell Laboratories by Aho, Weinberger and Kernighan [3]1 to process structured data files. In programming languages it is very common to have a definition of a record which may have one or more data fields. In this context, it is common to define a file as a collection of records. Records are structured data items arranged in accordance with some specification, basically as a pre-assigned sequence of fields. The data fields may be separated by a space or a tab. In a data processing environment it very common to have such record-based files. For instance, an organisation may maintain a personnel file. Each record may contain fields like employee name, gender, date of joining the organisation, designation, etc. Similarly, if we look at files created to maintain pay accounts, student files in universities, etc. all have structured records with a set of fields. AWK is ideal for the data processing of such structured set of records. AWK comes in many flavors [14]. There is gawk which is GNU AWK. Presently we will assume the availability of the standard AWK program which comes bundled with every flavor of the Unix OS. AWK is also available in the MS environment.
12.1 The Data to Process
As AWK is used to process a structured set of records, we shall use a small file called awk.test given below. It has a structured set of records. The data in this file lists employee name, employee's hourly wage, and the number of hours the employee has worked.
(File awk.test)
bhatt 4.00 0
ulhas 3.75 2
ritu 5.0 4
vivek 2.0 3
We will use this candidate data file for a variety of processing requirements. Suppose we need to compute the amount due to each employee and print it as a report. One could write a C language program to do the task. However, using a tool like AWK makes it simpler and perhaps smarter. Note that if we have a tool, then it is always a good idea to use it. This is because it takes less time to get the results. Also, the process is usually less error prone. Let us use the awk command with input file awk.test as shown below: bhatt@falerno [CRUD] =>awk '$3 > 0 { print $1, $2 * $3 }' awk.test
ulhas 7.5
ritu 20
vivek 6
Note some features of the syntax above | the awk command, the quoted string following it and the data file name. We shall next discuss first a few simple syntax rules. More advanced features are explained through examples that are discussed in Section 12.2.
12.1.1 AWK Syntax
To run an AWK program we simply give an “awk” command with the following syntax:
awk [options] <awk_program> [input_file]
where the options may be like a file input instead of a quoted string. The following should be noted:
- Note that in the syntax awk 'awk_program' [input_files] , the option on input files may be empty. That suggests that awk would take whatever is typed immediately after the command is given.
- Also, note that fields in the data file are identified with a $ symbol prefix as in $1.
In the example above we have a very small AWK program. It is the quoted string reproduced below:
'$3 > 0 {print $1, $2 * $3}'
The interpretation is to print the name corresponding to $1, and the wages due by taking a product of rate corresponding to $2 multiplied with the number of hours corresponding to $3. In this string the $ prefixed integers identify the fields we wish to use.
- In preparing the output: {print} or {print $0} prints the whole line of output.
{print $1, $3} will print the selected fields.
In the initial example we had a one line awk program. Basically, we tried to match a pattern and check if that qualified the line for some processing or action. In general, we may have many patterns to match and actions to take on finding a matching pattern. In that case the awk program may have several lines of code. Typically such a program shall have the following structure:
pattern {action}
pattern {action}
pattern {action}
.
.
If we have many operations to perform we shall have many lines in the AWK program. It would be then imperative to put such a program in a file and AWKing it would require using a file input option as shown below. So if the awk program is very long and kept in a file, use the -f option as shown below:
awk -f 'awk_program_file_name' [input_files]
where the awk program file name contains the awk program.
Comments
Post a Comment