AWK Tool in Unix:Programming Examples.
Programming Examples
We shall now give a few illustrative examples. Along with the examples we shall also discuss many other features that make the task of processing easier.
• Example 1
Suppose we now need to find out if there was an employee who did no work. Clearly his hours work field should be equal to 0. We show the AWK program to get that.
bhatt@falerno [CRUD] =>awk '$3 == 0 {print $1}' awk.test bhatt
The basic operation here was to scan a sequence of input lines searching for the lines that match any of the patterns in the program. Patterns like $3 > 0 match the 3rd field when the field has a value > 0 in it.
An Aside: Try a few errors and see the error detection on the one line awk programs.
- Example 2
In this example we shall show the use of some of the built-in variables which help in organizing our data processing needs. These variables acquire meaning in the context of the data file. NF is a built in variable which stores the number of fields and can be used in such context as fprint NF, $1, $NFg which prints the number of fields, the first and the last field. Another built-in variable is NR, which takes the value of the number of lines read so far and can also be used in a print statement.
bhatt@falerno [CRUD] =>awk '$3 > 0 {print NR, NF, $1, $NF }' awk.test
3 3 ulhas 2
4 3 ritu 4
5 3 vivek 3
- Example 3
The formatted data in files is usually devoid of any redundancy. However, one needs to generate verbose output. This requires that we get the values and interspread the desired strings and generate a verbose and meaningful output. In this example we will demonstrate such a usage.
bhatt@falerno [CRUD] =>awk '$3 > 0 {print "person ", NR, $1, "be paid",
$2*$3,
"dollarperson 3 ulhas be paid 7.5 dollars
person 4 ritu be paid 20 dollars
person 5 vivek be paid 6 dollars
One can use printf to format the output like in the C programs.
bhatt@falerno [CRUD] =>awk '$3 > 0 {printf("%-8s be paid $%6.2f dollars
“n", $1,
$2*$3ulhas be paid $ 7.50 dollars
ritu be paid $ 20.00 dollars
vivek be paid $ 6.00 dollars
An Aside: One could sort the output by <awk_program> | sort i.e. by a pipe to sort.
- Example 4
In the examples below we basically explore many selection possibilities. In general the selection of lines may be by comparison involving computation. As an example, we may use $2 > 3.0 to mean if the rate of payment is greater than 3.0. We may check for if the total due is > 5, as $2*$3 > 5:0, which is an example of comparison by computation.
One may also use a selection by text content (essentially comparison in my opinion). This is done by enclosing the test as /bhatt/ to identify $1 being string “bhatt" as in $1 == /bhatt/.
Tests on patterns may involve relational or logical operators as $ >=; ||
Awk is excellent for data validation. Checks like the following may be useful.
. NF != 3 ... no. of fields not equal to 3
. $2 < 2.0 .. wage rate below min. stipulated
. $2 > 10.0 . ..........exceeding max. .....
. $3 < 0 ...no. of hrs worked -ve etc.
It should be remarked that data validation checks are a very important part of data processing activity. Often an organization may employ or outsource data preparation. An online data processing may result in disasters if the data is not validated. For instance, with a wrong hourly wage field we may end up creating a pay cheque which may be wrong. One needs to ensure that the data is in expected range lest an organization ends up paying at a rate below the minimum legal wage or pay extra-ordinarily high amounts to a low paid worker!
- Example 5
In these examples we demonstrate how we may prepare additional pads to give the formatted data a look of a report under preparation. For instance, we do not have headings for the tabulated output. One can generate meaningful headers and trailers for a tabulated output. Usually, an AWK program may have a BEGIN key word to identify some pre-processing that can help prepare headers before processing the data file. Similarly, an AWK program may be used to generate a trailer with END key word. The next example illustrates such a usage. For our example the header can be generated by putting BEGIN {print "Name Rate Hours"} as preamble to the AWK program as shown below.
bhatt@falerno [CRUD] =>awk 'BEGIN{ print"name rate hours"; print""} “
{print}' awk.test
Note that print "" prints a blank line and the next print reproduces the input. In general, BEGIN matches before the first line of input and END after the last line of input. The ; is used to separate the actions. Let us now look at a similar program with -f option.
file awk.prg is
BEGIN {print "NAME RATE HOURS"; print ""} { print $1," ",$2," ",$3,"..."}
- Example 6
Now we shall attempt some computing within awk. To perform computations we may sometimes need to employ user-defined variables. In this example “pay" shall be used as a user defined variable. The program accumulates the total amount to be paid in “pay". So the printing is done after the last line in the data file has been processed, i.e. in the END segment of awk program. In NR we obtain all the records processed (so the number of employees can be determined). We are able to do the computations like “pay" as a total as well as compute the average salary as the last step.
BEGIN {print "NAME RATE HOURS"; print ""}
{ pay = pay + $2*$3 }
END {print NR "employees"
print "total amount paid is : ", pay
print "with the average being :", pay/NR}
bhatt@falerno [CRUD] =>!a
awk -f prg2.awk awk.test
4 employees
total amount paid is : 33.5
with the average being : 8.375
A better looking output could be produced by using printf statement as in c. Here is another program with its output. In this program, note the computation of “maximum" values and also the concatenation of names in “emplist". These are user-defined data-structures. Note also the use of “last" to store the last record processed, i.e. $0 gets the record and we keep storing it in last as we go along.
BEGIN {print "NAME RATE HOURS"; print ""}
{pay = pay + $2*$3}
$2 > maxrate {maxrate = $2; maxemp = $1}
{emplist = emplist $1 " "}
{last = $0}
END {print NR " employees"
print "total amount paid is : ", pay
print "with the average being :", pay/NR
print "highest paid rate is for " maxemp, " @ of : ", maxrate
print emplist
print ""
print "the last employee record is : ", last}
output is
bhatt@falerno [CRUD] =>!a
awk -f prg3.awk test.data
4 employees
total amount paid is : 33.5
with the average being : 8.375
highest paid rate is for ritu @ of : 5.0
bhatt ulhas ritu vivek
the last employee record is : vivek 2.0 3
- Example 7
There are some builtin functions that can be useful. For instance, the function “length" helps one to compute the length of the argument field as the number of characters in that field. See the program and the corresponding output below:
nc = nc + length($1) + length($2) + length($3) + 4 }
{ nw = nw + NF }
END {print nc " characters and "; print ""
print nw " words and "; print ""
print NR, " lines in this file "}
bhatt@falerno [CRUD] =>!a
awk -f prg4.awk test.data
53 characters and
12 words and
4 lines in this file
- Example 8
AWK supports many control flow statements to facilitate programming. We will first use the if-else construct. Note the absence of "then" and how the statements are grouped for the case when the if condition evaluates to true. Also, in the program note the protection against division by 0.
BEGIN {print "NAME RATE HOURS"; print ""}
$2 > 6 {n = n+1; pay = pay + $2*$3}
$2 > maxrate {maxrate = $2; maxemp = $1}
{emplist = emplist $1 " "}
{last = $0}
END {print NR " employees in the company "
if ( n > 0 ) {print n, "employees in this bracket of salary. "
print "with an average salary of ", pay/n, "dollars"
} else print " no employee in this bracket of salary. "
print "highest paid rate is for " maxemp, " @ of : ", maxrate
print emplist
print ""}
This gives the result shown below:
bhatt@falerno [CRUD] =>!a
awk -f prg5.awk data.awk
4 employees in the company
no employee in this bracket of salary.
highest paid rate is for ritu @ of : 5.0
bhatt ulhas ritu vivek
Next we shall use a “while" loop2. In this example, we simply compute the compound interest that accrues each year for a five year period.
#compound interest computation
#input : amount rate years
#output: compounded value at the end of each year
{ i = 1; x = $1;
while (i <= $3)
{ x = x + (x*$2)
printf("“t%d”t%8.2f”n",i, x)
i = i + 1
}
}
The result is shown below:
bhatt@falerno [CRUD] =>!a
awk -f prg6.awk
1000 0.06 5
1 1060.00
2 1123.60
3 1191.02
4 1262.48
5 1338.23
AWK also supports a “for" statement as in
for (i = 1; i <= $3; i = i + 1)
which will have the same effect. AWK supports arrays too, as the program below demonstrates.
# reverse - print the input in reverse order ...
BEGIN {print "NAME RATE HOURS"; print ""}
{line_ar [NR] = $0} # remembers the input line in array line_ar
END {# prepare to print in reverse order as input is over now
for (i = NR; i >= 1; i = i-1)
print line_ar[i]
}
The result is shown below.
bhatt@falerno [CRUD] =>awk -f prg7.awk data.awk
Some One-liners
Next we mention a few one-liners that are now folklore in the AWK programming community. It helps to remember some of these at the time of writing programs in AWK.
1. Print the total no. of input lines: END {print NR}.
2. Print the 10th input line: NR = 10.
3. Print the last field of each line: “{print “$NF”}.
4. Print the last field of the last input line:
5. Print every input line with more than 4 fields: NF > 4.
6. Print every input line i which the last field is more than 4: $NF > 4.
7. Print the total number of fields in all input lines.
{nf = nf + NF}
END {print nf}
8. Print the total number of lines containing the string “bhatt".
/bhatt/ {nlines = nlines + 1}
END {print nlines}
9. Print the largest first field and the line that contains it.
$1 > max {max = $1; maxline = $0}
END {print max, maxline}
10. Print every line that has at least one field: NF > 0.
11. Print every line with > 80 characters: length($0) > 80.
12. Print the number of fields followed by the line itself.
{print NF, $0}
13. Print the first two fields in opposite order: {print $2, $1}.
14. Exchange the first two fields of every line and then print the line:
{temp = $1; $1 = $2, $2 = temp, print}
15. Print every line with the first field replaced by the line number:
{$1 = NR; print}
16. Print every line after erasing the second field:
{$2 = ""; print}
17. Print in reverse order the fields of every line:
{for (i = NF; i > 0; i = i-1) printf("%s ", $i) printf("“n")}
18. Print the sums of fields of every line:
{sum = 0
for (i = 1; i <= NF; i = i+1) sum = sum + $i print sum}
19. Add up all the fields in all the lines and print the sum:
{for (i = 1; i <= NF; i = i+1) sum = sum + $i} END {print sum}
20. Print every line after replacing each field by its absolute value:
{for (i = 1; i <= NF; i = i+1) if ($i < 0) $i = -$i print}
Comments
Post a Comment