awk
— Kaushal ModiCollection of awk
examples.
Simple example #
BEGIN { print 42 }
42
The AWK Programming Language #
This section contains awk examples and my notes from the The AWK Programming Language book by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger.
An AWK Tutorial #
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Getting Started #
$3 > 0 { print $1, $2 * $3 }
Kathy 40
Mark 100
Mary 121
Susie 76.5
The ,
between $1
and $2
renders as a space in the output by default. That can be changed.
$3 == 0 { print $1 }
Beth
Dan
The Structure of an AWK Program #
Each awk program in this chapter is a sequence of one or more pattern-action statements:
pattern { action }
pattern { action }
...
Either the pattern or the action (but not both) in a pattern-action statement may be omitted. If a pattern has no action as in Code Snippet 4, then each line that the pattern matches is printed.
$3 == 0
Beth 4.00 0
Dan 3.75 0
And if there is an action with no pattern, then the action is performed for every input line.
{ print $1 }
Beth
Dan
Kathy
Mark
Mary
Susie
Simple Output #
Printing Every Line #
{ print }
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
{ print $0 }
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Printing Certain Fields #
{ print $1, $3 }
Beth 0
Dan 0
Kathy 10
Mark 20
Mary 22
Susie 18
NF
, the Number of Fields #
{ print NF, $1, $NF }
3 Beth 0
3 Dan 0
3 Kathy 10
3 Mark 20
3 Mary 22
3 Susie 18
Computing and Printing #
{ print $1, $2 * $3 }
Beth 0
Dan 0
Kathy 40
Mark 100
Mary 121
Susie 76.5
Printing Line Numbers #
{ print NR, $0 }
1 Beth 4.00 0
2 Dan 3.75 0
3 Kathy 4.00 10
4 Mark 5.00 20
5 Mary 5.50 22
6 Susie 4.25 18
Putting Text in the Output #
{ print "total pay for", $1, "is", $2 * $3 }
total pay for Beth is 0
total pay for Dan is 0
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5
Fancier Output #
- The
print
statement is for quick and easy output. - The
printf
statement is used if you need to format the output exactly the way you want.
Lining Up Fields #
With printf
, no blanks or newlines are produced automatically; you
need to create them yourself. Note the \n
in the printf
statement
in Code Snippet 13.
{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
total pay for Beth is $0.00
total pay for Dan is $0.00
total pay for Kathy is $40.00
total pay for Mark is $100.00
total pay for Mary is $121.00
total pay for Susie is $76.50
{ printf("%-8s $%6.2f\n", $1, $2 * $3) }
Beth $ 0.00
Dan $ 0.00
Kathy $ 40.00
Mark $100.00
Mary $121.00
Susie $ 76.50
Selection #
Selection by Comparison #
$2 >= 5
Mark 5.00 20
Mary 5.50 22
Selection by Computation #
$2 * $3 > 50 { printf("$%.2f for %s\n", $2 * $3, $1) }
$100.00 for Mark
$121.00 for Mary
$76.50 for Susie
Selection by Text Content #
$1 == "Susie"
Susie 4.25 18
/y/
Kathy 4.00 10
Mary 5.50 22
/Mary|Beth/
Beth 4.00 0
Mary 5.50 22
It looks like regular expressions cannot be specified in relation to
field variables like $1
, $2
, etc. But I am most likely wrong. For
instance, the /y.*4/
in Code Snippet 20 does a match across fields.
/y.*4/
Kathy 4.00 10
!/y/
Beth 4.00 0
Dan 3.75 0
Mark 5.00 20
Susie 4.25 18
Combinations of Patterns #
Patterns can be combined with parentheses and the logic operators &&
(and), ||
(or) and !
(not).
$2 >= 4 || $3 >= 20
Beth 4.00 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Above, the lines that match both $2>=4
and $3>=20
conditions are
printed just once. But in the case of Code Snippet 23, where multiple
patterns are specified, the program prints a line twice if that line
matches both the conditions.
$2 >= 4
$3 >= 20
Beth 4.00 0
Kathy 4.00 10
Mark 5.00 20
Mark 5.00 20
Mary 5.50 22
Mary 5.50 22
Susie 4.25 18
Code Snippet 24 is a De Morgan’s law variant of Code Snippet 22. Note that the results are the exact same.
!($2 < 4 && $3 < 20)
Beth 4.00 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Data Validation #
When doing data validation, the lines are printed only when they do not match the desirable properties. Think of this use as that of assertions in SystemVerilog. Below, as any of the lines in the example input do not match the failure conditions, there is no output.
NF != 3 { print $0, "number of fields is not equal to 3" }
$2 < 3.35 { print $0, "rate is below minimum wage" }
$2 > 10 { print $0, "rate exceeds $10 per hour" }
$3 < 0 { print $0, "negative hours worked" }
$3 > 60 { print $0, "too many hours worked" }
BEGIN and END #
- The special pattern
BEGIN
matches before the first line of the first input file is read. END
matches after the last line of the last file has been processed.
BEGIN { print "NAME RATE HOURS"; print "" }
{ print }
NAME RATE HOURS
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
As noted from Code Snippet 26,
- You can put several statements on a single line if you separate them by semi-colons.
- The
print ""
prints a blank line. - Plain
print
prints the whole line.
Computing with AWK #
Counting #
- The user-created variables are not declared; you just use them.
- The default initial value of variables used as numbers (
awk
auto-detects that) is 0.
$3 > 15 { emp = emp + 1 }
END { print emp, "employees worked more than 15 hours" }
3 employees worked more than 15 hours
Computing Sums and Averages #
END { print NR, "employees" }
6 employees
{ pay = pay + $2 * $3 } # Nothing is printed by this line; only calculation happens
END { print NR, "employees"
print "total pay is", pay
print "average pay is", pay/NR
}
6 employees
total pay is 337.5
average pay is 56.25
Handling Text #
$2 > maxrate { maxrate = $2; maxemp = $1 } # Here maxrate and maxemp variables are updated conditionally; nothing is printed
END { print "highest hourly rate:", maxrate, "for", maxemp }
highest hourly rate: 5.50 for Mary
String Concatenation #
{ names = names $1 " " }
END { print names }
Beth Dan Kathy Mark Mary Susie
Awk automagically figures out that here the names
variable is used to hold string and sets its initial value to a null or empty string.
Printing the Last Input Line #
- Although
NR
retains its values in anEND
action,$0
does not.
So in the below code snippet, we use a user-defined variable last
to store the $0
value of the last line read.
{ last = $0 }
END { print last }
Susie 4.25 18
Built-in Functions #
{ print $1, length($1) }
Beth 4
Dan 3
Kathy 5
Mark 4
Mary 4
Susie 5
Counting Lines, Words and Characters #
{ nc = nc + length($0) + 1 # the trailing "+ 1" is to count the newline character for each line
# $0 does not include the newline
nw = nw + NF
}
END { print NR, "lines", nw, "words", nc, "characters" }
6 lines 18 words 106 characters
Control-Flow Statements #
The control flow statements can be used only in actions.
If-Else Statement #
Code Snippet 35 is similar to Code Snippet 29, but with an if
to protect
against division by zero when computing average.
$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END { if (n > 0)
printf("%d employees, total pay is %.2f, average pay is %.2f",
n, pay, pay/n) # Note that we can continue a long statement over several lines
# by breaking it after a comma.
else
print "no employees are paid more than $6/hour"
}
no employees are paid more than $6/hour
While Statement #
1000 | .06 | 5 |
---|---|---|
1000 | .12 | 7 |
# interest1 - compute compound interest
# input: amount rate years
# output: compounded value at the end of each year
{ i = 1
printf("Amount = %.2f, Rate = %.2f, Years = %.2f\n", $1, $2, $3)
while (i <= $3) {
printf("\tYear %d: %.2f\n", i, $1 * (1 + $2) ^ i)
i = i + 1
}
print ""
}
Amount = 1000.00, Rate = 0.06, Years = 5.00
Year 1: 1060.00
Year 2: 1123.60
Year 3: 1191.02
Year 4: 1262.48
Year 5: 1338.23
Amount = 1000.00, Rate = 0.12, Years = 7.00
Year 1: 1120.00
Year 2: 1254.40
Year 3: 1404.93
Year 4: 1573.52
Year 5: 1762.34
Year 6: 1973.82
Year 7: 2210.68