Emacs, scripting and anything text oriented.

awk

Collection of awk examples.
Kaushal Modi

Simple example #

BEGIN { print 42 }
42

The AWK Programming Language #

This section contains awk examples and my notes from the The AWK Programming Language book by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger.

An AWK Tutorial #

Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Code Snippet 1: Example Input used for examples

Getting Started #

$3 > 0 { print $1, $2 * $3 }
Code Snippet 2: Print total salary only for the employees who have worked for non-zero hours
Kathy 40
Mark 100
Mary 121
Susie 76.5

The , between $1 and $2 renders as a space in the output by default. That can be changed.

$3 == 0 { print $1 }
Code Snippet 3: Print employees who did not work
Beth
Dan
The Structure of an AWK Program #

Each awk program in this chapter is a sequence of one or more pattern-action statements:

pattern { action }
pattern { action }
...

Either the pattern or the action (but not both) in a pattern-action statement may be omitted. If a pattern has no action as in Code Snippet 4, then each line that the pattern matches is printed.

$3 == 0
Code Snippet 4: Pattern without action
Beth    4.00   0
Dan     3.75   0

And if there is an action with no pattern, then the action is performed for every input line.

{ print $1 }
Code Snippet 5: Action without pattern
Beth
Dan
Kathy
Mark
Mary
Susie

Simple Output #

Printing Every Line #

{ print }
Code Snippet 6: Printing every line – 1
Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

{ print $0 }
Code Snippet 7: Printing every line – 2
Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Printing Certain Fields #

{ print $1, $3 }
Code Snippet 8: Printing certain fields
Beth 0
Dan 0
Kathy 10
Mark 20
Mary 22
Susie 18
NF, the Number of Fields #

{ print NF, $1, $NF }
Code Snippet 9: Print number of fields, first field and last field for each input line
3 Beth 0
3 Dan 0
3 Kathy 10
3 Mark 20
3 Mary 22
3 Susie 18
Computing and Printing #

{ print $1, $2 * $3 }
Code Snippet 10: Do computations on field values
Beth 0
Dan 0
Kathy 40
Mark 100
Mary 121
Susie 76.5
Printing Line Numbers #

{ print NR, $0 }
Code Snippet 11: NR, Number of lines Read
1 Beth    4.00   0
2 Dan     3.75   0
3 Kathy   4.00   10
4 Mark    5.00   20
5 Mary    5.50   22
6 Susie   4.25   18
Putting Text in the Output #

{ print "total pay for", $1, "is", $2 * $3 }
Code Snippet 12: Concatenating text in the output
total pay for Beth is 0
total pay for Dan is 0
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5

Fancier Output #

  • The print statement is for quick and easy output.
  • The printf statement is used if you need to format the output exactly the way you want.
Lining Up Fields #

With printf, no blanks or newlines are produced automatically; you need to create them yourself. Note the \n in the printf statement in example 13.

{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
Code Snippet 13: printf example
total pay for Beth is $0.00
total pay for Dan is $0.00
total pay for Kathy is $40.00
total pay for Mark is $100.00
total pay for Mary is $121.00
total pay for Susie is $76.50

{ printf("%-8s $%6.2f\n", $1, $2 * $3) }
Code Snippet 14: Justification using printf
Beth     $  0.00
Dan      $  0.00
Kathy    $ 40.00
Mark     $100.00
Mary     $121.00
Susie    $ 76.50

Selection #

Selection by Comparison #

$2 >= 5
Code Snippet 15: awk program with just a comparison pattern
Mark    5.00   20
Mary    5.50   22
Selection by Computation #

$2 * $3 > 50 { printf("$%.2f for %s\n", $2 * $3, $1) }
Code Snippet 16: Print details only for employees making more than $50
$100.00 for Mark
$121.00 for Mary
$76.50 for Susie
Selection by Text Content #

$1 == "Susie"
Code Snippet 17: Literal string match
Susie   4.25   18

/y/
Code Snippet 18: Regular expression match – 1
Kathy   4.00   10
Mary    5.50   22

It looks like regular expressions cannot be specified in relation to field variables like $1, $2, etc. But I am most likely wrong. For instance, the /y.*4/ in example 19 does a match across fields.

/y.*4/
Code Snippet 19: Regular expression match – 2
Kathy   4.00   10
Combinations of Patterns #

Patterns can be combined with parentheses and the logic operators && (and), || (or) and ! (not).

$2 >= 4 || $3 >= 20
Code Snippet 20: Logical operators in patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

Above, the lines that match both $2>=4 and $3>=20 conditions are printed just once. But in the case of below example 21, where multiple patterns are specified, the program prints a line twice if that line matches both the conditions.

$2 >= 4
$3 >= 20
Code Snippet 21: Multiple patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mark    5.00   20
Mary    5.50   22
Mary    5.50   22
Susie   4.25   18

Code snippet 22 is a De Morgan’s law variant of code snippet 20. Note that the results are the exact same.

!($2 < 4 && $3 < 20)
Code Snippet 22: Logical operators in patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Data Validation #

When doing data validation, the lines are printed only when they do not match the desirable properties. Think of this use as that of assertions in SystemVerilog. Below, as any of the lines in the example input do not match the failure conditions, there is no output.

NF != 3   { print $0, "number of fields is not equal to 3" }
$2 < 3.35 { print $0, "rate is below minimum wage" }
$2 > 10   { print $0, "rate exceeds $10 per hour" }
$3 < 0    { print $0, "negative hours worked" }
$3 > 60   { print $0, "too many hours worked" }
Code Snippet 23: Data validation or Assertions
BEGIN and END #
  • The special pattern BEGIN matches before the first line of the first input file is read.
  • END matches after the last line of the last file has been processed.

BEGIN { print "NAME    RATE   HOURS"; print "" }
      { print }
Code Snippet 24: Using BEGIN to print heading
NAME    RATE   HOURS

Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

As noted from code snippet 24,

  • You can put several statements on a single line if you separate them by semi-colons.
  • The print "" prints a blank line.
  • Plain print prints the whole line.

Computing with AWK #

Counting #
  • The user-created variables are not declared; you just use them.
  • The default initial value of variables used as numbers (awk auto-detects that) is 0.

$3 > 15 { emp = emp + 1 }
END     { print emp, "employees worked more than 15 hours" }
Code Snippet 25: User-created variables
3 employees worked more than 15 hours
Computing Sums and Averages #

END { print NR, "employees" }
Code Snippet 26: Print the number of lines
6 employees

    { pay = pay + $2 * $3 } # Nothing is printed by this line; only calculation happens
END { print NR, "employees"
      print "total pay is", pay
      print "average pay is", pay/NR
    }
Code Snippet 27: Using NR to compute the average pay
6 employees
total pay is 337.5
average pay is 56.25
Handling Text #

$2 > maxrate { maxrate = $2; maxemp = $1 } # Here maxrate and maxemp variables are updated conditionally; nothing is printed
END          { print "highest hourly rate:", maxrate, "for", maxemp }
Code Snippet 28: Find the employee who is paid the most per hour
highest hourly rate: 5.50 for Mary
String Concatenation #

    { names = names $1 " " }
END { print names }
Code Snippet 29: Concatenate strings with spaces in-between
Beth Dan Kathy Mark Mary Susie

Awk automagically figures out that here the names variable is used to hold string and sets its initial value to a null or empty string.

Printing the Last Input Line #
  • Although NR retains its values in an END action, $0 does not.

So in the below code snippet, we use a user-defined variable last to store the $0 value of the last line read.

    { last = $0 }
END { print last }
Code Snippet 30: Print the last line
Susie   4.25   18
Built-in Functions #

{ print $1, length($1) }
Code Snippet 31: In-built function length
Beth 4
Dan 3
Kathy 5
Mark 4
Mary 4
Susie 5
Counting Lines, Words and Characters #

    { nc = nc + length($0) + 1 # the trailing "+ 1" is to count the newline character for each line
                               # $0 does not include the newline
      nw = nw + NF
    }
END { print NR, "lines", nw, "words", nc, "characters" }
Code Snippet 32: Count lines, words, chars
6 lines 18 words 106 characters

Control-Flow Statements #

The control flow statements can be used only in actions.

If-Else Statement #

Below code snippet 33 is similar to code snippet 27, but with an if to protect against division by zero when computing average.

$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END    { if (n > 0)
           printf("%d employees, total pay is %.2f, average pay is %.2f",
                  n, pay, pay/n) # Note that we can continue a long statement over several lines
                                 # by breaking it after a comma.
         else
           print "no employees are paid more than $6/hour"
       }
Code Snippet 33: Sum and average pay of employees making more than $6/hr
no employees are paid more than $6/hour
While Statement #

Table 1: Input for interest1 program
1000.065
1000.127

# interest1 - compute compound interest
#  input: amount rate years
#  output: compounded value at the end of each year
{ i = 1
  printf("Amount = %.2f, Rate = %.2f, Years = %.2f\n", $1, $2, $3)
  while (i <= $3) {
    printf("\tYear %d: %.2f\n", i, $1 * (1 + $2) ^ i)
    i = i + 1
  }
  print ""
}
Code Snippet 34: Calculate compound interest
Amount = 1000.00, Rate = 0.06, Years = 5.00
	Year 1: 1060.00
	Year 2: 1123.60
	Year 3: 1191.02
	Year 4: 1262.48
	Year 5: 1338.23

Amount = 1000.00, Rate = 0.12, Years = 7.00
	Year 1: 1120.00
	Year 2: 1254.40
	Year 3: 1404.93
	Year 4: 1573.52
	Year 5: 1762.34
	Year 6: 1973.82
	Year 7: 2210.68