Emacs, scripting and anything text oriented.

awk

Kaushal Modi

Collection of awk examples.

Simple example #

BEGIN { print 42 }
42

The AWK Programming Language #

This section contains awk examples and my notes from the The AWK Programming Language book by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger.

An AWK Tutorial #

Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Code Snippet 1: Example Input used for examples

Getting Started #

$3 > 0 { print $1, $2 * $3 }
Code Snippet 2: Print total salary only for the employees who have worked for non-zero hours
Kathy 40
Mark 100
Mary 121
Susie 76.5

The , between $1 and $2 renders as a space in the output by default. That can be changed.

$3 == 0 { print $1 }
Code Snippet 3: Print employees who did not work
Beth
Dan
The Structure of an AWK Program #

Each awk program in this chapter is a sequence of one or more pattern-action statements:

pattern { action }
pattern { action }
...

Either the pattern or the action (but not both) in a pattern-action statement may be omitted. If a pattern has no action as in Code Snippet 4, then each line that the pattern matches is printed.

$3 == 0
Code Snippet 4: Pattern without action
Beth    4.00   0
Dan     3.75   0

And if there is an action with no pattern, then the action is performed for every input line.

{ print $1 }
Code Snippet 5: Action without pattern
Beth
Dan
Kathy
Mark
Mary
Susie

Simple Output #

Printing Every Line #

{ print }
Code Snippet 6: Printing every line – 1
Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

{ print $0 }
Code Snippet 7: Printing every line – 2
Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Printing Certain Fields #

{ print $1, $3 }
Code Snippet 8: Printing certain fields
Beth 0
Dan 0
Kathy 10
Mark 20
Mary 22
Susie 18
NF, the Number of Fields #

{ print NF, $1, $NF }
Code Snippet 9: Print number of fields, first field and last field for each input line
3 Beth 0
3 Dan 0
3 Kathy 10
3 Mark 20
3 Mary 22
3 Susie 18
Computing and Printing #

{ print $1, $2 * $3 }
Code Snippet 10: Do computations on field values
Beth 0
Dan 0
Kathy 40
Mark 100
Mary 121
Susie 76.5
Printing Line Numbers #

{ print NR, $0 }
Code Snippet 11: NR, Number of lines Read
1 Beth    4.00   0
2 Dan     3.75   0
3 Kathy   4.00   10
4 Mark    5.00   20
5 Mary    5.50   22
6 Susie   4.25   18
Putting Text in the Output #

{ print "total pay for", $1, "is", $2 * $3 }
Code Snippet 12: Concatenating text in the output
total pay for Beth is 0
total pay for Dan is 0
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5

Fancier Output #

  • The print statement is for quick and easy output.
  • The printf statement is used if you need to format the output exactly the way you want.
Lining Up Fields #

With printf, no blanks or newlines are produced automatically; you need to create them yourself. Note the \n in the printf statement in Code Snippet 13.

{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
Code Snippet 13: printf example
total pay for Beth is $0.00
total pay for Dan is $0.00
total pay for Kathy is $40.00
total pay for Mark is $100.00
total pay for Mary is $121.00
total pay for Susie is $76.50

{ printf("%-8s $%6.2f\n", $1, $2 * $3) }
Code Snippet 14: Justification using printf
Beth     $  0.00
Dan      $  0.00
Kathy    $ 40.00
Mark     $100.00
Mary     $121.00
Susie    $ 76.50

Selection #

Selection by Comparison #

$2 >= 5
Code Snippet 15: awk program with just a comparison pattern
Mark    5.00   20
Mary    5.50   22
Selection by Computation #

$2 * $3 > 50 { printf("$%.2f for %s\n", $2 * $3, $1) }
Code Snippet 16: Print details only for employees making more than $50
$100.00 for Mark
$121.00 for Mary
$76.50 for Susie
Selection by Text Content #

$1 == "Susie"
Code Snippet 17: Literal string match
Susie   4.25   18

/y/
Code Snippet 18: Regular expression match – 1
Kathy   4.00   10
Mary    5.50   22

/Mary|Beth/
Code Snippet 19: Regex OR expression
Beth    4.00   0
Mary    5.50   22

It looks like regular expressions cannot be specified in relation to field variables like $1, $2, etc. But I am most likely wrong. For instance, the /y.*4/ in Code Snippet 20 does a match across fields.

/y.*4/
Code Snippet 20: Regular expression match – 2
Kathy   4.00   10

!/y/
Code Snippet 21: Regular expression not matching
Beth    4.00   0
Dan     3.75   0
Mark    5.00   20
Susie   4.25   18
Combinations of Patterns #

Patterns can be combined with parentheses and the logic operators && (and), || (or) and ! (not).

$2 >= 4 || $3 >= 20
Code Snippet 22: Logical operators in patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

Above, the lines that match both $2>=4 and $3>=20 conditions are printed just once. But in the case of Code Snippet 23, where multiple patterns are specified, the program prints a line twice if that line matches both the conditions.

$2 >= 4
$3 >= 20
Code Snippet 23: Multiple patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mark    5.00   20
Mary    5.50   22
Mary    5.50   22
Susie   4.25   18

Code Snippet 24 is a De Morgan’s law variant of Code Snippet 22. Note that the results are the exact same.

!($2 < 4 && $3 < 20)
Code Snippet 24: Logical operators in patterns
Beth    4.00   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18
Data Validation #

When doing data validation, the lines are printed only when they do not match the desirable properties. Think of this use as that of assertions in SystemVerilog. Below, as any of the lines in the example input do not match the failure conditions, there is no output.

NF != 3   { print $0, "number of fields is not equal to 3" }
$2 < 3.35 { print $0, "rate is below minimum wage" }
$2 > 10   { print $0, "rate exceeds $10 per hour" }
$3 < 0    { print $0, "negative hours worked" }
$3 > 60   { print $0, "too many hours worked" }
Code Snippet 25: Data validation or Assertions
BEGIN and END #
  • The special pattern BEGIN matches before the first line of the first input file is read.
  • END matches after the last line of the last file has been processed.

BEGIN { print "NAME    RATE   HOURS"; print "" }
      { print }
Code Snippet 26: Using BEGIN to print heading
NAME    RATE   HOURS

Beth    4.00   0
Dan     3.75   0
Kathy   4.00   10
Mark    5.00   20
Mary    5.50   22
Susie   4.25   18

As noted from Code Snippet 26,

  • You can put several statements on a single line if you separate them by semi-colons.
  • The print "" prints a blank line.
  • Plain print prints the whole line.

Computing with AWK #

Counting #
  • The user-created variables are not declared; you just use them.
  • The default initial value of variables used as numbers (awk auto-detects that) is 0.

$3 > 15 { emp = emp + 1 }
END     { print emp, "employees worked more than 15 hours" }
Code Snippet 27: User-created variables
3 employees worked more than 15 hours
Computing Sums and Averages #

END { print NR, "employees" }
Code Snippet 28: Print the number of lines
6 employees

    { pay = pay + $2 * $3 } # Nothing is printed by this line; only calculation happens
END { print NR, "employees"
      print "total pay is", pay
      print "average pay is", pay/NR
    }
Code Snippet 29: Using NR to compute the average pay
6 employees
total pay is 337.5
average pay is 56.25
Handling Text #

$2 > maxrate { maxrate = $2; maxemp = $1 } # Here maxrate and maxemp variables are updated conditionally; nothing is printed
END          { print "highest hourly rate:", maxrate, "for", maxemp }
Code Snippet 30: Find the employee who is paid the most per hour
highest hourly rate: 5.50 for Mary
String Concatenation #

    { names = names $1 " " }
END { print names }
Code Snippet 31: Concatenate strings with spaces in-between
Beth Dan Kathy Mark Mary Susie

Awk automagically figures out that here the names variable is used to hold string and sets its initial value to a null or empty string.

Printing the Last Input Line #
  • Although NR retains its values in an END action, $0 does not.

So in the below code snippet, we use a user-defined variable last to store the $0 value of the last line read.

    { last = $0 }
END { print last }
Code Snippet 32: Print the last line
Susie   4.25   18
Built-in Functions #

{ print $1, length($1) }
Code Snippet 33: In-built function length
Beth 4
Dan 3
Kathy 5
Mark 4
Mary 4
Susie 5
Counting Lines, Words and Characters #

    { nc = nc + length($0) + 1 # the trailing "+ 1" is to count the newline character for each line
                               # $0 does not include the newline
      nw = nw + NF
    }
END { print NR, "lines", nw, "words", nc, "characters" }
Code Snippet 34: Count lines, words, chars
6 lines 18 words 106 characters

Control-Flow Statements #

The control flow statements can be used only in actions.

If-Else Statement #

Code Snippet 35 is similar to Code Snippet 29, but with an if to protect against division by zero when computing average.

$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END    { if (n > 0)
           printf("%d employees, total pay is %.2f, average pay is %.2f",
                  n, pay, pay/n) # Note that we can continue a long statement over several lines
                                 # by breaking it after a comma.
         else
           print "no employees are paid more than $6/hour"
       }
Code Snippet 35: Sum and average pay of employees making more than $6/hr
no employees are paid more than $6/hour
While Statement #

Table 1: Input for interest1 program
1000.065
1000.127

# interest1 - compute compound interest
#  input: amount rate years
#  output: compounded value at the end of each year
{ i = 1
  printf("Amount = %.2f, Rate = %.2f, Years = %.2f\n", $1, $2, $3)
  while (i <= $3) {
    printf("\tYear %d: %.2f\n", i, $1 * (1 + $2) ^ i)
    i = i + 1
  }
  print ""
}
Code Snippet 36: Calculate compound interest
Amount = 1000.00, Rate = 0.06, Years = 5.00
	Year 1: 1060.00
	Year 2: 1123.60
	Year 3: 1191.02
	Year 4: 1262.48
	Year 5: 1338.23

Amount = 1000.00, Rate = 0.12, Years = 7.00
	Year 1: 1120.00
	Year 2: 1254.40
	Year 3: 1404.93
	Year 4: 1573.52
	Year 5: 1762.34
	Year 6: 1973.82
	Year 7: 2210.68
Webmentions
Comment by Anonymous on Mon Jan 8, 2024 07:41 EST

“It looks like regular expressions cannot be specified in relation to field variables like $1, $2, etc. But I am most likely wrong.”

You are right that you’re wrong. ;) Try for example:

$1 ~ “a.*y”