Emacs, scripting and anything text oriented.

How to remove duplicate lines using awk?

If you type echo "Hi\nHow\nHi\nAre\nHi\nYou?\nAre", you will get this in your terminal:

Hi
How
Hi
Are
Hi
You?
Are

Here’s how we can remove the duplicate lines using awk ..

echo "Hi\nHow\nHi\nAre\nHi\nYou?\nAre" |  awk '\!x[$0]++'

The above will give this output:

Hi
How
Are
You?

The escape char \ is required for ! in tcsh.

This is how that awk snippet works:

  • Initially the x array will be empty.
  • When $0 is Hi, x[$0]=x[Hi]=0. So !x[Hi] will be True and it will be printed out.
  • After that the x[Hi] becomes 1 because of the ++ increment operator.
  • Next time when $0==Hi, as x[Hi]==1, !x[Hi] will be False and so $0 won’t be printed out.

Comments

comments powered by Disqus