awk to count lines in column of file -


i have large file want use awk count lines in specific column $5, before the: , count -uniq entries, seem having trouble getting syntax correct. thank :).

sample input

chr1    955542  955763  +   agrn:exon.1 1   0 chr1    955542  955763  +   agrn:exon.1 2   0 chr1    955542  955763  +   agrn:exon.1 3   0 chr1    955542  955763  +   agrn:exon.1 4   1 chr1    955542  955763  +   agrn:exon.1 5   1  awk -f: ' nr > 1 { count += $5 } -uniq' input 

desired output

1 

$ awk -f'[ \t:]+' '{a[$5]=1;} end{for (k in a)n++; print n;}' input 1 
  • -f'[ \t:]+'

    this tells awk use spaces, tabs, or colons field separator.

  • a[$5]=1

    as loop through each line, adds entry associative array a each value of $5 encountered.

  • end{for (k in a)n++; print n;}

    after have finished reading file, counts number of keys in associative array a , prints total.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -