awk to count lines in column of file -
i have large file want use awk
count lines in specific column $5
, before the:
, count -uniq
entries, seem having trouble getting syntax correct. thank :).
sample input
chr1 955542 955763 + agrn:exon.1 1 0 chr1 955542 955763 + agrn:exon.1 2 0 chr1 955542 955763 + agrn:exon.1 3 0 chr1 955542 955763 + agrn:exon.1 4 1 chr1 955542 955763 + agrn:exon.1 5 1 awk -f: ' nr > 1 { count += $5 } -uniq' input
desired output
1
$ awk -f'[ \t:]+' '{a[$5]=1;} end{for (k in a)n++; print n;}' input 1
-f'[ \t:]+'
this tells awk use spaces, tabs, or colons field separator.
a[$5]=1
as loop through each line, adds entry associative array
a
each value of$5
encountered.end{for (k in a)n++; print n;}
after have finished reading file, counts number of keys in associative array
a
, prints total.
Comments
Post a Comment