awk to output the percentage of a field compared to length -
the awk
below using sample input output following: basically, averages text in $5
matches if 7 < 30
.
awk '{if(len==0){last=$5;total=$7;len=1;getline}if($5!=last){printf("%s\t%f\n", last, total/len);last=$5;total=$7;len=1}else{total+=$7;len+=1}}end{printf("%s\t%f\n", last, total/len)}' input.txt > output.txt
sample input
chr 1 955542 955763 + agrn:exon.1 1 0 chr 1 955542 955763 + agrn:exon.1 2 0 chr 1 955542 955763 + agrn:exon.1 3 0 chr 1 955542 955763 + agrn:exon.1 4 1 chr 1 955542 955763 + agrn:exon.1 5 1 chr 1 955542 955763 + agrn:exon.1 6 1 .... .... chr 1 955542 955763 + agrn:exon.1 218 32 chr 1 955542 955763 + agrn:exon.1 219 32 chr 1 955542 955763 + agrn:exon.1 220 32 chr 1 955542 955763 + agrn:exon.1 221 29
output
agrn:exon.1 4.5714285
my question can not seem add correct syntax output total # of lines in $6
represent $5
, % of 7 < 30
know words may not helpful desired output help. thank :).
desired output
agrn:exon.1 4.5714285 3.16742% (221 (# of lines in `6' / 7 #3 of lines `< 30`)
i don't think program say. regardless might you're looking for
$ awk '$8<30{a[$6]+=$7;c[$6]++} {t[$6]++} end{for(i in a) print i,a[i]/c[i],(100*c[i]/t[i])"% ("t[i]" lines)"}' file
will give (after removing ...
rows input file
agrn:exon.1 34.5714 70% (10 lines)
Comments
Post a Comment