awk to output the percentage of a field compared to length -


the awk below using sample input output following: basically, averages text in $5 matches if 7 < 30.

awk '{if(len==0){last=$5;total=$7;len=1;getline}if($5!=last){printf("%s\t%f\n", last, total/len);last=$5;total=$7;len=1}else{total+=$7;len+=1}}end{printf("%s\t%f\n", last, total/len)}' input.txt > output.txt 

sample input

chr 1   955542  955763  +   agrn:exon.1 1   0 chr 1   955542  955763  +   agrn:exon.1 2   0 chr 1   955542  955763  +   agrn:exon.1 3   0 chr 1   955542  955763  +   agrn:exon.1 4   1 chr 1   955542  955763  +   agrn:exon.1 5   1 chr 1   955542  955763  +   agrn:exon.1 6   1 .... .... chr 1   955542  955763  +   agrn:exon.1 218 32 chr 1   955542  955763  +   agrn:exon.1 219 32 chr 1   955542  955763  +   agrn:exon.1 220 32 chr 1   955542  955763  +   agrn:exon.1 221 29 

output

agrn:exon.1 4.5714285 

my question can not seem add correct syntax output total # of lines in $6 represent $5 , % of 7 < 30 know words may not helpful desired output help. thank :).

desired output

agrn:exon.1 4.5714285 3.16742% (221 (# of lines in `6' / 7 #3 of lines `< 30`) 

i don't think program say. regardless might you're looking for

$ awk '$8<30{a[$6]+=$7;c[$6]++}              {t[$6]++}           end{for(i in a) print i,a[i]/c[i],(100*c[i]/t[i])"% ("t[i]" lines)"}' file 

will give (after removing ... rows input file

agrn:exon.1 34.5714 70% (10 lines) 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -