sorting - spark rank - scala based one second and third elemnts of tuple of RDD -


hi assign rank each row based on second element , third element of tuple ,here have sample data . add "1" if third element of tuple has max value against id . if tuple's third element has same values values , based 1 second element of tuple i.e-maximum of second element tuple should have "1" fourth element . other fourth elements of tuple values 0 . hope understand requirement :

    (id,second,third)->tuple     (32609,878,199)     (32609,832,199)     (45470,231,199)     (42482,1001,299)     (42482,16,291) 

code: *val rank=matching.map{{case (x1,x2,x3)=> (x1,x2,x3,((x3.toint*100000)+x2.toint).toint)}.sortby(-_.4).groupby(._1)*

result: rank.take(10).foreach(println)

(32609,compactbuffer((32609,878,199,19900878), (32609,832,199,19900832))) (45470,compactbuffer((45470,231,199,19900231))) (42482,compactbuffer((42482,1001,299,29901001), (42482,16,291,29100016))) 

desired output :

(32609,878,199,1) (32609,832,199,0) (45470,231,199,1) (42482,1001,299,1) (42482,16,291,0) 

seems can try following:

 val rank = matching.flatmap { case (x: string, y: string, z: string) =>      val yint = try(y.toint)     val zint = try(z.toint)     if (yint.issuccess && zint.issuccess) option((x, (yint.get, zint.get)))     else none  }.groupbykey().flatmap { case (key: string, tuples: iterable[(int, int)]) =>      val sorted = tuples.tolist.sortby(x => (-x._2, -x._1))      val toprank = (key, sorted.head._1, sorted.head._2, 1)      val restrank = (tup <- sorted.tail) yield (key, tup._1, tup._2, 0)      list(toprank) ++ restrank  } 

the initial flatmap performs typechecking , reorders tuples pairs. second flatmap (after groupbykey) sorts list 3rd , 2nd elements respectively , recreates tuples rank. note need import scala.util.try use this.

edit: modified ranking order per comment below.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -