Hadoop YARN clusters - Adding node at runtime -


i working on solution providing run-time ressources addition hadoop yarn cluster. purpose handle heavy peaks on our application.

i not expert , need in order approve / contest understand.

hadoop yarn

this application run in cluster-mode. provides ressource management (cpu & ram). spark application, example, ask job done. yarn handles request , provides executor computing on yarn cluster.

hdfs - data & executors

the datas not shared through executors, have stored in file system. in case : hdfs. means have run copy of spark streaming application in new server (hadoop node).

i not sure of this :

the yarn cluster , hdfs different, writing on hdfs won't write on new hadoop node local data (because not hdfs node).

as write on hdfs new data spark streaming application, creating new application should not problem.

  1. submit job yarn
    --- peak, resources needed
  2. instance new server
  3. install / configure hadoop & yarn, making slave

    • modifying hadoop/conf/slaves, adding it's ip adress (or dns name host file)
    • moddifying dfs.include , mapred.include

      on host machine :

    • yarn -refreshnodes
    • bin/hadoop dfsadmin -refreshnodes
    • bin/hadoop mradmin -refreshnodes

should work ? refreshqueues sounds not useful here seems take care of process queue.

i not sure if running job increase it's capacity. idea wait new ressources available , submit new job.

thanks help


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -