Hadoop YARN clusters - Adding node at runtime -
i working on solution providing run-time ressources addition hadoop yarn cluster. purpose handle heavy peaks on our application.
i not expert , need in order approve / contest understand.
hadoop yarn
this application run in cluster-mode. provides ressource management (cpu & ram). spark application, example, ask job done. yarn handles request , provides executor computing on yarn cluster.
hdfs - data & executors
the datas not shared through executors, have stored in file system. in case : hdfs. means have run copy of spark streaming application in new server (hadoop node).
i not sure of this :
the yarn cluster , hdfs different, writing on hdfs won't write on new hadoop node local data (because not hdfs node).
as write on hdfs new data spark streaming application, creating new application should not problem.
- submit job yarn
--- peak, resources needed - instance new server
install / configure hadoop & yarn, making slave
- modifying hadoop/conf/slaves, adding it's ip adress (or dns name host file)
moddifying dfs.include , mapred.include
on host machine :
- yarn -refreshnodes
- bin/hadoop dfsadmin -refreshnodes
- bin/hadoop mradmin -refreshnodes
should work ? refreshqueues sounds not useful here seems take care of process queue.
i not sure if running job increase it's capacity. idea wait new ressources available , submit new job.
thanks help
Comments
Post a Comment