±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚcnblogs£¬Ö÷Òª½éÉÜÁËSparkµÄ¶¨Ò壬¼òµ¥µÄÁ÷³Ì£¬ÔËÐÐÁ÷³Ì¼°Õ¹Ê¾£¬¶ÔÓÚ½á¹ûµÄ·ÖÎöµÈµÈ¡£
|
|
1¡¢ SparkÔËÐмܹ¹
1.1 ÊõÓﶨÒå
lApplication£ºSpark ApplicationµÄ¸ÅÄîºÍHadoop MapReduceÖеÄÀàËÆ£¬Ö¸µÄÊÇÓû§±àдµÄSparkÓ¦ÓóÌÐò£¬°üº¬ÁËÒ»¸öDriver
¹¦ÄܵĴúÂëºÍ·Ö²¼ÔÚ¼¯ÈºÖжà¸ö½ÚµãÉÏÔËÐеÄExecutor´úÂ룻
lDriver£ºSparkÖеÄDriver¼´ÔËÐÐÉÏÊöApplicationµÄmain()º¯Êý²¢ÇÒ´´½¨SparkContext£¬ÆäÖд´½¨SparkContextµÄÄ¿µÄÊÇΪÁË×¼±¸SparkÓ¦ÓóÌÐòµÄÔËÐл·¾³¡£ÔÚSparkÖÐÓÉSparkContext¸ºÔðºÍClusterManagerͨÐÅ£¬½øÐÐ×ÊÔ´µÄÉêÇë¡¢ÈÎÎñµÄ·ÖÅäºÍ¼à¿ØµÈ£»µ±Executor²¿·ÖÔËÐÐÍê±Ïºó£¬Driver¸ºÔð½«SparkContext¹Ø±Õ¡£Í¨³£ÓÃSparkContext´ú±íDrive£»
lExecutor£ºApplicationÔËÐÐÔÚWorker ½ÚµãÉϵÄÒ»¸ö½ø³Ì£¬¸Ã½ø³Ì¸ºÔðÔËÐÐTask£¬²¢ÇÒ¸ºÔð½«Êý¾Ý´æÔÚÄÚ´æ»òÕß´ÅÅÌÉÏ£¬Ã¿¸öApplication¶¼Óи÷×Ô¶ÀÁ¢µÄÒ»ÅúExecutor¡£ÔÚSpark
on YarnģʽÏ£¬Æä½ø³ÌÃû³ÆÎªCoarseGrainedExecutorBackend£¬ÀàËÆÓÚHadoop
MapReduceÖеÄYarnChild¡£Ò»¸öCoarseGrainedExecutorBackend½ø³ÌÓÐÇÒ½öÓÐÒ»¸öexecutor¶ÔÏó£¬Ëü¸ºÔð½«Task°ü×°³ÉtaskRunner£¬²¢´ÓÏ̳߳ØÖгéÈ¡³öÒ»¸ö¿ÕÏÐÏß³ÌÔËÐÐTask¡£Ã¿¸öCoarseGrainedExecutorBackendÄܲ¢ÐÐÔËÐÐTaskµÄÊýÁ¿¾ÍÈ¡¾öÓÚ·ÖÅ䏸ËüµÄCPUµÄ¸öÊýÁË£»
lCluster Manager£ºÖ¸µÄÊÇÔÚ¼¯ÈºÉÏ»ñÈ¡×ÊÔ´µÄÍⲿ·þÎñ£¬Ä¿Ç°ÓУº
Standalone£ºSparkÔÉúµÄ×ÊÔ´¹ÜÀí£¬ÓÉMaster¸ºÔð×ÊÔ´µÄ·ÖÅ䣻 Hadoop Yarn£ºÓÉYARNÖеÄResourceManager¸ºÔð×ÊÔ´µÄ·ÖÅ䣻
Worker£º¼¯ÈºÖÐÈκοÉÒÔÔËÐÐApplication´úÂëµÄ½Úµã£¬ÀàËÆÓÚYARNÖеÄNodeManager½Úµã¡£ÔÚStandaloneģʽÖÐÖ¸µÄ¾ÍÊÇͨ¹ýSlaveÎļþÅäÖõÄWorker½Úµã£¬ÔÚSpark
on YarnģʽÖÐÖ¸µÄ¾ÍÊÇNodeManager½Úµã£»
×÷Òµ£¨Job£©£º°üº¬¶à¸öTask×é³ÉµÄ²¢ÐмÆË㣬ÍùÍùÓÉSpark Action´ßÉú£¬Ò»¸öJOB°üº¬¶à¸öRDD¼°×÷ÓÃÓÚÏàÓ¦RDDÉϵĸ÷ÖÖOperation£»
½×¶Î£¨Stage£©£ºÃ¿¸öJob»á±»²ð·ÖºÜ¶à×éTask£¬Ã¿×éÈÎÎñ±»³ÆÎªStage£¬Ò²¿É³ÆTaskSet£¬Ò»¸ö×÷Òµ·ÖΪ¶à¸ö½×¶Î£»
ÈÎÎñ£¨Task£©£º ±»Ë͵½Ä³¸öExecutorÉϵŤ×÷ÈÎÎñ£»

1.2 SparkÔËÐлù±¾Á÷³Ì
SparkÔËÐлù±¾Á÷³Ì²Î¼ûÏÂÃæÊ¾Òâͼ
1. ¹¹½¨Spark ApplicationµÄÔËÐл·¾³£¨Æô¶¯SparkContext£©£¬SparkContextÏò×ÊÔ´¹ÜÀíÆ÷£¨¿ÉÒÔÊÇStandalone¡¢Mesos»òYARN£©×¢²á²¢ÉêÇëÔËÐÐExecutor×ÊÔ´£»
2. ×ÊÔ´¹ÜÀíÆ÷·ÖÅäExecutor×ÊÔ´²¢Æô¶¯StandaloneExecutorBackend£¬ExecutorÔËÐÐÇé¿ö½«Ëæ×ÅÐÄÌø·¢Ë͵½×ÊÔ´¹ÜÀíÆ÷ÉÏ£»
3. SparkContext¹¹½¨³ÉDAGͼ£¬½«DAGͼ·Ö½â³ÉStage£¬²¢°ÑTaskset·¢Ë͸øTask
Scheduler¡£ExecutorÏòSparkContextÉêÇëTask£¬Task Scheduler½«Task·¢·Å¸øExecutorÔËÐÐͬʱSparkContext½«Ó¦ÓóÌÐò´úÂë·¢·Å¸øExecutor¡£
4. TaskÔÚExecutorÉÏÔËÐУ¬ÔËÐÐÍê±ÏÊÍ·ÅËùÓÐ×ÊÔ´¡£

SparkÔËÐмܹ¹Ìص㣺
lÿ¸öApplication»ñȡרÊôµÄexecutor½ø³Ì£¬¸Ã½ø³ÌÔÚApplicationÆÚ¼äһֱפÁô£¬²¢ÒÔ¶àÏ̷߳½Ê½ÔËÐÐtasks¡£ÕâÖÖApplication¸ôÀë»úÖÆÓÐÆäÓÅÊÆµÄ£¬ÎÞÂÛÊÇ´Óµ÷¶È½Ç¶È¿´£¨Ã¿¸öDriverµ÷¶ÈËü×Ô¼ºµÄÈÎÎñ£©£¬»¹ÊÇ´ÓÔËÐнǶȿ´£¨À´×Ô²»Í¬ApplicationµÄTaskÔËÐÐÔÚ²»Í¬µÄJVMÖУ©¡£µ±È»£¬ÕâÒ²Òâζ×ÅSpark
Application²»ÄÜ¿çÓ¦ÓóÌÐò¹²ÏíÊý¾Ý£¬³ý·Ç½«Êý¾ÝдÈëµ½Íⲿ´æ´¢ÏµÍ³¡£
lSparkÓë×ÊÔ´¹ÜÀíÆ÷Î޹أ¬Ö»ÒªÄܹ»»ñÈ¡executor½ø³Ì£¬²¢Äܱ£³ÖÏ໥ͨОͿÉÒÔÁË¡£
lÌá½»SparkContextµÄClientÓ¦¸Ã¿¿½üWorker½Úµã£¨ÔËÐÐExecutorµÄ½Úµã)£¬×îºÃÊÇÔÚͬһ¸öRackÀÒòΪSpark
ApplicationÔËÐйý³ÌÖÐSparkContextºÍExecutorÖ®¼äÓдóÁ¿µÄÐÅÏ¢½»»»£»Èç¹ûÏëÔÚÔ¶³Ì¼¯ÈºÖÐÔËÐУ¬×îºÃʹÓÃRPC½«SparkContextÌá½»¸ø¼¯Èº£¬²»ÒªÔ¶ÀëWorkerÔËÐÐSparkContext¡£
lTask²ÉÓÃÁËÊý¾Ý±¾µØÐÔºÍÍÆ²âÖ´ÐеÄÓÅ»¯»úÖÆ¡£
1.2.1 DAGScheduler
DAGScheduler°ÑÒ»¸öSpark×÷ҵת»»³ÉStageµÄDAG£¨Directed Acyclic
GraphÓÐÏòÎÞ»·Í¼£©£¬¸ù¾ÝRDDºÍStageÖ®¼äµÄ¹ØÏµÕÒ³ö¿ªÏú×îСµÄµ÷¶È·½·¨£¬È»ºó°ÑStageÒÔTaskSetµÄÐÎʽÌá½»¸øTaskScheduler£¬ÏÂͼչʾÁËDAGSchedulerµÄ×÷Óãº

1.2.2 TaskScheduler
DAGScheduler¾ö¶¨ÁËÔËÐÐTaskµÄÀíÏëλÖ㬲¢°ÑÕâЩÐÅÏ¢´«µÝ¸øÏ²ãµÄTaskScheduler¡£´ËÍ⣬DAGScheduler»¹´¦ÀíÓÉÓÚShuffleÊý¾Ý¶ªÊ§µ¼ÖµÄʧ°Ü£¬ÕâÓпÉÄÜÐèÒªÖØÐÂÌá½»ÔËÐÐ֮ǰµÄStage£¨·ÇShuffleÊý¾Ý¶ªÊ§µ¼ÖµÄTaskʧ°ÜÓÉTaskScheduler´¦Àí£©¡£
TaskSchedulerά»¤ËùÓÐTaskSet£¬µ±ExecutorÏòDriver·¢ËÍÐÄÌøÊ±£¬TaskScheduler»á¸ù¾ÝÆä×ÊÔ´Ê£ÓàÇé¿ö·ÖÅäÏàÓ¦µÄTask¡£ÁíÍâTaskScheduler»¹Î¬»¤×ÅËùÓÐTaskµÄÔËÐÐ״̬£¬ÖØÊÔʧ°ÜµÄTask¡£ÏÂͼչʾÁËTaskSchedulerµÄ×÷Óãº

ÔÚ²»Í¬ÔËÐÐģʽÖÐÈÎÎñµ÷¶ÈÆ÷¾ßÌåΪ£º
Spark on StandaloneģʽΪTaskScheduler£»
YARN-ClientģʽΪYarnClientClusterScheduler
YARN-ClusterģʽΪYarnClusterScheduler
1.3 RDDÔËÐÐÔÀí
ÄÇô RDDÔÚSpark¼Ü¹¹ÖÐÊÇÈçºÎÔËÐеÄÄØ£¿×ܸ߲ã´ÎÀ´¿´£¬Ö÷Òª·ÖΪÈý²½£º
1.´´½¨ RDD ¶ÔÏó
2.DAGSchedulerÄ£¿é½éÈëÔËË㣬¼ÆËãRDDÖ®¼äµÄÒÀÀµ¹ØÏµ¡£RDDÖ®¼äµÄÒÀÀµ¹ØÏµ¾ÍÐγÉÁËDAG
3.ÿһ¸öJOB±»·ÖΪ¶à¸öStage£¬»®·ÖStageµÄÒ»¸öÖ÷ÒªÒÀ¾ÝÊǵ±Ç°¼ÆËãÒò×ÓµÄÊäÈëÊÇ·ñÊÇÈ·¶¨µÄ£¬Èç¹ûÊÇÔò½«Æä·ÖÔÚͬһ¸öStage£¬±ÜÃâ¶à¸öStageÖ®¼äµÄÏûÏ¢´«µÝ¿ªÏú¡£

ÒÔÏÂÃæÒ»¸ö°´ A-Z Ê××Öĸ·ÖÀ࣬²éÕÒÏàͬÊ××Öĸϲ»Í¬ÐÕÃû×ܸöÊýµÄÀý×ÓÀ´¿´Ò»Ï RDD ÊÇÈçºÎÔËÐÐÆðÀ´µÄ¡£

²½Öè 1 £º´´½¨ RDD ÉÏÃæµÄÀý×Ó³ýÈ¥×îºóÒ»¸ö collect ÊǸö¶¯×÷£¬²»»á´´½¨ RDD Ö®Íâ£¬Ç°ÃæËĸöת»»¶¼»á´´½¨³öеÄ
RDD ¡£Òò´ËµÚÒ»²½¾ÍÊÇ´´½¨ºÃËùÓÐ RDD( ÄÚ²¿µÄÎåÏîÐÅÏ¢ ) ¡£
²½Öè 2 £º´´½¨Ö´Ðмƻ® Spark »á¾¡¿ÉÄܵعܵÀ»¯£¬²¢»ùÓÚÊÇ·ñÒªÖØÐÂ×éÖ¯Êý¾ÝÀ´»®·Ö ½×¶Î (stage)
£¬ÀýÈç±¾ÀýÖÐµÄ groupBy() ת»»¾Í»á½«Õû¸öÖ´Ðмƻ®»®·Ö³ÉÁ½½×¶ÎÖ´ÐС£×îÖÕ»á²úÉúÒ»¸ö DAG(directed
acyclic graph £¬ÓÐÏòÎÞ»·Í¼ ) ×÷ΪÂß¼Ö´Ðмƻ®¡£

²½Öè 3 £ºµ÷¶ÈÈÎÎñ ½«¸÷½×¶Î»®·Ö³É²»Í¬µÄ ÈÎÎñ (task) £¬Ã¿¸öÈÎÎñ¶¼ÊÇÊý¾ÝºÍ¼ÆËãµÄºÏÌå¡£ÔÚ½øÐÐÏÂÒ»½×¶Îǰ£¬µ±Ç°½×¶ÎµÄËùÓÐÈÎÎñ¶¼ÒªÖ´ÐÐÍê³É¡£ÒòΪÏÂÒ»½×¶ÎµÄµÚÒ»¸öת»»Ò»¶¨ÊÇÖØÐÂ×éÖ¯Êý¾ÝµÄ£¬ËùÒÔ±ØÐëµÈµ±Ç°½×¶ÎËùÓнá¹ûÊý¾Ý¶¼¼ÆËã³öÀ´Á˲ÅÄܼÌÐø¡£
¼ÙÉè±¾ÀýÖÐµÄ hdfs://names ÏÂÓÐËĸöÎļþ¿é£¬ÄÇô HadoopRDD ÖÐ partitions
¾Í»áÓÐËĸö·ÖÇø¶ÔÓ¦ÕâËĸö¿éÊý¾Ý£¬Í¬Ê± preferedLocations »áÖ¸Ã÷ÕâËĸö¿éµÄ×î¼ÑλÖá£ÏÖÔÚ£¬¾Í¿ÉÒÔ´´½¨³öËĸöÈÎÎñ£¬²¢µ÷¶Èµ½ºÏÊʵļ¯Èº½áµãÉÏ¡£

2¡¢SparkÔÚ²»Í¬¼¯ÈºÖеÄÔËÐмܹ¹
Spark×¢ÖØ½¨Á¢Á¼ºÃµÄÉú̬ϵͳ£¬Ëü²»½öÖ§³Ö¶àÖÖÍⲿÎļþ´æ´¢ÏµÍ³£¬ÌṩÁ˶àÖÖ¶àÑùµÄ¼¯ÈºÔËÐÐģʽ¡£²¿ÊðÔÚµ¥Ì¨»úÆ÷ÉÏʱ£¬¼È¿ÉÒÔÓñ¾µØ£¨Local£©Ä£Ê½ÔËÐУ¬Ò²¿ÉÒÔʹÓÃα·Ö²¼Ê½Ä£Ê½À´ÔËÐУ»µ±ÒÔ·Ö²¼Ê½¼¯Èº²¿ÊðµÄʱºò£¬¿ÉÒÔ¸ù¾Ý×Ô¼º¼¯ÈºµÄʵ¼ÊÇé¿öÑ¡ÔñStandaloneģʽ£¨Spark×Ô´øµÄģʽ£©¡¢YARN-Clientģʽ»òÕßYARN-Clusterģʽ¡£SparkµÄ¸÷ÖÖÔËÐÐģʽËäÈ»ÔÚÆô¶¯·½Ê½¡¢ÔËÐÐλÖᢵ÷¶È²ßÂÔÉϸ÷Óв»Í¬£¬µ«ËüÃǵÄÄ¿µÄ»ù±¾¶¼ÊÇÒ»Öµģ¬¾ÍÊÇÔÚºÏÊʵÄλÖð²È«¿É¿¿µÄ¸ù¾ÝÓû§µÄÅäÖúÍJobµÄÐèÒªÔËÐк͹ÜÀíTask¡£
2.1 Spark on StandaloneÔËÐйý³Ì
StandaloneģʽÊÇSparkʵÏÖµÄ×ÊÔ´µ÷¶È¿ò¼Ü£¬ÆäÖ÷ÒªµÄ½ÚµãÓÐClient½Úµã¡¢Master½ÚµãºÍWorker½Úµã¡£ÆäÖÐDriver¼È¿ÉÒÔÔËÐÐÔÚMaster½ÚµãÉÏÖУ¬Ò²¿ÉÒÔÔËÐÐÔÚ±¾µØClient¶Ë¡£µ±ÓÃspark-shell½»»¥Ê½¹¤¾ßÌá½»SparkµÄJobʱ£¬DriverÔÚMaster½ÚµãÉÏÔËÐУ»µ±Ê¹ÓÃspark-submit¹¤¾ßÌá½»Job»òÕßÔÚEclips¡¢IDEAµÈ¿ª·¢Æ½Ì¨ÉÏʹÓá±new
SparkConf.setManager(¡°spark://master:7077¡±)¡±·½Ê½ÔËÐÐSparkÈÎÎñʱ£¬DriverÊÇÔËÐÐÔÚ±¾µØClient¶ËÉϵġ£
ÆäÔËÐйý³ÌÈçÏ£º
1.SparkContextÁ¬½Óµ½Master£¬ÏòMaster×¢²á²¢ÉêÇë×ÊÔ´£¨CPU Core ºÍMemory£©£»
2.Master¸ù¾ÝSparkContextµÄ×ÊÔ´ÉêÇëÒªÇóºÍWorkerÐÄÌøÖÜÆÚÄÚ±¨¸æµÄÐÅÏ¢¾ö¶¨ÔÚÄĸöWorkerÉÏ·ÖÅä×ÊÔ´£¬È»ºóÔÚ¸ÃWorkerÉÏ»ñÈ¡×ÊÔ´£¬È»ºóÆô¶¯StandaloneExecutorBackend£»
3.StandaloneExecutorBackendÏòSparkContext×¢²á£»
4.SparkContext½«Applicaiton´úÂë·¢Ë͸øStandaloneExecutorBackend£»²¢ÇÒSparkContext½âÎöApplicaiton´úÂ룬¹¹½¨DAGͼ£¬²¢Ìá½»¸øDAG
Scheduler·Ö½â³ÉStage£¨µ±Åöµ½Action²Ù×÷ʱ£¬¾Í»á´ßÉúJob£»Ã¿¸öJobÖк¬ÓÐ1¸ö»ò¶à¸öStage£¬StageÒ»°ãÔÚ»ñÈ¡ÍⲿÊý¾ÝºÍshuffle֮ǰ²úÉú£©£¬È»ºóÒÔStage£¨»òÕß³ÆÎªTaskSet£©Ìá½»¸øTask
Scheduler£¬Task Scheduler¸ºÔð½«Task·ÖÅäµ½ÏàÓ¦µÄWorker£¬×îºóÌá½»¸øStandaloneExecutorBackendÖ´ÐУ»
5.StandaloneExecutorBackend»á½¨Á¢ExecutorÏ̳߳أ¬¿ªÊ¼Ö´ÐÐTask£¬²¢ÏòSparkContext±¨¸æ£¬Ö±ÖÁTaskÍê³É¡£
6.ËùÓÐTaskÍê³Éºó£¬SparkContextÏòMaster×¢Ïú£¬ÊÍ·Å×ÊÔ´¡£

2.2 Spark on YARNÔËÐйý³Ì
YARNÊÇÒ»ÖÖͳһ×ÊÔ´¹ÜÀí»úÖÆ£¬ÔÚÆäÉÏÃæ¿ÉÒÔÔËÐжàÌ×¼ÆËã¿ò¼Ü¡£Ä¿Ç°µÄ´óÊý¾Ý¼¼ÊõÊÀ½ç£¬´ó¶àÊý¹«Ë¾³ýÁËʹÓÃSparkÀ´½øÐÐÊý¾Ý¼ÆË㣬ÓÉÓÚÀúÊ·ÔÒò»òÕßµ¥·½ÃæÒµÎñ´¦ÀíµÄÐÔÄÜ¿¼ÂǶøÊ¹ÓÃ×ÅÆäËûµÄ¼ÆËã¿ò¼Ü£¬±ÈÈçMapReduce¡¢StormµÈ¼ÆËã¿ò¼Ü¡£Spark»ùÓÚ´ËÖÖÇé¿ö¿ª·¢ÁËSpark
on YARNµÄÔËÐÐģʽ£¬ÓÉÓÚ½èÖúÁËYARNÁ¼ºÃµÄµ¯ÐÔ×ÊÔ´¹ÜÀí»úÖÆ£¬²»½ö²¿ÊðApplication¸ü¼Ó·½±ã£¬¶øÇÒÓû§ÔÚYARN¼¯ÈºÖÐÔËÐеķþÎñºÍApplicationµÄ×ÊÔ´Ò²ÍêÈ«¸ôÀ룬¸ü¾ßʵ¼ùÓ¦ÓüÛÖµµÄÊÇYARN¿ÉÒÔͨ¹ý¶ÓÁеķ½Ê½£¬¹ÜÀíͬʱÔËÐÐÔÚ¼¯ÈºÖеĶà¸ö·þÎñ¡£
Spark on YARNģʽ¸ù¾ÝDriverÔÚ¼¯ÈºÖеÄλÖ÷ÖΪÁ½ÖÖģʽ£ºÒ»ÖÖÊÇYARN-Clientģʽ£¬ÁíÒ»ÖÖÊÇYARN-Cluster£¨»ò³ÆÎªYARN-Standaloneģʽ£©¡£
2.2.1 YARN¿ò¼ÜÁ÷³Ì
Èκοò¼ÜÓëYARNµÄ½áºÏ£¬¶¼±ØÐë×ñÑYARNµÄ¿ª·¢Ä£Ê½¡£ÔÚ·ÖÎöSpark on YARNµÄʵÏÖϸ½Ú֮ǰ£¬ÓбØÒªÏÈ·ÖÎöÒ»ÏÂYARN¿ò¼ÜµÄһЩ»ù±¾ÔÀí¡£
Yarn¿ò¼ÜµÄ»ù±¾ÔËÐÐÁ÷³ÌͼΪ£º

ÆäÖУ¬ResourceManager¸ºÔ𽫼¯ÈºµÄ×ÊÔ´·ÖÅ䏸¸÷¸öÓ¦ÓÃʹÓ㬶ø×ÊÔ´·ÖÅäºÍµ÷¶ÈµÄ»ù±¾µ¥Î»ÊÇContainer£¬ÆäÖзâ×°ÁË»úÆ÷×ÊÔ´£¬ÈçÄÚ´æ¡¢CPU¡¢´ÅÅ̺ÍÍøÂçµÈ£¬Ã¿¸öÈÎÎñ»á±»·ÖÅäÒ»¸öContainer£¬¸ÃÈÎÎñÖ»ÄÜÔÚ¸ÃContainerÖÐÖ´ÐУ¬²¢Ê¹ÓøÃContainer·â×°µÄ×ÊÔ´¡£NodeManagerÊÇÒ»¸ö¸öµÄ¼ÆËã½Úµã£¬Ö÷Òª¸ºÔðÆô¶¯ApplicationËùÐèµÄContainer£¬¼à¿Ø×ÊÔ´£¨ÄÚ´æ¡¢CPU¡¢´ÅÅ̺ÍÍøÂçµÈ£©µÄʹÓÃÇé¿ö²¢½«Ö®»ã±¨¸øResourceManager¡£ResourceManagerÓëNodeManagers¹²Í¬×é³ÉÕû¸öÊý¾Ý¼ÆËã¿ò¼Ü£¬ApplicationMasterÓë¾ßÌåµÄApplicationÏà¹Ø£¬Ö÷Òª¸ºÔðͬResourceManagerÐÉÌÒÔ»ñÈ¡ºÏÊʵÄContainer£¬²¢¸ú×ÙÕâЩContainerµÄ״̬ºÍ¼à¿ØÆä½ø¶È¡£
2.2.2 YARN-Client
Yarn-ClientģʽÖУ¬DriverÔÚ¿Í»§¶Ë±¾µØÔËÐУ¬ÕâÖÖģʽ¿ÉÒÔʹµÃSpark ApplicationºÍ¿Í»§¶Ë½øÐн»»¥£¬ÒòΪDriverÔÚ¿Í»§¶Ë£¬ËùÒÔ¿ÉÒÔͨ¹ýwebUI·ÃÎÊDriverµÄ״̬£¬Ä¬ÈÏÊÇhttp://hadoop1:4040·ÃÎÊ£¬¶øYARNͨ¹ýhttp://
hadoop1:8088·ÃÎÊ¡£
YARN-clientµÄ¹¤×÷Á÷³Ì·ÖΪÒÔϼ¸¸ö²½Ö裺

1.Spark Yarn ClientÏòYARNµÄResourceManagerÉêÇëÆô¶¯Application
Master¡£Í¬Ê±ÔÚSparkContent³õʼ»¯Öн«´´½¨DAGSchedulerºÍTASKSchedulerµÈ£¬ÓÉÓÚÎÒÃÇÑ¡ÔñµÄÊÇYarn-Clientģʽ£¬³ÌÐò»áÑ¡ÔñYarnClientClusterSchedulerºÍYarnClientSchedulerBackend£»
2.ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÓëYARN-ClusterÇø±ðµÄÊÇÔÚ¸ÃApplicationMaster²»ÔËÐÐSparkContext£¬Ö»ÓëSparkContext½øÐÐÁªÏµ½øÐÐ×ÊÔ´µÄ·ÖÅÉ£»
3.ClientÖеÄSparkContext³õʼ»¯Íê±Ïºó£¬ÓëApplicationMaster½¨Á¢Í¨Ñ¶£¬ÏòResourceManager×¢²á£¬¸ù¾ÝÈÎÎñÐÅÏ¢ÏòResourceManagerÉêÇë×ÊÔ´£¨Container£©£»
4.Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯Æô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòClientÖеÄSparkContext×¢²á²¢ÉêÇëTask£»
5.ClientÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòDriver»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃClientËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ£»
6.Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ClientµÄSparkContextÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º¡£
2.2.3 YARN-Cluster
ÔÚYARN-ClusterģʽÖУ¬µ±Óû§ÏòYARNÖÐÌá½»Ò»¸öÓ¦ÓóÌÐòºó£¬YARN½«·ÖÁ½¸ö½×¶ÎÔËÐиÃÓ¦ÓóÌÐò£ºµÚÒ»¸ö½×¶ÎÊǰÑSparkµÄDriver×÷Ϊһ¸öApplicationMasterÔÚYARN¼¯ÈºÖÐÏÈÆô¶¯£»µÚ¶þ¸ö½×¶ÎÊÇÓÉApplicationMaster´´½¨Ó¦ÓóÌÐò£¬È»ºóΪËüÏòResourceManagerÉêÇë×ÊÔ´£¬²¢Æô¶¯ExecutorÀ´ÔËÐÐTask£¬Í¬Ê±¼à¿ØËüµÄÕû¸öÔËÐйý³Ì£¬Ö±µ½ÔËÐÐÍê³É¡£
YARN-clusterµÄ¹¤×÷Á÷³Ì·ÖΪÒÔϼ¸¸ö²½Ö裺

1. Spark Yarn ClientÏòYARNÖÐÌá½»Ó¦ÓóÌÐò£¬°üÀ¨ApplicationMaster³ÌÐò¡¢Æô¶¯ApplicationMasterµÄÃüÁî¡¢ÐèÒªÔÚExecutorÖÐÔËÐеijÌÐòµÈ£»
2. ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÆäÖÐApplicationMaster½øÐÐSparkContextµÈµÄ³õʼ»¯£»
3. ApplicationMasterÏòResourceManager×¢²á£¬ÕâÑùÓû§¿ÉÒÔÖ±½Óͨ¹ýResourceManage²é¿´Ó¦ÓóÌÐòµÄÔËÐÐ״̬£¬È»ºóËü½«²ÉÓÃÂÖѯµÄ·½Ê½Í¨¹ýRPCÐÒéΪ¸÷¸öÈÎÎñÉêÇë×ÊÔ´£¬²¢¼à¿ØËüÃǵÄÔËÐÐ״ֱ̬µ½ÔËÐнáÊø£»
4. Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯Æô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòApplicationMasterÖеÄSparkContext×¢²á²¢ÉêÇëTask¡£ÕâÒ»µãºÍStandaloneģʽһÑù£¬Ö»²»¹ýSparkContextÔÚSpark
ApplicationÖгõʼ»¯Ê±£¬Ê¹ÓÃCoarseGrainedSchedulerBackendÅäºÏYarnClusterScheduler½øÐÐÈÎÎñµÄµ÷¶È£¬ÆäÖÐYarnClusterSchedulerÖ»ÊǶÔTaskSchedulerImplµÄÒ»¸ö¼òµ¥°ü×°£¬Ôö¼ÓÁ˶ÔExecutorµÄµÈ´ýÂß¼µÈ£»
5. ApplicationMasterÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòApplicationMaster»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃApplicationMasterËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ£»
6. Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ApplicationMasterÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º¡£
2.2.4 YARN-Client Óë YARN-Cluster Çø±ð
Àí½âYARN-ClientºÍYARN-ClusterÉî²ã´ÎµÄÇø±ð֮ǰÏÈÇå³þÒ»¸ö¸ÅÄApplication
Master¡£ÔÚYARNÖУ¬Ã¿¸öApplicationʵÀý¶¼ÓÐÒ»¸öApplicationMaster½ø³Ì£¬ËüÊÇApplicationÆô¶¯µÄµÚÒ»¸öÈÝÆ÷¡£Ëü¸ºÔðºÍResourceManager´ò½»µÀ²¢ÇëÇó×ÊÔ´£¬»ñÈ¡×ÊÔ´Ö®ºó¸æËßNodeManagerΪÆäÆô¶¯Container¡£´ÓÉî²ã´ÎµÄº¬Òå½²YARN-ClusterºÍYARN-ClientģʽµÄÇø±ðÆäʵ¾ÍÊÇApplicationMaster½ø³ÌµÄÇø±ð¡£
YARN-ClusterģʽÏ£¬DriverÔËÐÐÔÚAM(Application
Master)ÖУ¬Ëü¸ºÔðÏòYARNÉêÇë×ÊÔ´£¬²¢¼à¶½×÷ÒµµÄÔËÐÐ×´¿ö¡£µ±Óû§Ìá½»ÁË×÷ÒµÖ®ºó£¬¾Í¿ÉÒԹصôClient£¬×÷Òµ»á¼ÌÐøÔÚYARNÉÏÔËÐУ¬Òò¶øYARN-Clusterģʽ²»ÊʺÏÔËÐн»»¥ÀàÐ͵Ä×÷Òµ£»
YARN-ClientģʽÏ£¬Application Master½ö½öÏòYARNÇëÇóExecutor£¬Client»áºÍÇëÇóµÄContainerͨÐÅÀ´µ÷¶ÈËûÃǹ¤×÷£¬Ò²¾ÍÊÇ˵Client²»ÄÜÀ뿪¡£


3¡¢SparkÔÚ²»Í¬¼¯ÈºÖеÄÔËÐÐÑÝʾ
ÔÚÒÔÏÂÔËÐÐÑÝʾ¹ý³ÌÖÐÐèÒªÆô¶¯HadoopºÍSpark¼¯Èº£¬ÆäÖÐHadoopÐèÒªÆô¶¯HDFSºÍYARN£¬Æô¶¯¹ý³Ì¿ÉÒԲμûµÚÈý½Ú¡¶Spark±à³ÌÄ£ÐÍ£¨ÉÏ£©--¸ÅÄî¼°ShellÊÔÑé¡·¡£
3.1 StandaloneÔËÐйý³ÌÑÝʾ
ÔÚSpark¼¯ÈºµÄ½ÚµãÖУ¬40%µÄÊý¾ÝÓÃÓÚ¼ÆË㣬60%µÄÄÚ´æÓÃÓÚ±£´æ½á¹û£¬ÎªÁËÄܹ»Ö±¹Û¸ÐÊÜÊý¾ÝÔÚÄÚ´æºÍ·ÇÄÚ´æËٶȵÄÇø±ð£¬ÔÚ¸ÃÑÝʾÖн«Ê¹ÓôóСΪ1GµÄSogou3.txtÊý¾ÝÎļþ£¨²Î¼ûµÚÈý½Ú¡¶Spark±à³ÌÄ£ÐÍ£¨ÉÏ£©--¸ÅÄî¼°ShellÊÔÑé¡·µÄ3.2²âÊÔÊý¾ÝÎļþÉÏ´«£©£¬Í¨¹ý¶Ô±ÈµÃµ½²î¾à¡£
3.1.1 ²é¿´²âÊÔÎļþ´æ·ÅλÖÃ
ʹÓÃHDFSÃüÁî¹Û²ìSogou3.txtÊý¾Ý´æ·Å½ÚµãµÄλÖÃ
$cd /app/hadoop/hadoop-2.2.0/bin
$hdfs fsck /sogou/SogouQ3.txt -files
-blocks -locations
ͨ¹ý¿ÉÒÔ¿´µ½¸ÃÎļþ±»·Ö¸ôΪ9¸ö¿é·ÅÔÚ¼¯ÈºÖÐ

3.1.2Æô¶¯Spark-Shell
ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖÐÿ¸öExecutor·ÖÅä1GÄÚ´æ
$cd /app/hadoop/spark-1.1.0/bin
$./spark-shell --master spark://hadoop1:7077 --executor-memory
1g
ͨ¹ýSparkµÄ¼à¿Ø½çÃæ²é¿´ExecutorsµÄÇé¿ö£¬¿ÉÒԹ۲쵽ÓÐ1¸öDriver ºÍ3¸öExecutor£¬ÆäÖÐhadoop2ºÍhadoop3Æô¶¯Ò»¸öExecutor£¬¶øhadoop1Æô¶¯Ò»¸öExecutorºÍDriver¡£ÔÚ¸ÃģʽÏÂDriverÖÐÔËÐÐSparkContect£¬Ò²¾ÍÊÇDAGShedulerºÍTaskShedulerµÈ½ø³ÌÊÇÔËÐÐÔÚ½ÚµãÉÏ£¬½øÐÐStageºÍTaskµÄ·ÖÅäºÍ¹ÜÀí¡£

3.1.3ÔËÐйý³Ì¼°½á¹û·ÖÎö
µÚÒ»²½ ¶ÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬²¢¼ÆËã¹ý³ÌÖÐʹÓÃcache()·½·¨¶ÔÊý¾Ý¼¯½øÐлº´æ
val sogou=sc.textFile ("hdfs://hadoop1: 9000/sogou/ SogouQ3.txt")
sogou.cache()
sogou.count()
ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐÒ»¸öÈÎÎñµÄÊý¾ÝÀ´Ô´ÓÚÁ½¸öÊý¾Ý·ÖƬ£¬ÆäËûµÄÈÎÎñ¸÷¶ÔÓ¦Ò»¸öÊý¾Ý·ÖƬ£¬¼´ÏÔʾ7¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪ£¨NODE_LOCAL£©£¬1¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪÈκÎλÖã¨ANY£©¡£

ÔÚ´æ´¢¼à¿Ø½çÃæÖУ¬ÎÒÃÇ¿ÉÒÔ¿´µ½»º´æ·ÝÊýΪ3£¬´óСΪ907.1M£¬»º´æÂÊΪ38%

ÔËÐнá¹ûµÃµ½Êý¾Ý¼¯µÄÊýÁ¿Îª1000Íò±ÊÊý¾Ý£¬×ܹ²»¨·ÑÁË352.17Ãë

µÚ¶þ²½ ÔٴζÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬´Ë´Î¼ÆËãʹÓûº´æµÄÊý¾Ý£¬¶Ô±Èǰºó
ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ»¹ÊÇ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐ3¸öÈÎÎñÊý¾ÝÀ´×ÔÄڴ棨PROCESS_LOCAL£©£¬3¸öÈÎÎñÊý¾ÝÀ´×Ô±¾»ú£¨NODE_LOCAL£©£¬ÆäËû2¸öÈÎÎñÊý¾ÝÀ´×ÔÈκÎλÖã¨ANY£©¡£ÈÎÎñËùºÄ·ÑµÄʱ¼ä¶àÉÙÅÅÐòΪ£ºANY>
NODE_LOCAL> PROCESS_LOCAL£¬¶Ô±È¿´³öʹÓÃÄÚ´æµÄÊý¾Ý±ÈʹÓñ¾»ú»òÈκÎλÖõÄËÙ¶ÈÖÁÉÙ»á¿ì2¸öÊýÁ¿¼¶¡£

Õû¸ö×÷ÒµµÄÔËÐÐËÙ¶ÈΪ34.14Ã룬±ÈûÓлº´æÌá¸ßÁËÒ»¸öÊýÁ¿¼¶¡£ÓÉÓÚ¸Õ²ÅÀý×ÓÖÐÊý¾ÝÖ»ÊDz¿·Ö»º´æ£¨»º´æÂÊ38%£©£¬Èç¹ûÍêÈ«»º´æËÙ¶ÈÄܹ»µÃµ½½øÒ»²½ÌáÉý£¬´ÓÕâÌåÑéµ½Spark·Ç³£ºÄÄڴ棬²»¹ýÒ²¹»¿ì¡¢¹»·æÀû£¡

3.2 YARN-ClientÔËÐйý³ÌÑÝʾ
3.2.1 Æô¶¯Spark-Shell
ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖзÖÅä3¸öExecutor¡¢Ã¿¸öExecutorΪ1GÄÚ´æ
$cd /app/hadoop/spark -1.1.0/bin
$./spark-shell --master YARN-client
--num-executors 3 --executor-memory 1g
µÚÒ»²½ °ÑÏà¹ØµÄÔËÐÐJAR°üÉÏ´«µ½HDFSÖÐ

ͨ¹ýHDFS²é¿´½çÃæ¿ÉÒÔ¿´µ½ÔÚ /user/hadoop/.sparkStaging/Ó¦ÓñàºÅ£¬²é¿´µ½ÕâЩÎļþ£º

µÚ¶þ²½ Æô¶¯Application Master£¬×¢²áExecutor
Ó¦ÓóÌÐòÏòResourceManagerÉêÇëÆô¶¯Application Master£¬ÔÚÆô¶¯Íê³Éºó»á·ÖÅäCotainer²¢°ÑÕâЩÐÅÏ¢·´À¡¸øSparkContext£¬SparkContextºÍÏà¹ØµÄNMͨѶ£¬ÔÚ»ñµÃµÄContainerÉÏÆô¶¯Executor£¬´ÓÏÂͼ¿ÉÒÔ¿´µ½ÔÚhadoop1¡¢hadoop2ºÍhadoop3·Ö±ðÆô¶¯ÁËExecutor

µÚÈý²½ ²é¿´Æô¶¯½á¹û
YARN-ClientģʽÖУ¬DriverÔÚ¿Í»§¶Ë±¾µØÔËÐУ¬ÕâÖÖģʽ¿ÉÒÔʹµÃSpark ApplicationºÍ¿Í»§¶Ë½øÐн»»¥£¬ÒòΪDriverÔÚ¿Í»§¶ËËùÒÔ¿ÉÒÔͨ¹ýwebUI·ÃÎÊDriverµÄ״̬£¬Ä¬ÈÏÊÇhttp://hadoop1:4040·ÃÎÊ£¬¶øYARNͨ¹ýhttp://
hadoop1:8088·ÃÎÊ¡£


3.2.2 ÔËÐйý³Ì¼°½á¹û·ÖÎö
µÚÒ»²½ ¶ÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬²¢¼ÆËã¹ý³ÌÖÐʹÓÃcache()·½·¨¶ÔÊý¾Ý¼¯½øÐлº´æ
val sogou=sc.textFile ("hdfs://hadoop1:9000/ sogou/SogouQ3.txt")
sogou.cache()
sogou.count()
ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐÒ»¸öÈÎÎñµÄÊý¾ÝÀ´Ô´ÓÚÁ½¸öÊý¾Ý·ÖƬ£¬ÆäËûµÄÈÎÎñ¸÷¶ÔÓ¦Ò»¸öÊý¾Ý·ÖƬ£¬¼´ÏÔʾ7¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪ£¨NODE_LOCAL£©£¬1¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪÈκÎλÖã¨RACK_LOCAL£©¡£

ͨ¹ýÔËÐÐÈÕÖ¾¿ÉÒԹ۲쵽ÔÚËùÓÐÈÎÎñ½áÊøµÄʱºò£¬ÓÉ YARNClientScheduler֪ͨYARN¼¯ÈºÈÎÎñÔËÐÐÍê±Ï£¬»ØÊÕ×ÊÔ´£¬×îÖչرÕSparkContext£¬Õû¸ö¹ý³ÌºÄ·Ñ108.6Ãë¡£

µÚ¶þ²½ ²é¿´Êý¾Ý»º´æÇé¿ö
ͨ¹ý¼à¿Ø½çÃæ¿ÉÒÔ¿´µ½£¬ºÍStandaloneÒ»Ñù38%µÄÊý¾ÝÒѾ»º´æÔÚÄÚ´æÖÐ

µÚÈý²½ ÔٴζÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬´Ë´Î¼ÆËãʹÓûº´æµÄÊý¾Ý£¬¶Ô±Èǰºó
sogou.count()
ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ»¹ÊÇ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐ3¸öÈÎÎñÊý¾ÝÀ´×ÔÄڴ棨PROCESS_LOCAL£©£¬4¸öÈÎÎñÊý¾ÝÀ´×Ô±¾»ú£¨NODE_LOCAL£©£¬1¸öÈÎÎñÊý¾ÝÀ´×Ô»ú¼Ü£¨RACK_LOCAL£©¡£¶Ô±ÈÔÚÄÚ´æÖеÄÔËÐÐËÙ¶È×î¿ì£¬ËٶȱÈÔÚ±¾»úÒª¿ìÖÁÉÙ1¸öÊýÁ¿¼¶¡£

YARNClientClusterSchedulerÌæ´úÁËStandaloneģʽϵÃTaskScheduler½øÐÐÈÎÎñ¹ÜÀí£¬ÔÚÈÎÎñ½áÊøºó֪ͨYARN¼¯Èº½øÐÐ×ÊÔ´µÄ»ØÊÕ£¬×îºó¹Ø±ÕSparkContect¡£²¿·Ö»º´æÊý¾ÝÔËÐйý³ÌºÄ·ÑÁË29.77Ã룬±ÈûÓлº´æËÙ¶ÈÌáÉý²»ÉÙ¡£

3.3 YARN-ClusterÔËÐйý³ÌÑÝʾ
3.3.1 ÔËÐгÌÐò
ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖзÖÅä3¸öExecutor¡¢Ã¿¸öExecutorΪ512MÄÚ´æ
$cd /app/hadoop/spark-1.1.0
$./bin/spark-submit --master YARN-cluster --class
class3.SogouResult --executor-memory 512m LearnSpark.jar
hdfs://hadoop1:9000 /sogou/SogouQ3.txt hdfs://hadoop1:9000/ class3/output2
µÚÒ»²½ °ÑÏà¹ØµÄ×ÊÔ´ÉÏ´«µ½HDFSÖУ¬Ïà¶ÔÓÚYARN-Client¶àÁËLearnSpark.jarÎļþ

ÕâЩÎļþ¿ÉÒÔÔÚHDFSÖÐÕÒµ½£¬¾ßÌå·¾¶Îª http://hadoop1:9000
/user/hadoop/.sparkStaging/Ó¦ÓñàºÅ £º

µÚ¶þ²½ YARN¼¯Èº½Ó¹ÜÔËÐÐ
Ê×ÏÈYARN¼¯ÈºÖÐÓÉResourceManager·ÖÅäContainerÆô¶¯SparkContext£¬²¢·ÖÅäÔËÐнڵ㣬ÓÉSparkConextºÍNM½øÐÐͨѶ£¬»ñÈ¡ContainerÆô¶¯Executor£¬È»ºóÓÉSparkContextµÄYarnClusterScheduler½øÐÐÈÎÎñµÄ·Ö·¢ºÍ¼à¿Ø£¬×îÖÕÔÚÈÎÎñÖ´ÐÐÍê±ÏʱÓÉYarnClusterScheduler֪ͨResourceManager½øÐÐ×ÊÔ´µÄ»ØÊÕ¡£

3.3.2 ÔËÐнá¹û
ÔÚYARN-ClusterģʽÖÐÃüÁî½çÃæÖ»¸ºÔðÓ¦ÓõÄÌá½»£¬SparkContextºÍ×÷ÒµÔËÐоùÔÚYARN¼¯ÈºÖУ¬¿ÉÒÔ´Óhttp://
hadoop1:8088²é¿´µ½¾ßÌåÔËÐйý³Ì£¬ÔËÐнá¹ûÊä³öµ½HDFSÖУ¬ÈçÏÂͼËùʾ£º

4¡¢ÎÊÌâ½â¾ö
4.1 YARN-ClientÆô¶¯±¨´í
ÔÚ½øÐÐHadoop2.X 64bit±àÒë°²×°ÖÐÓÉÓÚʹÓõ½64λÐéÄâ»ú£¬°²×°¹ý³ÌÖгöÏÖÏÂͼ´íÎó£º
[hadoop@hadoop1 spark-1.1.0]$ bin/spark-shell --master
YARN-client --executor-memory 1g --num-executors 3
Spark assembly has been built with Hive, including
Datanucleus jars on classpath
Exception in thread " main" java.lang.Exception:
When running with master 'YARN-client' either HADOOP_CONF_DIR
or YARN_CONF_DIR must be set in the environment.
at org.apache.spark.deploy .SparkSubmitArguments.checkRequiredArguments (SparkSubmitArguments.scala:182)
at org.apache.spark.deploy .SparkSubmitArguments.<init> (SparkSubmitArguments.scala:62)
at org.apache.spark.deploy .SparkSubmit$.main (SparkSubmit.scala:70)
at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala)

|