Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
SparkÔËÐмܹ¹
 
  4706  次浏览      32
 2017-12-27  
 
±à¼­ÍƼö:

±¾ÎÄÀ´×ÔÓÚcnblogs£¬Ö÷Òª½éÉÜÁËSparkµÄ¶¨Ò壬¼òµ¥µÄÁ÷³Ì£¬ÔËÐÐÁ÷³Ì¼°Õ¹Ê¾£¬¶ÔÓÚ½á¹ûµÄ·ÖÎöµÈµÈ¡£

1¡¢ SparkÔËÐмܹ¹

1.1 ÊõÓﶨÒå

lApplication£ºSpark ApplicationµÄ¸ÅÄîºÍHadoop MapReduceÖеÄÀàËÆ£¬Ö¸µÄÊÇÓû§±àдµÄSparkÓ¦ÓóÌÐò£¬°üº¬ÁËÒ»¸öDriver ¹¦ÄܵĴúÂëºÍ·Ö²¼ÔÚ¼¯ÈºÖжà¸ö½ÚµãÉÏÔËÐеÄExecutor´úÂ룻

lDriver£ºSparkÖеÄDriver¼´ÔËÐÐÉÏÊöApplicationµÄmain()º¯Êý²¢ÇÒ´´½¨SparkContext£¬ÆäÖд´½¨SparkContextµÄÄ¿µÄÊÇΪÁË×¼±¸SparkÓ¦ÓóÌÐòµÄÔËÐл·¾³¡£ÔÚSparkÖÐÓÉSparkContext¸ºÔðºÍClusterManagerͨÐÅ£¬½øÐÐ×ÊÔ´µÄÉêÇë¡¢ÈÎÎñµÄ·ÖÅäºÍ¼à¿ØµÈ£»µ±Executor²¿·ÖÔËÐÐÍê±Ïºó£¬Driver¸ºÔð½«SparkContext¹Ø±Õ¡£Í¨³£ÓÃSparkContext´ú±íDrive£»

lExecutor£ºApplicationÔËÐÐÔÚWorker ½ÚµãÉϵÄÒ»¸ö½ø³Ì£¬¸Ã½ø³Ì¸ºÔðÔËÐÐTask£¬²¢ÇÒ¸ºÔð½«Êý¾Ý´æÔÚÄÚ´æ»òÕß´ÅÅÌÉÏ£¬Ã¿¸öApplication¶¼Óи÷×Ô¶ÀÁ¢µÄÒ»ÅúExecutor¡£ÔÚSpark on YarnģʽÏ£¬Æä½ø³ÌÃû³ÆÎªCoarseGrainedExecutorBackend£¬ÀàËÆÓÚHadoop MapReduceÖеÄYarnChild¡£Ò»¸öCoarseGrainedExecutorBackend½ø³ÌÓÐÇÒ½öÓÐÒ»¸öexecutor¶ÔÏó£¬Ëü¸ºÔð½«Task°ü×°³ÉtaskRunner£¬²¢´ÓÏ̳߳ØÖгéÈ¡³öÒ»¸ö¿ÕÏÐÏß³ÌÔËÐÐTask¡£Ã¿¸öCoarseGrainedExecutorBackendÄܲ¢ÐÐÔËÐÐTaskµÄÊýÁ¿¾ÍÈ¡¾öÓÚ·ÖÅ䏸ËüµÄCPUµÄ¸öÊýÁË£»

lCluster Manager£ºÖ¸µÄÊÇÔÚ¼¯ÈºÉÏ»ñÈ¡×ÊÔ´µÄÍⲿ·þÎñ£¬Ä¿Ç°ÓУº

Standalone£ºSparkÔ­ÉúµÄ×ÊÔ´¹ÜÀí£¬ÓÉMaster¸ºÔð×ÊÔ´µÄ·ÖÅ䣻

Hadoop Yarn£ºÓÉYARNÖеÄResourceManager¸ºÔð×ÊÔ´µÄ·ÖÅ䣻

Worker£º¼¯ÈºÖÐÈκοÉÒÔÔËÐÐApplication´úÂëµÄ½Úµã£¬ÀàËÆÓÚYARNÖеÄNodeManager½Úµã¡£ÔÚStandaloneģʽÖÐÖ¸µÄ¾ÍÊÇͨ¹ýSlaveÎļþÅäÖõÄWorker½Úµã£¬ÔÚSpark on YarnģʽÖÐÖ¸µÄ¾ÍÊÇNodeManager½Úµã£»

×÷Òµ£¨Job£©£º°üº¬¶à¸öTask×é³ÉµÄ²¢ÐмÆË㣬ÍùÍùÓÉSpark Action´ßÉú£¬Ò»¸öJOB°üº¬¶à¸öRDD¼°×÷ÓÃÓÚÏàÓ¦RDDÉϵĸ÷ÖÖOperation£»

½×¶Î£¨Stage£©£ºÃ¿¸öJob»á±»²ð·ÖºÜ¶à×éTask£¬Ã¿×éÈÎÎñ±»³ÆÎªStage£¬Ò²¿É³ÆTaskSet£¬Ò»¸ö×÷Òµ·ÖΪ¶à¸ö½×¶Î£»

ÈÎÎñ£¨Task£©£º ±»Ë͵½Ä³¸öExecutorÉϵŤ×÷ÈÎÎñ£»

1.2 SparkÔËÐлù±¾Á÷³Ì

SparkÔËÐлù±¾Á÷³Ì²Î¼ûÏÂÃæÊ¾Òâͼ

1. ¹¹½¨Spark ApplicationµÄÔËÐл·¾³£¨Æô¶¯SparkContext£©£¬SparkContextÏò×ÊÔ´¹ÜÀíÆ÷£¨¿ÉÒÔÊÇStandalone¡¢Mesos»òYARN£©×¢²á²¢ÉêÇëÔËÐÐExecutor×ÊÔ´£»

2. ×ÊÔ´¹ÜÀíÆ÷·ÖÅäExecutor×ÊÔ´²¢Æô¶¯StandaloneExecutorBackend£¬ExecutorÔËÐÐÇé¿ö½«Ëæ×ÅÐÄÌø·¢Ë͵½×ÊÔ´¹ÜÀíÆ÷ÉÏ£»

3. SparkContext¹¹½¨³ÉDAGͼ£¬½«DAGͼ·Ö½â³ÉStage£¬²¢°ÑTaskset·¢Ë͸øTask Scheduler¡£ExecutorÏòSparkContextÉêÇëTask£¬Task Scheduler½«Task·¢·Å¸øExecutorÔËÐÐͬʱSparkContext½«Ó¦ÓóÌÐò´úÂë·¢·Å¸øExecutor¡£

4. TaskÔÚExecutorÉÏÔËÐУ¬ÔËÐÐÍê±ÏÊÍ·ÅËùÓÐ×ÊÔ´¡£

SparkÔËÐмܹ¹Ìص㣺

lÿ¸öApplication»ñȡרÊôµÄexecutor½ø³Ì£¬¸Ã½ø³ÌÔÚApplicationÆÚ¼äһֱפÁô£¬²¢ÒÔ¶àÏ̷߳½Ê½ÔËÐÐtasks¡£ÕâÖÖApplication¸ôÀë»úÖÆÓÐÆäÓÅÊÆµÄ£¬ÎÞÂÛÊÇ´Óµ÷¶È½Ç¶È¿´£¨Ã¿¸öDriverµ÷¶ÈËü×Ô¼ºµÄÈÎÎñ£©£¬»¹ÊÇ´ÓÔËÐнǶȿ´£¨À´×Ô²»Í¬ApplicationµÄTaskÔËÐÐÔÚ²»Í¬µÄJVMÖУ©¡£µ±È»£¬ÕâÒ²Òâζ×ÅSpark Application²»ÄÜ¿çÓ¦ÓóÌÐò¹²ÏíÊý¾Ý£¬³ý·Ç½«Êý¾ÝдÈëµ½Íⲿ´æ´¢ÏµÍ³¡£

lSparkÓë×ÊÔ´¹ÜÀíÆ÷Î޹أ¬Ö»ÒªÄܹ»»ñÈ¡executor½ø³Ì£¬²¢Äܱ£³ÖÏ໥ͨОͿÉÒÔÁË¡£

lÌá½»SparkContextµÄClientÓ¦¸Ã¿¿½üWorker½Úµã£¨ÔËÐÐExecutorµÄ½Úµã)£¬×îºÃÊÇÔÚͬһ¸öRackÀÒòΪSpark ApplicationÔËÐйý³ÌÖÐSparkContextºÍExecutorÖ®¼äÓдóÁ¿µÄÐÅÏ¢½»»»£»Èç¹ûÏëÔÚÔ¶³Ì¼¯ÈºÖÐÔËÐУ¬×îºÃʹÓÃRPC½«SparkContextÌá½»¸ø¼¯Èº£¬²»ÒªÔ¶ÀëWorkerÔËÐÐSparkContext¡£

lTask²ÉÓÃÁËÊý¾Ý±¾µØÐÔºÍÍÆ²âÖ´ÐеÄÓÅ»¯»úÖÆ¡£

1.2.1 DAGScheduler

DAGScheduler°ÑÒ»¸öSpark×÷ҵת»»³ÉStageµÄDAG£¨Directed Acyclic GraphÓÐÏòÎÞ»·Í¼£©£¬¸ù¾ÝRDDºÍStageÖ®¼äµÄ¹ØÏµÕÒ³ö¿ªÏú×îСµÄµ÷¶È·½·¨£¬È»ºó°ÑStageÒÔTaskSetµÄÐÎʽÌá½»¸øTaskScheduler£¬ÏÂͼչʾÁËDAGSchedulerµÄ×÷Óãº

1.2.2 TaskScheduler

DAGScheduler¾ö¶¨ÁËÔËÐÐTaskµÄÀíÏëλÖ㬲¢°ÑÕâЩÐÅÏ¢´«µÝ¸øÏ²ãµÄTaskScheduler¡£´ËÍ⣬DAGScheduler»¹´¦ÀíÓÉÓÚShuffleÊý¾Ý¶ªÊ§µ¼ÖµÄʧ°Ü£¬ÕâÓпÉÄÜÐèÒªÖØÐÂÌá½»ÔËÐÐ֮ǰµÄStage£¨·ÇShuffleÊý¾Ý¶ªÊ§µ¼ÖµÄTaskʧ°ÜÓÉTaskScheduler´¦Àí£©¡£

TaskSchedulerά»¤ËùÓÐTaskSet£¬µ±ExecutorÏòDriver·¢ËÍÐÄÌøÊ±£¬TaskScheduler»á¸ù¾ÝÆä×ÊÔ´Ê£ÓàÇé¿ö·ÖÅäÏàÓ¦µÄTask¡£ÁíÍâTaskScheduler»¹Î¬»¤×ÅËùÓÐTaskµÄÔËÐÐ״̬£¬ÖØÊÔʧ°ÜµÄTask¡£ÏÂͼչʾÁËTaskSchedulerµÄ×÷Óãº

ÔÚ²»Í¬ÔËÐÐģʽÖÐÈÎÎñµ÷¶ÈÆ÷¾ßÌåΪ£º

Spark on StandaloneģʽΪTaskScheduler£»

YARN-ClientģʽΪYarnClientClusterScheduler

YARN-ClusterģʽΪYarnClusterScheduler

1.3 RDDÔËÐÐÔ­Àí

ÄÇô RDDÔÚSpark¼Ü¹¹ÖÐÊÇÈçºÎÔËÐеÄÄØ£¿×ܸ߲ã´ÎÀ´¿´£¬Ö÷Òª·ÖΪÈý²½£º

1.´´½¨ RDD ¶ÔÏó

2.DAGSchedulerÄ£¿é½éÈëÔËË㣬¼ÆËãRDDÖ®¼äµÄÒÀÀµ¹ØÏµ¡£RDDÖ®¼äµÄÒÀÀµ¹ØÏµ¾ÍÐγÉÁËDAG

3.ÿһ¸öJOB±»·ÖΪ¶à¸öStage£¬»®·ÖStageµÄÒ»¸öÖ÷ÒªÒÀ¾ÝÊǵ±Ç°¼ÆËãÒò×ÓµÄÊäÈëÊÇ·ñÊÇÈ·¶¨µÄ£¬Èç¹ûÊÇÔò½«Æä·ÖÔÚͬһ¸öStage£¬±ÜÃâ¶à¸öStageÖ®¼äµÄÏûÏ¢´«µÝ¿ªÏú¡£

ÒÔÏÂÃæÒ»¸ö°´ A-Z Ê××Öĸ·ÖÀ࣬²éÕÒÏàͬÊ××Öĸϲ»Í¬ÐÕÃû×ܸöÊýµÄÀý×ÓÀ´¿´Ò»Ï RDD ÊÇÈçºÎÔËÐÐÆðÀ´µÄ¡£

²½Öè 1 £º´´½¨ RDD ÉÏÃæµÄÀý×Ó³ýÈ¥×îºóÒ»¸ö collect ÊǸö¶¯×÷£¬²»»á´´½¨ RDD Ö®Íâ£¬Ç°ÃæËĸöת»»¶¼»á´´½¨³öÐ嵀 RDD ¡£Òò´ËµÚÒ»²½¾ÍÊÇ´´½¨ºÃËùÓÐ RDD( ÄÚ²¿µÄÎåÏîÐÅÏ¢ ) ¡£

²½Öè 2 £º´´½¨Ö´Ðмƻ® Spark »á¾¡¿ÉÄܵعܵÀ»¯£¬²¢»ùÓÚÊÇ·ñÒªÖØÐÂ×éÖ¯Êý¾ÝÀ´»®·Ö ½×¶Î (stage) £¬ÀýÈç±¾ÀýÖÐµÄ groupBy() ת»»¾Í»á½«Õû¸öÖ´Ðмƻ®»®·Ö³ÉÁ½½×¶ÎÖ´ÐС£×îÖÕ»á²úÉúÒ»¸ö DAG(directed acyclic graph £¬ÓÐÏòÎÞ»·Í¼ ) ×÷ΪÂß¼­Ö´Ðмƻ®¡£

²½Öè 3 £ºµ÷¶ÈÈÎÎñ ½«¸÷½×¶Î»®·Ö³É²»Í¬µÄ ÈÎÎñ (task) £¬Ã¿¸öÈÎÎñ¶¼ÊÇÊý¾ÝºÍ¼ÆËãµÄºÏÌå¡£ÔÚ½øÐÐÏÂÒ»½×¶Îǰ£¬µ±Ç°½×¶ÎµÄËùÓÐÈÎÎñ¶¼ÒªÖ´ÐÐÍê³É¡£ÒòΪÏÂÒ»½×¶ÎµÄµÚÒ»¸öת»»Ò»¶¨ÊÇÖØÐÂ×éÖ¯Êý¾ÝµÄ£¬ËùÒÔ±ØÐëµÈµ±Ç°½×¶ÎËùÓнá¹ûÊý¾Ý¶¼¼ÆËã³öÀ´Á˲ÅÄܼÌÐø¡£

¼ÙÉè±¾ÀýÖÐµÄ hdfs://names ÏÂÓÐËĸöÎļþ¿é£¬ÄÇô HadoopRDD ÖÐ partitions ¾Í»áÓÐËĸö·ÖÇø¶ÔÓ¦ÕâËĸö¿éÊý¾Ý£¬Í¬Ê± preferedLocations »áÖ¸Ã÷ÕâËĸö¿éµÄ×î¼ÑλÖá£ÏÖÔÚ£¬¾Í¿ÉÒÔ´´½¨³öËĸöÈÎÎñ£¬²¢µ÷¶Èµ½ºÏÊʵļ¯Èº½áµãÉÏ¡£

2¡¢SparkÔÚ²»Í¬¼¯ÈºÖеÄÔËÐмܹ¹

Spark×¢ÖØ½¨Á¢Á¼ºÃµÄÉú̬ϵͳ£¬Ëü²»½öÖ§³Ö¶àÖÖÍⲿÎļþ´æ´¢ÏµÍ³£¬ÌṩÁ˶àÖÖ¶àÑùµÄ¼¯ÈºÔËÐÐģʽ¡£²¿ÊðÔÚµ¥Ì¨»úÆ÷ÉÏʱ£¬¼È¿ÉÒÔÓñ¾µØ£¨Local£©Ä£Ê½ÔËÐУ¬Ò²¿ÉÒÔʹÓÃα·Ö²¼Ê½Ä£Ê½À´ÔËÐУ»µ±ÒÔ·Ö²¼Ê½¼¯Èº²¿ÊðµÄʱºò£¬¿ÉÒÔ¸ù¾Ý×Ô¼º¼¯ÈºµÄʵ¼ÊÇé¿öÑ¡ÔñStandaloneģʽ£¨Spark×Ô´øµÄģʽ£©¡¢YARN-Clientģʽ»òÕßYARN-Clusterģʽ¡£SparkµÄ¸÷ÖÖÔËÐÐģʽËäÈ»ÔÚÆô¶¯·½Ê½¡¢ÔËÐÐλÖᢵ÷¶È²ßÂÔÉϸ÷Óв»Í¬£¬µ«ËüÃǵÄÄ¿µÄ»ù±¾¶¼ÊÇÒ»Öµģ¬¾ÍÊÇÔÚºÏÊʵÄλÖð²È«¿É¿¿µÄ¸ù¾ÝÓû§µÄÅäÖúÍJobµÄÐèÒªÔËÐк͹ÜÀíTask¡£

2.1 Spark on StandaloneÔËÐйý³Ì

StandaloneģʽÊÇSparkʵÏÖµÄ×ÊÔ´µ÷¶È¿ò¼Ü£¬ÆäÖ÷ÒªµÄ½ÚµãÓÐClient½Úµã¡¢Master½ÚµãºÍWorker½Úµã¡£ÆäÖÐDriver¼È¿ÉÒÔÔËÐÐÔÚMaster½ÚµãÉÏÖУ¬Ò²¿ÉÒÔÔËÐÐÔÚ±¾µØClient¶Ë¡£µ±ÓÃspark-shell½»»¥Ê½¹¤¾ßÌá½»SparkµÄJobʱ£¬DriverÔÚMaster½ÚµãÉÏÔËÐУ»µ±Ê¹ÓÃspark-submit¹¤¾ßÌá½»Job»òÕßÔÚEclips¡¢IDEAµÈ¿ª·¢Æ½Ì¨ÉÏʹÓá±new SparkConf.setManager(¡°spark://master:7077¡±)¡±·½Ê½ÔËÐÐSparkÈÎÎñʱ£¬DriverÊÇÔËÐÐÔÚ±¾µØClient¶ËÉϵġ£

ÆäÔËÐйý³ÌÈçÏ£º

1.SparkContextÁ¬½Óµ½Master£¬ÏòMaster×¢²á²¢ÉêÇë×ÊÔ´£¨CPU Core ºÍMemory£©£»

2.Master¸ù¾ÝSparkContextµÄ×ÊÔ´ÉêÇëÒªÇóºÍWorkerÐÄÌøÖÜÆÚÄÚ±¨¸æµÄÐÅÏ¢¾ö¶¨ÔÚÄĸöWorkerÉÏ·ÖÅä×ÊÔ´£¬È»ºóÔÚ¸ÃWorkerÉÏ»ñÈ¡×ÊÔ´£¬È»ºóÆô¶¯StandaloneExecutorBackend£»

3.StandaloneExecutorBackendÏòSparkContext×¢²á£»

4.SparkContext½«Applicaiton´úÂë·¢Ë͸øStandaloneExecutorBackend£»²¢ÇÒSparkContext½âÎöApplicaiton´úÂ룬¹¹½¨DAGͼ£¬²¢Ìá½»¸øDAG Scheduler·Ö½â³ÉStage£¨µ±Åöµ½Action²Ù×÷ʱ£¬¾Í»á´ßÉúJob£»Ã¿¸öJobÖк¬ÓÐ1¸ö»ò¶à¸öStage£¬StageÒ»°ãÔÚ»ñÈ¡ÍⲿÊý¾ÝºÍshuffle֮ǰ²úÉú£©£¬È»ºóÒÔStage£¨»òÕß³ÆÎªTaskSet£©Ìá½»¸øTask Scheduler£¬Task Scheduler¸ºÔð½«Task·ÖÅäµ½ÏàÓ¦µÄWorker£¬×îºóÌá½»¸øStandaloneExecutorBackendÖ´ÐУ»

5.StandaloneExecutorBackend»á½¨Á¢ExecutorÏ̳߳أ¬¿ªÊ¼Ö´ÐÐTask£¬²¢ÏòSparkContext±¨¸æ£¬Ö±ÖÁTaskÍê³É¡£

6.ËùÓÐTaskÍê³Éºó£¬SparkContextÏòMaster×¢Ïú£¬ÊÍ·Å×ÊÔ´¡£

2.2 Spark on YARNÔËÐйý³Ì

YARNÊÇÒ»ÖÖͳһ×ÊÔ´¹ÜÀí»úÖÆ£¬ÔÚÆäÉÏÃæ¿ÉÒÔÔËÐжàÌ×¼ÆËã¿ò¼Ü¡£Ä¿Ç°µÄ´óÊý¾Ý¼¼ÊõÊÀ½ç£¬´ó¶àÊý¹«Ë¾³ýÁËʹÓÃSparkÀ´½øÐÐÊý¾Ý¼ÆË㣬ÓÉÓÚÀúÊ·Ô­Òò»òÕßµ¥·½ÃæÒµÎñ´¦ÀíµÄÐÔÄÜ¿¼ÂǶøÊ¹ÓÃ×ÅÆäËûµÄ¼ÆËã¿ò¼Ü£¬±ÈÈçMapReduce¡¢StormµÈ¼ÆËã¿ò¼Ü¡£Spark»ùÓÚ´ËÖÖÇé¿ö¿ª·¢ÁËSpark on YARNµÄÔËÐÐģʽ£¬ÓÉÓÚ½èÖúÁËYARNÁ¼ºÃµÄµ¯ÐÔ×ÊÔ´¹ÜÀí»úÖÆ£¬²»½ö²¿ÊðApplication¸ü¼Ó·½±ã£¬¶øÇÒÓû§ÔÚYARN¼¯ÈºÖÐÔËÐеķþÎñºÍApplicationµÄ×ÊÔ´Ò²ÍêÈ«¸ôÀ룬¸ü¾ßʵ¼ùÓ¦ÓüÛÖµµÄÊÇYARN¿ÉÒÔͨ¹ý¶ÓÁеķ½Ê½£¬¹ÜÀíͬʱÔËÐÐÔÚ¼¯ÈºÖеĶà¸ö·þÎñ¡£

Spark on YARNģʽ¸ù¾ÝDriverÔÚ¼¯ÈºÖеÄλÖ÷ÖΪÁ½ÖÖģʽ£ºÒ»ÖÖÊÇYARN-Clientģʽ£¬ÁíÒ»ÖÖÊÇYARN-Cluster£¨»ò³ÆÎªYARN-Standaloneģʽ£©¡£

2.2.1 YARN¿ò¼ÜÁ÷³Ì

Èκοò¼ÜÓëYARNµÄ½áºÏ£¬¶¼±ØÐë×ñÑ­YARNµÄ¿ª·¢Ä£Ê½¡£ÔÚ·ÖÎöSpark on YARNµÄʵÏÖϸ½Ú֮ǰ£¬ÓбØÒªÏÈ·ÖÎöÒ»ÏÂYARN¿ò¼ÜµÄһЩ»ù±¾Ô­Àí¡£

Yarn¿ò¼ÜµÄ»ù±¾ÔËÐÐÁ÷³ÌͼΪ£º

ÆäÖУ¬ResourceManager¸ºÔ𽫼¯ÈºµÄ×ÊÔ´·ÖÅ䏸¸÷¸öÓ¦ÓÃʹÓ㬶ø×ÊÔ´·ÖÅäºÍµ÷¶ÈµÄ»ù±¾µ¥Î»ÊÇContainer£¬ÆäÖзâ×°ÁË»úÆ÷×ÊÔ´£¬ÈçÄÚ´æ¡¢CPU¡¢´ÅÅ̺ÍÍøÂçµÈ£¬Ã¿¸öÈÎÎñ»á±»·ÖÅäÒ»¸öContainer£¬¸ÃÈÎÎñÖ»ÄÜÔÚ¸ÃContainerÖÐÖ´ÐУ¬²¢Ê¹ÓøÃContainer·â×°µÄ×ÊÔ´¡£NodeManagerÊÇÒ»¸ö¸öµÄ¼ÆËã½Úµã£¬Ö÷Òª¸ºÔðÆô¶¯ApplicationËùÐèµÄContainer£¬¼à¿Ø×ÊÔ´£¨ÄÚ´æ¡¢CPU¡¢´ÅÅ̺ÍÍøÂçµÈ£©µÄʹÓÃÇé¿ö²¢½«Ö®»ã±¨¸øResourceManager¡£ResourceManagerÓëNodeManagers¹²Í¬×é³ÉÕû¸öÊý¾Ý¼ÆËã¿ò¼Ü£¬ApplicationMasterÓë¾ßÌåµÄApplicationÏà¹Ø£¬Ö÷Òª¸ºÔðͬResourceManagerЭÉÌÒÔ»ñÈ¡ºÏÊʵÄContainer£¬²¢¸ú×ÙÕâЩContainerµÄ״̬ºÍ¼à¿ØÆä½ø¶È¡£

2.2.2 YARN-Client

Yarn-ClientģʽÖУ¬DriverÔÚ¿Í»§¶Ë±¾µØÔËÐУ¬ÕâÖÖģʽ¿ÉÒÔʹµÃSpark ApplicationºÍ¿Í»§¶Ë½øÐн»»¥£¬ÒòΪDriverÔÚ¿Í»§¶Ë£¬ËùÒÔ¿ÉÒÔͨ¹ýwebUI·ÃÎÊDriverµÄ״̬£¬Ä¬ÈÏÊÇhttp://hadoop1:4040·ÃÎÊ£¬¶øYARNͨ¹ýhttp:// hadoop1:8088·ÃÎÊ¡£

YARN-clientµÄ¹¤×÷Á÷³Ì·ÖΪÒÔϼ¸¸ö²½Ö裺

1.Spark Yarn ClientÏòYARNµÄResourceManagerÉêÇëÆô¶¯Application Master¡£Í¬Ê±ÔÚSparkContent³õʼ»¯Öн«´´½¨DAGSchedulerºÍTASKSchedulerµÈ£¬ÓÉÓÚÎÒÃÇÑ¡ÔñµÄÊÇYarn-Clientģʽ£¬³ÌÐò»áÑ¡ÔñYarnClientClusterSchedulerºÍYarnClientSchedulerBackend£»

2.ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÓëYARN-ClusterÇø±ðµÄÊÇÔÚ¸ÃApplicationMaster²»ÔËÐÐSparkContext£¬Ö»ÓëSparkContext½øÐÐÁªÏµ½øÐÐ×ÊÔ´µÄ·ÖÅÉ£»

3.ClientÖеÄSparkContext³õʼ»¯Íê±Ïºó£¬ÓëApplicationMaster½¨Á¢Í¨Ñ¶£¬ÏòResourceManager×¢²á£¬¸ù¾ÝÈÎÎñÐÅÏ¢ÏòResourceManagerÉêÇë×ÊÔ´£¨Container£©£»

4.Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯Æô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòClientÖеÄSparkContext×¢²á²¢ÉêÇëTask£»

5.ClientÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòDriver»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃClientËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ£»

6.Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ClientµÄSparkContextÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º¡£

2.2.3 YARN-Cluster

ÔÚYARN-ClusterģʽÖУ¬µ±Óû§ÏòYARNÖÐÌá½»Ò»¸öÓ¦ÓóÌÐòºó£¬YARN½«·ÖÁ½¸ö½×¶ÎÔËÐиÃÓ¦ÓóÌÐò£ºµÚÒ»¸ö½×¶ÎÊǰÑSparkµÄDriver×÷Ϊһ¸öApplicationMasterÔÚYARN¼¯ÈºÖÐÏÈÆô¶¯£»µÚ¶þ¸ö½×¶ÎÊÇÓÉApplicationMaster´´½¨Ó¦ÓóÌÐò£¬È»ºóΪËüÏòResourceManagerÉêÇë×ÊÔ´£¬²¢Æô¶¯ExecutorÀ´ÔËÐÐTask£¬Í¬Ê±¼à¿ØËüµÄÕû¸öÔËÐйý³Ì£¬Ö±µ½ÔËÐÐÍê³É¡£

YARN-clusterµÄ¹¤×÷Á÷³Ì·ÖΪÒÔϼ¸¸ö²½Ö裺

1. Spark Yarn ClientÏòYARNÖÐÌá½»Ó¦ÓóÌÐò£¬°üÀ¨ApplicationMaster³ÌÐò¡¢Æô¶¯ApplicationMasterµÄÃüÁî¡¢ÐèÒªÔÚExecutorÖÐÔËÐеijÌÐòµÈ£»

2. ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÆäÖÐApplicationMaster½øÐÐSparkContextµÈµÄ³õʼ»¯£»

3. ApplicationMasterÏòResourceManager×¢²á£¬ÕâÑùÓû§¿ÉÒÔÖ±½Óͨ¹ýResourceManage²é¿´Ó¦ÓóÌÐòµÄÔËÐÐ״̬£¬È»ºóËü½«²ÉÓÃÂÖѯµÄ·½Ê½Í¨¹ýRPCЭÒéΪ¸÷¸öÈÎÎñÉêÇë×ÊÔ´£¬²¢¼à¿ØËüÃǵÄÔËÐÐ״ֱ̬µ½ÔËÐнáÊø£»

4. Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯Æô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòApplicationMasterÖеÄSparkContext×¢²á²¢ÉêÇëTask¡£ÕâÒ»µãºÍStandaloneģʽһÑù£¬Ö»²»¹ýSparkContextÔÚSpark ApplicationÖгõʼ»¯Ê±£¬Ê¹ÓÃCoarseGrainedSchedulerBackendÅäºÏYarnClusterScheduler½øÐÐÈÎÎñµÄµ÷¶È£¬ÆäÖÐYarnClusterSchedulerÖ»ÊǶÔTaskSchedulerImplµÄÒ»¸ö¼òµ¥°ü×°£¬Ôö¼ÓÁ˶ÔExecutorµÄµÈ´ýÂß¼­µÈ£»

5. ApplicationMasterÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòApplicationMaster»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃApplicationMasterËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ£»

6. Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ApplicationMasterÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º¡£

2.2.4 YARN-Client Óë YARN-Cluster Çø±ð

Àí½âYARN-ClientºÍYARN-ClusterÉî²ã´ÎµÄÇø±ð֮ǰÏÈÇå³þÒ»¸ö¸ÅÄApplication Master¡£ÔÚYARNÖУ¬Ã¿¸öApplicationʵÀý¶¼ÓÐÒ»¸öApplicationMaster½ø³Ì£¬ËüÊÇApplicationÆô¶¯µÄµÚÒ»¸öÈÝÆ÷¡£Ëü¸ºÔðºÍResourceManager´ò½»µÀ²¢ÇëÇó×ÊÔ´£¬»ñÈ¡×ÊÔ´Ö®ºó¸æËßNodeManagerΪÆäÆô¶¯Container¡£´ÓÉî²ã´ÎµÄº¬Òå½²YARN-ClusterºÍYARN-ClientģʽµÄÇø±ðÆäʵ¾ÍÊÇApplicationMaster½ø³ÌµÄÇø±ð¡£

YARN-ClusterģʽÏ£¬DriverÔËÐÐÔÚAM(Application Master)ÖУ¬Ëü¸ºÔðÏòYARNÉêÇë×ÊÔ´£¬²¢¼à¶½×÷ÒµµÄÔËÐÐ×´¿ö¡£µ±Óû§Ìá½»ÁË×÷ÒµÖ®ºó£¬¾Í¿ÉÒԹصôClient£¬×÷Òµ»á¼ÌÐøÔÚYARNÉÏÔËÐУ¬Òò¶øYARN-Clusterģʽ²»ÊʺÏÔËÐн»»¥ÀàÐ͵Ä×÷Òµ£»

YARN-ClientģʽÏ£¬Application Master½ö½öÏòYARNÇëÇóExecutor£¬Client»áºÍÇëÇóµÄContainerͨÐÅÀ´µ÷¶ÈËûÃǹ¤×÷£¬Ò²¾ÍÊÇ˵Client²»ÄÜÀ뿪¡£

3¡¢SparkÔÚ²»Í¬¼¯ÈºÖеÄÔËÐÐÑÝʾ

ÔÚÒÔÏÂÔËÐÐÑÝʾ¹ý³ÌÖÐÐèÒªÆô¶¯HadoopºÍSpark¼¯Èº£¬ÆäÖÐHadoopÐèÒªÆô¶¯HDFSºÍYARN£¬Æô¶¯¹ý³Ì¿ÉÒԲμûµÚÈý½Ú¡¶Spark±à³ÌÄ£ÐÍ£¨ÉÏ£©--¸ÅÄî¼°ShellÊÔÑé¡·¡£

3.1 StandaloneÔËÐйý³ÌÑÝʾ

ÔÚSpark¼¯ÈºµÄ½ÚµãÖУ¬40%µÄÊý¾ÝÓÃÓÚ¼ÆË㣬60%µÄÄÚ´æÓÃÓÚ±£´æ½á¹û£¬ÎªÁËÄܹ»Ö±¹Û¸ÐÊÜÊý¾ÝÔÚÄÚ´æºÍ·ÇÄÚ´æËٶȵÄÇø±ð£¬ÔÚ¸ÃÑÝʾÖн«Ê¹ÓôóСΪ1GµÄSogou3.txtÊý¾ÝÎļþ£¨²Î¼ûµÚÈý½Ú¡¶Spark±à³ÌÄ£ÐÍ£¨ÉÏ£©--¸ÅÄî¼°ShellÊÔÑé¡·µÄ3.2²âÊÔÊý¾ÝÎļþÉÏ´«£©£¬Í¨¹ý¶Ô±ÈµÃµ½²î¾à¡£

3.1.1 ²é¿´²âÊÔÎļþ´æ·ÅλÖÃ

ʹÓÃHDFSÃüÁî¹Û²ìSogou3.txtÊý¾Ý´æ·Å½ÚµãµÄλÖÃ

$cd /app/hadoop/hadoop-2.2.0/bin

$hdfs fsck /sogou/SogouQ3.txt -files -blocks -locations

ͨ¹ý¿ÉÒÔ¿´µ½¸ÃÎļþ±»·Ö¸ôΪ9¸ö¿é·ÅÔÚ¼¯ÈºÖÐ

3.1.2Æô¶¯Spark-Shell

ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖÐÿ¸öExecutor·ÖÅä1GÄÚ´æ

$cd /app/hadoop/spark-1.1.0/bin

$./spark-shell --master spark://hadoop1:7077 --executor-memory 1g

ͨ¹ýSparkµÄ¼à¿Ø½çÃæ²é¿´ExecutorsµÄÇé¿ö£¬¿ÉÒԹ۲쵽ÓÐ1¸öDriver ºÍ3¸öExecutor£¬ÆäÖÐhadoop2ºÍhadoop3Æô¶¯Ò»¸öExecutor£¬¶øhadoop1Æô¶¯Ò»¸öExecutorºÍDriver¡£ÔÚ¸ÃģʽÏÂDriverÖÐÔËÐÐSparkContect£¬Ò²¾ÍÊÇDAGShedulerºÍTaskShedulerµÈ½ø³ÌÊÇÔËÐÐÔÚ½ÚµãÉÏ£¬½øÐÐStageºÍTaskµÄ·ÖÅäºÍ¹ÜÀí¡£

3.1.3ÔËÐйý³Ì¼°½á¹û·ÖÎö

µÚÒ»²½ ¶ÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬²¢¼ÆËã¹ý³ÌÖÐʹÓÃcache()·½·¨¶ÔÊý¾Ý¼¯½øÐлº´æ

val sogou=sc.textFile ("hdfs://hadoop1: 9000/sogou/ SogouQ3.txt")

sogou.cache()

sogou.count()

ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐÒ»¸öÈÎÎñµÄÊý¾ÝÀ´Ô´ÓÚÁ½¸öÊý¾Ý·ÖƬ£¬ÆäËûµÄÈÎÎñ¸÷¶ÔÓ¦Ò»¸öÊý¾Ý·ÖƬ£¬¼´ÏÔʾ7¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪ£¨NODE_LOCAL£©£¬1¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪÈκÎλÖã¨ANY£©¡£

ÔÚ´æ´¢¼à¿Ø½çÃæÖУ¬ÎÒÃÇ¿ÉÒÔ¿´µ½»º´æ·ÝÊýΪ3£¬´óСΪ907.1M£¬»º´æÂÊΪ38%

ÔËÐнá¹ûµÃµ½Êý¾Ý¼¯µÄÊýÁ¿Îª1000Íò±ÊÊý¾Ý£¬×ܹ²»¨·ÑÁË352.17Ãë

µÚ¶þ²½ ÔٴζÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬´Ë´Î¼ÆËãʹÓûº´æµÄÊý¾Ý£¬¶Ô±Èǰºó

ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ»¹ÊÇ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐ3¸öÈÎÎñÊý¾ÝÀ´×ÔÄڴ棨PROCESS_LOCAL£©£¬3¸öÈÎÎñÊý¾ÝÀ´×Ô±¾»ú£¨NODE_LOCAL£©£¬ÆäËû2¸öÈÎÎñÊý¾ÝÀ´×ÔÈκÎλÖã¨ANY£©¡£ÈÎÎñËùºÄ·ÑµÄʱ¼ä¶àÉÙÅÅÐòΪ£ºANY> NODE_LOCAL> PROCESS_LOCAL£¬¶Ô±È¿´³öʹÓÃÄÚ´æµÄÊý¾Ý±ÈʹÓñ¾»ú»òÈκÎλÖõÄËÙ¶ÈÖÁÉÙ»á¿ì2¸öÊýÁ¿¼¶¡£

Õû¸ö×÷ÒµµÄÔËÐÐËÙ¶ÈΪ34.14Ã룬±ÈûÓлº´æÌá¸ßÁËÒ»¸öÊýÁ¿¼¶¡£ÓÉÓÚ¸Õ²ÅÀý×ÓÖÐÊý¾ÝÖ»ÊDz¿·Ö»º´æ£¨»º´æÂÊ38%£©£¬Èç¹ûÍêÈ«»º´æËÙ¶ÈÄܹ»µÃµ½½øÒ»²½ÌáÉý£¬´ÓÕâÌåÑéµ½Spark·Ç³£ºÄÄڴ棬²»¹ýÒ²¹»¿ì¡¢¹»·æÀû£¡

3.2 YARN-ClientÔËÐйý³ÌÑÝʾ

3.2.1 Æô¶¯Spark-Shell

ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖзÖÅä3¸öExecutor¡¢Ã¿¸öExecutorΪ1GÄÚ´æ

$cd /app/hadoop/spark -1.1.0/bin

$./spark-shell --master YARN-client --num-executors 3 --executor-memory 1g

µÚÒ»²½ °ÑÏà¹ØµÄÔËÐÐJAR°üÉÏ´«µ½HDFSÖÐ

ͨ¹ýHDFS²é¿´½çÃæ¿ÉÒÔ¿´µ½ÔÚ /user/hadoop/.sparkStaging/Ó¦ÓñàºÅ£¬²é¿´µ½ÕâЩÎļþ£º

µÚ¶þ²½ Æô¶¯Application Master£¬×¢²áExecutor

Ó¦ÓóÌÐòÏòResourceManagerÉêÇëÆô¶¯Application Master£¬ÔÚÆô¶¯Íê³Éºó»á·ÖÅäCotainer²¢°ÑÕâЩÐÅÏ¢·´À¡¸øSparkContext£¬SparkContextºÍÏà¹ØµÄNMͨѶ£¬ÔÚ»ñµÃµÄContainerÉÏÆô¶¯Executor£¬´ÓÏÂͼ¿ÉÒÔ¿´µ½ÔÚhadoop1¡¢hadoop2ºÍhadoop3·Ö±ðÆô¶¯ÁËExecutor

µÚÈý²½ ²é¿´Æô¶¯½á¹û

YARN-ClientģʽÖУ¬DriverÔÚ¿Í»§¶Ë±¾µØÔËÐУ¬ÕâÖÖģʽ¿ÉÒÔʹµÃSpark ApplicationºÍ¿Í»§¶Ë½øÐн»»¥£¬ÒòΪDriverÔÚ¿Í»§¶ËËùÒÔ¿ÉÒÔͨ¹ýwebUI·ÃÎÊDriverµÄ״̬£¬Ä¬ÈÏÊÇhttp://hadoop1:4040·ÃÎÊ£¬¶øYARNͨ¹ýhttp:// hadoop1:8088·ÃÎÊ¡£

3.2.2 ÔËÐйý³Ì¼°½á¹û·ÖÎö

µÚÒ»²½ ¶ÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬²¢¼ÆËã¹ý³ÌÖÐʹÓÃcache()·½·¨¶ÔÊý¾Ý¼¯½øÐлº´æ

val sogou=sc.textFile ("hdfs://hadoop1:9000/ sogou/SogouQ3.txt")

sogou.cache()

sogou.count()

ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐÒ»¸öÈÎÎñµÄÊý¾ÝÀ´Ô´ÓÚÁ½¸öÊý¾Ý·ÖƬ£¬ÆäËûµÄÈÎÎñ¸÷¶ÔÓ¦Ò»¸öÊý¾Ý·ÖƬ£¬¼´ÏÔʾ7¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪ£¨NODE_LOCAL£©£¬1¸öÈÎÎñ»ñÈ¡Êý¾ÝµÄÀàÐÍΪÈκÎλÖã¨RACK_LOCAL£©¡£

ͨ¹ýÔËÐÐÈÕÖ¾¿ÉÒԹ۲쵽ÔÚËùÓÐÈÎÎñ½áÊøµÄʱºò£¬ÓÉ YARNClientScheduler֪ͨYARN¼¯ÈºÈÎÎñÔËÐÐÍê±Ï£¬»ØÊÕ×ÊÔ´£¬×îÖչرÕSparkContext£¬Õû¸ö¹ý³ÌºÄ·Ñ108.6Ãë¡£

µÚ¶þ²½ ²é¿´Êý¾Ý»º´æÇé¿ö

ͨ¹ý¼à¿Ø½çÃæ¿ÉÒÔ¿´µ½£¬ºÍStandaloneÒ»Ñù38%µÄÊý¾ÝÒѾ­»º´æÔÚÄÚ´æÖÐ

µÚÈý²½ ÔٴζÁÈ¡Îļþºó¼ÆËãÊý¾Ý¼¯ÌõÊý£¬´Ë´Î¼ÆËãʹÓûº´æµÄÊý¾Ý£¬¶Ô±Èǰºó

sogou.count()

ͨ¹ýÒ³Ãæ¼à¿Ø¿ÉÒÔ¿´µ½¸Ã×÷Òµ»¹ÊÇ·ÖΪ8¸öÈÎÎñ£¬ÆäÖÐ3¸öÈÎÎñÊý¾ÝÀ´×ÔÄڴ棨PROCESS_LOCAL£©£¬4¸öÈÎÎñÊý¾ÝÀ´×Ô±¾»ú£¨NODE_LOCAL£©£¬1¸öÈÎÎñÊý¾ÝÀ´×Ô»ú¼Ü£¨RACK_LOCAL£©¡£¶Ô±ÈÔÚÄÚ´æÖеÄÔËÐÐËÙ¶È×î¿ì£¬ËٶȱÈÔÚ±¾»úÒª¿ìÖÁÉÙ1¸öÊýÁ¿¼¶¡£

YARNClientClusterSchedulerÌæ´úÁËStandaloneģʽϵÃTaskScheduler½øÐÐÈÎÎñ¹ÜÀí£¬ÔÚÈÎÎñ½áÊøºó֪ͨYARN¼¯Èº½øÐÐ×ÊÔ´µÄ»ØÊÕ£¬×îºó¹Ø±ÕSparkContect¡£²¿·Ö»º´æÊý¾ÝÔËÐйý³ÌºÄ·ÑÁË29.77Ã룬±ÈûÓлº´æËÙ¶ÈÌáÉý²»ÉÙ¡£

3.3 YARN-ClusterÔËÐйý³ÌÑÝʾ

3.3.1 ÔËÐгÌÐò

ͨ¹ýÈçÏÂÃüÁîÆô¶¯Spark-Shell£¬ÔÚÑÝʾµ±ÖзÖÅä3¸öExecutor¡¢Ã¿¸öExecutorΪ512MÄÚ´æ

$cd /app/hadoop/spark-1.1.0

$./bin/spark-submit --master YARN-cluster --class class3.SogouResult --executor-memory 512m LearnSpark.jar hdfs://hadoop1:9000 /sogou/SogouQ3.txt hdfs://hadoop1:9000/ class3/output2

µÚÒ»²½ °ÑÏà¹ØµÄ×ÊÔ´ÉÏ´«µ½HDFSÖУ¬Ïà¶ÔÓÚYARN-Client¶àÁËLearnSpark.jarÎļþ

ÕâЩÎļþ¿ÉÒÔÔÚHDFSÖÐÕÒµ½£¬¾ßÌå·¾¶Îª http://hadoop1:9000 /user/hadoop/.sparkStaging/Ó¦ÓñàºÅ £º

µÚ¶þ²½ YARN¼¯Èº½Ó¹ÜÔËÐÐ

Ê×ÏÈYARN¼¯ÈºÖÐÓÉResourceManager·ÖÅäContainerÆô¶¯SparkContext£¬²¢·ÖÅäÔËÐнڵ㣬ÓÉSparkConextºÍNM½øÐÐͨѶ£¬»ñÈ¡ContainerÆô¶¯Executor£¬È»ºóÓÉSparkContextµÄYarnClusterScheduler½øÐÐÈÎÎñµÄ·Ö·¢ºÍ¼à¿Ø£¬×îÖÕÔÚÈÎÎñÖ´ÐÐÍê±ÏʱÓÉYarnClusterScheduler֪ͨResourceManager½øÐÐ×ÊÔ´µÄ»ØÊÕ¡£

3.3.2 ÔËÐнá¹û

ÔÚYARN-ClusterģʽÖÐÃüÁî½çÃæÖ»¸ºÔðÓ¦ÓõÄÌá½»£¬SparkContextºÍ×÷ÒµÔËÐоùÔÚYARN¼¯ÈºÖУ¬¿ÉÒÔ´Óhttp:// hadoop1:8088²é¿´µ½¾ßÌåÔËÐйý³Ì£¬ÔËÐнá¹ûÊä³öµ½HDFSÖУ¬ÈçÏÂͼËùʾ£º

4¡¢ÎÊÌâ½â¾ö

4.1 YARN-ClientÆô¶¯±¨´í

ÔÚ½øÐÐHadoop2.X 64bit±àÒë°²×°ÖÐÓÉÓÚʹÓõ½64λÐéÄâ»ú£¬°²×°¹ý³ÌÖгöÏÖÏÂͼ´íÎó£º

[hadoop@hadoop1 spark-1.1.0]$ bin/spark-shell --master YARN-client --executor-memory 1g --num-executors 3

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Exception in thread " main" java.lang.Exception: When running with master 'YARN-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

at org.apache.spark.deploy .SparkSubmitArguments.checkRequiredArguments (SparkSubmitArguments.scala:182)

at org.apache.spark.deploy .SparkSubmitArguments.<init> (SparkSubmitArguments.scala:62)

at org.apache.spark.deploy .SparkSubmit$.main (SparkSubmit.scala:70)

at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala)

   
4706 ´Îä¯ÀÀ       32
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ