Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Ò»ÎÄÏê½âSpark»ù±¾¼Ü¹¹Ô­Àí
 
  5544  次浏览      27
 2019-6-17 
 
±à¼­ÍƼö:

±¾ÎÄÀ´×ÔÓÚ¼òÊ飬Apache Spark ÊÇרΪ´ó¹æÄ£Êý¾Ý´¦Àí¶øÉè¼ÆµÄ¿ìËÙͨÓõļÆËãÒýÇæ¡£ÎÄÕÂÖ÷Ҫͨ¹ý°Ë¸ö·½ÃæÈ«Ãæ½éÉÜÁËsparkµÄ¼Ü¹¹Ô­Àí£¬¸ü¶àÄÚÈÝÇë¿´È«ÎÄ¡£

Apache SparkÊÇÒ»¸öÎ§ÈÆËÙ¶È¡¢Ò×ÓÃÐԺ͸´ÔÓ·ÖÎö¹¹½¨µÄ´óÊý¾Ý´¦Àí¿ò¼Ü£¬×î³õÔÚ2009ÄêÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУµÄAMPLab¿ª·¢£¬²¢ÓÚ2010Äê³ÉΪApacheµÄ¿ªÔ´ÏîĿ֮һ£¬ÓëHadoopºÍStormµÈÆäËû´óÊý¾ÝºÍMapReduce¼¼ÊõÏà±È£¬SparkÓÐÈçÏÂÓÅÊÆ£ºSparkÌṩÁËÒ»¸öÈ«Ãæ¡¢Í³Ò»µÄ¿ò¼ÜÓÃÓÚ¹ÜÀí¸÷ÖÖÓÐ×Ų»Í¬ÐÔÖÊ£¨Îı¾Êý¾Ý¡¢Í¼±íÊý¾ÝµÈ£©µÄÊý¾Ý¼¯ºÍÊý¾ÝÔ´£¨ÅúÁ¿Êý¾Ý»òʵʱµÄÁ÷Êý¾Ý£©µÄ´óÊý¾Ý´¦ÀíµÄÐèÇó

¹Ù·½×ÊÁϽéÉÜSpark¿ÉÒÔ½«Hadoop¼¯ÈºÖеÄÓ¦ÓÃÔÚÄÚ´æÖеÄÔËÐÐËÙ¶ÈÌáÉý100±¶£¬ÉõÖÁÄܹ»½«Ó¦ÓÃÔÚ´ÅÅÌÉϵÄÔËÐÐËÙ¶ÈÌáÉý10±¶

Ä¿±ê:

¼Ü¹¹¼°Éú̬

spark Óë hadoop

ÔËÐÐÁ÷³Ì¼°Ìصã

³£ÓÃÊõÓï

standaloneģʽ

yarn¼¯Èº

RDDÔËÐÐÁ÷³Ì

¼Ü¹¹¼°Éú̬£º

ͨ³£µ±ÐèÒª´¦ÀíµÄÊý¾ÝÁ¿³¬¹ýÁ˵¥»ú³ß¶È(±ÈÈçÎÒÃǵļÆËã»úÓÐ4GBµÄÄڴ棬¶øÎÒÃÇÐèÒª´¦Àí100GBÒÔÉϵÄÊý¾Ý)ÕâʱÎÒÃÇ¿ÉÒÔÑ¡Ôñspark¼¯Èº½øÐмÆË㣬ÓÐʱÎÒÃÇ¿ÉÄÜÐèÒª´¦ÀíµÄÊý¾ÝÁ¿²¢²»´ó£¬µ«ÊǼÆËãºÜ¸´ÔÓ£¬ÐèÒª´óÁ¿µÄʱ¼ä£¬ÕâʱÎÒÃÇÒ²¿ÉÒÔÑ¡ÔñÀûÓÃspark¼¯ÈºÇ¿´óµÄ¼ÆËã×ÊÔ´£¬²¢Ðл¯µØ¼ÆË㣬Æä¼Ü¹¹Ê¾ÒâͼÈçÏ£º

Spark Core£º°üº¬SparkµÄ»ù±¾¹¦ÄÜ£»ÓÈÆäÊǶ¨ÒåRDDµÄAPI¡¢²Ù×÷ÒÔ¼°ÕâÁ½ÕßÉϵ͝×÷¡£ÆäËûSparkµÄ¿â¶¼Êǹ¹½¨ÔÚRDDºÍSpark CoreÖ®ÉϵÄ

Spark SQL£ºÌṩͨ¹ýApache HiveµÄSQL±äÌåHive²éѯÓïÑÔ£¨HiveQL£©ÓëSpark½øÐн»»¥µÄAPI¡£Ã¿¸öÊý¾Ý¿â±í±»µ±×öÒ»¸öRDD£¬Spark SQL²éѯ±»×ª»»ÎªSpark²Ù×÷¡£

Spark Streaming£º¶ÔʵʱÊý¾ÝÁ÷½øÐд¦ÀíºÍ¿ØÖÆ¡£Spark StreamingÔÊÐí³ÌÐòÄܹ»ÏñÆÕͨRDDÒ»Ñù´¦ÀíʵʱÊý¾Ý

MLlib£ºÒ»¸ö³£ÓûúÆ÷ѧϰËã·¨¿â£¬Ëã·¨±»ÊµÏÖΪ¶ÔRDDµÄSpark²Ù×÷¡£Õâ¸ö¿â°üº¬¿ÉÀ©Õ¹µÄѧϰËã·¨£¬±ÈÈç·ÖÀà¡¢»Ø¹éµÈÐèÒª¶Ô´óÁ¿Êý¾Ý¼¯½øÐеü´úµÄ²Ù×÷¡£

GraphX£º¿ØÖÆÍ¼¡¢²¢ÐÐͼ²Ù×÷ºÍ¼ÆËãµÄÒ»×éËã·¨ºÍ¹¤¾ßµÄ¼¯ºÏ¡£GraphXÀ©Õ¹ÁËRDD API£¬°üº¬¿ØÖÆÍ¼¡¢´´½¨×Óͼ¡¢·ÃÎÊ·¾¶ÉÏËùÓж¥µãµÄ²Ù×÷

Spark¼Ü¹¹µÄ×é³ÉͼÈçÏ£º

Cluster Manager£ºÔÚstandaloneģʽÖм´ÎªMasterÖ÷½Úµã£¬¿ØÖÆÕû¸ö¼¯Èº£¬¼à¿Øworker¡£ÔÚYARNģʽÖÐΪ×ÊÔ´¹ÜÀíÆ÷

Worker½Úµã£º´Ó½Úµã£¬¸ºÔð¿ØÖƼÆËã½Úµã£¬Æô¶¯Executor»òÕßDriver¡£

Driver£º ÔËÐÐApplication µÄmain()º¯Êý

Executor£ºÖ´ÐÐÆ÷£¬ÊÇΪij¸öApplicationÔËÐÐÔÚworker nodeÉϵÄÒ»¸ö½ø³Ì

SparkÓëhadoop:

HadoopÓÐÁ½¸öºËÐÄÄ£¿é£¬·Ö²¼Ê½´æ´¢Ä£¿éHDFSºÍ·Ö²¼Ê½¼ÆËãÄ£¿éMapreduce

spark±¾Éí²¢Ã»ÓÐÌṩ·Ö²¼Ê½Îļþϵͳ£¬Òò´ËsparkµÄ·ÖÎö´ó¶àÒÀÀµÓÚHadoopµÄ·Ö²¼Ê½ÎļþϵͳHDFS

HadoopµÄMapreduceÓëspark¶¼¿ÉÒÔ½øÐÐÊý¾Ý¼ÆË㣬¶øÏà±ÈÓÚMapreduce£¬sparkµÄËٶȸü¿ì²¢ÇÒÌṩµÄ¹¦Äܸü¼Ó·á¸»

¹ØÏµÍ¼ÈçÏ£º

ÔËÐÐÁ÷³Ì¼°Ìص㣺

sparkÔËÐÐÁ÷³ÌͼÈçÏ£º

1.¹¹½¨Spark ApplicationµÄÔËÐл·¾³£¬Æô¶¯SparkContext

2.SparkContextÏò×ÊÔ´¹ÜÀíÆ÷£¨¿ÉÒÔÊÇStandalone£¬Mesos£¬Yarn£©ÉêÇëÔËÐÐExecutor×ÊÔ´£¬²¢Æô¶¯StandaloneExecutorbackend£¬

3.ExecutorÏòSparkContextÉêÇëTask

4,SparkContext½«Ó¦ÓóÌÐò·Ö·¢¸øExecutor

5.SparkContext¹¹½¨³ÉDAGͼ£¬½«DAGͼ·Ö½â³ÉStage¡¢½«Taskset·¢Ë͸øTask Scheduler£¬×îºóÓÉTask Scheduler½«Task·¢Ë͸øExecutorÔËÐÐ

6.TaskÔÚExecutorÉÏÔËÐУ¬ÔËÐÐÍêÊÍ·ÅËùÓÐ×ÊÔ´

SparkÔËÐÐÌØµã£º

1.ÿ¸öApplication»ñȡרÊôµÄexecutor½ø³Ì£¬¸Ã½ø³ÌÔÚApplicationÆÚ¼äһֱפÁô£¬²¢ÒÔ¶àÏ̷߳½Ê½ÔËÐÐTask¡£ÕâÖÖApplication¸ôÀë»úÖÆÊÇÓÐÓÅÊÆµÄ£¬ÎÞÂÛÊÇ´Óµ÷¶È½Ç¶È¿´£¨Ã¿¸öDriverµ÷¶ÈËû×Ô¼ºµÄÈÎÎñ£©£¬»¹ÊÇ´ÓÔËÐнǶȿ´£¨À´×Ô²»Í¬ApplicationµÄTaskÔËÐÐÔÚ²»Í¬JVMÖУ©£¬µ±È»ÕâÑùÒâζ×ÅSpark Application²»ÄÜ¿çÓ¦ÓóÌÐò¹²ÏíÊý¾Ý£¬³ý·Ç½«Êý¾ÝдÈëÍⲿ´æ´¢ÏµÍ³

2.SparkÓë×ÊÔ´¹ÜÀíÆ÷Î޹أ¬Ö»ÒªÄܹ»»ñÈ¡executor½ø³Ì£¬²¢Äܱ£³ÖÏ໥ͨОͿÉÒÔÁË

3.Ìá½»SparkContextµÄClientÓ¦¸Ã¿¿½üWorker½Úµã£¨ÔËÐÐExecutorµÄ½Úµã£©£¬×îºÃÊÇÔÚͬһ¸öRackÀÒòΪSpark ApplicationÔËÐйý³ÌÖÐSparkContextºÍExecutorÖ®¼äÓдóÁ¿µÄÐÅÏ¢½»»»

4.Task²ÉÓÃÁËÊý¾Ý±¾µØÐÔºÍÍÆ²âÖ´ÐеÄÓÅ»¯»úÖÆ

³£ÓÃÊõÓï:

Application:Appliction¶¼ÊÇÖ¸Óû§±àдµÄSparkÓ¦ÓóÌÐò£¬ÆäÖаüÀ¨Ò»¸öDriver¹¦ÄܵĴúÂëºÍ·Ö²¼ÔÚ¼¯ÈºÖжà¸ö½ÚµãÉÏÔËÐеÄExecutor´úÂë

**Driver: SparkÖеÄDriver¼´ÔËÐÐÉÏÊöApplicationµÄmainº¯Êý²¢´´½¨SparkContext£¬´´½¨SparkContextµÄÄ¿µÄÊÇΪÁË×¼±¸SparkÓ¦ÓóÌÐòµÄÔËÐл·¾³£¬ÔÚSparkÖÐÓÐSparkContext¸ºÔðÓëClusterManagerͨÐÅ£¬½øÐÐ×ÊÔ´ÉêÇë¡¢ÈÎÎñµÄ·ÖÅäºÍ¼à¿ØµÈ£¬µ±Executor²¿·ÖÔËÐÐÍê±Ïºó£¬Driverͬʱ¸ºÔð½«SparkContext¹Ø±Õ£¬Í¨³£ÓÃSparkContext´ú±íDriver

Executor: ij¸öApplicationÔËÐÐÔÚworker½ÚµãÉϵÄÒ»¸ö½ø³Ì£¬ ¸Ã½ø³Ì¸ºÔðÔËÐÐijЩTask£¬ ²¢ÇÒ¸ºÔð½«Êý¾Ý´æµ½ÄÚ´æ»ò´ÅÅÌÉÏ£¬Ã¿¸öApplication¶¼Óи÷×Ô¶ÀÁ¢µÄÒ»ÅúExecutor£¬ ÔÚSpark on YarnģʽÏ£¬Æä½ø³ÌÃû³ÆÎªCoarseGrainedExecutor Backend¡£Ò»¸öCoarseGrainedExecutor BackendÓÐÇÒ½öÓÐÒ»¸öExecutor¶ÔÏó£¬ ¸ºÔð½«Task°ü×°³ÉtaskRunner,²¢´ÓÏ̳߳ØÖгéȡһ¸ö¿ÕÏÐÏß³ÌÔËÐÐTask£¬ Õâ¸öÿһ¸öoarseGrainedExecutor BackendÄܲ¢ÐÐÔËÐÐTaskµÄÊýÁ¿È¡¾öÓë·ÖÅ䏸ËüµÄcpu¸öÊý

Cluter Manager£ºÖ¸µÄÊÇÔÚ¼¯ÈºÉÏ»ñÈ¡×ÊÔ´µÄÍⲿ·þÎñ¡£Ä¿Ç°ÓÐÈýÖÖÀàÐÍ

Standalon : sparkÔ­ÉúµÄ×ÊÔ´¹ÜÀí£¬ÓÉMaster¸ºÔð×ÊÔ´µÄ·ÖÅä

Apache Mesos:Óëhadoop MR¼æÈÝÐÔÁ¼ºÃµÄÒ»ÖÖ×ÊÔ´µ÷¶È¿ò¼Ü

Hadoop Yarn: Ö÷ÒªÊÇÖ¸YarnÖеÄResourceManager

Worker: ¼¯ÈºÖÐÈκοÉÒÔÔËÐÐApplication´úÂëµÄ½Úµã£¬ÔÚStandaloneģʽÖÐÖ¸µÄÊÇͨ¹ýslaveÎļþÅäÖõÄWorker½Úµã£¬ÔÚSpark on YarnģʽϾÍÊÇNoteManager½Úµã

Task: ±»Ë͵½Ä³¸öExecutorÉϵŤ×÷µ¥Ôª£¬µ«hadoopMRÖеÄMapTaskºÍReduceTask¸ÅÄîÒ»Ñù£¬ÊÇÔËÐÐApplicationµÄ»ù±¾µ¥Î»£¬¶à¸öTask×é³ÉÒ»¸öStage£¬¶øTaskµÄµ÷¶ÈºÍ¹ÜÀíµÈÊÇÓÉTaskScheduler¸ºÔð

Job: °üº¬¶à¸öTask×é³ÉµÄ²¢ÐмÆË㣬ÍùÍùÓÉSpark Action´¥·¢Éú³É£¬ Ò»¸öApplicationÖÐÍùÍù»á²úÉú¶à¸öJob

Stage: ÿ¸öJob»á±»²ð·Ö³É¶à×éTask£¬ ×÷Ϊһ¸öTaskSet£¬ ÆäÃû³ÆÎªStage£¬StageµÄ»®·ÖºÍµ÷¶ÈÊÇÓÐDAGSchedulerÀ´¸ºÔðµÄ£¬StageÓзÇ×îÖÕµÄStage£¨Shuffle Map Stage£©ºÍ×îÖÕµÄStage£¨Result Stage£©Á½ÖÖ£¬StageµÄ±ß½ç¾ÍÊÇ·¢ÉúshuffleµÄµØ·½

DAGScheduler:¸ù¾ÝJob¹¹½¨»ùÓÚStageµÄDAG£¨Directed Acyclic GraphÓÐÏòÎÞ»·Í¼)£¬²¢Ìá½»Stage¸øTASkScheduler¡£ Æä»®·ÖStageµÄÒÀ¾ÝÊÇRDDÖ®¼äµÄÒÀÀµµÄ¹ØÏµÕÒ³ö¿ªÏú×îСµÄµ÷¶È·½·¨£¬ÈçÏÂͼ

TASKSedulter: ½«TaskSETÌá½»¸øworkerÔËÐУ¬Ã¿¸öExecutorÔËÐÐʲôTask¾ÍÊÇÔÚ´Ë´¦·ÖÅäµÄ. TaskSchedulerά»¤ËùÓÐTaskSet£¬µ±ExecutorÏòDriver·¢ÉúÐÄÌøÊ±£¬TaskScheduler»á¸ù¾Ý×ÊÔ´Ê£ÓàÇé¿ö·ÖÅäÏàÓ¦µÄTask¡£ÁíÍâTaskScheduler»¹Î¬»¤×ÅËùÓÐTaskµÄÔËÐбêÇ©£¬ÖØÊÔʧ°ÜµÄTask¡£ÏÂͼչʾÁËTaskSchedulerµÄ×÷ÓÃ

ÔÚ²»Í¬ÔËÐÐģʽÖÐÈÎÎñµ÷¶ÈÆ÷¾ßÌåΪ£º

Spark on StandaloneģʽΪTaskScheduler

YARN-ClientģʽΪYarnClientClusterScheduler

YARN-ClusterģʽΪYarnClusterScheduler

½«ÕâЩÊõÓï´®ÆðÀ´µÄÔËÐвã´ÎͼÈçÏ£º

Job=¶à¸östage£¬Stage=¶à¸öͬÖÖtask, Task·ÖΪShuffleMapTaskºÍResultTask£¬Dependency·ÖΪShuffleDependencyºÍNarrowDependency

SparkÔËÐÐģʽ£º

SparkµÄÔËÐÐģʽ¶àÖÖ¶àÑù£¬Áé»î¶à±ä£¬²¿ÊðÔÚµ¥»úÉÏʱ£¬¼È¿ÉÒÔÓñ¾µØÄ£Ê½ÔËÐУ¬Ò²¿ÉÒÔÓÃα·Ö²¼Ä£Ê½ÔËÐУ¬¶øµ±ÒÔ·Ö²¼Ê½¼¯ÈºµÄ·½Ê½²¿Êðʱ£¬Ò²ÓÐÖÚ¶àµÄÔËÐÐģʽ¿É¹©Ñ¡Ôñ£¬ÕâÈ¡¾öÓÚ¼¯ÈºµÄʵ¼ÊÇé¿ö£¬µ×²ãµÄ×ÊÔ´µ÷¶È¼´¿ÉÒÔÒÀÀµÍⲿ×ÊÔ´µ÷¶È¿ò¼Ü£¬Ò²¿ÉÒÔʹÓÃSparkÄÚ½¨µÄStandaloneģʽ¡£

¶ÔÓÚÍⲿ×ÊÔ´µ÷¶È¿ò¼ÜµÄÖ§³Ö£¬Ä¿Ç°µÄʵÏÖ°üÀ¨Ïà¶ÔÎȶ¨µÄMesosģʽ£¬ÒÔ¼°hadoop YARNģʽ

±¾µØÄ£Ê½£º³£ÓÃÓÚ±¾µØ¿ª·¢²âÊÔ£¬±¾µØ»¹·Ö±ð local ºÍ local cluster

standalone: ¶ÀÁ¢¼¯ÈºÔËÐÐģʽ

StandaloneģʽʹÓÃSpark×Ô´øµÄ×ÊÔ´µ÷¶È¿ò¼Ü

²ÉÓÃMaster/SlavesµÄµäÐͼܹ¹£¬Ñ¡ÓÃZooKeeperÀ´ÊµÏÖMasterµÄHA

¿ò¼Ü½á¹¹Í¼ÈçÏÂ:

¸ÃģʽÖ÷ÒªµÄ½ÚµãÓÐClient½Úµã¡¢Master½ÚµãºÍWorker½Úµã¡£ÆäÖÐDriver¼È¿ÉÒÔÔËÐÐÔÚMaster½ÚµãÉÏÖУ¬Ò²¿ÉÒÔÔËÐÐÔÚ±¾µØClient¶Ë¡£µ±ÓÃspark-shell½»»¥Ê½¹¤¾ßÌá½»SparkµÄJobʱ£¬DriverÔÚMaster½ÚµãÉÏÔËÐУ»µ±Ê¹ÓÃspark-submit¹¤¾ßÌá½»Job»òÕßÔÚEclips¡¢IDEAµÈ¿ª·¢Æ½Ì¨ÉÏʹÓá±new SparkConf.setManager(¡°spark://master:7077¡±)¡±·½Ê½ÔËÐÐSparkÈÎÎñʱ£¬DriverÊÇÔËÐÐÔÚ±¾µØClient¶ËÉϵÄ

ÔËÐйý³ÌÈçÏÂͼ£º

1.SparkContextÁ¬½Óµ½Master£¬ÏòMaster×¢²á²¢ÉêÇë×ÊÔ´£¨CPU Core ºÍMemory£©

2.Master¸ù¾ÝSparkContextµÄ×ÊÔ´ÉêÇëÒªÇóºÍWorkerÐÄÌøÖÜÆÚÄÚ±¨¸æµÄÐÅÏ¢¾ö¶¨ÔÚÄĸöWorkerÉÏ·ÖÅä×ÊÔ´£¬È»ºóÔÚ¸ÃWorkerÉÏ»ñÈ¡×ÊÔ´£¬È»ºóÆô¶¯StandaloneExecutorBackend£»

3.StandaloneExecutorBackendÏòSparkContext×¢²á£»

4.SparkContext½«Applicaiton´úÂë·¢Ë͸øStandaloneExecutorBackend£»²¢ÇÒSparkContext½âÎöApplicaiton´úÂ룬¹¹½¨DAGͼ£¬²¢Ìá½»¸øDAG Scheduler·Ö½â³ÉStage£¨µ±Åöµ½Action²Ù×÷ʱ£¬¾Í»á´ßÉúJob£»Ã¿¸öJobÖк¬ÓÐ1¸ö»ò¶à¸öStage£¬StageÒ»°ãÔÚ»ñÈ¡ÍⲿÊý¾ÝºÍshuffle֮ǰ²úÉú£©£¬È»ºóÒÔStage£¨»òÕß³ÆÎªTaskSet£©Ìá½»¸øTask Scheduler£¬Task Scheduler¸ºÔð½«Task·ÖÅäµ½ÏàÓ¦µÄWorker£¬×îºóÌá½»¸øStandaloneExecutorBackendÖ´ÐУ»

5.StandaloneExecutorBackend»á½¨Á¢ExecutorÏ̳߳أ¬¿ªÊ¼Ö´ÐÐTask£¬²¢ÏòSparkContext±¨¸æ£¬Ö±ÖÁTaskÍê³É

6.ËùÓÐTaskÍê³Éºó£¬SparkContextÏòMaster×¢Ïú£¬ÊÍ·Å×ÊÔ´

yarn£º

Spark on YARNģʽ¸ù¾ÝDriverÔÚ¼¯ÈºÖеÄλÖ÷ÖΪÁ½ÖÖģʽ£ºÒ»ÖÖÊÇYARN-Clientģʽ£¬ÁíÒ»ÖÖÊÇYARN-Cluster£¨»ò³ÆÎªYARN-Standaloneģʽ£©

Yarn-ClientģʽÖУ¬DriverÔÚ¿Í»§¶Ë±¾µØÔËÐУ¬ÕâÖÖģʽ¿ÉÒÔʹµÃSpark ApplicationºÍ¿Í»§¶Ë½øÐн»»¥£¬ÒòΪDriverÔÚ¿Í»§¶Ë£¬ËùÒÔ¿ÉÒÔͨ¹ýwebUI·ÃÎÊDriverµÄ״̬£¬Ä¬ÈÏÊÇhttp://hadoop1:4040·ÃÎÊ£¬¶øYARNͨ¹ýhttp:// hadoop1:8088·ÃÎÊ

YARN-clientµÄ¹¤×÷Á÷³Ì²½ÖèΪ£º

Spark Yarn ClientÏòYARNµÄResourceManagerÉêÇëÆô¶¯Application Master¡£Í¬Ê±ÔÚSparkContent³õʼ»¯Öн«´´½¨DAGSchedulerºÍTASKSchedulerµÈ£¬ÓÉÓÚÎÒÃÇÑ¡ÔñµÄÊÇYarn-Clientģʽ£¬³ÌÐò»áÑ¡ÔñYarnClientClusterSchedulerºÍYarnClientSchedulerBackend

ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÓëYARN-ClusterÇø±ðµÄÊÇÔÚ¸ÃApplicationMaster²»ÔËÐÐSparkContext£¬Ö»ÓëSparkContext½øÐÐÁªÏµ½øÐÐ×ÊÔ´µÄ·ÖÅÉ

ClientÖеÄSparkContext³õʼ»¯Íê±Ïºó£¬ÓëApplicationMaster½¨Á¢Í¨Ñ¶£¬ÏòResourceManager×¢²á£¬¸ù¾ÝÈÎÎñÐÅÏ¢ÏòResourceManagerÉêÇë×ÊÔ´£¨Container£©

Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòClientÖеÄSparkContext×¢²á²¢ÉêÇëTask

clientÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòDriver»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃClientËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ

Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ClientµÄSparkContextÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º

Spark Clusterģʽ:

ÔÚYARN-ClusterģʽÖУ¬µ±Óû§ÏòYARNÖÐÌá½»Ò»¸öÓ¦ÓóÌÐòºó£¬YARN½«·ÖÁ½¸ö½×¶ÎÔËÐиÃÓ¦ÓóÌÐò£º

µÚÒ»¸ö½×¶ÎÊǰÑSparkµÄDriver×÷Ϊһ¸öApplicationMasterÔÚYARN¼¯ÈºÖÐÏÈÆô¶¯£»

µÚ¶þ¸ö½×¶ÎÊÇÓÉApplicationMaster´´½¨Ó¦ÓóÌÐò£¬È»ºóΪËüÏòResourceManagerÉêÇë×ÊÔ´£¬²¢Æô¶¯ExecutorÀ´ÔËÐÐTask£¬Í¬Ê±¼à¿ØËüµÄÕû¸öÔËÐйý³Ì£¬Ö±µ½ÔËÐÐÍê³É

YARN-clusterµÄ¹¤×÷Á÷³Ì·ÖΪÒÔϼ¸¸ö²½Öè

Spark Yarn ClientÏòYARNÖÐÌá½»Ó¦ÓóÌÐò£¬°üÀ¨ApplicationMaster³ÌÐò¡¢Æô¶¯ApplicationMasterµÄÃüÁî¡¢ÐèÒªÔÚExecutorÖÐÔËÐеijÌÐòµÈ

ResourceManagerÊÕµ½ÇëÇóºó£¬ÔÚ¼¯ÈºÖÐÑ¡ÔñÒ»¸öNodeManager£¬Îª¸ÃÓ¦ÓóÌÐò·ÖÅäµÚÒ»¸öContainer£¬ÒªÇóËüÔÚÕâ¸öContainerÖÐÆô¶¯Ó¦ÓóÌÐòµÄApplicationMaster£¬ÆäÖÐApplicationMaster½øÐÐSparkContextµÈµÄ³õʼ»¯

ApplicationMasterÏòResourceManager×¢²á£¬ÕâÑùÓû§¿ÉÒÔÖ±½Óͨ¹ýResourceManage²é¿´Ó¦ÓóÌÐòµÄÔËÐÐ״̬£¬È»ºóËü½«²ÉÓÃÂÖѯµÄ·½Ê½Í¨¹ýRPCЭÒéΪ¸÷¸öÈÎÎñÉêÇë×ÊÔ´£¬²¢¼à¿ØËüÃǵÄÔËÐÐ״ֱ̬µ½ÔËÐнáÊø

Ò»µ©ApplicationMasterÉêÇëµ½×ÊÔ´£¨Ò²¾ÍÊÇContainer£©ºó£¬±ãÓë¶ÔÓ¦µÄNodeManagerͨÐÅ£¬ÒªÇóËüÔÚ»ñµÃµÄContainerÖÐÆô¶¯CoarseGrainedExecutorBackend£¬CoarseGrainedExecutorBackendÆô¶¯ºó»áÏòApplicationMasterÖеÄSparkContext×¢²á²¢ÉêÇëTask¡£ÕâÒ»µãºÍStandaloneģʽһÑù£¬Ö»²»¹ýSparkContextÔÚSpark ApplicationÖгõʼ»¯Ê±£¬Ê¹ÓÃCoarseGrainedSchedulerBackendÅäºÏYarnClusterScheduler½øÐÐÈÎÎñµÄµ÷¶È£¬ÆäÖÐYarnClusterSchedulerÖ»ÊǶÔTaskSchedulerImplµÄÒ»¸ö¼òµ¥°ü×°£¬Ôö¼ÓÁ˶ÔExecutorµÄµÈ´ýÂß¼­µÈ

ApplicationMasterÖеÄSparkContext·ÖÅäTask¸øCoarseGrainedExecutorBackendÖ´ÐУ¬CoarseGrainedExecutorBackendÔËÐÐTask²¢ÏòApplicationMaster»ã±¨ÔËÐеÄ״̬ºÍ½ø¶È£¬ÒÔÈÃApplicationMasterËæÊ±ÕÆÎÕ¸÷¸öÈÎÎñµÄÔËÐÐ״̬£¬´Ó¶ø¿ÉÒÔÔÚÈÎÎñʧ°ÜÊ±ÖØÐÂÆô¶¯ÈÎÎñ

Ó¦ÓóÌÐòÔËÐÐÍê³Éºó£¬ApplicationMasterÏòResourceManagerÉêÇë×¢Ïú²¢¹Ø±Õ×Ô¼º

Spark Client ºÍ Spark ClusterµÄÇø±ð:

Àí½âYARN-ClientºÍYARN-ClusterÉî²ã´ÎµÄÇø±ð֮ǰÏÈÇå³þÒ»¸ö¸ÅÄApplication Master¡£ÔÚYARNÖУ¬Ã¿¸öApplicationʵÀý¶¼ÓÐÒ»¸öApplicationMaster½ø³Ì£¬ËüÊÇApplicationÆô¶¯µÄµÚÒ»¸öÈÝÆ÷¡£Ëü¸ºÔðºÍResourceManager´ò½»µÀ²¢ÇëÇó×ÊÔ´£¬»ñÈ¡×ÊÔ´Ö®ºó¸æËßNodeManagerΪÆäÆô¶¯Container¡£´ÓÉî²ã´ÎµÄº¬Òå½²YARN-ClusterºÍYARN-ClientģʽµÄÇø±ðÆäʵ¾ÍÊÇApplicationMaster½ø³ÌµÄÇø±ð

YARN-ClusterģʽÏ£¬DriverÔËÐÐÔÚAM(Application Master)ÖУ¬Ëü¸ºÔðÏòYARNÉêÇë×ÊÔ´£¬²¢¼à¶½×÷ÒµµÄÔËÐÐ×´¿ö¡£µ±Óû§Ìá½»ÁË×÷ÒµÖ®ºó£¬¾Í¿ÉÒԹصôClient£¬×÷Òµ»á¼ÌÐøÔÚYARNÉÏÔËÐУ¬Òò¶øYARN-Clusterģʽ²»ÊʺÏÔËÐн»»¥ÀàÐ͵Ä×÷Òµ

YARN-ClientģʽÏ£¬Application Master½ö½öÏòYARNÇëÇóExecutor£¬Client»áºÍÇëÇóµÄContainerͨÐÅÀ´µ÷¶ÈËûÃǹ¤×÷£¬Ò²¾ÍÊÇ˵Client²»ÄÜÀ뿪

˼¿¼£º ÎÒÃÇÔÚʹÓÃSparkÌá½»jobʱʹÓõÄÄÄÖÖģʽ£¿

RDDÔËÐÐÁ÷³Ì£º

RDDÔÚSparkÖÐÔËÐдó¸Å·ÖΪÒÔÏÂÈý²½£º

´´½¨RDD¶ÔÏó

DAGSchedulerÄ£¿é½éÈëÔËË㣬¼ÆËãRDDÖ®¼äµÄÒÀÀµ¹ØÏµ£¬RDDÖ®¼äµÄÒÀÀµ¹ØÏµ¾ÍÐγÉÁËDAG

ÿһ¸öJob±»·ÖΪ¶à¸öStage¡£»®·ÖStageµÄÒ»¸öÖ÷ÒªÒÀ¾ÝÊǵ±Ç°¼ÆËãÒò×ÓµÄÊäÈëÊÇ·ñÊÇÈ·¶¨µÄ£¬Èç¹ûÊÇÔò½«Æä·ÖÔÚͬһ¸öStage£¬±ÜÃâ¶à¸öStageÖ®¼äµÄÏûÏ¢´«µÝ¿ªÏú

ʾÀýͼÈçÏ£º

ÒÔÏÂÃæÒ»¸ö°´ A-Z Ê××Öĸ·ÖÀ࣬²éÕÒÏàͬÊ××Öĸϲ»Í¬ÐÕÃû×ܸöÊýµÄÀý×ÓÀ´¿´Ò»Ï RDD ÊÇÈçºÎÔËÐÐÆðÀ´µÄ

´´½¨ RDD ÉÏÃæµÄÀý×Ó³ýÈ¥×îºóÒ»¸ö collect ÊǸö¶¯×÷£¬²»»á´´½¨ RDD Ö®Íâ£¬Ç°ÃæËĸöת»»¶¼»á´´½¨³öÐ嵀 RDD ¡£Òò´ËµÚÒ»²½¾ÍÊÇ´´½¨ºÃËùÓÐ RDD( ÄÚ²¿µÄÎåÏîÐÅÏ¢ )£¿

´´½¨Ö´Ðмƻ® Spark »á¾¡¿ÉÄܵعܵÀ»¯£¬²¢»ùÓÚÊÇ·ñÒªÖØÐÂ×éÖ¯Êý¾ÝÀ´»®·Ö ½×¶Î (stage) £¬ÀýÈç±¾ÀýÖÐµÄ groupBy() ת»»¾Í»á½«Õû¸öÖ´Ðмƻ®»®·Ö³ÉÁ½½×¶ÎÖ´ÐС£×îÖÕ»á²úÉúÒ»¸ö DAG(directed acyclic graph £¬ÓÐÏòÎÞ»·Í¼ ) ×÷ΪÂß¼­Ö´Ðмƻ®

µ÷¶ÈÈÎÎñ ½«¸÷½×¶Î»®·Ö³É²»Í¬µÄ ÈÎÎñ (task) £¬Ã¿¸öÈÎÎñ¶¼ÊÇÊý¾ÝºÍ¼ÆËãµÄºÏÌå¡£ÔÚ½øÐÐÏÂÒ»½×¶Îǰ£¬µ±Ç°½×¶ÎµÄËùÓÐÈÎÎñ¶¼ÒªÖ´ÐÐÍê³É¡£ÒòΪÏÂÒ»½×¶ÎµÄµÚÒ»¸öת»»Ò»¶¨ÊÇÖØÐÂ×éÖ¯Êý¾ÝµÄ£¬ËùÒÔ±ØÐëµÈµ±Ç°½×¶ÎËùÓнá¹ûÊý¾Ý¶¼¼ÆËã³öÀ´Á˲ÅÄܼÌÐø¡£

   
5544 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ