Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark¼ÆËã¹ý³Ì·ÖÎö
 
×÷ÕߣºHuFeiHu-Blog À´Ô´£ºblog.csdn.net ·¢²¼ÓÚ£º 2016-12-27
  2485  次浏览      27
 

»ù±¾¸ÅÄî

SparkÊÇÒ»¸ö·Ö²¼Ê½µÄÄÚ´æ¼ÆËã¿ò¼Ü£¬ÆäÌØµãÊÇÄÜ´¦Àí´ó¹æÄ£Êý¾Ý£¬¼ÆËãËٶȿ졣SparkÑÓÐøÁËHadoopµÄMapReduce¼ÆËãÄ£ÐÍ£¬Ïà±ÈÖ®ÏÂSparkµÄ¼ÆËã¹ý³Ì±£³ÖÔÚÄÚ´æÖУ¬¼õÉÙÁËÓ²Å̶Áд£¬Äܹ»½«¶à¸ö²Ù×÷½øÐкϲ¢ºó¼ÆË㣬Òò´ËÌáÉýÁ˼ÆËãËÙ¶È¡£Í¬Ê±SparkÒ²ÌṩÁ˸ü·á¸»µÄ¼ÆËãAPI¡£

MapReduceÊÇHadoopºÍSparkµÄ¼ÆËãÄ£ÐÍ£¬ÆäÌØµãÊÇMapºÍReduce¹ý³Ì¸ß¶È¿É²¢Ðл¯£»¹ý³Ì¼äñîºÏ¶ÈµÍ£¬µ¥¸ö¹ý³ÌµÄʧ°Üºó¿ÉÒÔÖØÐ¼ÆË㣬¶ø²»»áµ¼ÖÂÕûÌåʧ°Ü£»×îÖØÒªµÄÊÇÊý¾Ý´¦ÀíÖеļÆËãÂß¼­¿ÉÒԺܺõÄת»»ÎªMapºÍReduce²Ù×÷¡£¶ÔÓÚÒ»¸öÊý¾Ý¼¯À´Ëµ£¬Map¶ÔÿÌõÊý¾Ý×öÏàͬµÄת»»²Ù×÷£¬Reduce¿ÉÒÔ°´Ìõ¼þ¶ÔÊý¾Ý·Ö×飬ȻºóÔÚ·Ö×éÉÏ×ö²Ù×÷¡£³ýÁËMapºÍReduce²Ù×÷Ö®Í⣬Spark»¹ÑÓÉì³öÁËÈçfilter£¬flatMap£¬count£¬distinctµÈ¸ü·á¸»µÄ²Ù×÷¡£

RDDµÄÊÇSparkÖÐ×îÖ÷ÒªµÄÊý¾Ý½á¹¹£¬¿ÉÒÔÖ±¹ÛµÄÈÏΪRDD¾ÍÊÇÒª´¦ÀíµÄÊý¾Ý¼¯¡£RDDÊÇ·Ö²¼Ê½µÄÊý¾Ý¼¯£¬Ã¿¸öRDD¶¼Ö§³ÖMapReduceÀà²Ù×÷£¬¾­¹ýMapReduce²Ù×÷ºó»á²úÉúеÄRDD£¬¶ø²»»áÐÞ¸ÄÔ­ÓÐRDD¡£RDDµÄÊý¾Ý¼¯ÊÇ·ÖÇøµÄ£¬Òò´Ë¿ÉÒÔ°Ñÿ¸öÊý¾Ý·ÖÇø·Åµ½²»Í¬µÄ·ÖÇøÉϽøÐмÆË㣬¶øÊµ¼ÊÉÏ´ó¶àÊýMapReduce²Ù×÷¶¼ÊÇÔÚ·ÖÇøÉϽøÐмÆËãµÄ¡£Spark²»»á°Ñÿһ¸öMapReduce²Ù×÷¶¼·¢ÆðÔËË㣬¶øÊǾ¡Á¿µÄ°Ñ²Ù×÷ÀÛ¼ÆÆðÀ´Ò»Æð¼ÆËã¡£Spark°Ñ²Ù×÷»®·ÖΪת»»£¨transformation£©ºÍ¶¯×÷£¨action£©£¬¶ÔRDD½øÐеÄת»»²Ù×÷»áµþ¼ÓÆðÀ´£¬Ö±µ½¶ÔRDD½øÐж¯×÷²Ù×÷ʱ²Å»á·¢Æð¼ÆËã¡£ÕâÖÖÌØÐÔҲʹSpark¿ÉÒÔ¼õÉÙÖмä½á¹ûµÄÍÌÍ£¬¿ÉÒÔ¿ìËٵĽøÐжà´Îµü´ú¼ÆËã¡£

ϵͳ½á¹¹

Spark×ÔÉíÖ»¶Ô¼ÆË㸺Ôð£¬Æä¼ÆËã×ÊÔ´µÄ¹ÜÀíºÍµ÷¶ÈÓɵÚÈý·½¿ò¼ÜÀ´ÊµÏÖ¡£³£ÓõĿò¼ÜÓÐYARNºÍMesos¡£±¾ÎÄÒÔYARNΪÀý½øÐнéÉÜ¡£ÏÈ¿´Ò»ÏÂSpark on YARNµÄϵͳ½á¹¹Í¼£º

Spark on YARNϵͳ½á¹¹Í¼

ͼÖй²·ÖΪÈý´ó²¿·Ö£ºSpark Driver£¬ Worker£¬ Cluster manager¡£ÆäÖÐDriver program¸ºÔð½«RDDת»»ÎªÈÎÎñ£¬²¢½øÐÐÈÎÎñµ÷¶È¡£Worker¸ºÔðÈÎÎñµÄÖ´ÐС£YARN¸ºÔð¼ÆËã×ÊÔ´µÄά»¤ºÍ·ÖÅä¡£

Driver¿ÉÒÔÔËÐÐÔÚÓû§³ÌÐòÖУ¬»òÕßÔËÐÐÔÚÆäÖÐÒ»¸öWorkerÉÏ¡£SparkÖеÄÿһ¸öÓ¦Óã¨Application£©¶ÔÓ¦×ÅÒ»¸öDriver¡£Õâ¸öDriver¿ÉÒÔ½ÓÊÕRDDÉϵļÆËãÇëÇó£¬Ã¿¸ö¶¯×÷£¨Action£©ÀàÐ͵IJÙ×÷½«±»×÷Ϊһ¸öJob½øÐмÆËã¡£Spark»á¸ù¾ÝRDDµÄÒÀÀµ¹ØÏµ¹¹½¨¼ÆËã½×¶Î£¨Stage£©µÄÓÐÏòÎÞ»·Í¼£¬Ã¿¸ö½×¶ÎÓÐÓë·ÖÇøÊýÏàͬµÄÈÎÎñ£¨Task£©¡£ÕâЩÈÎÎñ½«ÔÚÿ¸ö·ÖÇø£¨Partition£©ÉϽøÐмÆË㣬ÈÎÎñ»®·ÖÍê³ÉºóDriver½«ÈÎÎñÌá½»µ½ÔËÐÐÓÚWorkerÉϵÄExecutorÖнøÐмÆË㣬²¢¶ÔÈÎÎñµÄ³É¹¦¡¢Ê§°Ü½øÐмǼºÍÖØÆôµÈ´¦Àí¡£

WorkerÒ»°ã¶ÔӦһ̨ÎïÀí»ú£¬Ã¿¸öWorkerÉÏ¿ÉÒÔÔËÐжà¸öExecutor£¬Ã¿¸öExecutor¶¼ÊǶÀÁ¢µÄJVM½ø³Ì£¬DriverÌá½»µÄÈÎÎñ¾ÍÊÇÒÔÏ̵߳ÄÐÎʽÔËÐÐÔÚExecutorÖеġ£Èç¹ûʹÓÃYARN×÷Ϊ×ÊÔ´µ÷¶È¿ò¼ÜµÄ»°£¬ÆäÖÐÒ»¸öWorkerÉÏ»¹»áÓÐExecutor launcher×÷ΪYARNµÄApplicationMaster£¬ÓÃÓÚÏòYARNÉêÇë¼ÆËã×ÊÔ´£¬²¢Æô¶¯¡¢¼à²â¡¢ÖØÆôExecutor¡£

¼ÆËã¹ý³Ì

ÕâÀïÎÒÃÇ´ÓRDDµ½Êä³ö½á¹ûµÄÕû¸ö¼ÆËã¹ý³ÌΪÖ÷Ïߣ¬Ì½¾¿SparkµÄ¼ÆËã¹ý³Ì¡£Õâ¸ö¼ÆËã¹ý³Ì¿ÉÒÔ·ÖΪ£º

RDD¹¹½¨£º¹¹½¨RDDÖ®¼äµÄÒÀÀµ¹ØÏµ£¬½«RDDת»»Îª½×¶ÎµÄÓÐÏòÎÞ»·Í¼¡£

ÈÎÎñµ÷¶È£º¸ù¾Ý¿ÕÏмÆËã×ÊÔ´Çé¿ö½øÐÐÈÎÎñÌá½»£¬²¢¶ÔÈÎÎñµÄÔËÐÐ״̬½øÐмà²âºÍ´¦Àí¡£

ÈÎÎñ¼ÆË㣺´î½¨ÈÎÎñÔËÐл·¾³£¬Ö´ÐÐÈÎÎñ²¢·µ»ØÈÎÎñ½á¹û¡£

Shuffle¹ý³Ì£ºÁ½¸ö½×¶ÎÖ®¼äÓпíÒÀÀµÊ±£¬ÐèÒª½øÐÐShuffle²Ù×÷¡£

¼ÆËã½á¹ûÊÕ¼¯£º´Óÿ¸öÈÎÎñÊÕ¼¯²¢»ã×ܽá¹û¡£

ÔÚÕâÀïÎÒÃÇÓÃÒ»¸ö¼ò½àµÄCharCount³ÌÐòΪÀý£¬Õâ¸ö³ÌÐò°Ñº¬ÓÐa-z×Ö·ûµÄÁбíת»¯ÎªRDD£¬¶Ô´ËRDD½øÐÐÁËMapºÍReduce²Ù×÷¼ÆËãÿ¸ö×ÖĸµÄƵÊý£¬×îºó½«½á¹ûÊÕ¼¯¡£Æä´úÂëÈçÏ£º

CharCountÀý×Ó³ÌÐò

RDD¹¹½¨ºÍת»»

RDD°´ÕÕÆä×÷ÓÿÉÒÔ·ÖΪÁ½ÖÖÀàÐÍ£¬Ò»ÖÖÊǶÔÊý¾ÝÔ´µÄ·â×°£¬¿ÉÒÔ°ÑÊý¾ÝԴת»»ÎªRDD£¬ÕâÖÖÀàÐ͵ÄRDD°üÀ¨NewHadoopRDD£¬ParallelCollectionRDD£¬JdbcRDDµÈ¡£ÁíÒ»ÖÖÊǶÔRDDµÄת»»£¬´Ó¶øÊµÏÖÒ»ÖÖ¼ÆËã·½·¨£¬ÕâÖÖÀàÐ͵ÄRDD°üÀ¨MappedRDD£¬ShuffledRDD£¬FilteredRDDµÈ¡£Êý¾ÝÔ´ÀàÐ͵ÄRDD²»ÒÀÀµÓÚÆäËûRDD£¬¼ÆËãÀàµÄRDDÓµÓÐ×Ô¼ºµÄRDDÒÀÀµ¡£

RDDÓÐÈý¸öÒªËØ£º·ÖÇø£¬ÒÀÀµ¹ØÏµ£¬¼ÆËãÂß¼­¡£·ÖÇøÊDZ£Ö¤RDD·Ö²¼Ê½µÄÌØÐÔ£¬·ÖÇø¿ÉÒÔ¶ÔRDDµÄÊý¾Ý½øÐл®·Ö£¬»®·ÖºóµÄ·ÖÇø¿ÉÒÔ·Ö²¼µ½²»Í¬µÄExecutorÖУ¬´ó²¿·Ö¶ÔRDDµÄ¼ÆËã¶¼ÊÇÔÚ·ÖÇøÉϽøÐеġ£ÒÀÀµ¹ØÏµÎ¬»¤×ÅRDDµÄ¼ÆËã¹ý³Ì£¬Ã¿¸ö¼ÆËãÀàÐ͵ÄRDDÔÚ¼ÆËãʱ£¬»á½«ËùÒÀÀµµÄRDD×÷ΪÊý¾ÝÔ´½øÐмÆËã¡£¸ù¾ÝÒ»¸ö·ÖÇøµÄÊä³öÊÇ·ñ±»¶à·ÖÇøÊ¹Óã¬Spark»¹½«ÒÀÀµ·ÖΪխÒÀÀµºÍ¿íÒÀÀµ¡£RDDµÄ¼ÆËãÂß¼­ÊÇÆä¹¦ÄܵÄÌåÏÖ£¬Æä¼ÆËã¹ý³ÌÊÇÒÔËùÒÀÀµµÄRDDΪÊý¾ÝÔ´½øÐеġ£

Àý×ÓÖй²²úÉúÁËÈý¸öRDD£¬³ýÁ˵ÚÒ»¸öRDDÖ®Í⣬ÿ¸öRDDÓëÉϼ¶RDDÓÐÒÀÀµ¹ØÏµ¡£

1.spark.parallelize(data, partitionSize)·½·¨½«²úÉúÒ»¸öÊý¾ÝÔ´Ð͵ÄParallelCollectionRDD£¬Õâ¸öRDDµÄ·ÖÇøÊǶÔÁбíÊý¾ÝµÄÇз֣¬Ã»ÓÐÉϼ¶ÒÀÀµ£¬¼ÆËãÂß¼­ÊÇÖ±½Ó·µ»Ø·ÖÇøÊý¾Ý¡£

2.mapº¯Êý½«»á´´½¨Ò»¸öMappedRDD£¬Æä·ÖÇøÓëÉϼ¶ÒÀÀµÏàͬ£¬»áÓÐÒ»¸öÒÀÀµÓÚParallelCollectionRDDµÄÕ­ÒÀÀµ£¬¼ÆËãÂß¼­ÊǶÔParallelCollectionRDDµÄÊý¾Ý×ömap²Ù×÷¡£

3.reduceByKeyº¯Êý½«»á²úÉúÒ»¸öShuffledRDD£¬·ÖÇøÊýÁ¿ÓëÉÏÃæµÄMappedRDDÏàͬ£¬»áÓÐÒ»¸öÒÀÀµÓÚMappedRDDµÄ¿íÒÀÀµ£¬¼ÆËãÂß¼­ÊÇShuffleºóÔÚ·ÖÇøÉϵľۺϲÙ×÷¡£

RDDµÄÒÀÀµ¹ØÏµ

SparkÔÚÓöµ½¶¯×÷Àà²Ù×÷ʱ£¬¾Í»á·¢Æð¼ÆËãJob£¬°ÑRDDת»»ÎªÈÎÎñ£¬²¢·¢ËÍÈÎÎñµ½ExecutorÉÏÖ´ÐС£´ÓRDDµ½ÈÎÎñµÄת»»¹ý³ÌÊÇÔÚDAGSchedulerÖнøÐеġ£Æä×ÜÌå˼·ÊǸù¾ÝRDDµÄÒÀÀµ¹ØÏµ£¬°ÑÕ­ÒÀÀµºÏ²¢µ½Ò»¸ö½×¶ÎÖУ¬Óöµ½¿íÒÀÀµÔò»®·Ö³öеĽ׶Σ¬×îÖÕÐγÉÒ»¸ö½×¶ÎµÄÓÐÏòÎÞ»·Í¼£¬²¢¸ù¾ÝͼµÄÒÀÀµ¹ØÏµÏȺóÌá½»½×¶Î¡£Ã¿¸ö½×¶Î°´ÕÕ·ÖÇøÊýÁ¿»®·ÖΪ¶à¸öÈÎÎñ£¬×îÖÕÈÎÎñ±»ÐòÁл¯²¢Ìá½»µ½ExecutorÉÏÖ´ÐС£

RDDµ½TaskµÄ¹¹½¨¹ý³Ì

µ±RDDµÄ¶¯×÷Àà²Ù×÷±»µ÷ÓÃʱ£¬RDD½«µ÷ÓÃSparkContext¿ªÊ¼Ìá½»Job£¬SparkContext½«µ÷ÓÃDAGScheduler°ÑRDDת»¯Îª½×¶ÎµÄÓÐÏòÎÞ»·Í¼£¬È»ºóÊ×ÏȽ«ÓÐÏòÎÞ»·Í¼ÖÐûÓÐδÍê³ÉµÄÒÀÀµµÄ½×¶Î½øÐÐÌá½»¡£Ôڽ׶α»Ìύʱ£¬Ã¿¸ö½×¶Î½«²úÉúÓë·ÖÇøÊýÁ¿ÏàͬµÄÈÎÎñ£¬ÕâЩÈÎÎñ³ÆÖ®ÎªÒ»¸öTaskSet¡£ÈÎÎñµÄÀàÐÍ·ÖΪ ShuffleMapTaskºÍResultTask£¬Èç¹û½×¶ÎµÄÊä³ö½«ÓÃÓÚϸö½×¶ÎµÄÊäÈ룬Ҳ¾ÍÊÇÐèÒª½øÐÐShuffle²Ù×÷£¬ÔòÈÎÎñÀàÐÍΪShuffleMapTask¡£Èç¹û½×¶ÎµÄÊäÈ뼴ΪJob½á¹û£¬ÔòÈÎÎñÀàÐÍΪResultTask¡£ÈÎÎñ´´½¨Íê³Éºó»á½»¸øTaskSchedulerImpl½øÐÐTaskSet¼¶±ðµÄµ÷¶ÈÖ´ÐС£

ÈÎÎñµ÷¶È

ÔÚÈÎÎñµ÷¶ÈµÄ·Ö¹¤ÉÏ£¬DAGScheduler¸ºÔð×ÜÌåµÄÈÎÎñµ÷¶È£¬SchedulerBackend¸ºÔðÓëExecutorsͨÐÅ£¬Î¬»¤¼ÆËã×ÊÔ´ÐÅÏ¢£¬²¢¸ºÔð½«ÈÎÎñÐòÁл¯²¢Ìá½»µ½Executor¡£TaskSetManager¸ºÔð¶ÔÒ»¸ö½×¶ÎµÄÈÎÎñ½øÐйÜÀí£¬ÆäÖлá¸ù¾ÝÈÎÎñµÄÊý¾Ý±¾µØÐÔÑ¡ÔñÓÅÏÈÌá½»µÄÈÎÎñ¡£TaskSchedulerImpl¸ºÔð¶ÔTaskSet½øÐе÷¶È£¬Í¨¹ýµ÷¶È²ßÂÔÈ·¶¨TaskSetÓÅÏȼ¶¡£Í¬Ê±ÊÇÒ»¸öÖнéÕߣ¬Æä½«DAGScheduler£¬SchedulerBackendºÍTaskSetManagerÁª½áÆðÀ´£¬¶ÔExecutorºÍTaskµÄÏà¹ØÊ¼þ½øÐÐת·¢¡£

ÔÚÈÎÎñÌá½»Á÷³ÌÉÏ£¬DAGSchedulerÌá½»TaskSetµ½TaskSchedulerImpl£¬Ê¹TaskSetÔÚ´Ë×¢²á¡£TaskSchedulerImpl֪ͨSchedulerBackendÓÐеÄÈÎÎñ½øÈ룬SchedulerBackendµ÷ÓÃmakeOffers¸ù¾Ý×¢²áµ½×Ô¼ºµÄExecutorsÐÅÏ¢£¬È·¶¨ÊÇ·ñÓмÆËã×ÊÔ´Ö´ÐÐÈÎÎñ£¬ÈçÓÐ×ÊÔ´Ôò֪ͨTaskSchedulerImplÈ¥·ÖÅäÕâЩ×ÊÔ´¡£ TaskSchedulerImpl¸ù¾ÝTaskSetµ÷¶È²ßÂÔÓÅÏÈ·ÖÅäTaskSet½ÓÊÕ´Ë×ÊÔ´¡£TaskSetManagerÔÙ¸ù¾ÝÈÎÎñµÄÊý¾Ý±¾µØÐÔ£¬È·¶¨Ìá½»ÄÄЩÈÎÎñ¡£×îÖÕÈÎÎñµÄ±Õ°ü±»SchedulerBackendÐòÁл¯£¬²¢´«Ê䏸Executor½øÐÐÖ´ÐС£

SparkµÄÈÎÎñµ÷¶È

¸ù¾ÝÒÔÉϹý³Ì£¬SparkÖеÄÈÎÎñµ÷¶Èʵ¼ÊÉÏ·ÖÁËÈý¸ö²ã´Î¡£µÚÒ»²ã´ÎÊÇ»ùÓڽ׶εÄÓÐÏòÎÞ»·Í¼½øÐÐStageµÄµ÷¶È£¬µÚ¶þ²ã´ÎÊǸù¾Ýµ÷¶È²ßÂÔ£¨FIFO£¬FAIR£©½øÐÐTaskSetµ÷¶È£¬µÚÈý²ã´ÎÊǸù¾ÝÊý¾Ý±¾µØÐÔ£¨Process£¬Node£¬Rack£©ÔÚTaskSetÄÚ½øÐе÷¶È¡£

ÈÎÎñ¼ÆËã

ÈÎÎñµÄ¼ÆËã¹ý³ÌÊÇÔÚExecutorÉÏÍê³ÉµÄ£¬Executor¼àÌýÀ´×ÔSchedulerBackendµÄÖ¸Á½ÓÊÕµ½ÈÎÎñʱ»áÆô¶¯TaskRunnerÏ߳̽øÐÐÈÎÎñÖ´ÐС£ÔÚTaskRunnerÖÐÊ×ÏȽ«ÈÎÎñºÍÏà¹ØÐÅÏ¢·´ÐòÁл¯£¬È»ºó¸ù¾ÝÏà¹ØÐÅÏ¢»ñÈ¡ÈÎÎñËùÒÀÀµµÄJar°üºÍËùÐèÎļþ£¬Íê³É×¼±¸¹¤×÷ºóÖ´ÐÐÈÎÎñµÄrun·½·¨£¬Êµ¼ÊÉϾÍÊÇÖ´ÐÐShuffleMapTask»òResultTaskµÄrun·½·¨¡£ÈÎÎñÖ´ÐÐÍê±Ïºó½«½á¹û·¢Ë͸øDriver½øÐд¦Àí¡£

ÔÚTask.run·½·¨ÖпÉÒÔ¿´µ½ShuffleMapTaskºÍResultTaskÓÐ×Ų»Í¬µÄ¼ÆËãÂß¼­¡£ShuffleMapTaskÊǽ«ËùÒÀÀµRDDµÄÊä³öдÈëµ½ShuffleWriterÖУ¬ÎªºóÃæµÄShuffle¹ý³Ì×ö×¼±¸¡£ResultTaskÊÇÔÚËùÒÀÀµRDDÉÏÓ¦ÓÃÒ»¸öº¯Êý£¬²¢·µ»Øº¯ÊýµÄ¼ÆËã½á¹û¡£ÔÚÕâÁ½¸öTaskÖÐÖ»ÄÜ¿´µ½Êý¾ÝµÄÊä³ö·½Ê½£¬¶ø¿´²»µ½Ó¦ÓеļÆËãÂß¼­¡£Êµ¼ÊÉϼÆËã¹ý³ÌÊǰüº¬ÔÚRDDÖе쬵÷ÓÃRDD. Iterator·½·¨»ñÈ¡RDDµÄÊý¾Ý½«´¥·¢Õâ¸öRDDµÄ¼ÆË㶯×÷£¨RDD. Iterator£©£¬ÓÉÓÚ´ËRDDµÄ¼ÆËã¹ý³ÌÖÐÒ²»áʹÓÃËùÒÀÀµRDDµÄÊý¾Ý¡£´Ó¶øRDDµÄ¼ÆËã¹ý³Ì½«µÝ¹éÏòÉÏÖ±µ½Ò»¸öÊý¾ÝÔ´ÀàÐ͵ÄRDD£¬ÔٵݹéÏòϼÆËãÿ¸öRDDµÄÖµ¡£ÐèҪעÒâµÄÊÇ£¬ÒÔÉϵļÆËã¹ý³Ì¶¼ÊÇÔÚ·ÖÇøÉϽøÐе쬶ø²»ÊÇÕû¸öÊý¾Ý¼¯£¬¼ÆËãÍê³ÉµÃµ½µÄÊÇ´Ë·ÖÇøÉϵĽá¹û£¬¶ø²»ÊÇ×îÖÕ½á¹û¡£

´ÓRDDµÄ¼ÆËã¹ý³Ì¿ÉÒÔ¿´³ö£¬RDDµÄ¼ÆËã¹ý³ÌÊǰüº¬ÔÚRDDµÄÒÀÀµ¹ØÏµÖеģ¬Ö»ÒªRDDÖ®¼äÊÇÁ¬ÐøÕ­ÒÀÀµ£¬ÄÇô¶à¸ö¼ÆËã¹ý³Ì¾Í¿ÉÒÔÔÚͬһ¸öTaskÖнøÐмÆË㣬Öмä½á¹û¿ÉÒÔÁ¢¼´±»Ï¸ö²Ù×÷ʹÓ㬶øÎÞÐèÔÚ½ø³Ì¼ä¡¢½Úµã¼ä¡¢´ÅÅÌÉϽøÐн»»»¡£

RDD¼ÆËã¹ý³Ì

Shuffle¹ý³Ì

ShuffleÊÇÒ»¸ö¶ÔÊý¾Ý½øÐзÖ×é¾ÛºÏµÄ²Ù×÷¹ý³Ì£¬Ô­Êý¾Ý½«°´ÕÕ¹æÔò½øÐзÖ×飬ȻºóʹÓÃÒ»¸ö¾ÛºÏº¯ÊýÓ¦ÓÃÓÚ·Ö×éÉÏ£¬´Ó¶ø²úÉúÐÂÊý¾Ý¡£Shuffle²Ù×÷µÄÄ¿µÄÊǰÑͬ×éÊý¾Ý·ÖÅäµ½Ïàͬ·ÖÇøÉÏ£¬´Ó¶øÄܹ»ÔÚ·ÖÇøÉϽøÐоۺϼÆË㡣ΪÁËÌá¸ßShuffleÐÔÄÜ£¬»¹¿ÉÒÔÏÈÔÚÔ­·ÖÇø¶ÔÊý¾Ý½øÐоۺϣ¨mapSideCombine£©£¬È»ºóÔÙ·ÖÅ䲿·Ö¾ÛºÏµÄÊý¾Ýµ½Ð·ÖÇø£¬µÚÈý²½ÔÚзÖÇøÉÏÔٴνøÐоۺϡ£

ÔÚ»®·Ö½×¶Îʱ£¬Ö»ÓÐÓöµ½¿íÒÀÀµ²Å»á²úÉúн׶Σ¬²ÅÐèÒªShuffle²Ù×÷¡£¿íÒÀÀµÓëÕ­ÒÀÀµÈ¡¾öÓÚÔ­·ÖÇø±»Ð·ÖÇøµÄʹÓùØÏµ£¬Ö»ÒªÒ»¸öÔ­·ÖÇø»á±»¶à¸öзÖÇøÊ¹Óã¬ÔòΪ¿íÒÀÀµ£¬ÐèÒªShuffle¡£·ñÔòΪխÒÀÀµ£¬²»ÐèÒªShuffle¡£

ÒÔÉÏÒ²¾ÍÊÇ˵ֻÓн׶ÎÓë½×¶ÎÖ®¼äÐèÒªShuffle£¬×îºóÒ»¸ö½×¶Î»áÊä³ö½á¹û£¬Òò´Ë²»ÐèÒªShuffle¡£Àý×ÓÖеijÌÐò»á²úÉúÁ½¸ö½×¶Î£¬µÚÒ»¸öÎÒÃǼò³ÆMap½×¶Î£¬µÚ¶þ¸öÎÒÃǼò³ÆReduce½×¶Î¡£ShuffleÊÇͨ¹ýMap½×¶ÎµÄShuffleMapTaskÓëReduce½×¶ÎµÄShuffledRDDÅäºÏÍê³ÉµÄ¡£ÆäÖÐShuffleMapTask»á°ÑÈÎÎñµÄ¼ÆËã½á¹ûдÈëShuffleWriter£¬ShuffledRDD´ÓShuffleReaderÖжÁÈ¡Êý¾Ý£¬Shuffle¹ý³Ì»áÔÚдÈëºÍ¶ÁÈ¡¹ý³ÌÖÐÍê³É¡£ÒÔHashShuffleΪÀý£¬HashShuffleWriterÔÚдÈëÊý¾Ýʱ£¬»á¾ö¶¨ÊÇ·ñÔÚÔ­·ÖÇø×ö¾ÛºÏ£¬È»ºó¸ù¾ÝÊý¾ÝµÄHashֵдÈëÏàӦзÖÇø¡£HashShuffleReaderÔÙ¸ù¾Ý·ÖÇøºÅÈ¡³öÏàÓ¦Êý¾Ý£¬È»ºó¶ÔÊý¾Ý½øÐоۺϡ£

SparkµÄShuffle¹ý³Ì

¼ÆËã½á¹ûÊÕ¼¯

ResultTaskÈÎÎñ¼ÆËãÍê³Éºó¿ÉÒԵõ½Ã¿¸ö·ÖÇøµÄ¼ÆËã½á¹û£¬´ËʱÐèÒªÔÚDriverÉ϶Խá¹û½øÐлã×Ü´Ó¶øµÃµ½×îÖÕ½á¹û¡£

RDDÔÚÖ´ÐÐcollect£¬countµÈ¶¯×÷ʱ£¬»á¸ø³öÁ½¸öº¯Êý£¬Ò»¸öº¯ÊýÔÚ·ÖÇøÉÏÖ´ÐУ¬Ò»¸öº¯ÊýÔÚ·ÖÇø½á¹û¼¯ÉÏÖ´ÐС£ÀýÈçcollect¶¯×÷ÔÚ·ÖÇøÉÏ£¨ExecutorÖУ©Ö´Ðн«Iteratorת»»ÎªArrayµÄº¯Êý£¬²¢½«´Ëº¯Êý½á¹û·µ»Øµ½Driver¡£Driver ´Ó¶à¸ö·ÖÇøÉϵõ½ArrayÀàÐ͵ķÖÇø½á¹û¼¯£¬È»ºóÔÚ½á¹û¼¯ÉÏ£¨DriverÖУ©Ö´Ðкϲ¢ArrayµÄ²Ù×÷£¬´Ó¶øµÃµ½×îÖÕ½á¹û¡£

×ܽá

Spark¶ÔÓÚRDDµÄÉè¼ÆÊÇÆä¾«ËèËùÔÚ¡£ÓÃRDD²Ù×÷Êý¾ÝµÄ¸Ð¾õ¾ÍÒ»¸ö×Ö£ºË¬£¡¡£Ïëµ½RDD±³ºóÊǼ¸¶ÖÖØµÄ´óÊý¾Ý¼¯£¬¶øÎÒÃÇËæÊÖµ÷ÓÃÏÂmap(), reduce()¾Í¿ÉÒÔ°ÑËüת»»À´×ª»»È¥£¬Ò»ÖÖ°ëÁ½²¦Ç§½ïµÄ¸Ð¾õ¾Í»áÓÍÈ»¶øÉú¡£ÎÒÏëÊÇÒÔÏÂÌØÐÔ¸øÎÒÃÇ´øÀ´ÁËÕâЩ£º

RDD°Ñ²»Í¬À´Ô´£¬²»Í¬ÀàÐ͵ÄÊý¾Ý½øÐÐÁËͳһ£¬Ê¹ÎÒÃÇÃæ¶ÔRDDµÄʱºò¾Í»á²úÉúÒ»ÖÖÐÅÐÄ£¬¾Í»áÈÏΪÕâÊÇijÖÖÀàÐ͵ÄRDD£¬´Ó¶ø¿ÉÒÔ½øÐÐRDDµÄËùÓвÙ×÷¡£

¶ÔRDDµÄ²Ù×÷¿ÉÒÔµþ¼Óµ½Ò»Æð¼ÆË㣬ÎÒÃDz»±Øµ£ÐÄÖмä½á¹ûÍÌͶÔÐÔÄܵÄÓ°Ïì¡£

RDDÌṩÁ˸ü·á¸»µÄÊý¾Ý¼¯²Ù×÷º¯Êý£¬ÕâЩº¯Êý´ó¶¼ÊÇÔÚMapReduce»ù´¡ÉÏÀ©³äµÄ£¬Ê¹ÓÃÆðÀ´ºÜ·½±ã¡£

RDDΪÌṩÁËÒ»¸ö¼ò½àµÄ±à³Ì½çÃæ£¬±³ºó¸´Ôӵķֲ¼Ê½¼ÆËã¹ý³Ì¶Ô¿ª·¢ÕßÊÇ͸Ã÷µÄ¡£´Ó¶øÄܹ»ÈÃÎÒÃǰѹØ×¢µã¸ü¶àµÄ·ÅÔÚÒµÎñÉÏ¡£

   
2485 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

APPÍÆ¹ãÖ®ÇÉÓù¤¾ß½øÐÐÊý¾Ý·ÖÎö
Hadoop Hive»ù´¡sqlÓï·¨
Ó¦Óö༶»º´æÄ£Ê½Ö§³Åº£Á¿¶Á·þÎñ
HBase ³¬Ïêϸ½éÉÜ
HBase¼¼ÊõÏêϸ½éÉÜ
Spark¶¯Ì¬×ÊÔ´·ÖÅä

HadoopÓëSpark´óÊý¾Ý¼Ü¹¹
HadoopÔ­ÀíÓë¸ß¼¶Êµ¼ù
HadoopÔ­Àí¡¢Ó¦ÓÃÓëÓÅ»¯
´óÊý¾ÝÌåϵ¿ò¼ÜÓëÓ¦ÓÃ
´óÊý¾ÝµÄ¼¼ÊõÓëʵ¼ù
Spark´óÊý¾Ý´¦Àí¼¼Êõ

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí