±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚcsdn£¬ÎÄÕ½²½âRDDµÄÌØµã£¬RDD²Ù×÷º¯ÊýÏà¹Ø£¬´©²å°¸ÀýÀ±½´µÃ¶Î×Ó£¬´ø´ó¼ÒÀí½âMapReduce£¬Í¨¹ý¹þÄ·À×ÌØµ¥´Ê·ÖÎö°¸Àý½øÐÐÉî¶ÈÆÊÎö¡£
|
|
RDD£¨Resilient Distributed Datasetsµ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£©£¬ÊÇsparkÖÐ×îÖØÒªµÄ¸ÅÄ¿ÉÒÔ¼òµ¥µÄ°ÑRDDÀí½â³ÉÒ»¸öÌṩÁËÐí¶à²Ù×÷½Ó¿ÚµÄÊý¾Ý¼¯ºÏ£¬ºÍÒ»°ãÊý¾Ý¼¯²»Í¬µÄÊÇ£¬Æäʵ¼ÊÊý¾Ý·Ö²¼´æ´¢ÓÚÒ»Åú»úÆ÷ÖУ¨ÄÚ´æ»ò´ÅÅÌÖУ©£¬RDD»ìºÏÁ˸÷ÖÖ¼ÆËãÄ£ÐÍ£¬Ê¹µÃSpark¿ÉÒÔÓ¦ÓÃÓÚ¸÷ÖÖ´óÊý¾Ý´¦Àí³¡¾°µ±È»£¬RDD¿Ï¶¨²»»áÕâô¼òµ¥£¬ËüµÄ¹¦ÄÜ»¹°üÀ¨ÈÝ´í¡¢¼¯ºÏÄÚµÄÊý¾Ý¿ÉÒÔ²¢Ðд¦ÀíµÈ¡£RDD¿ÉÒÔcacheµ½ÄÚ´æÖУ¬Ã¿´Î¶ÔRDDÊý¾Ý¼¯µÄ²Ù×÷Ö®ºóµÄ½á¹û£¬¶¼¿ÉÒÔ´æ·Åµ½ÄÚ´æÖУ¬ÏÂÒ»¸ö²Ù×÷¿ÉÒÔÖ±½Ó´ÓÄÚ´æÖÐÊäÈ룬ʡȥÁËMapReduce´óÁ¿µÄ´ÅÅÌIO²Ù×÷¡£
RDDµÄÌØµã
´´½¨£ºÖ»ÄÜͨ¹ýת»» ( transformation £¬Èçmap/filter/groupBy/join
µÈ£¬Çø±ðÓÚ¶¯×÷ action) ´ÓÁ½ÖÖÊý¾ÝÔ´Öд´½¨ RDD
Ö»¶Á£º×´Ì¬²»¿É±ä£¬²»ÄÜÐ޸ġ£
·ÖÇø£ºÖ§³Öʹ RDD ÖеÄÔªËØ¸ù¾ÝÄǸö key À´·ÖÇø ( partitioning ) £¬±£´æµ½¶à¸ö½áµãÉÏ¡£»¹Ôʱֻ»áÖØÐ¼ÆË㶪ʧ·ÖÇøµÄÊý¾Ý£¬¶ø²»»áÓ°ÏìÕû¸öϵͳ¡£
·¾¶£º¼´ RDD Óгä×ãµÄÐÅÏ¢¹ØÓÚËüÊÇÈçºÎ´ÓÆäËû RDD ²úÉú¶øÀ´µÄ¡£
³Ö¾Ã»¯£ºÖ§³Ö½«»á±»ÖØÓÃµÄ RDD »º´æ ( Èç in-memory »òÒç³öµ½´ÅÅÌ )¡£
ÑÓ³Ù¼ÆË㣺 Spark Ò²»áÑÓ³Ù¼ÆËã RDD £¬Ê¹ÆäÄܹ»½«×ª»»¹ÜµÀ»¯ (pipeline transformation)¡£
²Ù×÷£º·á¸»µÄת»»£¨transformation£©ºÍ¶¯×÷ ( action ) £¬ count/reduce/collect/save
µÈ¡£Ö´ÐÐÁ˶àÉÙ´Îtransformation²Ù×÷£¬RDD¶¼²»»áÕæÕýÖ´ÐÐÔËË㣨¼Ç¼lineage£©£¬Ö»Óе±action²Ù×÷±»Ö´ÐÐʱ£¬ÔËËã²Å»á´¥·¢¡£
RDD ·ÖΪ¶þÀࣺtransformation ºÍ action¡£
transformation ÊÇ´ÓÒ»¸ö RDD ת»»ÎªÒ»¸öÐ嵀 RDD »òÕß´ÓÊý¾ÝÔ´Éú³ÉÒ»¸öÐ嵀 RDD£»
action ÊÇ´¥·¢ job µÄÖ´ÐУ¬Ö»ÓÐÔÚ action ±»Ìá½»µÄʱºò²Å´¥·¢Ç°ÃæÕû¸öRDDµÄÖ´ÐÐͼ
RDDÔËÐÐÂß¼

2.1Ò»¸ö¶Î×ÓÀí½âMapReduce?

Àî¶úÎÒÓÐÈý¼þ±¦±´£¬³ÖÓжøÕäÖØËü¡£µÚÒ»¼þ½Ð´È°®£¬µÚ¶þ¼þ½Ð½Ú¼ó£¬µÚÈý¼þ½Ð²»¸Ò´¦ÔÚÖÚÈËÖ®ÏÈ
½ºþ´«ËµÓÀÁ÷´«£º¹È¸è¼¼ÊõÓÐ"Èý±¦"£¬GFS¡¢MapReduceºÍ´ó±í£¨BigTable£©£¡
À±½´¶Î×Ó

×òÌ죬ÎÒÔÚXebiaÓ¡¶È°ì¹«ÊÒ·¢±íÁËÒ»¸ö¹ØÓÚMapReduceµÄÑÝ˵¡£ÑÝ˵½øÐеúÜ˳Àû£¬ÌýÖÚÃǶ¼Äܹ»Àí½âMapReduceµÄ¸ÅÄ¸ù¾ÝËûÃǵķ´À¡£©¡£Îҳɹ¦µØÏò¼¼ÊõÌýÖÚÃÇ£¨½âÊÍÁËMapReduceµÄ¸ÅÄÕâÈÃÎҸе½ÐË·Ü¡£ÔÚËùÓÐÐÁÇڵŤ×÷Ö®ºó£¬ÎÒÃÇÔÚXebiaÓ¡¶È°ì¹«ÊÒÏíÓÃÁË·áÊ¢µÄÍí²Í£¬È»ºóÎÒ¾¶Ö±»ØÁ˼ҡ£
¡¡»Ø¼Òºó£¬Î񵀮Þ×Ó£¨Supriya£©ÎʵÀ£º¡°ÄãµÄ»á¿ªµÃÔõôÑù£¿¡±ÎÒ˵»¹²»´í¡£ ½Ó×ÅËýÓÖÎÊÎÒ»áÒéÊǵÄÄÚÈÝÊÇʲô(Ëý²»ÊÇ´ÓÊÂÈí¼þ»ò±à³ÌÁìÓòµÄ¹¤×÷µÄ)¡£ÎÒ¸æËßËý˵MapReduce¡£¡°Mapduce£¬ÄÇÊÇÊ²Ã´ÍæÒâ¶ù£¿¡±ËýÎʵÀ£º
¡°¸úµØÐÎͼÓйØÂ𣿡±ÎÒ˵²»£¬²»Êǵģ¬ËüºÍµØÐÎͼһµã¹ØÏµÒ²Ã»ÓС£¡°ÄÇô£¬Ëüµ½µ×ÊÇÊ²Ã´ÍæÒâ¶ù£¿¡±ÆÞ×ÓÎʵÀ¡£
¡°ßí¡ÈÃÎÒÃÇÈ¥Dominos(ÅûÈøÁ¬Ëø)°É£¬ÎÒ»áÔÚ²Í×ÀÉϸúÄãºÃºÃ½âÊÍ¡£¡± ÆÞ×Ó˵£º¡°ºÃµÄ¡£¡± È»ºóÎÒÃǾÍÈ¥ÁËÅûÈøµê¡£
ÎÒÃÇÔÚDomionsµã²ÍÖ®ºó£¬¹ñ̨µÄС»ï×Ó¸æËßÎÒÃÇ˵ÅûÈøÐèÒª15·ÖÖÓ²ÅÄÜ×¼±¸ºÃ¡£ÓÚÊÇ£¬ÎÒÎÊÆÞ×Ó£º¡°ÄãÕæµÄÏëҪŪ¶®Ê²Ã´ÊÇMapReduce£¿¡±
ËýºÜ¼á¶¨µÄ»Ø´ð˵¡°Êǵġ±¡£ Òò´ËÎÒÎʵÀ£º
¡¡¡¡ÎÒ£º ÄãÊÇÈçºÎ×¼±¸Ñó´ÐÀ±½·½´µÄ£¿£¨ÒÔϲ¢·Ç׼ȷʳÆ×£¬ÇëÎðÔÚ¼Ò³¢ÊÔ£©
¡¡¡¡ÆÞ×Ó£º ÎÒ»áȡһ¸öÑó´Ð£¬°ÑËüÇÐË飬Ȼºó°èÈëÑκÍË®£¬×îºó·Å½ø»ìºÏÑÐÄ¥»úÀïÑÐÄ¥¡£ÕâÑù¾ÍÄܵõ½Ñó´ÐÀ±½·½´ÁË¡£
¡¡¡¡ÆÞ×Ó£º µ«ÕâºÍMapReduceÓÐʲô¹ØÏµ£¿
¡¡¡¡ÎÒ£º ÄãµÈһϡ£ÈÃÎÒÀ´±àÒ»¸öÍêÕûµÄÇé½Ú£¬ÕâÑùÄã¿Ï¶¨¿ÉÒÔÔÚ15·ÖÖÓÄÚŪ¶®MapReduce.
¡¡¡¡ÆÞ×Ó£º ºÃ°É¡£
¡¡¡¡ÎÒ£ºÏÖÔÚ£¬¼ÙÉèÄãÏëÓñ¡ºÉ¡¢Ñó´Ð¡¢·¬ÇÑ¡¢À±½·¡¢´óËâŪһƿ»ìºÏÀ±½·½´¡£Äã»áÔõô×öÄØ£¿
¡¡¡¡ÆÞ×Ó£º ÎÒ»áÈ¡±¡ºÉÒ¶Ò»´é£¬Ñó´ÐÒ»¸ö£¬·¬ÇÑÒ»¸ö£¬À±½·Ò»¸ù£¬´óËâÒ»¸ù£¬ÇÐËéºó¼ÓÈëÊÊÁ¿µÄÑκÍË®£¬ÔÙ·ÅÈë»ìºÏÑÐÄ¥»úÀïÑÐÄ¥£¬ÕâÑùÄã¾Í¿ÉÒԵõ½Ò»Æ¿»ìºÏÀ±½·½´ÁË¡£
¡¡¡¡ÎÒ£º û´í£¬ÈÃÎÒÃǰÑMapReduceµÄ¸ÅÄîÓ¦Óõ½Ê³Æ×ÉÏ¡£MapºÍReduceÆäʵÊÇÁ½ÖÖ²Ù×÷£¬ÎÒÀ´¸øÄãÏêϸ½²½âÏ¡£Map£¨Ó³É䣩:
°ÑÑó´Ð¡¢·¬ÇÑ¡¢À±½·ºÍ´óËâÇÐË飬ÊǸ÷×Ô×÷ÓÃÔÚÕâЩÎïÌåÉϵÄÒ»¸öMap²Ù×÷¡£ËùÒÔÄã¸øMapÒ»¸öÑó´Ð£¬Map¾Í»á°ÑÑó´ÐÇÐËé¡£
ͬÑùµÄ£¬Äã°ÑÀ±½·£¬´óËâºÍ·¬ÇÑÒ»Ò»µØÄøøMap£¬ÄãÒ²»áµÃµ½¸÷ÖÖËé¿é¡£ ËùÒÔ£¬µ±ÄãÔÚÇÐÏñÑó´ÐÕâÑùµÄÊß²Ëʱ£¬ÄãÖ´ÐоÍÊÇÒ»¸öMap²Ù×÷¡£
Map²Ù×÷ÊÊÓÃÓÚÿһÖÖÊ߲ˣ¬Ëü»áÏàÓ¦µØÉú²ú³öÒ»ÖÖ»ò¶àÖÖËé¿é£¬ÔÚÎÒÃǵÄÀý×ÓÖÐÉú²úµÄÊÇÊ߲˿顣ÔÚMap²Ù×÷ÖпÉÄÜ»á³öÏÖÓиöÑó´Ð»µµôÁ˵ÄÇé¿ö£¬ÄãÖ»Òª°Ñ»µÑó´Ð¶ªÁ˾ÍÐÐÁË¡£ËùÒÔ£¬Èç¹û³öÏÖ»µÑó´ÐÁË£¬Map²Ù×÷¾Í»á¹ýÂ˵ô»µÑó´Ð¶ø²»»áÉú²ú³öÈκεϵÑó´Ð¿é¡£
¡¡¡¡Reduce£¨»¯¼ò£©:ÔÚÕâÒ»½×¶Î£¬Ä㽫¸÷ÖÖÊß²ËËé¶¼·ÅÈëÑÐÄ¥»úÀï½øÐÐÑÐÄ¥£¬Äã¾Í¿ÉÒԵõ½Ò»Æ¿À±½·½´ÁË¡£ÕâÒâÎ¶ÒªÖÆ³ÉһƿÀ±½·½´£¬ÄãµÃÑÐÄ¥ËùÓеÄÔÁÏ¡£Òò´Ë£¬ÑÐÄ¥»úͨ³£½«map²Ù×÷µÄÊß²ËËé¾Û¼¯ÔÚÁËÒ»Æð¡£
¡¡¡¡ÆÞ×Ó£º ËùÒÔ£¬Õâ¾ÍÊÇMapReduce?
¡¡¡¡ÎÒ£º Äã¿ÉÒÔ˵ÊÇ£¬Ò²¿ÉÒÔ˵²»ÊÇ¡£ ÆäʵÕâÖ»ÊÇMapReduceµÄÒ»²¿·Ö£¬MapReduceµÄÇ¿´óÔÚÓÚ·Ö²¼Ê½¼ÆËã¡£
¡¡¡¡ÆÞ×Ó£º ·Ö²¼Ê½¼ÆË㣿 ÄÇÊÇʲô£¿Çë¸øÎÒ½âÊÍϰɡ£
¡¡¡¡ÎÒ£º ûÎÊÌâ¡£
¡¡¡¡ÎÒ£º ¼ÙÉèÄã²Î¼ÓÁËÒ»¸öÀ±½·½´±ÈÈü²¢ÇÒÄãµÄʳÆ×Ó®µÃÁË×î¼ÑÀ±½·½´½±¡£µÃ½±Ö®ºó£¬À±½·½´Ê³Æ×´óÊÜ»¶Ó£¬ÓÚÊÇÄãÏëÒª¿ªÊ¼³öÊÛ×ÔÖÆÆ·ÅÆµÄÀ±½·½´¡£¼ÙÉèÄãÿÌìÐèÒªÉú²ú10000Æ¿À±½·½´£¬Äã»áÔõô°ìÄØ£¿
¡¡¡¡ÆÞ×Ó£º ÎÒ»áÕÒÒ»¸öÄÜΪÎÒ´óÁ¿ÌṩÔÁϵũӦÉÌ¡£
¡¡¡¡ÎÒ£ºÊǵÄ..¾ÍÊÇÄÇÑùµÄ¡£ÄÇÄãÄÜ·ñ¶À×ÔÍê³ÉÖÆ×÷ÄØ£¿Ò²¾ÍÊÇ˵£¬¶À×Ô½«ÔÁ϶¼ÇÐË飿 ½ö½öÒ»²¿ÑÐÄ¥»úÓÖÊÇ·ñÄÜÂú×ãÐèÒª£¿¶øÇÒÏÖÔÚ£¬ÎÒÃÇ»¹ÐèÒª¹©Ó¦²»Í¬ÖÖÀàµÄÀ±½·½´£¬ÏñÑó´ÐÀ±½·½´¡¢ÇཷÀ±½·½´¡¢·¬ÇÑÀ±½·½´µÈµÈ¡£
¡¡¡¡ÆÞ×Ó£º µ±È»²»ÄÜÁË£¬ÎÒ»á¹ÍÓ¶¸ü¶àµÄ¹¤ÈËÀ´ÇÐÊ߲ˡ£ÎÒ»¹ÐèÒª¸ü¶àµÄÑÐÄ¥»ú£¬ÕâÑùÎҾͿÉÒÔ¸ü¿ìµØÉú²úÀ±½·½´ÁË¡£
¡¡¡¡ÎÒ£ºÃ»´í£¬ËùÒÔÏÖÔÚÄã¾Í²»µÃ²»·ÖÅ乤×÷ÁË£¬Ä㽫ÐèÒª¼¸¸öÈËÒ»ÆðÇÐÊ߲ˡ£Ã¿¸öÈ˶¼Òª´¦ÀíÂúÂúÒ»´üµÄÊ߲ˣ¬¶øÃ¿Ò»¸öÈ˶¼Ï൱ÓÚÔÚÖ´ÐÐÒ»¸ö¼òµ¥µÄMap²Ù×÷¡£Ã¿Ò»¸öÈ˶¼½«²»¶ÏµÄ´Ó´ü×ÓÀïÄóöÊß²ËÀ´£¬²¢ÇÒÿ´ÎÖ»¶ÔÒ»ÖÖÊ߲˽øÐд¦Àí£¬Ò²¾ÍÊǽ«ËüÃÇÇÐË飬ֱµ½´ü×Ó¿ÕÁËΪֹ¡£
¡¡¡¡ÕâÑù£¬µ±ËùÓеŤÈ˶¼ÇÐÍêÒԺ󣬹¤×÷̨£¨Ã¿¸öÈ˹¤×÷µÄµØ·½£©ÉϾÍÓÐÁËÑó´Ð¿é¡¢·¬Çѿ顢ºÍËâÈØµÈµÈ¡£ ¡¡¡¡ÆÞ×Ó£ºµ«ÊÇÎÒÔõô»áÖÆÔì³ö²»Í¬ÖÖÀàµÄ·¬Çѽ´ÄØ£¿
¡¡¡¡ÎÒ£ºÏÖÔÚÄã»á¿´µ½MapReduceÒÅ©µÄ½×¶Î¡ª½Á°è½×¶Î¡£MapReduce½«ËùÓÐÊä³öµÄÊß²ËËé¶¼½Á°èÔÚÁËÒ»Æð£¬ÕâЩÊß²ËËé¶¼ÊÇÔÚÒÔkeyΪ»ù´¡µÄmap²Ù×÷ϲúÉúµÄ¡£½Á°è½«×Ô¶¯Íê³É£¬Äã¿ÉÒÔ¼ÙÉèkeyÊÇÒ»ÖÖÔÁϵÄÃû×Ö£¬¾ÍÏñÑó´ÐÒ»Ñù¡£
ËùÒÔÈ«²¿µÄÑó´Ðkeys¶¼»á½Á°èÔÚÒ»Æð£¬²¢×ªÒƵ½ÑÐÄ¥Ñó´ÐµÄÑÐÄ¥Æ÷Àï¡£ÕâÑù£¬Äã¾ÍÄܵõ½Ñó´ÐÀ±½·½´ÁË¡£Í¬ÑùµØ£¬ËùÓеķ¬ÇÑÒ²»á±»×ªÒƵ½±ê¼Ç×Å·¬ÇѵÄÑÐÄ¥Æ÷À²¢ÖÆÔì³ö·¬ÇÑÀ±½·½´¡£
¡¡¡¡ÅûÈøÖÕÓÚ×öºÃÁË£¬Ëýµãµãͷ˵ËýÒѾŪ¶®Ê²Ã´ÊÇMapReduceÁË¡£ÎÒֻϣÍûÏ´ÎËýÌýµ½MapReduceʱ£¬ÄܸüºÃµÄÀí½âÎÒµ½µ×ÔÚ×öЩʲô¡£
ÍøÉÏÆäËûÈËÓÃ×î¼ò¶ÌµÄÓïÑÔ½âÊÍMapReduce£º
We want to count all the books in the library. You
count up shelf #1, I count up shelf #2. That¡¯s map.
The more people we get, the faster it goes.
¡¡¡¡ÎÒÃÇÒªÊýͼÊé¹ÝÖеÄËùÓÐÊé¡£ÄãÊý1ºÅÊé¼Ü£¬ÎÒÊý2ºÅÊé¼Ü¡£Õâ¾ÍÊÇ¡°Map¡±¡£ÎÒÃÇÈËÔ½¶à£¬ÊýÊé¾Í¸ü¿ì¡£
¡¡¡¡Now we get together and add our individual counts.
That¡¯s reduce.
¡¡¡¡ÏÖÔÚÎÒÃǵ½Ò»Æð£¬°ÑËùÓÐÈ˵Äͳ¼ÆÊý¼ÓÔÚÒ»Æð¡£Õâ¾ÍÊÇ¡°Reduce¡±
2.2RDD²Ù×÷º¯Êý
transformation

action

º¯Êý¾ßÌ幦ÄܽéÉÜ
<1>½«Ò»¸öRDDÖеÄÿ¸öÊý¾ÝÏͨ¹ýmapÖеĺ¯ÊýÓ³Éä±äΪһ¸öеÄÔªËØ,
ÊäÈë·ÖÇøÓëÊä³ö·ÖÇøÒ»¶ÔÒ»£¬¼´£ºÓжàÉÙ¸öÊäÈë·ÖÇø£¬¾ÍÓжàÉÙ¸öÊä³ö·ÖÇø.
data = sc.textFile("/tmp/hive/tmp/1.txt")
data: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1]
at textFile at :21
//ʹÓÃmapËã×Ó
mapresult = data.map()
mapresult: org.apache.spark.rdd.RDD[Array[String]]
= MapPartitionsRDD[2] at map at :23
|
<2>flatMap,ÊôÓÚTransformationËã×Ó£¬µÚÒ»²½ºÍmapÒ»Ñù£¬×îºó½«ËùÓеÄÊä³ö·ÖÇøºÏ²¢³ÉÒ»¸ö
<3>count·µ»ØRDDÖеÄÔªËØÊýÁ¿
<4>reduce,¸ù¾ÝÓ³É亯Êý¶ÔRDDÀïµÄÔªËØ½øÐжþÔª¼ÆËã½á¹û£¬·µ»Ø¼ÆËã½á¹û
rdd.reduce(lambda
x,y:x.a+y.a,x.b+y.b) |
<5>collectÓÃÓÚ½«Ò»¸öRDDת»»³ÉÊý×é
<6>countByKey(),countByKeyÓÃÓÚͳ¼ÆRDD[K,V]ÖÐÿ¸öKµÄÊýÁ¿
<7>foreach(),foreachÓÃÓÚ±éÀúRDD,½«º¯ÊýfÓ¦ÓÃÓÚÿһ¸öÔªËØ?<8>saveAsTextFileÓÃÓÚ½«RDDÒÔÎı¾ÎļþµÄ¸ñʽ´æ´¢µ½ÎļþϵͳÖÐ
rdd.saveAsTextFile("/tmp/hive/itcast/python-bigdata.txt") |
2.3¹þÄ·À×ÌØµ¥´Ê·ÖÎö°¸Àý
hdfsÎļþ²Ù×÷
<1>½«±¾µØµÄHamlet.txtÉÏ´«µ½hdfsÉÏ
hadoop fs -put
Hamlet.txt /tmp/hive/itcast/ |
ÆäËû²Ù×÷
»ñÈ¡hdfsÎļþµ½±¾µØ
hadoop fs -get
/tmp/hive/itcast/python.txt ./ |
ÁгöhdfsÎļþϵͳ¸ùĿ¼ÏµÄĿ¼ºÍÎļþ
rm²Ù×÷
hadoop fs -rm
< hdfs file > ...
hadoop fs -rm -r < hdfs dir>...
ÿ´Î¿ÉÒÔɾ³ý¶à¸öÎļþ»òĿ¼ |
sparkÔËÐÐÔÀí³ÌÐò·ÖÎö
SparkÓ¦ÓÃ×÷Ϊ¶ÀÁ¢µÄ½ø³ÌÔËÐУ¬ÓÉÇý¶¯³ÌÐòÖеÄSparkContextе÷¡£Õâ¸öcontext½«»áÁ¬½Óµ½Ò»Ð©¼¯Èº¹ÜÀíÕߣ¨ÈçYARN£©£¬ÕâЩ¹ÜÀíÕß·ÖÅäϵͳ×ÊÔ´¡£¼¯ÈºÉϵÄÿ¸öworkerÓÉÖ´ÐÐÕߣ¨executor£©¹ÜÀí£¬Ö´ÐÐÕß·´¹ýÀ´ÓÉSparkContext¹ÜÀí¡£Ö´ÐÐÕß¹ÜÀí¼ÆËã¡¢´æ´¢£¬»¹ÓÐÿ̨»úÆ÷ÉϵĻº´æ¡£
sc
¿ªÆôsparkºó£¬SparkContext¼òÂÔд·¨sc,¶ÔÓÚsc¿ÉÒÔ½øÐÐÎļþµÄ¶ÁÈ¡ÒÔ¼°½ÚµãÖ®¼äµÄе÷
<1>¶ÁÈ¡Îļþ²¢×ª»»³ÉÒ»¸öRDDÊý¾Ý¼¯
sc.textFile("itcast-python.txt") |
<2>½«Êý¾Ý´ÓÒ»¸ö½Úµã·¢Ë͵½ÆäËü½ÚµãÉÏ
#¶ÁÈ¡RDDÀàÐ͵ÄÊý¾Ý¼¯£¬²¢ÇÒ·µ»ØÒ»¸öbroadcastÀàÐ͵ÄÊý¾Ý
#¶ÁÈ¡RDDÀàÐ͵ÄÊý¾Ý¼¯£¬²¢ÇÒ·µ»ØÒ»¸öbroadcastÀàÐ͵ÄÊý¾Ý
bdata = sc.broadcast(data) |
from operator
import add
text = sc.textFile("/tmp/hive/itcast/Hamlet.txt")
def tokenize(text):
return text.split()
words = text.flatMap(tokenize)
wc = words.map(lambda x: (x,1))
counts = wc.reduceByKey(add)
counts.saveAsTextFile("/tmp/hive/itcast/hm")
|
|