±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚcsdn£¬±¾ÎĽéÉÜÁËSparkÊÇ»ùÓÚÄÚ´æµÄµü´ú¼ÆËã¿ò¼Ü£¬ÊÊÓÃÓÚÐèÒª¶à´Î²Ù×÷ÌØ¶¨Êý¾Ý¼¯µÄÓ¦Óó¡ºÏ¡£
|
|
Spark¿ÉÒÔÖ±½Ó¶ÔHDFS½øÐÐÊý¾ÝµÄ¶Áд£¬Í¬ÑùÖ§³ÖSpark on
YARN¡£Spark¿ÉÒÔÓëMapReduceÔËÐÐÓÚͬ¼¯ÈºÖУ¬¹²Ïí´æ´¢×ÊÔ´Óë¼ÆË㣬Êý¾Ý²Ö¿âSharkʵÏÖÉϽèÓÃHive£¬¼¸ºõÓëHiveÍêÈ«¼æÈÝ¡£
Spark µÄºËÐĸÅÄî
1¡¢Resilient Distributed Dataset (RDD)µ¯ÐÔ·Ö²¼Êý¾Ý¼¯
RDDÊÇSparkµÄ×î»ù±¾³éÏó,ÊǶԷֲ¼Ê½ÄÚ´æµÄ³éÏóʹÓã¬ÊµÏÖÁËÒÔ²Ù×÷±¾µØ¼¯ºÏµÄ·½Ê½À´²Ù×÷·Ö²¼Ê½Êý¾Ý¼¯µÄ³éÏóʵÏÖ¡£RDDÊÇSpark×îºË
ÐĵĶ«Î÷£¬Ëü±íʾÒѱ»·ÖÇø£¬²»¿É±äµÄ²¢Äܹ»±»²¢ÐвÙ×÷µÄÊý¾Ý¼¯ºÏ£¬²»Í¬µÄÊý¾Ý¼¯¸ñʽ¶ÔÓ¦²»Í¬µÄRDDʵÏÖ¡£RDD±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£RDD¿ÉÒÔcache
µ½ÄÚ´æÖУ¬Ã¿´Î¶ÔRDDÊý¾Ý¼¯µÄ²Ù×÷Ö®ºóµÄ½á¹û£¬¶¼¿ÉÒÔ´æ·Åµ½ÄÚ´æÖУ¬ÏÂÒ»¸ö²Ù×÷¿ÉÒÔÖ±½Ó´ÓÄÚ´æÖÐÊäÈ룬ʡȥÁËMapReduce´óÁ¿µÄ´ÅÅÌIO²Ù×÷¡£Õâ
¶ÔÓÚµü´úÔËËã±È½Ï³£¼ûµÄ»úÆ÷ѧϰËã·¨, ½»»¥Ê½Êý¾ÝÍÚ¾òÀ´Ëµ£¬Ð§ÂÊÌáÉý±È½Ï´ó¡£
RDDµÄÌØµã£º
1.ËüÊÇÔÚ¼¯Èº½ÚµãÉϵIJ»¿É±äµÄ¡¢ÒÑ·ÖÇøµÄ¼¯ºÏ¶ÔÏó¡£
2.ͨ¹ý²¢ÐÐת»»µÄ·½Ê½À´´´½¨È磨map, filter, join,
etc£©¡£
3.ʧ°Ü×Ô¶¯Öؽ¨¡£
4.¿ÉÒÔ¿ØÖÆ´æ´¢¼¶±ð£¨ÄÚ´æ¡¢´ÅÅ̵ȣ©À´½øÐÐÖØÓá£
5.±ØÐëÊÇ¿ÉÐòÁл¯µÄ¡£
6.ÊǾ²Ì¬ÀàÐ͵ġ£
RDDµÄºÃ´¦£º
1.RDDÖ»Äܴӳ־ô洢»òͨ¹ýTransformations²Ù×÷²úÉú£¬Ïà±ÈÓÚ·Ö²¼Ê½¹²ÏíÄڴ棨DSM£©¿ÉÒÔ¸ü¸ßЧʵÏÖÈÝ´í£¬¶ÔÓÚ¶ªÊ§²¿·ÖÊý¾Ý·ÖÇøÖ»Ðè¸ù¾ÝËüµÄlineage¾Í¿ÉÖØÐ¼ÆËã³öÀ´£¬¶ø²»ÐèÒª×öÌØ¶¨µÄCheckpoint¡£
2.RDDµÄ²»±äÐÔ£¬¿ÉÒÔʵÏÖÀàHadoop MapReduceµÄÍÆ²âʽִÐС£
3.RDDµÄÊý¾Ý·ÖÇøÌØÐÔ£¬¿ÉÒÔͨ¹ýÊý¾ÝµÄ±¾µØÐÔÀ´Ìá¸ßÐÔÄÜ£¬ÕâÓëHadoop
MapReduceÊÇÒ»ÑùµÄ¡£
4.RDD¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÔÚÄÚ´æ²»×ãʱ¿É×Ô¶¯½µ¼¶Îª´ÅÅÌ´æ´¢£¬°ÑRDD´æ´¢ÓÚ´ÅÅÌÉÏ£¬ÕâʱÐÔÄÜ»áÓдóµÄϽµµ«²»»á²îÓÚÏÖÔÚµÄMapReduce¡£
RDDµÄ´æ´¢Óë·ÖÇø£º
1.Óû§¿ÉÒÔÑ¡Ôñ²»Í¬µÄ´æ´¢¼¶±ð´æ´¢RDDÒÔ±ãÖØÓá£
2.µ±Ç°RDDĬÈÏÊÇ´æ´¢ÓÚÄڴ棬µ«µ±ÄÚ´æ²»×ãʱ£¬RDD»áspillµ½disk¡£
3.RDDÔÚÐèÒª½øÐзÖÇø°ÑÊý¾Ý·Ö²¼ÓÚ¼¯ÈºÖÐʱ»á¸ù¾ÝÿÌõ¼Ç¼Key½øÐзÖÇø£¨ÈçHash
·ÖÇø£©£¬ÒԴ˱£Ö¤Á½¸öÊý¾Ý¼¯ÔÚJoinʱÄܸßЧ¡£
RDDµÄÄÚ²¿±íʾ£º
1.·ÖÇøÁÐ±í£¨Êý¾Ý¿éÁÐ±í£©
2.¼ÆËãÿ¸ö·ÖƬµÄº¯Êý£¨¸ù¾Ý¸¸RDD¼ÆËã³ö´ËRDD£©
3.¶Ô¸¸RDDµÄÒÀÀµÁбí
4.¶Ôkey-value RDDµÄPartitioner¡¾¿ÉÑ¡¡¿
5.ÿ¸öÊý¾Ý·ÖƬµÄÔ¤¶¨Ò嵨ַÁбí(ÈçHDFSÉϵÄÊý¾Ý¿éµÄµØÖ·)¡¾¿ÉÑ¡¡¿
RDDµÄ´æ´¢¼¶±ð£ºRDD¸ù¾ÝuseDisk¡¢useMemory¡¢deserialized¡¢replicationËĸö²ÎÊýµÄ×éºÏÌṩÁË11ÖÖ´æ´¢¼¶±ð¡£RDD¶¨ÒåÁ˸÷ÖÖ²Ù×÷£¬²»Í¬ÀàÐ͵ÄÊý¾ÝÓɲ»Í¬µÄRDDÀà³éÏó±íʾ£¬²»Í¬µÄ²Ù×÷Ò²ÓÉRDD½øÐгéʵÏÖ¡£
RDDÓÐÁ½ÖÖ´´½¨·½Ê½£º
1.´ÓHadoopÎļþϵͳ£¨»òÓëHadoop¼æÈÝµÄÆäËü´æ´¢ÏµÍ³£©ÊäÈ루ÀýÈçHDFS£©´´½¨¡£
2.´Ó¸¸RDDת»»µÃµ½ÐÂRDD¡£
2¡¢Spark On Mesos
SparkÖ§³ÖLocalµ÷ÓúÍMesos¼¯ÈºÁ½ÖÖģʽ£¬ÔÚSparkÉÏ¿ª·¢Ëã·¨³ÌÐò£¬¿ÉÒÔÔÚ±¾µØÄ£Ê½µ÷ÊԳɹ¦ºó£¬Ö±½Ó¸ÄÓÃMesos¼¯ÈºÔËÐУ¬
³ýÁËÎļþµÄ±£´æÎ»ÖÃÐèÒª¿¼ÂÇÒÔÍ⣬Ëã·¨ÀíÂÛÉϲ»ÐèÒª×öÈκÎÐ޸ġ£SparkµÄ±¾µØÄ£Ê½Ö§³Ö¶àỊ̈߳¬ÓÐÒ»¶¨µÄµ¥»ú²¢·¢´¦ÀíÄÜÁ¦¡£µ«ÊDz»ËãºÜÇ¿¾¢¡£±¾µØÄ£Ê½¿É
ÒÔ±£´æ½á¹ûÔÚ±¾µØ»òÕß·Ö²¼Ê½Îļþϵͳ£¬¶øMesosģʽһ¶¨ÐèÒª±£´æÔÚ·Ö²¼Ê½»òÕß¹²ÏíÎļþϵͳ¡£
ΪÁËÔÚMesos¿ò¼ÜÉÏÔËÐУ¬°²×°MesosµÄ¹æ·¶ºÍÉè¼Æ£¬SparkʵÏÖÁ½¸öÀ࣬һ¸öÊÇSparkScheduler£¬ÔÚSparkÖÐÀàÃûÊÇ
MesosScheduler£»Ò»¸öÊÇSparkExecutor£¬ÔÚSparkÖÐÀàÃûÊÇExecutor¡£ÓÐÁËÕâÁ½¸öÀ࣬Spark¾Í¿ÉÒÔͨ¹ý
Mesos½øÐзֲ¼Ê½µÄ¼ÆËã¡£Spark»á½«RDDºÍMapReduceº¯Êý£¬½øÐÐÒ»´Îת»»£¬±ä³É±ê×¼µÄJobºÍһϵÁеÄTask¡£Ìá½»¸ø
SparkScheduler£¬SparkScheduler»á°ÑTaskÌá½»¸øMesos Master£¬ÓÉMaster·ÖÅ䏸²»Í¬µÄSlave£¬×îÖÕÓÉSlaveÖеÄSpark
Executor£¬½«·ÖÅäµ½µÄTaskÒ»Ò»Ö´ÐУ¬²¢ÇÒ·µ»Ø£¬×é³ÉеÄRDD£¬»òÕßÖ±½ÓдÈëµ½·Ö²¼Ê½Îļþϵͳ¡£

3¡¢Transformations & Actions
¶ÔÓÚRDD¿ÉÒÔÓÐÁ½ÖÖ¼ÆË㷽ʽ£º×ª»»£¨·µ»ØÖµ»¹ÊÇÒ»¸öRDD£©Óë²Ù×÷£¨·µ»ØÖµ²»ÊÇÒ»¸öRDD£©¡£
ת»»(Transformations) (È磺map, filter, groupBy, joinµÈ)£¬Transformations²Ù×÷ÊÇLazyµÄ£¬Ò²¾ÍÊÇ˵´ÓÒ»¸öRDDת»»Éú³ÉÁíÒ»¸öRDDµÄ²Ù×÷²»ÊÇÂíÉÏÖ´ÐУ¬SparkÔÚÓöµ½
Transformations²Ù×÷ʱֻ»á¼Ç¼ÐèÒªÕâÑùµÄ²Ù×÷£¬²¢²»»áÈ¥Ö´ÐУ¬ÐèÒªµÈµ½ÓÐActions²Ù×÷µÄʱºò²Å»áÕæÕýÆô¶¯¼ÆËã¹ý³Ì½øÐмÆËã¡£
²Ù×÷(Actions) (È磺count, collect, saveµÈ)£¬Actions²Ù×÷»á·µ»Ø½á¹û»ò°ÑRDDÊý¾Ýдµ½´æ´¢ÏµÍ³ÖС£ActionsÊÇ´¥·¢SparkÆô¶¯¼ÆËãµÄ¶¯Òò¡£
ËüÃDZ¾ÖÊÇø±ðÊÇ£ºTransformation·µ»ØÖµ»¹ÊÇÒ»¸öRDD¡£ËüʹÓÃÁËÁ´Ê½µ÷ÓõÄÉè¼ÆÄ£Ê½£¬¶ÔÒ»¸öRDD½øÐмÆËãºó£¬±ä»»³ÉÁíÍâÒ»¸ö
RDD£¬È»ºóÕâ¸öRDDÓÖ¿ÉÒÔ½øÐÐÁíÍâÒ»´Îת»»¡£Õâ¸ö¹ý³ÌÊÇ·Ö²¼Ê½µÄ¡£Action·µ»ØÖµ²»ÊÇÒ»¸öRDD¡£ËüҪôÊÇÒ»¸öScalaµÄÆÕͨ¼¯ºÏ£¬ÒªÃ´ÊÇÒ»¸ö
Öµ£¬ÒªÃ´Êǿգ¬×îÖÕ»ò·µ»Øµ½Driver³ÌÐò£¬»ò°ÑRDDдÈëµ½ÎļþϵͳÖС£¹ØÓÚÕâÁ½¸ö¶¯×÷£¬ÔÚSpark¿ª·¢Ö¸ÄÏÖлáÓоͽøÒ»²½µÄÏêϸ½éÉÜ£¬ËüÃÇÊÇ»ùÓÚ
Spark¿ª·¢µÄºËÐÄ¡£ÕâÀォSparkµÄ¹Ù·½pptÖеÄÒ»ÕÅͼÂÔ×÷¸ÄÔ죬²ûÃ÷Ò»ÏÂÁ½ÖÖ¶¯×÷µÄÇø±ð¡£

4¡¢Lineage£¨ÑªÍ³£©
ÀûÓÃÄÚ´æ¼Ó¿ìÊý¾Ý¼ÓÔØ,ÔÚÖÚ¶àµÄÆäËüµÄIn-MemoryÀàÊý¾Ý¿â»òCacheÀàϵͳÖÐÒ²ÓÐʵÏÖ£¬SparkµÄÖ÷񻂿±ðÔÚÓÚËü´¦Àí·Ö²¼Ê½ÔËËã»·¾³
ϵÄÊý¾ÝÈÝ´íÐÔ£¨½ÚµãʵЧ/Êý¾Ý¶ªÊ§£©ÎÊÌâʱ²ÉÓõķ½°¸¡£ÎªÁ˱£Ö¤RDDÖÐÊý¾ÝµÄ³°ôÐÔ£¬RDDÊý¾Ý¼¯Í¨¹ýËùνµÄѪͳ¹ØÏµ(Lineage)¼ÇסÁËËüÊÇÈç
ºÎ´ÓÆäËüRDDÖÐÑݱä¹ýÀ´µÄ¡£Ïà±ÈÆäËüϵͳµÄϸ¿ÅÁ£¶ÈµÄÄÚ´æÊý¾Ý¸üм¶±ðµÄ±¸·Ý»òÕßLOG»úÖÆ£¬RDDµÄLineage¼Ç¼µÄÊÇ´Ö¿ÅÁ£¶ÈµÄÌØ¶¨Êý¾Ýת»»
£¨Transformation£©²Ù×÷£¨filter, map, join etc.)ÐÐΪ¡£µ±Õâ¸öRDDµÄ²¿·Ö·ÖÇøÊý¾Ý¶ªÊ§Ê±£¬Ëü¿ÉÒÔͨ¹ýLineage»ñÈ¡×ã¹»µÄÐÅÏ¢À´ÖØÐÂÔËËãºÍ»Ö¸´¶ªÊ§µÄÊý¾Ý·ÖÇø¡£ÕâÖÖ´Ö¿ÅÁ£µÄÊý¾ÝÄ£ÐÍ£¬ÏÞ
ÖÆÁËSparkµÄÔËÓó¡ºÏ£¬µ«Í¬Ê±Ïà±Èϸ¿ÅÁ£¶ÈµÄÊý¾ÝÄ£ÐÍ£¬Ò²´øÀ´ÁËÐÔÄܵÄÌáÉý¡£
RDDÔÚLineageÒÀÀµ·½Ãæ·ÖΪÁ½ÖÖNarrow DependenciesÓëWide DependenciesÓÃÀ´½â¾öÊý¾ÝÈÝ´íµÄ¸ßЧÐÔ¡£
Narrow DependenciesÊÇÖ¸¸¸RDDµÄÿһ¸ö·ÖÇø×î¶à±»Ò»¸ö×ÓRDDµÄ·ÖÇøËùÓ㬱íÏÖΪһ¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø»ò¶à¸ö¸¸RDDµÄ·ÖÇø¶ÔÓ¦ÓÚÒ»¸ö×ÓRDDµÄ·ÖÇø£¬Ò²¾ÍÊÇ˵һ¸ö¸¸RDDµÄÒ»¸ö·ÖÇø²»¿ÉÄܶÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£
Wide DependenciesÊÇÖ¸×ÓRDDµÄ·ÖÇøÒÀÀµÓÚ¸¸RDDµÄ¶à¸ö·ÖÇø»òËùÓзÖÇø£¬Ò²¾ÍÊÇ˵´æÔÚÒ»¸ö¸¸RDDµÄÒ»¸ö·ÖÇø¶ÔÓ¦Ò»¸ö×ÓRDDµÄ¶à¸ö·ÖÇø¡£¶ÔÓë
Wide Dependencies£¬ÕâÖÖ¼ÆËãµÄÊäÈëºÍÊä³öÔÚ²»Í¬µÄ½ÚµãÉÏ£¬lineage·½·¨¶ÔÓëÊäÈë½ÚµãÍêºÃ£¬¶øÊä³ö½Úµãå´»úʱ£¬Í¨¹ýÖØÐ¼ÆË㣬ÕâÖÖÇé¿öÏ£¬Õâ
ÖÖ·½·¨ÈÝ´íÊÇÓÐЧµÄ£¬·ñÔòÎÞЧ£¬ÒòΪÎÞ·¨ÖØÊÔ£¬ÐèÒªÏòÉÏÆä׿ÏÈ×·ËÝ¿´ÊÇ·ñ¿ÉÒÔÖØÊÔ£¨Õâ¾ÍÊÇlineage£¬ÑªÍ³µÄÒâ˼£©£¬Narrow
Dependencies¶ÔÓÚÊý¾ÝµÄÖØË㿪ÏúҪԶСÓÚWide DependenciesµÄÊý¾ÝÖØË㿪Ïú¡£
ÔÚRDD¼ÆË㣬ͨ¹ýcheckpint½øÐÐÈÝ´í£¬×öcheckpointÓÐÁ½ÖÖ·½Ê½£¬Ò»¸öÊÇcheckpoint
data£¬Ò»¸öÊÇlogging the updates¡£Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í£¬Ä¬ÈÏÊÇlogging
the updates·½Ê½£¬Í¨¹ý¼Ç¼¸ú×ÙËùÓÐÉú³ÉRDDµÄת»»£¨transformations£©Ò²¾ÍÊǼǼÿ¸öRDDµÄlineage£¨ÑªÍ³£©À´ÖØÐ¼ÆËãÉú³É
¶ªÊ§µÄ·ÖÇøÊý¾Ý¡£
Spark µÄ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È
Spark¶ÔÓÚ×ÊÔ´¹ÜÀíÓë×÷Òµµ÷¶È¿ÉÒÔʹÓñ¾µØÄ£Ê½£¬Standalone(¶ÀÁ¢Ä£Ê½)£¬Apache
Mesos¼°Hadoop YARNÀ´ÊµÏÖ¡£Spark on YarnÔÚSpark0.6ʱÒýÓ㬵«ÕæÕý¿ÉÓÃÊÇÔÚÏÖÔÚµÄbranch-0.8°æ±¾¡£Spark
on Yarn×ñÑYARNµÄ¹Ù·½¹æ·¶ÊµÏÖ£¬µÃÒæÓÚSparkÌìÉúÖ§³Ö¶àÖÖSchedulerºÍExecutorµÄÁ¼ºÃÉè¼Æ£¬¶ÔYARNµÄÖ§³ÖÒ²¾Í·Ç³£ÈÝ
Ò×£¬Spark on YarnµÄ´óÖ¿ò¼Üͼ¡£

ÈÃSparkÔËÐÐÓÚYARNÉÏÓëHadoop¹²Óü¯Èº×ÊÔ´¿ÉÒÔÌá¸ß×ÊÔ´ÀûÓÃÂÊ¡£
±à³Ì½Ó¿Ú
Sparkͨ¹ýÓë±à³ÌÓïÑÔ¼¯³ÉµÄ·½Ê½±©Â¶RDDµÄ²Ù×÷£¬ÀàËÆÓÚDryadLINQºÍFlumeJava£¬Ã¿¸öÊý¾Ý¼¯¶¼±íʾΪRDD¶ÔÏ󣬶ÔÊý¾Ý¼¯
µÄ²Ù×÷¾Í±íʾ³É¶ÔRDD¶ÔÏóµÄ²Ù×÷¡£SparkÖ÷ÒªµÄ±à³ÌÓïÑÔÊÇScala£¬Ñ¡ÔñScalaÊÇÒòΪËüµÄ¼ò½àÐÔ£¨Scala¿ÉÒԺܷ½±ãÔÚ½»»¥Ê½ÏÂʹÓ㩺ÍÐÔ
ÄÜ£¨JVMÉϵľ²Ì¬Ç¿ÀàÐÍÓïÑÔ£©¡£
SparkºÍHadoop MapReduceÀàËÆ£¬ÓÉMaster(ÀàËÆÓÚMapReduceµÄJobtracker)ºÍWorkers(SparkµÄSlave¹¤×÷½Úµã)×é³É¡£
Óû§±àдµÄSpark³ÌÐò±»³ÆÎªDriver³ÌÐò£¬Dirver³ÌÐò»áÁ¬½Ómaster²¢¶¨ÒåÁ˶Ը÷RDDµÄת»»Óë²Ù×÷£¬¶ø¶ÔRDDµÄת»»Óë²Ù×÷ͨ¹ý
Scala±Õ°ü(×ÖÃæÁ¿º¯Êý)À´±íʾ£¬ScalaʹÓÃJava¶ÔÏóÀ´±íʾ±Õ°üÇÒ¶¼ÊÇ¿ÉÐòÁл¯µÄ£¬ÒԴ˰ѶÔRDDµÄ±Õ°ü²Ù×÷·¢Ë͵½¸÷Workers½Úµã¡£
Workers´æ´¢×ÅÊý¾Ý·Ö¿éºÍÏíÓм¯ÈºÄڴ棬ÊÇÔËÐÐÔÚ¹¤×÷½ÚµãÉϵÄÊØ»¤½ø³Ì£¬µ±ËüÊÕµ½¶ÔRDDµÄ²Ù×÷ʱ£¬¸ù¾ÝÊý¾Ý·ÖƬÐÅÏ¢½øÐб¾µØ»¯Êý¾Ý²Ù×÷£¬Éú³ÉеÄ
Êý¾Ý·ÖƬ¡¢·µ»Ø½á¹û»ò°ÑRDDдÈë´æ´¢ÏµÍ³¡£

Scala£ºSparkʹÓÃScala¿ª·¢£¬Ä¬ÈÏʹÓÃScala×÷Ϊ±à³ÌÓïÑÔ¡£±àдSpark³ÌÐò±È±àдHadoop
MapReduce³ÌÐòÒª¼òµ¥µÄ¶à£¬SparKÌṩÁËSpark-Shell£¬¿ÉÒÔÔÚSpark-Shell²âÊÔ³ÌÐò¡£Ð´SparK³ÌÐòµÄÒ»°ã²½Öè¾ÍÊÇ´´
½¨»òʹÓÃ(SparkContext)ʵÀý£¬Ê¹ÓÃSparkContext´´½¨RDD£¬È»ºó¾ÍÊǶÔRDD½øÐвÙ×÷¡£
Java£ºSparkÖ§³ÖJava±à³Ì£¬µ«¶ÔÓÚʹÓÃJava¾ÍûÓÐÁËSpark-ShellÕâÑù·½±ãµÄ¹¤¾ß£¬ÆäËüÓëScala±à³ÌÊÇÒ»ÑùµÄ£¬ÒòΪ¶¼ÊÇJVMÉϵÄÓïÑÔ£¬ScalaÓëJava¿ÉÒÔ»¥²Ù×÷£¬Java±à³Ì½Ó¿ÚÆäʵ¾ÍÊǶÔScalaµÄ·â×°¡£È磺
Python£ºÏÖÔÚSparkÒ²ÌṩÁËPython±à³Ì½Ó¿Ú£¬SparkʹÓÃpy4jÀ´ÊµÏÖpythonÓëjavaµÄ»¥²Ù×÷£¬´Ó¶øÊµÏÖʹÓÃ
python±àдSpark³ÌÐò¡£SparkҲͬÑùÌṩÁËpyspark£¬Ò»¸öSparkµÄpython
shell£¬¿ÉÒÔÒÔ½»»¥Ê½µÄ·½Ê½Ê¹ÓÃPython±àдSpark³ÌÐò¡£
Spark Éú̬ϵͳ
Shark ( Hive on Spark): Shark»ù±¾ÉϾÍÊÇÔÚSparkµÄ¿ò¼Ü»ù´¡ÉÏÌṩºÍHiveÒ»ÑùµÄH
iveQLÃüÁî½Ó¿Ú£¬ÎªÁË×î´ó³Ì¶ÈµÄ±£³ÖºÍHiveµÄ¼æÈÝÐÔ£¬SharkʹÓÃÁËHiveµÄAPIÀ´ÊµÏÖquery
ParsingºÍ Logic Plan generation£¬×îºóµÄPhysicalPlan execution½×¶ÎÓÃSpark´úÌæHadoop
MapReduce¡£Í¨¹ýÅäÖÃShark²ÎÊý£¬Shark¿ÉÒÔ×Ô¶¯ÔÚÄÚ´æÖлº´æÌض¨µÄRDD£¬ÊµÏÖÊý¾ÝÖØÓ㬽ø¶ø¼Ó¿ìÌØ¶¨Êý¾Ý¼¯µÄ¼ìË÷¡£Í¬Ê±£¬Shark
ͨ¹ýUDFÓû§×Ô¶¨Ò庯ÊýʵÏÖÌØ¶¨µÄÊý¾Ý·ÖÎöѧϰËã·¨£¬Ê¹µÃSQLÊý¾Ý²éѯºÍÔËËã·ÖÎöÄܽáºÏÔÚÒ»Æð£¬×î´ó»¯RDDµÄÖØ¸´Ê¹Óá£
Spark streaming: ¹¹½¨ÔÚSparkÉÏ´¦ÀíStreamÊý¾ÝµÄ¿ò¼Ü£¬»ù±¾µÄÔÀíÊǽ«StreamÊý¾Ý·Ö³ÉСµÄʱ¼äƬ¶Ï£¨¼¸Ã룩£¬ÒÔÀàËÆbatchÅúÁ¿´¦ÀíµÄ·½Ê½À´´¦ÀíÕâС²¿
·ÖÊý¾Ý¡£Spark Streaming¹¹½¨ÔÚSparkÉÏ£¬Ò»·½ÃæÊÇÒòΪSparkµÄµÍÑÓ³ÙÖ´ÐÐÒýÇæ£¨100ms+£©¿ÉÒÔÓÃÓÚʵʱ¼ÆË㣬ÁíÒ»·½ÃæÏà±È»ùÓÚRecordµÄÆäËü
´¦Àí¿ò¼Ü£¨ÈçStorm£©£¬RDDÊý¾Ý¼¯¸üÈÝÒ××ö¸ßЧµÄÈÝ´í´¦Àí¡£´ËÍâСÅúÁ¿´¦ÀíµÄ·½Ê½Ê¹µÃËü¿ÉÒÔͬʱ¼æÈÝÅúÁ¿ºÍʵʱÊý¾Ý´¦ÀíµÄÂß¼ºÍËã·¨¡£·½±ãÁËһЩÐè
ÒªÀúÊ·Êý¾ÝºÍʵʱÊý¾ÝÁªºÏ·ÖÎöµÄÌØ¶¨Ó¦Óó¡ºÏ¡£
Bagel: Pregel on Spark£¬¿ÉÒÔÓÃSpark½øÐÐͼ¼ÆË㣬ÕâÊǸö·Ç³£ÓÐÓõÄСÏîÄ¿¡£Bagel×Ô´øÁËÒ»¸öÀý×Ó£¬ÊµÏÖÁËGoogleµÄPageRankËã·¨¡£
Spark µÄÊÊÓó¡¾°
SparkÊÇ»ùÓÚÄÚ´æµÄµü´ú¼ÆËã¿ò¼Ü£¬ÊÊÓÃÓÚÐèÒª¶à´Î²Ù×÷ÌØ¶¨Êý¾Ý¼¯µÄÓ¦Óó¡ºÏ¡£ÐèÒª·´¸´²Ù×÷µÄ´ÎÊýÔ½¶à£¬ËùÐè¶ÁÈ¡µÄÊý¾ÝÁ¿Ô½´ó£¬ÊÜÒæÔ½´ó£¬Êý¾ÝÁ¿Ð¡µ«ÊǼÆËãÃܼ¯¶È½Ï´óµÄ³¡ºÏ£¬ÊÜÒæ¾ÍÏà¶Ô½ÏС
ÓÉÓÚRDDµÄÌØÐÔ£¬Spark²»ÊÊÓÃÄÇÖÖÒ첽ϸÁ£¶È¸üÐÂ״̬µÄÓ¦Óã¬ÀýÈçweb·þÎñµÄ´æ´¢»òÕßÊÇÔöÁ¿µÄwebÅÀ³æºÍË÷Òý¡£¾ÍÊǶÔÓÚÄÇÖÖÔöÁ¿Ð޸ĵÄÓ¦ÓÃÄ£ÐͲ»Êʺϡ£
×ܵÄÀ´ËµSparkµÄÊÊÓÃÃæ±È½Ï¹ã·ºÇұȽÏͨÓá£
|