ËäÈ»
Hadoop ÔÚ·Ö²¼Ê½Êý¾Ý·ÖÎö·½Ã汸ÊܹØ×¢£¬µ«ÊÇÈÔÓÐÒ»Ð©Ìæ´ú²úÆ·ÌṩÁËÓÅÓÚµäÐÍ Hadoop ƽ̨µÄÁîÈ˹Ø×¢µÄÓÅÊÆ¡£Spark
ÊÇÒ»ÖÖ¿ÉÀ©Õ¹µÄÊý¾Ý·ÖÎöƽ̨£¬ËüÕûºÏÁËÄÚ´æ¼ÆËãµÄ»ùÔª£¬Òò´Ë£¬Ïà¶ÔÓÚ Hadoop µÄ¼¯Èº´æ´¢·½·¨£¬ËüÔÚÐÔÄÜ·½Ãæ¸ü¾ßÓÅÊÆ¡£Spark
ÊÇÔÚ Scala ÓïÑÔÖÐʵÏֵ쬲¢ÇÒÀûÓÃÁ˸ÃÓïÑÔ£¬ÎªÊý¾Ý´¦ÀíÌṩÁ˶ÀÒ»ÎÞ¶þµÄ»·¾³¡£Á˽â Spark µÄ¼¯Èº¼ÆËã·½·¨ÒÔ¼°ËüÓë
Hadoop µÄ²»Í¬Ö®´¦¡£
Spark ÊÇÒ»ÖÖÓë Hadoop ÏàËÆµÄ¿ªÔ´¼¯Èº¼ÆËã»·¾³£¬µ«ÊÇÁ½ÕßÖ®¼ä»¹´æÔÚһЩ²»Í¬Ö®´¦£¬ÕâЩÓÐÓõIJ»Í¬Ö®´¦Ê¹
Spark ÔÚijЩ¹¤×÷¸ºÔØ·½Ãæ±íÏֵøü¼ÓÓÅÔ½£¬»»¾ä»°Ëµ£¬Spark ÆôÓÃÁËÄÚ´æ·Ö²¼Êý¾Ý¼¯£¬³ýÁËÄܹ»Ìṩ½»»¥Ê½²éѯÍ⣬Ëü»¹¿ÉÒÔÓÅ»¯µü´ú¹¤×÷¸ºÔØ¡£
Spark ÊÇÔÚ Scala ÓïÑÔÖÐʵÏֵģ¬Ëü½« Scala ÓÃ×÷ÆäÓ¦ÓóÌÐò¿ò¼Ü¡£Óë
Hadoop ²»Í¬£¬Spark ºÍ Scala Äܹ»½ôÃܼ¯³É£¬ÆäÖÐµÄ Scala ¿ÉÒÔÏñ²Ù×÷±¾µØ¼¯ºÏ¶ÔÏóÒ»ÑùÇáËɵزÙ×÷·Ö²¼Ê½Êý¾Ý¼¯¡£
¾¡¹Ü´´½¨ Spark ÊÇΪÁËÖ§³Ö·Ö²¼Ê½Êý¾Ý¼¯Éϵĵü´ú×÷Òµ£¬µ«ÊÇʵ¼ÊÉÏËüÊǶÔ
Hadoop µÄ²¹³ä£¬¿ÉÒÔÔÚ Hadoo ÎļþϵͳÖв¢ÐÐÔËÐС£Í¨¹ýÃûΪ Mesos µÄµÚÈý·½¼¯Èº¿ò¼Ü¿ÉÒÔÖ§³Ö´ËÐÐΪ¡£Spark
ÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ AMP ʵÑéÊÒ (Algorithms, Machines, and People
Lab) ¿ª·¢£¬¿ÉÓÃÀ´¹¹½¨´óÐ͵ġ¢µÍÑÓ³ÙµÄÊý¾Ý·ÖÎöÓ¦ÓóÌÐò¡£
Spark ¼¯Èº¼ÆËã¼Ü¹¹
ËäÈ» Spark Óë Hadoop ÓÐÏàËÆÖ®´¦£¬µ«ËüÌṩÁ˾ßÓÐÓÐÓòîÒìµÄÒ»¸öеļ¯Èº¼ÆËã¿ò¼Ü¡£Ê×ÏÈ£¬Spark
ÊÇΪ¼¯Èº¼ÆËãÖеÄÌØ¶¨ÀàÐ͵Ť×÷¸ºÔضøÉè¼Æ£¬¼´ÄÇЩÔÚ²¢ÐвÙ×÷Ö®¼äÖØÓù¤×÷Êý¾Ý¼¯£¨±ÈÈç»úÆ÷ѧϰËã·¨£©µÄ¹¤×÷¸ºÔØ¡£ÎªÁËÓÅ»¯ÕâЩÀàÐ͵Ť×÷¸ºÔØ£¬Spark
Òý½øÁËÄڴ漯Ⱥ¼ÆËãµÄ¸ÅÄ¿ÉÔÚÄڴ漯Ⱥ¼ÆËãÖн«Êý¾Ý¼¯»º´æÔÚÄÚ´æÖУ¬ÒÔËõ¶Ì·ÃÎÊÑÓ³Ù¡£
Spark »¹Òý½øÁËÃûΪ µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯ (RDD) µÄ³éÏó¡£RDD
ÊÇ·Ö²¼ÔÚÒ»×é½ÚµãÖеÄÖ»¶Á¶ÔÏ󼯺ϡ£ÕâЩ¼¯ºÏÊǵ¯ÐԵģ¬Èç¹ûÊý¾Ý¼¯Ò»²¿·Ö¶ªÊ§£¬Ôò¿ÉÒÔ¶ÔËüÃǽøÐÐÖØ½¨¡£Öؽ¨²¿·ÖÊý¾Ý¼¯µÄ¹ý³ÌÒÀÀµÓÚÈÝ´í»úÖÆ£¬¸Ã»úÖÆ¿ÉÒÔά»¤
¡°ÑªÍ³¡±£¨¼´³äÐí»ùÓÚÊý¾ÝÑÜÉú¹ý³ÌÖØ½¨²¿·ÖÊý¾Ý¼¯µÄÐÅÏ¢£©¡£RDD ±»±íʾΪһ¸ö Scala ¶ÔÏ󣬲¢ÇÒ¿ÉÒÔ´ÓÎļþÖд´½¨Ëü£»Ò»¸ö²¢Ðл¯µÄÇÐÆ¬£¨±é²¼ÓÚ½ÚµãÖ®¼ä£©£»ÁíÒ»¸ö
RDD µÄת»»ÐÎʽ£»²¢ÇÒ×îÖջ᳹µ×¸Ä±äÏÖÓÐ RDD µÄ³Ö¾ÃÐÔ£¬±ÈÈçÇëÇ󻺴æÔÚÄÚ´æÖС£
Spark ÖеÄÓ¦ÓóÌÐò³ÆÎªÇý¶¯³ÌÐò£¬ÕâЩÇý¶¯³ÌÐò¿ÉʵÏÖÔÚµ¥Ò»½ÚµãÉÏÖ´ÐеIJÙ×÷»òÔÚÒ»×é½ÚµãÉϲ¢ÐÐÖ´ÐеIJÙ×÷¡£Óë
Hadoop ÀàËÆ£¬Spark Ö§³Öµ¥½Úµã¼¯Èº»ò¶à½Úµã¼¯Èº¡£¶ÔÓÚ¶à½Úµã²Ù×÷£¬Spark ÒÀÀµÓÚ Mesos
¼¯Èº¹ÜÀíÆ÷¡£Mesos Ϊ·Ö²¼Ê½Ó¦ÓóÌÐòµÄ×ÊÔ´¹²ÏíºÍ¸ôÀëÌṩÁËÒ»¸öÓÐЧƽ̨£¨²Î¼û ͼ 1£©¡£¸ÃÉèÖóäÐí
Spark Óë Hadoop ¹²´æÓÚ½ÚµãµÄÒ»¸ö¹²Ïí³ØÖС£

ͼ 1. Spark ÒÀÀµÓÚ Mesos
¼¯Èº¹ÜÀíÆ÷ʵÏÖ×ÊÔ´¹²ÏíºÍ¸ôÀë¡£
Spark ±à³Ìģʽ
Çý¶¯³ÌÐò¿ÉÒÔÔÚÊý¾Ý¼¯ÉÏÖ´ÐÐÁ½ÖÖÀàÐ͵IJÙ×÷£º¶¯×÷ºÍת»»¡£¶¯×÷ »áÔÚÊý¾Ý¼¯ÉÏÖ´ÐÐÒ»¸ö¼ÆË㣬²¢ÏòÇý¶¯³ÌÐò·µ»ØÒ»¸öÖµ£»¶ø×ª»»
»á´ÓÏÖÓÐÊý¾Ý¼¯Öд´½¨Ò»¸öеÄÊý¾Ý¼¯¡£¶¯×÷µÄʾÀý°üÀ¨Ö´ÐÐÒ»¸ö Reduce ²Ù×÷£¨Ê¹Óú¯Êý£©ÒÔ¼°ÔÚÊý¾Ý¼¯ÉϽøÐеü´ú£¨ÔÚÿ¸öÔªËØÉÏÔËÐÐÒ»¸öº¯Êý£¬ÀàËÆÓÚ
Map ²Ù×÷£©¡£×ª»»Ê¾Àý°üÀ¨ Map ²Ù×÷ºÍ Cache ²Ù×÷£¨ËüÇëÇóеÄÊý¾Ý¼¯´æ´¢ÔÚÄÚ´æÖУ©¡£
ÎÒÃÇËæºó¾Í»á¿´¿´ÕâÁ½¸ö²Ù×÷µÄʾÀý£¬µ«ÊÇ£¬ÈÃÎÒÃÇÏÈÀ´Á˽âһϠScala
ÓïÑÔ¡£
Scala ¼ò½é
Scala ¿ÉÄÜÊÇ Internet Éϲ»ÎªÈËÖªµÄÃØÃÜÖ®Ò»¡£Äú¿ÉÒÔÔÚһЩ×æµÄ
Internet ÍøÕ¾£¨Èç Twitter¡¢LinkedIn ºÍ Foursquare£¬Foursquare
ʹÓÃÁËÃûΪ Lift µÄ Web Ó¦ÓóÌÐò¿ò¼Ü£©µÄÖÆ×÷¹ý³ÌÖп´µ½ Scala µÄÉíÓ°¡£»¹ÓÐÖ¤¾Ý±íÃ÷£¬Ðí¶à½ðÈÚ»ú¹¹ÒÑ¿ªÊ¼¹Ø×¢
Scala µÄÐÔÄÜ£¨±ÈÈç EDF Trading ¹«Ë¾½« Scala ÓÃÓÚÑÜÉú²úÆ·¶¨¼Û£©¡£
Scala ÊÇÒ»Öֶ෶ʽÓïÑÔ£¬ËüÒÔÒ»ÖÖÁ÷³©µÄ¡¢ÈÃÈ˸е½Êæ·þµÄ·½·¨Ö§³ÖÓëÃüÁîʽ¡¢º¯ÊýʽºÍÃæÏò¶ÔÏóµÄÓïÑÔÏà¹ØµÄÓïÑÔÌØÐÔ¡£´ÓÃæÏò¶ÔÏóµÄ½Ç¶ÈÀ´¿´£¬Scala
ÖеÄÿ¸öÖµ¶¼ÊÇÒ»¸ö¶ÔÏó¡£Í¬Ñù£¬´Óº¯Êý¹ÛµãÀ´¿´£¬Ã¿¸öº¯Êý¶¼ÊÇÒ»¸öÖµ¡£Scala Ò²ÊÇÊôÓÚ¾²Ì¬ÀàÐÍ£¬ËüÓÐÒ»¸ö¼ÈÓбíÏÖÁ¦Óֺܰ²È«µÄÀàÐÍϵͳ¡£
´ËÍ⣬Scala ÊÇÒ»ÖÖÐéÄâ»ú (VM) ÓïÑÔ£¬²¢ÇÒ¿ÉÒÔͨ¹ý Scala
±àÒëÆ÷Éú³ÉµÄ×Ö½ÚÂ룬ֱ½ÓÔËÐÐÔÚʹÓà Java Runtime Environment V2 µÄ Java?
Virtual Machine (JVM) ÉÏ¡£¸ÃÉèÖóäÐí Scala ÔËÐÐÔÚÔËÐÐ JVM µÄÈκεط½£¨ÒªÇóÒ»¸ö¶îÍâµÄ
Scala ÔËÐÐʱ¿â£©¡£Ëü»¹³äÐí Scala ÀûÓôóÁ¿ÏÖ´æµÄ Java ¿âÒÔ¼°ÏÖÓÐµÄ Java ´úÂë¡£
×îºó£¬Scala ¾ßÓпÉÀ©Õ¹ÐÔ¡£¸ÃÓïÑÔ£¨Ëüʵ¼ÊÉÏ´ú±íÁË¿ÉÀ©Õ¹ÓïÑÔ£©±»¶¨ÒåΪ¿ÉÖ±½Ó¼¯³Éµ½ÓïÑÔÖеļòµ¥À©Õ¹¡£
¾ÙÀý˵Ã÷ Scala
ÈÃÎÒÃÇÀ´¿´Ò»Ð©Êµ¼ÊµÄ Scala ÓïÑÔʾÀý¡£Scala Ìṩ×ÔÉíµÄ½âÊÍÆ÷£¬³äÐíÄúÒÔ½»»¥·½Ê½ÊÔÓøÃÓïÑÔ¡£Scala
µÄÓÐÓô¦ÀíÒѳ¬³ö±¾ÎÄËùÉæ¼°µÄ·¶Î§£¬µ«ÊÇÄú¿ÉÒÔÔÚ ²Î¿¼×ÊÁÏ ÖÐÕÒµ½¸ü¶àÏà¹ØÐÅÏ¢µÄÁ´½Ó¡£
Çåµ¥ 1 ͨ¹ý Scala ×ÔÉíÌṩµÄ½âÊÍÆ÷¿ªÊ¼ÁË¿ìËÙÁ˽â Scala
ÓïÑÔÖ®Âá£ÆôÓà Scala ºó£¬ÏµÍ³»á¸ø³öÌáʾ£¬Í¨¹ý¸ÃÌáʾ£¬Äú¿ÉÒÔÒÔ½»»¥·½Ê½ÆÀ¹À±í´ïʽºÍ³ÌÐò¡£ÎÒÃÇÊ×ÏÈ´´½¨ÁËÁ½¸ö±äÁ¿£¬Ò»¸öÊDz»¿É±ä±äÁ¿£¨¼´
vals£¬³Æ×÷µ¥¸³Öµ£©£¬ÁíÒ»¸ö±äÁ¿Êǿɱä±äÁ¿ (vars)¡£×¢Ò⣬µ±ÄúÊÔͼ¸ü¸Ä b£¨ÄúµÄ var£©Ê±£¬Äú¿ÉÒԳɹ¦µØÖ´Ðд˲Ù×÷£¬µ«ÊÇ£¬µ±ÄúÊÔͼ¸ü¸Ä
val ʱ£¬Ôò»á·µ»ØÒ»¸ö´íÎó¡£
Çåµ¥ 1. Scala Öеļòµ¥±äÁ¿
$ scala Welcome to Scala version 2.8.1.final (OpenJDK Client VM, Java 1.6.0_20). Type in expressions to have them evaluated. Type :help for more information. scala> val a = 1 a: Int = 1 scala> var b = 2 b: Int = 2 scala> b = b + a b: Int = 3 scala> a = 2 <console>6: error: reassignment to val a = 2 ^ |
½ÓÏÂÀ´£¬´´½¨Ò»¸ö¼òµ¥µÄ·½·¨À´¼ÆËãºÍ·µ»Ø Int µÄƽ·½Öµ¡£ÔÚ Scala
Öж¨ÒåÒ»¸ö·½·¨µÃÏÈ´Ó def ¿ªÊ¼£¬ºó¸ú·½·¨Ãû³ÆºÍ²ÎÊýÁÐ±í£¬È»ºó£¬Òª½«ËüÉèÖÃΪÓï¾äµÄÊýÁ¿£¨ÔÚ±¾Ê¾ÀýÖÐΪ
1£©¡£ÎÞÐèÖ¸¶¨Èκηµ»ØÖµ£¬ÒòΪ¿ÉÒÔ´Ó·½·¨±¾ÉíÍÆ¶Ï³ö¸ÃÖµ¡£×¢Ò⣬ÕâÀàËÆÓÚΪ±äÁ¿¸³Öµ¡£ÔÚÒ»¸öÃûΪ 3 µÄ¶ÔÏóºÍÒ»¸öÃûΪ
res0 µÄ½á¹û±äÁ¿£¨Scala ½âÊÍÆ÷»á×Ô¶¯ÎªÄú´´½¨¸Ã±äÁ¿£©ÉÏ£¬ÎÒÑÝʾÁËÕâ¸ö¹ý³Ì¡£ÕâЩ¶¼ÏÔʾÔÚ Çåµ¥
2 ÖС£
Çåµ¥ 2. Scala ÖеÄÒ»¸ö¼òµ¥·½·¨
scala> def square(x: Int) = x*x square: (x: Int)Int scala> square(3) res0: Int = 9
scala> square(res0)
res1: Int = 81 |
½ÓÏÂÀ´£¬ÈÃÎÒÃÇ¿´Ò»Ï Scala ÖеÄÒ»¸ö¼òµ¥ÀàµÄ¹¹½¨¹ý³Ì£¨²Î¼û Çåµ¥
3£©¡£¶¨ÒåÒ»¸ö¼òµ¥µÄ Dog ÀàÀ´½ÓÊÕÒ»¸ö String ²ÎÊý£¨ÄúµÄÃû³Æ¹¹Ô캯Êý£©¡£×¢Ò⣬ÕâÀïµÄÀàÖ±½Ó²ÉÓÃÁ˸òÎÊý£¨ÎÞÐèÔÚÀàµÄÕýÎÄÖж¨ÒåÀà²ÎÊý£©¡£»¹ÓÐÒ»¸ö¶¨Òå¸Ã²ÎÊýµÄ·½·¨£¬¿ÉÔÚµ÷ÓòÎÊýʱ·¢ËÍÒ»¸ö×Ö·û´®¡£ÄúÒª´´½¨Ò»¸öеÄÀàʵÀý£¬È»ºóµ÷ÓÃÄúµÄ·½·¨¡£×¢Ò⣬½âÊÍÆ÷»á²åÈëһЩÊúÏߣºËüÃDz»ÊôÓÚ´úÂë¡£
Çåµ¥ 3. Scala ÖеÄÒ»¸ö¼òµ¥µÄÀà
scala> class Dog( name: String ) { | def bark() = println(name + " barked") | } defined class Dog scala> val stubby = new Dog("Stubby") stubby: Dog = Dog@1dd5a3d scala> stubby.bark Stubby barked scala> |
Íê³ÉÉÏÊö²Ù×÷ºó£¬Ö»ÐèÊäÈë :quit ¼´¿ÉÍ˳ö Scala ½âÊÍÆ÷¡£
°²×° Scala ºÍ Spark
µÚÒ»²½ÊÇÏÂÔØºÍÅäÖà Scala¡£Çåµ¥ 4 ÖÐÏÔʾµÄÃüÁî²ûÊöÁË Scala
°²×°µÄÏÂÔØºÍ×¼±¸¹¤×÷¡£Ê¹Óà Scala v2.8£¬ÒòΪÕâÊǾ¹ý֤ʵµÄ Spark ËùÐèµÄ°æ±¾¡£
Çåµ¥ 4. °²×° Scala
$ wget http://www.scala-lang.org/downloads/distrib/files/scala-2.8.1.final.tgz $ sudo tar xvfz scala-2.8.1.final.tgz --directory /opt/ |
Ҫʹ Scala ¿ÉÊÓ»¯£¬Ç뽫ÏÂÁÐÐÐÌí¼ÓÖÁÄúµÄ .bashrc ÖУ¨Èç¹ûÄúÕýʹÓà Bash ×÷Ϊ shell£©£º
export SCALA_HOME=/opt/scala-2.8.1.final export PATH=$SCALA_HOME/bin:$PATH |
½Ó×Å¿ÉÒÔ¶ÔÄúµÄ°²×°½øÐвâÊÔ£¬Èç Çåµ¥ 5 Ëùʾ¡£Õâ×éÃüÁî»á½«¸ü¸Ä¼ÓÔØÖÁ
bashrc ÎļþÖУ¬½Ó×Å¿ìËÙ²âÊÔ Scala ½âÊÍÆ÷ shell¡£
Çåµ¥ 5. ÅäÖúÍÔËÐн»»¥Ê½ Scala
$ scala Welcome to Scala version 2.8.1.final (OpenJDK Client VM, Java 1.6.0_20). Type in expressions to have them evaluated. Type :help for more information.
scala> println("Scala is installed!")
Scala is installed!
scala> :quit
$ |
ÈçÇåµ¥ÖÐËùʾ£¬ÏÖÔÚÓ¦¸Ã¿´µ½Ò»¸ö Scala Ìáʾ¡£Äú¿ÉÒÔͨ¹ýÊäÈë :quit
Ö´ÐÐÍ˳ö¡£×¢Ò⣬Scala ÒªÔÚ JVM µÄÉÏÏÂÎÄÖÐÖ´ÐвÙ×÷£¬ËùÒÔÄú»áÐèÒª JVM¡£ÎÒʹÓõÄÊÇ Ubuntu£¬ËüÔÚĬÈÏÇé¿öÏ»áÌṩ
OpenJDK¡£
½ÓÏÂÀ´£¬Çë»ñÈ¡×îÐ嵀 Spark ¿ò¼Ü¸±±¾¡£Îª´Ë£¬ÇëʹÓà Çåµ¥ 6 ÖеĽű¾¡£
Çåµ¥ 6. ÏÂÔØºÍ°²×° Spark ¿ò¼Ü
wget https://github.com/mesos/spark/tarball/0.3-scala-2.8/ mesos-spark-0.3-scala-2.8-0-gc86af80.tar.gz $ sudo tar xvfz mesos-spark-0.3-scala-2.8-0-gc86af80.tar.gz |
½ÓÏÂÀ´£¬Ê¹ÓÃÏÂÁÐÐн« spark ÅäÖÃÉèÖÃÔÚ Scala µÄ¸ùĿ¼ ./conf/spar-env.sh
ÖУº
export SCALA_HOME=/opt/scala-2.8.1.final |
ÉèÖõÄ×îºóÒ»²½ÊÇʹÓüòµ¥µÄ¹¹½¨¹¤¾ß (sbt) ¸üÐÂÄúµÄ·Ö²¼¡£sbt ÊÇÒ»¿îÕë¶Ô Scala µÄ¹¹½¨¹¤¾ß£¬ÓÃÓÚ
Spark ·Ö²¼ÖС£Äú¿ÉÒÔÔÚ mesos-spark-c86af80 ×ÓĿ¼ÖÐÖ´ÐиüкͱäÒì²½Ö裬ÈçÏÂËùʾ£º
×¢Ò⣬ÔÚÖ´Ðд˲½Öèʱ£¬ÐèÒªÁ¬½ÓÖÁ Internet¡£µ±Íê³É´Ë²Ù×÷ºó£¬ÇëÖ´ÐÐ
Spark ¿ìËÙ¼ì²â£¬Èç Çåµ¥ 7 Ëùʾ¡£ÔڸòâÊÔÖУ¬ÐèÒªÔËÐÐ SparkPi ʾÀý£¬Ëü»á¼ÆËã pi µÄ¹ÀÖµ£¨Í¨¹ýµ¥Î»Æ½·½ÖеÄÈÎÒâµã²ÉÑù£©¡£ËùÏÔʾµÄ¸ñʽÐèÒªÑùÀý³ÌÐò
(spark.examples.SparkPi) ºÍÖ÷»ú²ÎÊý£¬¸Ã²ÎÊý¶¨ÒåÁË Mesos Ö÷»ú£¨ÔÚ´ËÀýÖУ¬ÊÇÄúµÄ±¾µØÖ÷»ú£¬ÒòΪËüÊÇÒ»¸öµ¥½Úµã¼¯Èº£©ºÍҪʹÓõÄÏß³ÌÊýÁ¿¡£×¢Ò⣬ÔÚ
Çåµ¥ 7 ÖУ¬Ö´ÐÐÁËÁ½¸öÈÎÎñ£¬¶øÇÒÕâÁ½¸öÈÎÎñ±»ÐòÁл¯£¨ÈÎÎñ 0 ¿ªÊ¼ºÍ½áÊøÖ®ºó£¬ÈÎÎñ 1 ÔÙ¿ªÊ¼£©¡£
Çåµ¥ 7. ¶Ô Spark Ö´ÐпìËÙ¼ì²â
$ ./run spark.examples.SparkPi local[1] 11/08/26 19:52:33 INFO spark.CacheTrackerActor: Registered actor on port 50501 11/08/26 19:52:33 INFO spark.MapOutputTrackerActor: Registered actor on port 50501 11/08/26 19:52:33 INFO spark.SparkContext: Starting job... 11/08/26 19:52:33 INFO spark.CacheTracker: Registering RDD ID 0 with cache 11/08/26 19:52:33 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions 11/08/26 19:52:33 INFO spark.CacheTrackerActor: Asked for current cache locations 11/08/26 19:52:33 INFO spark.LocalScheduler: Final stage: Stage 0 11/08/26 19:52:33 INFO spark.LocalScheduler: Parents of final stage: List() 11/08/26 19:52:33 INFO spark.LocalScheduler: Missing parents: List() 11/08/26 19:52:33 INFO spark.LocalScheduler: Submitting Stage 0, which has no missing ... 11/08/26 19:52:33 INFO spark.LocalScheduler: Running task 0 11/08/26 19:52:33 INFO spark.LocalScheduler: Size of task 0 is 1385 bytes 11/08/26 19:52:33 INFO spark.LocalScheduler: Finished task 0 11/08/26 19:52:33 INFO spark.LocalScheduler: Running task 1 11/08/26 19:52:33 INFO spark.LocalScheduler: Completed ResultTask(0, 0) 11/08/26 19:52:33 INFO spark.LocalScheduler: Size of task 1 is 1385 bytes 11/08/26 19:52:33 INFO spark.LocalScheduler: Finished task 1 11/08/26 19:52:33 INFO spark.LocalScheduler: Completed ResultTask(0, 1) 11/08/26 19:52:33 INFO spark.SparkContext: Job finished in 0.145892763 s Pi is roughly 3.14952 $ |
ͨ¹ýÔö¼ÓÏß³ÌÊýÁ¿£¬Äú²»½ö¿ÉÒÔÔö¼ÓÏß³ÌÖ´ÐеIJ¢Ðл¯£¬»¹¿ÉÒÔÓøüÉÙµÄʱ¼äÖ´ÐÐ×÷Òµ£¨Èç
Çåµ¥ 8 Ëùʾ£©¡£
Çåµ¥ 8. ¶Ô°üº¬Á½¸öÏß³ÌµÄ Spark Ö´ÐÐÁíÒ»¸ö¿ìËÙ¼ì²â
$ ./run spark.examples.SparkPi local[2] 11/08/26 20:04:30 INFO spark.MapOutputTrackerActor: Registered actor on port 50501 11/08/26 20:04:30 INFO spark.CacheTrackerActor: Registered actor on port 50501 11/08/26 20:04:30 INFO spark.SparkContext: Starting job... 11/08/26 20:04:30 INFO spark.CacheTracker: Registering RDD ID 0 with cache 11/08/26 20:04:30 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions 11/08/26 20:04:30 INFO spark.CacheTrackerActor: Asked for current cache locations 11/08/26 20:04:30 INFO spark.LocalScheduler: Final stage: Stage 0 11/08/26 20:04:30 INFO spark.LocalScheduler: Parents of final stage: List() 11/08/26 20:04:30 INFO spark.LocalScheduler: Missing parents: List() 11/08/26 20:04:30 INFO spark.LocalScheduler: Submitting Stage 0, which has no missing ... 11/08/26 20:04:30 INFO spark.LocalScheduler: Running task 0 11/08/26 20:04:30 INFO spark.LocalScheduler: Running task 1 11/08/26 20:04:30 INFO spark.LocalScheduler: Size of task 1 is 1385 bytes 11/08/26 20:04:30 INFO spark.LocalScheduler: Size of task 0 is 1385 bytes 11/08/26 20:04:30 INFO spark.LocalScheduler: Finished task 0 11/08/26 20:04:30 INFO spark.LocalScheduler: Finished task 1 11/08/26 20:04:30 INFO spark.LocalScheduler: Completed ResultTask(0, 1) 11/08/26 20:04:30 INFO spark.LocalScheduler: Completed ResultTask(0, 0) 11/08/26 20:04:30 INFO spark.SparkContext: Job finished in 0.101287331 s Pi is roughly 3.14052 $ |
ʹÓà Scala ¹¹½¨Ò»¸ö¼òµ¥µÄ Spark Ó¦ÓóÌÐò
Òª¹¹½¨ Spark Ó¦ÓóÌÐò£¬ÄúÐèÒªµ¥Ò» Java ¹éµµ (JAR) ÎļþÐÎʽµÄ
Spark ¼°ÆäÒÀÀµ¹ØÏµ¡£Ê¹Óà sbt ÔÚ Spark µÄ¶¥¼¶Ä¿Â¼Öд´½¨¸Ã JAR Îļþ£¬ÈçÏÂËùʾ£º
½á¹û²úÉúÒ»¸öÎļþ ./core/target/scala_2.8.1/"Spark
Core-assembly-0.3.jar"¡£½«¸ÃÎļþÌí¼ÓÖÁÄúµÄ CLASSPATH ÖУ¬ÒÔ±ã¿ÉÒÔ·ÃÎÊËü¡£ÔÚ±¾Ê¾ÀýÖУ¬²»»áÓõ½´Ë
JAR Îļþ£¬ÒòΪÄú½«»áʹÓà Scala ½âÊÍÆ÷ÔËÐÐËü£¬¶ø²»ÊÇ¶ÔÆä½øÐбàÒë¡£
ÔÚ±¾Ê¾ÀýÖУ¬Ê¹ÓÃÁ˱ê×¼µÄ MapReduce ת»»£¨Èç Çåµ¥ 9 Ëùʾ£©¡£¸ÃʾÀý´ÓÖ´ÐбØÒªµÄ
Spark ÀർÈ뿪ʼ¡£½Ó×Å£¬ÐèÒª¶¨ÒåÄúµÄÀà (SparkTest) ¼°ÆäÖ÷·½·¨£¬ÓÃËü½âÎöÉÔºóʹÓõIJÎÊý¡£ÕâЩ²ÎÊý¶¨ÒåÁËÖ´ÐÐ
Spark µÄ»·¾³£¨ÔÚ±¾ÀýÖУ¬¸Ã»·¾³ÊÇÒ»¸öµ¥½Úµã¼¯Èº£©¡£½ÓÏÂÀ´£¬Òª´´½¨ SparkContext ¶ÔÏó£¬Ëü»á¸æÖª
Spark ÈçºÎ¶ÔÄúµÄ¼¯Èº½øÐзÃÎÊ¡£¸Ã¶ÔÏóÐèÒªÁ½¸ö²ÎÊý£ºMesos Ö÷»úÃû³Æ£¨ÒÑ´«È룩ÒÔ¼°Äú·ÖÅ䏸×÷ÒµµÄÃû³Æ
(SparkTest)¡£½âÎöÃüÁîÐÐÖеÄÇÐÆ¬ÊýÁ¿£¬Ëü»á¸æÖª Spark ÓÃÓÚ×÷ÒµµÄÏß³ÌÊýÁ¿¡£ÒªÉèÖõÄ×îºóÒ»ÏîÊÇÖ¸¶¨ÓÃÓÚ
MapReduce ²Ù×÷µÄÎı¾Îļþ¡£
×îºó£¬Äú½«Á˽â Spark ʾÀýµÄʵÖÊ£¬ËüÊÇÓÉÒ»×éת»»×é³É¡£Ê¹ÓÃÄúµÄÎļþʱ£¬¿Éµ÷ÓÃ
flatMap ·½·¨·µ»ØÒ»¸ö RDD£¨Í¨¹ýÖ¸¶¨µÄº¯Êý½«Îı¾ÐзֽâΪ±ê¼Ç£©¡£È»ºóͨ¹ý map ·½·¨£¨¸Ã·½·¨´´½¨Á˼üÖµ¶Ô£©´«µÝ´Ë
RDD £¬×îÖÕͨ¹ý ReduceByKey ·½·¨ºÏ²¢¼üÖµ¶Ô¡£ºÏ²¢²Ù×÷ÊÇͨ¹ý½«¼üÖµ¶Ô´«µÝ¸ø _ + _ ÄäÃûº¯ÊýÀ´Íê³ÉµÄ¡£¸Ãº¯ÊýÖ»²ÉÓÃÁ½¸ö²ÎÊý£¨ÃÜÔ¿ºÍÖµ£©£¬²¢·µ»Ø½«Á½Õߺϲ¢Ëù²úÉúµÄ½á¹û£¨Ò»¸ö
String ºÍÒ»¸ö Int£©¡£½Ó×ÅÒÔÎı¾ÎļþµÄÐÎʽ·¢Ë͸ÃÖµ£¨µ½Êä³öĿ¼£©¡£
Çåµ¥ 9. Scala/Spark ÖÐµÄ MapReduce (SparkTest.scala)
import spark.SparkContext import SparkContext._ object SparkTest { def main( args: Array[String]) { if (args.length == 0) { System.err.println("Usage: SparkTest <host> [<slices>]") System.exit(1) } val spark = new SparkContext(args(0), "SparkTest") val slices = if (args.length > 1) args(1).toInt else 2 val myFile = spark.textFile("test.txt") val counts = myFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("out.txt") } } SparkTest.main(args) |
ÒªÖ´ÐÐÄúµÄ½Å±¾£¬Ö»ÐèÒªÖ´ÐÐÒÔÏÂÃüÁ
$ scala SparkTest.scala local[1] |
Äú¿ÉÒÔÔÚÊä³öĿ¼ÖÐÕÒµ½ MapReduce ²âÊÔÎļþ£¨Èç output/part-00000£©¡£
ÆäËûµÄ´óÊý¾Ý·ÖÎö¿ò¼Ü
×Ô´Ó¿ª·¢ÁË Hadoop ºó£¬Êг¡ÉÏÍÆ³öÁËÐí¶àÖµµÃ¹Ø×¢µÄÆäËû´óÊý¾Ý·ÖÎöƽ̨¡£ÕâЩƽ̨·¶Î§¹ãÀ«£¬´Ó¼òµ¥µÄ»ùÓڽű¾µÄ²úÆ·µ½Óë
Hadoop ÀàËÆµÄÉú²ú»·¾³¡£
ÃûΪ bashreduce µÄƽ̨ÊÇÕâЩƽ̨ÖÐ×î¼òµ¥µÄƽ̨֮һ£¬¹ËÃû˼Ò壬Ëü³äÐíÄúÔÚ
Bash »·¾³ÖеĶà¸ö»úÆ÷ÉÏÖ´ÐÐ MapReduce ÀàÐ͵IJÙ×÷¡£bashreduce ÒÀÀµÓÚÄú¼Æ»®Ê¹ÓõĻúÆ÷¼¯ÈºµÄ
Secure Shell£¨ÎÞÃÜÂ룩£¬²¢ÒԽű¾µÄÐÎʽ´æÔÚ£¬Í¨¹ýËü£¬Äú¿ÉÒÔʹÓà UNIX?-style ¹¤¾ß£¨sort¡¢awk¡¢netcat
µÈ£©ÇëÇó×÷Òµ¡£
GraphLab ÊÇÁíÒ»¸öÊÜÈ˹Ø×¢µÄ MapReduce ³éÏóʵÏÖ£¬Ëü²àÖØÓÚ»úÆ÷ѧϰËã·¨µÄ²¢ÐÐʵÏÖ¡£ÔÚ
GraphLab ÖУ¬Map ½×¶Î»á¶¨ÒåһЩ¿Éµ¥¶À£¨ÔÚ¶ÀÁ¢Ö÷»úÉÏ£©Ö´ÐеļÆËãÖ¸Á¶ø Reduce ½×¶Î»á¶Ô½á¹û½øÐкϲ¢¡£
×îºó£¬´óÊý¾Ý³¡¾°µÄÒ»¸öгÉÔ±ÊÇÀ´×Ô Twitter µÄ Storm£¨Í¨¹ýÊÕ¹º
BackType »ñµÃ£©¡£Storm ±»¶¨ÒåΪ ¡°ÊµÊ±´¦ÀíµÄ Hadoop¡±£¬ËüÖ÷Òª²àÖØÓÚÁ÷´¦ÀíºÍ³ÖÐø¼ÆË㣨Á÷´¦Àí¿ÉÒԵóö¼ÆËãµÄ½á¹û£©¡£Storm
ÊÇÓà Clojure ÓïÑÔ£¨Lisp ÓïÑÔµÄÒ»ÖÖ·½ÑÔ£©±àдµÄ£¬µ«ËüÖ§³ÖÓÃÈκÎÓïÑÔ£¨±ÈÈç Ruby ºÍ Python£©±àдµÄÓ¦ÓóÌÐò¡£Twitter
ÓÚ 2011 Äê 9 ÔÂÒÔ¿ªÔ´ÐÎʽ·¢²¼ Storm¡£
½áÊøÓï
Spark ÊDz»¶Ï׳´óµÄ´óÊý¾Ý·ÖÎö½â¾ö·½°¸¼Ò×åÖб¸ÊܹØ×¢µÄÐÂÔö³ÉÔ±¡£Ëü²»½öΪ·Ö²¼Êý¾Ý¼¯µÄ´¦ÀíÌṩһ¸öÓÐЧ¿ò¼Ü£¬¶øÇÒÒÔ¸ßЧµÄ·½Ê½£¨Í¨¹ý¼ò½àµÄ
Scala ½Å±¾£©´¦Àí·Ö²¼Êý¾Ý¼¯¡£Spark ºÍ Scala ¶¼´¦ÔÚ»ý¼«·¢Õ¹½×¶Î¡£²»¹ý£¬ÓÉÓڹؼü Internet
ÊôÐÔÖвÉÓÃÁËËüÃÇ£¬Á½ÕßËÆºõ¶¼ÒÑ´ÓÊÜÈ˹Ø×¢µÄ¿ªÔ´Èí¼þ¹ý¶É³ÉΪ»ù´¡ Web ¼¼Êõ¡£
|