Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark£¬Ò»ÖÖ¿ìËÙÊý¾Ý·ÖÎöÌæ´ú·½°¸
 
×÷ÕߣºM. Tim Jones À´Ô´£ºIBM ·¢²¼ÓÚ£º2015-7-9
  2060  次浏览      27
 

ËäÈ» Hadoop ÔÚ·Ö²¼Ê½Êý¾Ý·ÖÎö·½Ã汸ÊܹØ×¢£¬µ«ÊÇÈÔÓÐÒ»Ð©Ìæ´ú²úÆ·ÌṩÁËÓÅÓÚµäÐÍ Hadoop ƽ̨µÄÁîÈ˹Ø×¢µÄÓÅÊÆ¡£Spark ÊÇÒ»ÖÖ¿ÉÀ©Õ¹µÄÊý¾Ý·ÖÎöƽ̨£¬ËüÕûºÏÁËÄÚ´æ¼ÆËãµÄ»ùÔª£¬Òò´Ë£¬Ïà¶ÔÓÚ Hadoop µÄ¼¯Èº´æ´¢·½·¨£¬ËüÔÚÐÔÄÜ·½Ãæ¸ü¾ßÓÅÊÆ¡£Spark ÊÇÔÚ Scala ÓïÑÔÖÐʵÏֵ쬲¢ÇÒÀûÓÃÁ˸ÃÓïÑÔ£¬ÎªÊý¾Ý´¦ÀíÌṩÁ˶ÀÒ»ÎÞ¶þµÄ»·¾³¡£Á˽â Spark µÄ¼¯Èº¼ÆËã·½·¨ÒÔ¼°ËüÓë Hadoop µÄ²»Í¬Ö®´¦¡£

Spark ÊÇÒ»ÖÖÓë Hadoop ÏàËÆµÄ¿ªÔ´¼¯Èº¼ÆËã»·¾³£¬µ«ÊÇÁ½ÕßÖ®¼ä»¹´æÔÚһЩ²»Í¬Ö®´¦£¬ÕâЩÓÐÓõIJ»Í¬Ö®´¦Ê¹ Spark ÔÚijЩ¹¤×÷¸ºÔØ·½Ãæ±íÏֵøü¼ÓÓÅÔ½£¬»»¾ä»°Ëµ£¬Spark ÆôÓÃÁËÄÚ´æ·Ö²¼Êý¾Ý¼¯£¬³ýÁËÄܹ»Ìṩ½»»¥Ê½²éѯÍ⣬Ëü»¹¿ÉÒÔÓÅ»¯µü´ú¹¤×÷¸ºÔØ¡£

Spark ÊÇÔÚ Scala ÓïÑÔÖÐʵÏֵģ¬Ëü½« Scala ÓÃ×÷ÆäÓ¦ÓóÌÐò¿ò¼Ü¡£Óë Hadoop ²»Í¬£¬Spark ºÍ Scala Äܹ»½ôÃܼ¯³É£¬ÆäÖÐµÄ Scala ¿ÉÒÔÏñ²Ù×÷±¾µØ¼¯ºÏ¶ÔÏóÒ»ÑùÇáËɵزÙ×÷·Ö²¼Ê½Êý¾Ý¼¯¡£

¾¡¹Ü´´½¨ Spark ÊÇΪÁËÖ§³Ö·Ö²¼Ê½Êý¾Ý¼¯Éϵĵü´ú×÷Òµ£¬µ«ÊÇʵ¼ÊÉÏËüÊÇ¶Ô Hadoop µÄ²¹³ä£¬¿ÉÒÔÔÚ Hadoo ÎļþϵͳÖв¢ÐÐÔËÐС£Í¨¹ýÃûΪ Mesos µÄµÚÈý·½¼¯Èº¿ò¼Ü¿ÉÒÔÖ§³Ö´ËÐÐΪ¡£Spark ÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ AMP ʵÑéÊÒ (Algorithms, Machines, and People Lab) ¿ª·¢£¬¿ÉÓÃÀ´¹¹½¨´óÐ͵ġ¢µÍÑÓ³ÙµÄÊý¾Ý·ÖÎöÓ¦ÓóÌÐò¡£

Spark ¼¯Èº¼ÆËã¼Ü¹¹

ËäÈ» Spark Óë Hadoop ÓÐÏàËÆÖ®´¦£¬µ«ËüÌṩÁ˾ßÓÐÓÐÓòîÒìµÄÒ»¸öеļ¯Èº¼ÆËã¿ò¼Ü¡£Ê×ÏÈ£¬Spark ÊÇΪ¼¯Èº¼ÆËãÖеÄÌØ¶¨ÀàÐ͵Ť×÷¸ºÔضøÉè¼Æ£¬¼´ÄÇЩÔÚ²¢ÐвÙ×÷Ö®¼äÖØÓù¤×÷Êý¾Ý¼¯£¨±ÈÈç»úÆ÷ѧϰËã·¨£©µÄ¹¤×÷¸ºÔØ¡£ÎªÁËÓÅ»¯ÕâЩÀàÐ͵Ť×÷¸ºÔØ£¬Spark Òý½øÁËÄڴ漯Ⱥ¼ÆËãµÄ¸ÅÄ¿ÉÔÚÄڴ漯Ⱥ¼ÆËãÖн«Êý¾Ý¼¯»º´æÔÚÄÚ´æÖУ¬ÒÔËõ¶Ì·ÃÎÊÑÓ³Ù¡£

Spark »¹Òý½øÁËÃûΪ µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯ (RDD) µÄ³éÏó¡£RDD ÊÇ·Ö²¼ÔÚÒ»×é½ÚµãÖеÄÖ»¶Á¶ÔÏ󼯺ϡ£ÕâЩ¼¯ºÏÊǵ¯ÐԵģ¬Èç¹ûÊý¾Ý¼¯Ò»²¿·Ö¶ªÊ§£¬Ôò¿ÉÒÔ¶ÔËüÃǽøÐÐÖØ½¨¡£Öؽ¨²¿·ÖÊý¾Ý¼¯µÄ¹ý³ÌÒÀÀµÓÚÈÝ´í»úÖÆ£¬¸Ã»úÖÆ¿ÉÒÔά»¤ ¡°ÑªÍ³¡±£¨¼´³äÐí»ùÓÚÊý¾ÝÑÜÉú¹ý³ÌÖØ½¨²¿·ÖÊý¾Ý¼¯µÄÐÅÏ¢£©¡£RDD ±»±íʾΪһ¸ö Scala ¶ÔÏ󣬲¢ÇÒ¿ÉÒÔ´ÓÎļþÖд´½¨Ëü£»Ò»¸ö²¢Ðл¯µÄÇÐÆ¬£¨±é²¼ÓÚ½ÚµãÖ®¼ä£©£»ÁíÒ»¸ö RDD µÄת»»ÐÎʽ£»²¢ÇÒ×îÖջ᳹µ×¸Ä±äÏÖÓÐ RDD µÄ³Ö¾ÃÐÔ£¬±ÈÈçÇëÇ󻺴æÔÚÄÚ´æÖС£

Spark ÖеÄÓ¦ÓóÌÐò³ÆÎªÇý¶¯³ÌÐò£¬ÕâЩÇý¶¯³ÌÐò¿ÉʵÏÖÔÚµ¥Ò»½ÚµãÉÏÖ´ÐеIJÙ×÷»òÔÚÒ»×é½ÚµãÉϲ¢ÐÐÖ´ÐеIJÙ×÷¡£Óë Hadoop ÀàËÆ£¬Spark Ö§³Öµ¥½Úµã¼¯Èº»ò¶à½Úµã¼¯Èº¡£¶ÔÓÚ¶à½Úµã²Ù×÷£¬Spark ÒÀÀµÓÚ Mesos ¼¯Èº¹ÜÀíÆ÷¡£Mesos Ϊ·Ö²¼Ê½Ó¦ÓóÌÐòµÄ×ÊÔ´¹²ÏíºÍ¸ôÀëÌṩÁËÒ»¸öÓÐЧƽ̨£¨²Î¼û ͼ 1£©¡£¸ÃÉèÖóäÐí Spark Óë Hadoop ¹²´æÓÚ½ÚµãµÄÒ»¸ö¹²Ïí³ØÖС£

ͼ 1. Spark ÒÀÀµÓÚ Mesos ¼¯Èº¹ÜÀíÆ÷ʵÏÖ×ÊÔ´¹²ÏíºÍ¸ôÀë¡£

Spark ±à³Ìģʽ

Çý¶¯³ÌÐò¿ÉÒÔÔÚÊý¾Ý¼¯ÉÏÖ´ÐÐÁ½ÖÖÀàÐ͵IJÙ×÷£º¶¯×÷ºÍת»»¡£¶¯×÷ »áÔÚÊý¾Ý¼¯ÉÏÖ´ÐÐÒ»¸ö¼ÆË㣬²¢ÏòÇý¶¯³ÌÐò·µ»ØÒ»¸öÖµ£»¶ø×ª»» »á´ÓÏÖÓÐÊý¾Ý¼¯Öд´½¨Ò»¸öеÄÊý¾Ý¼¯¡£¶¯×÷µÄʾÀý°üÀ¨Ö´ÐÐÒ»¸ö Reduce ²Ù×÷£¨Ê¹Óú¯Êý£©ÒÔ¼°ÔÚÊý¾Ý¼¯ÉϽøÐеü´ú£¨ÔÚÿ¸öÔªËØÉÏÔËÐÐÒ»¸öº¯Êý£¬ÀàËÆÓÚ Map ²Ù×÷£©¡£×ª»»Ê¾Àý°üÀ¨ Map ²Ù×÷ºÍ Cache ²Ù×÷£¨ËüÇëÇóеÄÊý¾Ý¼¯´æ´¢ÔÚÄÚ´æÖУ©¡£

ÎÒÃÇËæºó¾Í»á¿´¿´ÕâÁ½¸ö²Ù×÷µÄʾÀý£¬µ«ÊÇ£¬ÈÃÎÒÃÇÏÈÀ´Á˽âһϠScala ÓïÑÔ¡£

Scala ¼ò½é

Scala ¿ÉÄÜÊÇ Internet Éϲ»ÎªÈËÖªµÄÃØÃÜÖ®Ò»¡£Äú¿ÉÒÔÔÚһЩ×æµÄ Internet ÍøÕ¾£¨Èç Twitter¡¢LinkedIn ºÍ Foursquare£¬Foursquare ʹÓÃÁËÃûΪ Lift µÄ Web Ó¦ÓóÌÐò¿ò¼Ü£©µÄÖÆ×÷¹ý³ÌÖп´µ½ Scala µÄÉíÓ°¡£»¹ÓÐÖ¤¾Ý±íÃ÷£¬Ðí¶à½ðÈÚ»ú¹¹ÒÑ¿ªÊ¼¹Ø×¢ Scala µÄÐÔÄÜ£¨±ÈÈç EDF Trading ¹«Ë¾½« Scala ÓÃÓÚÑÜÉú²úÆ·¶¨¼Û£©¡£

Scala ÊÇÒ»Öֶ෶ʽÓïÑÔ£¬ËüÒÔÒ»ÖÖÁ÷³©µÄ¡¢ÈÃÈ˸е½Êæ·þµÄ·½·¨Ö§³ÖÓëÃüÁîʽ¡¢º¯ÊýʽºÍÃæÏò¶ÔÏóµÄÓïÑÔÏà¹ØµÄÓïÑÔÌØÐÔ¡£´ÓÃæÏò¶ÔÏóµÄ½Ç¶ÈÀ´¿´£¬Scala ÖеÄÿ¸öÖµ¶¼ÊÇÒ»¸ö¶ÔÏó¡£Í¬Ñù£¬´Óº¯Êý¹ÛµãÀ´¿´£¬Ã¿¸öº¯Êý¶¼ÊÇÒ»¸öÖµ¡£Scala Ò²ÊÇÊôÓÚ¾²Ì¬ÀàÐÍ£¬ËüÓÐÒ»¸ö¼ÈÓбíÏÖÁ¦Óֺܰ²È«µÄÀàÐÍϵͳ¡£

´ËÍ⣬Scala ÊÇÒ»ÖÖÐéÄâ»ú (VM) ÓïÑÔ£¬²¢ÇÒ¿ÉÒÔͨ¹ý Scala ±àÒëÆ÷Éú³ÉµÄ×Ö½ÚÂ룬ֱ½ÓÔËÐÐÔÚʹÓà Java Runtime Environment V2 µÄ Java? Virtual Machine (JVM) ÉÏ¡£¸ÃÉèÖóäÐí Scala ÔËÐÐÔÚÔËÐÐ JVM µÄÈκεط½£¨ÒªÇóÒ»¸ö¶îÍâµÄ Scala ÔËÐÐʱ¿â£©¡£Ëü»¹³äÐí Scala ÀûÓôóÁ¿ÏÖ´æµÄ Java ¿âÒÔ¼°ÏÖÓÐµÄ Java ´úÂë¡£

×îºó£¬Scala ¾ßÓпÉÀ©Õ¹ÐÔ¡£¸ÃÓïÑÔ£¨Ëüʵ¼ÊÉÏ´ú±íÁË¿ÉÀ©Õ¹ÓïÑÔ£©±»¶¨ÒåΪ¿ÉÖ±½Ó¼¯³Éµ½ÓïÑÔÖеļòµ¥À©Õ¹¡£

¾ÙÀý˵Ã÷ Scala

ÈÃÎÒÃÇÀ´¿´Ò»Ð©Êµ¼ÊµÄ Scala ÓïÑÔʾÀý¡£Scala Ìṩ×ÔÉíµÄ½âÊÍÆ÷£¬³äÐíÄúÒÔ½»»¥·½Ê½ÊÔÓøÃÓïÑÔ¡£Scala µÄÓÐÓô¦ÀíÒѳ¬³ö±¾ÎÄËùÉæ¼°µÄ·¶Î§£¬µ«ÊÇÄú¿ÉÒÔÔÚ ²Î¿¼×ÊÁÏ ÖÐÕÒµ½¸ü¶àÏà¹ØÐÅÏ¢µÄÁ´½Ó¡£

Çåµ¥ 1 ͨ¹ý Scala ×ÔÉíÌṩµÄ½âÊÍÆ÷¿ªÊ¼ÁË¿ìËÙÁ˽â Scala ÓïÑÔÖ®Âá£ÆôÓà Scala ºó£¬ÏµÍ³»á¸ø³öÌáʾ£¬Í¨¹ý¸ÃÌáʾ£¬Äú¿ÉÒÔÒÔ½»»¥·½Ê½ÆÀ¹À±í´ïʽºÍ³ÌÐò¡£ÎÒÃÇÊ×ÏÈ´´½¨ÁËÁ½¸ö±äÁ¿£¬Ò»¸öÊDz»¿É±ä±äÁ¿£¨¼´ vals£¬³Æ×÷µ¥¸³Öµ£©£¬ÁíÒ»¸ö±äÁ¿Êǿɱä±äÁ¿ (vars)¡£×¢Ò⣬µ±ÄúÊÔͼ¸ü¸Ä b£¨ÄúµÄ var£©Ê±£¬Äú¿ÉÒԳɹ¦µØÖ´Ðд˲Ù×÷£¬µ«ÊÇ£¬µ±ÄúÊÔͼ¸ü¸Ä val ʱ£¬Ôò»á·µ»ØÒ»¸ö´íÎó¡£

Çåµ¥ 1. Scala Öеļòµ¥±äÁ¿

$ scala
Welcome to Scala version 2.8.1.final (OpenJDK Client VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val a = 1
a: Int = 1

scala> var b = 2
b: Int = 2

scala> b = b + a
b: Int = 3

scala> a = 2
<console>6: error: reassignment to val
a = 2
^

½ÓÏÂÀ´£¬´´½¨Ò»¸ö¼òµ¥µÄ·½·¨À´¼ÆËãºÍ·µ»Ø Int µÄƽ·½Öµ¡£ÔÚ Scala Öж¨ÒåÒ»¸ö·½·¨µÃÏÈ´Ó def ¿ªÊ¼£¬ºó¸ú·½·¨Ãû³ÆºÍ²ÎÊýÁÐ±í£¬È»ºó£¬Òª½«ËüÉèÖÃΪÓï¾äµÄÊýÁ¿£¨ÔÚ±¾Ê¾ÀýÖÐΪ 1£©¡£ÎÞÐèÖ¸¶¨Èκηµ»ØÖµ£¬ÒòΪ¿ÉÒÔ´Ó·½·¨±¾ÉíÍÆ¶Ï³ö¸ÃÖµ¡£×¢Ò⣬ÕâÀàËÆÓÚΪ±äÁ¿¸³Öµ¡£ÔÚÒ»¸öÃûΪ 3 µÄ¶ÔÏóºÍÒ»¸öÃûΪ res0 µÄ½á¹û±äÁ¿£¨Scala ½âÊÍÆ÷»á×Ô¶¯ÎªÄú´´½¨¸Ã±äÁ¿£©ÉÏ£¬ÎÒÑÝʾÁËÕâ¸ö¹ý³Ì¡£ÕâЩ¶¼ÏÔʾÔÚ Çåµ¥ 2 ÖС£

Çåµ¥ 2. Scala ÖеÄÒ»¸ö¼òµ¥·½·¨

scala> def square(x: Int) = x*x
square: (x: Int)Int

scala> square(3)
res0: Int = 9

scala> square(res0)
res1: Int = 81

½ÓÏÂÀ´£¬ÈÃÎÒÃÇ¿´Ò»Ï Scala ÖеÄÒ»¸ö¼òµ¥ÀàµÄ¹¹½¨¹ý³Ì£¨²Î¼û Çåµ¥ 3£©¡£¶¨ÒåÒ»¸ö¼òµ¥µÄ Dog ÀàÀ´½ÓÊÕÒ»¸ö String ²ÎÊý£¨ÄúµÄÃû³Æ¹¹Ô캯Êý£©¡£×¢Ò⣬ÕâÀïµÄÀàÖ±½Ó²ÉÓÃÁ˸òÎÊý£¨ÎÞÐèÔÚÀàµÄÕýÎÄÖж¨ÒåÀà²ÎÊý£©¡£»¹ÓÐÒ»¸ö¶¨Òå¸Ã²ÎÊýµÄ·½·¨£¬¿ÉÔÚµ÷ÓòÎÊýʱ·¢ËÍÒ»¸ö×Ö·û´®¡£ÄúÒª´´½¨Ò»¸öеÄÀàʵÀý£¬È»ºóµ÷ÓÃÄúµÄ·½·¨¡£×¢Ò⣬½âÊÍÆ÷»á²åÈëһЩÊúÏߣºËüÃDz»ÊôÓÚ´úÂë¡£

Çåµ¥ 3. Scala ÖеÄÒ»¸ö¼òµ¥µÄÀà

scala> class Dog( name: String ) {
| def bark() = println(name + " barked")
| }
defined class Dog

scala> val stubby = new Dog("Stubby")
stubby: Dog = Dog@1dd5a3d

scala> stubby.bark
Stubby barked

scala>

Íê³ÉÉÏÊö²Ù×÷ºó£¬Ö»ÐèÊäÈë :quit ¼´¿ÉÍ˳ö Scala ½âÊÍÆ÷¡£

°²×° Scala ºÍ Spark

µÚÒ»²½ÊÇÏÂÔØºÍÅäÖà Scala¡£Çåµ¥ 4 ÖÐÏÔʾµÄÃüÁî²ûÊöÁË Scala °²×°µÄÏÂÔØºÍ×¼±¸¹¤×÷¡£Ê¹Óà Scala v2.8£¬ÒòΪÕâÊǾ­¹ý֤ʵµÄ Spark ËùÐèµÄ°æ±¾¡£

Çåµ¥ 4. °²×° Scala

$ wget http://www.scala-lang.org/downloads/distrib/files/scala-2.8.1.final.tgz
$ sudo tar xvfz scala-2.8.1.final.tgz --directory /opt/

Ҫʹ Scala ¿ÉÊÓ»¯£¬Ç뽫ÏÂÁÐÐÐÌí¼ÓÖÁÄúµÄ .bashrc ÖУ¨Èç¹ûÄúÕýʹÓà Bash ×÷Ϊ shell£©£º

export SCALA_HOME=/opt/scala-2.8.1.final
export PATH=$SCALA_HOME/bin:$PATH

½Ó×Å¿ÉÒÔ¶ÔÄúµÄ°²×°½øÐвâÊÔ£¬Èç Çåµ¥ 5 Ëùʾ¡£Õâ×éÃüÁî»á½«¸ü¸Ä¼ÓÔØÖÁ bashrc ÎļþÖУ¬½Ó×Å¿ìËÙ²âÊÔ Scala ½âÊÍÆ÷ shell¡£

Çåµ¥ 5. ÅäÖúÍÔËÐн»»¥Ê½ Scala

$ scala
Welcome to Scala version 2.8.1.final (OpenJDK Client VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> println("Scala is installed!")
Scala is installed!

scala> :quit
$

ÈçÇåµ¥ÖÐËùʾ£¬ÏÖÔÚÓ¦¸Ã¿´µ½Ò»¸ö Scala Ìáʾ¡£Äú¿ÉÒÔͨ¹ýÊäÈë :quit Ö´ÐÐÍ˳ö¡£×¢Ò⣬Scala ÒªÔÚ JVM µÄÉÏÏÂÎÄÖÐÖ´ÐвÙ×÷£¬ËùÒÔÄú»áÐèÒª JVM¡£ÎÒʹÓõÄÊÇ Ubuntu£¬ËüÔÚĬÈÏÇé¿öÏ»áÌṩ OpenJDK¡£

½ÓÏÂÀ´£¬Çë»ñÈ¡×îÐ嵀 Spark ¿ò¼Ü¸±±¾¡£Îª´Ë£¬ÇëʹÓà Çåµ¥ 6 ÖеĽű¾¡£

Çåµ¥ 6. ÏÂÔØºÍ°²×° Spark ¿ò¼Ü

wget https://github.com/mesos/spark/tarball/0.3-scala-2.8/
mesos-spark-0.3-scala-2.8-0-gc86af80.tar.gz
$ sudo tar xvfz mesos-spark-0.3-scala-2.8-0-gc86af80.tar.gz

½ÓÏÂÀ´£¬Ê¹ÓÃÏÂÁÐÐн« spark ÅäÖÃÉèÖÃÔÚ Scala µÄ¸ùĿ¼ ./conf/spar-env.sh ÖУº

export SCALA_HOME=/opt/scala-2.8.1.final

ÉèÖõÄ×îºóÒ»²½ÊÇʹÓüòµ¥µÄ¹¹½¨¹¤¾ß (sbt) ¸üÐÂÄúµÄ·Ö²¼¡£sbt ÊÇÒ»¿îÕë¶Ô Scala µÄ¹¹½¨¹¤¾ß£¬ÓÃÓÚ Spark ·Ö²¼ÖС£Äú¿ÉÒÔÔÚ mesos-spark-c86af80 ×ÓĿ¼ÖÐÖ´ÐиüкͱäÒì²½Ö裬ÈçÏÂËùʾ£º

$ sbt/sbt update compile

×¢Ò⣬ÔÚÖ´Ðд˲½Öèʱ£¬ÐèÒªÁ¬½ÓÖÁ Internet¡£µ±Íê³É´Ë²Ù×÷ºó£¬ÇëÖ´ÐÐ Spark ¿ìËÙ¼ì²â£¬Èç Çåµ¥ 7 Ëùʾ¡£ÔڸòâÊÔÖУ¬ÐèÒªÔËÐÐ SparkPi ʾÀý£¬Ëü»á¼ÆËã pi µÄ¹ÀÖµ£¨Í¨¹ýµ¥Î»Æ½·½ÖеÄÈÎÒâµã²ÉÑù£©¡£ËùÏÔʾµÄ¸ñʽÐèÒªÑùÀý³ÌÐò (spark.examples.SparkPi) ºÍÖ÷»ú²ÎÊý£¬¸Ã²ÎÊý¶¨ÒåÁË Mesos Ö÷»ú£¨ÔÚ´ËÀýÖУ¬ÊÇÄúµÄ±¾µØÖ÷»ú£¬ÒòΪËüÊÇÒ»¸öµ¥½Úµã¼¯Èº£©ºÍҪʹÓõÄÏß³ÌÊýÁ¿¡£×¢Ò⣬ÔÚ Çåµ¥ 7 ÖУ¬Ö´ÐÐÁËÁ½¸öÈÎÎñ£¬¶øÇÒÕâÁ½¸öÈÎÎñ±»ÐòÁл¯£¨ÈÎÎñ 0 ¿ªÊ¼ºÍ½áÊøÖ®ºó£¬ÈÎÎñ 1 ÔÙ¿ªÊ¼£©¡£

Çåµ¥ 7. ¶Ô Spark Ö´ÐпìËÙ¼ì²â

$ ./run spark.examples.SparkPi local[1]
11/08/26 19:52:33 INFO spark.CacheTrackerActor: Registered actor on port 50501
11/08/26 19:52:33 INFO spark.MapOutputTrackerActor: Registered actor on port 50501
11/08/26 19:52:33 INFO spark.SparkContext: Starting job...
11/08/26 19:52:33 INFO spark.CacheTracker: Registering RDD ID 0 with cache
11/08/26 19:52:33 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
11/08/26 19:52:33 INFO spark.CacheTrackerActor: Asked for current cache locations
11/08/26 19:52:33 INFO spark.LocalScheduler: Final stage: Stage 0
11/08/26 19:52:33 INFO spark.LocalScheduler: Parents of final stage: List()
11/08/26 19:52:33 INFO spark.LocalScheduler: Missing parents: List()
11/08/26 19:52:33 INFO spark.LocalScheduler: Submitting Stage 0, which has no missing ...
11/08/26 19:52:33 INFO spark.LocalScheduler: Running task 0
11/08/26 19:52:33 INFO spark.LocalScheduler: Size of task 0 is 1385 bytes
11/08/26 19:52:33 INFO spark.LocalScheduler: Finished task 0
11/08/26 19:52:33 INFO spark.LocalScheduler: Running task 1
11/08/26 19:52:33 INFO spark.LocalScheduler: Completed ResultTask(0, 0)
11/08/26 19:52:33 INFO spark.LocalScheduler: Size of task 1 is 1385 bytes
11/08/26 19:52:33 INFO spark.LocalScheduler: Finished task 1
11/08/26 19:52:33 INFO spark.LocalScheduler: Completed ResultTask(0, 1)
11/08/26 19:52:33 INFO spark.SparkContext: Job finished in 0.145892763 s
Pi is roughly 3.14952
$

ͨ¹ýÔö¼ÓÏß³ÌÊýÁ¿£¬Äú²»½ö¿ÉÒÔÔö¼ÓÏß³ÌÖ´ÐеIJ¢Ðл¯£¬»¹¿ÉÒÔÓøüÉÙµÄʱ¼äÖ´ÐÐ×÷Òµ£¨Èç Çåµ¥ 8 Ëùʾ£©¡£

Çåµ¥ 8. ¶Ô°üº¬Á½¸öÏß³ÌµÄ Spark Ö´ÐÐÁíÒ»¸ö¿ìËÙ¼ì²â

$ ./run spark.examples.SparkPi local[2]
11/08/26 20:04:30 INFO spark.MapOutputTrackerActor: Registered actor on port 50501
11/08/26 20:04:30 INFO spark.CacheTrackerActor: Registered actor on port 50501
11/08/26 20:04:30 INFO spark.SparkContext: Starting job...
11/08/26 20:04:30 INFO spark.CacheTracker: Registering RDD ID 0 with cache
11/08/26 20:04:30 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
11/08/26 20:04:30 INFO spark.CacheTrackerActor: Asked for current cache locations
11/08/26 20:04:30 INFO spark.LocalScheduler: Final stage: Stage 0
11/08/26 20:04:30 INFO spark.LocalScheduler: Parents of final stage: List()
11/08/26 20:04:30 INFO spark.LocalScheduler: Missing parents: List()
11/08/26 20:04:30 INFO spark.LocalScheduler: Submitting Stage 0, which has no missing ...
11/08/26 20:04:30 INFO spark.LocalScheduler: Running task 0
11/08/26 20:04:30 INFO spark.LocalScheduler: Running task 1
11/08/26 20:04:30 INFO spark.LocalScheduler: Size of task 1 is 1385 bytes
11/08/26 20:04:30 INFO spark.LocalScheduler: Size of task 0 is 1385 bytes
11/08/26 20:04:30 INFO spark.LocalScheduler: Finished task 0
11/08/26 20:04:30 INFO spark.LocalScheduler: Finished task 1
11/08/26 20:04:30 INFO spark.LocalScheduler: Completed ResultTask(0, 1)
11/08/26 20:04:30 INFO spark.LocalScheduler: Completed ResultTask(0, 0)
11/08/26 20:04:30 INFO spark.SparkContext: Job finished in 0.101287331 s
Pi is roughly 3.14052
$

ʹÓà Scala ¹¹½¨Ò»¸ö¼òµ¥µÄ Spark Ó¦ÓóÌÐò

Òª¹¹½¨ Spark Ó¦ÓóÌÐò£¬ÄúÐèÒªµ¥Ò» Java ¹éµµ (JAR) ÎļþÐÎʽµÄ Spark ¼°ÆäÒÀÀµ¹ØÏµ¡£Ê¹Óà sbt ÔÚ Spark µÄ¶¥¼¶Ä¿Â¼Öд´½¨¸Ã JAR Îļþ£¬ÈçÏÂËùʾ£º

$ sbt/sbt assembly

½á¹û²úÉúÒ»¸öÎļþ ./core/target/scala_2.8.1/"Spark Core-assembly-0.3.jar"¡£½«¸ÃÎļþÌí¼ÓÖÁÄúµÄ CLASSPATH ÖУ¬ÒÔ±ã¿ÉÒÔ·ÃÎÊËü¡£ÔÚ±¾Ê¾ÀýÖУ¬²»»áÓõ½´Ë JAR Îļþ£¬ÒòΪÄú½«»áʹÓà Scala ½âÊÍÆ÷ÔËÐÐËü£¬¶ø²»ÊÇ¶ÔÆä½øÐбàÒë¡£

ÔÚ±¾Ê¾ÀýÖУ¬Ê¹ÓÃÁ˱ê×¼µÄ MapReduce ת»»£¨Èç Çåµ¥ 9 Ëùʾ£©¡£¸ÃʾÀý´ÓÖ´ÐбØÒªµÄ Spark ÀർÈ뿪ʼ¡£½Ó×Å£¬ÐèÒª¶¨ÒåÄúµÄÀà (SparkTest) ¼°ÆäÖ÷·½·¨£¬ÓÃËü½âÎöÉÔºóʹÓõIJÎÊý¡£ÕâЩ²ÎÊý¶¨ÒåÁËÖ´ÐÐ Spark µÄ»·¾³£¨ÔÚ±¾ÀýÖУ¬¸Ã»·¾³ÊÇÒ»¸öµ¥½Úµã¼¯Èº£©¡£½ÓÏÂÀ´£¬Òª´´½¨ SparkContext ¶ÔÏó£¬Ëü»á¸æÖª Spark ÈçºÎ¶ÔÄúµÄ¼¯Èº½øÐзÃÎÊ¡£¸Ã¶ÔÏóÐèÒªÁ½¸ö²ÎÊý£ºMesos Ö÷»úÃû³Æ£¨ÒÑ´«È룩ÒÔ¼°Äú·ÖÅ䏸×÷ÒµµÄÃû³Æ (SparkTest)¡£½âÎöÃüÁîÐÐÖеÄÇÐÆ¬ÊýÁ¿£¬Ëü»á¸æÖª Spark ÓÃÓÚ×÷ÒµµÄÏß³ÌÊýÁ¿¡£ÒªÉèÖõÄ×îºóÒ»ÏîÊÇÖ¸¶¨ÓÃÓÚ MapReduce ²Ù×÷µÄÎı¾Îļþ¡£

×îºó£¬Äú½«Á˽â Spark ʾÀýµÄʵÖÊ£¬ËüÊÇÓÉÒ»×éת»»×é³É¡£Ê¹ÓÃÄúµÄÎļþʱ£¬¿Éµ÷Óà flatMap ·½·¨·µ»ØÒ»¸ö RDD£¨Í¨¹ýÖ¸¶¨µÄº¯Êý½«Îı¾ÐзֽâΪ±ê¼Ç£©¡£È»ºóͨ¹ý map ·½·¨£¨¸Ã·½·¨´´½¨Á˼üÖµ¶Ô£©´«µÝ´Ë RDD £¬×îÖÕͨ¹ý ReduceByKey ·½·¨ºÏ²¢¼üÖµ¶Ô¡£ºÏ²¢²Ù×÷ÊÇͨ¹ý½«¼üÖµ¶Ô´«µÝ¸ø _ + _ ÄäÃûº¯ÊýÀ´Íê³ÉµÄ¡£¸Ãº¯ÊýÖ»²ÉÓÃÁ½¸ö²ÎÊý£¨ÃÜÔ¿ºÍÖµ£©£¬²¢·µ»Ø½«Á½Õߺϲ¢Ëù²úÉúµÄ½á¹û£¨Ò»¸ö String ºÍÒ»¸ö Int£©¡£½Ó×ÅÒÔÎı¾ÎļþµÄÐÎʽ·¢Ë͸ÃÖµ£¨µ½Êä³öĿ¼£©¡£

Çåµ¥ 9. Scala/Spark ÖÐµÄ MapReduce (SparkTest.scala)

import spark.SparkContext
import SparkContext._

object SparkTest {

def main( args: Array[String]) {

if (args.length == 0) {
System.err.println("Usage: SparkTest <host> [<slices>]")
System.exit(1)
}

val spark = new SparkContext(args(0), "SparkTest")
val slices = if (args.length > 1) args(1).toInt else 2

val myFile = spark.textFile("test.txt")
val counts = myFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)

counts.saveAsTextFile("out.txt")

}

}

SparkTest.main(args)

ÒªÖ´ÐÐÄúµÄ½Å±¾£¬Ö»ÐèÒªÖ´ÐÐÒÔÏÂÃüÁ

$ scala SparkTest.scala local[1]

Äú¿ÉÒÔÔÚÊä³öĿ¼ÖÐÕÒµ½ MapReduce ²âÊÔÎļþ£¨Èç output/part-00000£©¡£

ÆäËûµÄ´óÊý¾Ý·ÖÎö¿ò¼Ü

×Ô´Ó¿ª·¢ÁË Hadoop ºó£¬Êг¡ÉÏÍÆ³öÁËÐí¶àÖµµÃ¹Ø×¢µÄÆäËû´óÊý¾Ý·ÖÎöƽ̨¡£ÕâЩƽ̨·¶Î§¹ãÀ«£¬´Ó¼òµ¥µÄ»ùÓڽű¾µÄ²úÆ·µ½Óë Hadoop ÀàËÆµÄÉú²ú»·¾³¡£

ÃûΪ bashreduce µÄƽ̨ÊÇÕâЩƽ̨ÖÐ×î¼òµ¥µÄƽ̨֮һ£¬¹ËÃû˼Ò壬Ëü³äÐíÄúÔÚ Bash »·¾³ÖеĶà¸ö»úÆ÷ÉÏÖ´ÐÐ MapReduce ÀàÐ͵IJÙ×÷¡£bashreduce ÒÀÀµÓÚÄú¼Æ»®Ê¹ÓõĻúÆ÷¼¯ÈºµÄ Secure Shell£¨ÎÞÃÜÂ룩£¬²¢ÒԽű¾µÄÐÎʽ´æÔÚ£¬Í¨¹ýËü£¬Äú¿ÉÒÔʹÓà UNIX?-style ¹¤¾ß£¨sort¡¢awk¡¢netcat µÈ£©ÇëÇó×÷Òµ¡£

GraphLab ÊÇÁíÒ»¸öÊÜÈ˹Ø×¢µÄ MapReduce ³éÏóʵÏÖ£¬Ëü²àÖØÓÚ»úÆ÷ѧϰËã·¨µÄ²¢ÐÐʵÏÖ¡£ÔÚ GraphLab ÖУ¬Map ½×¶Î»á¶¨ÒåһЩ¿Éµ¥¶À£¨ÔÚ¶ÀÁ¢Ö÷»úÉÏ£©Ö´ÐеļÆËãÖ¸Á¶ø Reduce ½×¶Î»á¶Ô½á¹û½øÐкϲ¢¡£

×îºó£¬´óÊý¾Ý³¡¾°µÄÒ»¸öгÉÔ±ÊÇÀ´×Ô Twitter µÄ Storm£¨Í¨¹ýÊÕ¹º BackType »ñµÃ£©¡£Storm ±»¶¨ÒåΪ ¡°ÊµÊ±´¦ÀíµÄ Hadoop¡±£¬ËüÖ÷Òª²àÖØÓÚÁ÷´¦ÀíºÍ³ÖÐø¼ÆË㣨Á÷´¦Àí¿ÉÒԵóö¼ÆËãµÄ½á¹û£©¡£Storm ÊÇÓà Clojure ÓïÑÔ£¨Lisp ÓïÑÔµÄÒ»ÖÖ·½ÑÔ£©±àдµÄ£¬µ«ËüÖ§³ÖÓÃÈκÎÓïÑÔ£¨±ÈÈç Ruby ºÍ Python£©±àдµÄÓ¦ÓóÌÐò¡£Twitter ÓÚ 2011 Äê 9 ÔÂÒÔ¿ªÔ´ÐÎʽ·¢²¼ Storm¡£

½áÊøÓï

Spark ÊDz»¶Ï׳´óµÄ´óÊý¾Ý·ÖÎö½â¾ö·½°¸¼Ò×åÖб¸ÊܹØ×¢µÄÐÂÔö³ÉÔ±¡£Ëü²»½öΪ·Ö²¼Êý¾Ý¼¯µÄ´¦ÀíÌṩһ¸öÓÐЧ¿ò¼Ü£¬¶øÇÒÒÔ¸ßЧµÄ·½Ê½£¨Í¨¹ý¼ò½àµÄ Scala ½Å±¾£©´¦Àí·Ö²¼Êý¾Ý¼¯¡£Spark ºÍ Scala ¶¼´¦ÔÚ»ý¼«·¢Õ¹½×¶Î¡£²»¹ý£¬ÓÉÓڹؼü Internet ÊôÐÔÖвÉÓÃÁËËüÃÇ£¬Á½ÕßËÆºõ¶¼ÒÑ´ÓÊÜÈ˹Ø×¢µÄ¿ªÔ´Èí¼þ¹ý¶É³ÉΪ»ù´¡ Web ¼¼Êõ¡£

   
2060 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ

²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí