±³¾°
ʹÓÃspark¿ª·¢ÒÑÓм¸¸öÔ¡£Ïà±ÈÓÚpython/hive£¬scala/sparkѧϰÃż÷½Ï¸ß¡£ÓÈÆä¼ÇµÃ¸Õ¿ªÊ±£¬¾Ù²½Î¬¼è£¬½øÕ¹Ê®·Ö»ºÂý¡£²»¹ýлÌìлµØ£¬Õâ¶Î¿àɬ£¨bi£©µÄÈÕ×Ó¹ýÈ¥ÁË¡£Òä¿à˼Ìð£¬ÎªÁ˱ÜÃâÏîÄ¿×éµÄÆäËûͬѧ×ßÍä·£¬¾ö¶¨×ܽáºÍÊáÀísparkµÄʹÓþÑé¡£
Spark»ù´¡
»ùʯRDD
sparkµÄºËÐÄÊÇRDD£¨µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£©£¬Ò»ÖÖͨÓõÄÊý¾Ý³éÏ󣬷â×°ÁË»ù´¡µÄÊý¾Ý²Ù×÷£¬Èçmap£¬filter£¬reduceµÈ¡£RDDÌṩÊý¾Ý¹²ÏíµÄ³éÏó£¬Ïà±ÈÆäËû´óÊý¾Ý´¦Àí¿ò¼Ü£¬ÈçMapReduce£¬Pegel£¬DryadLINQºÍHIVEµÈ¾ùȱ·¦´ËÌØÐÔ£¬ËùÒÔRDD¸üΪͨÓá£
¼òÒªµØ¸ÅÀ¨RDD£ºRDDÊÇÒ»¸ö²»¿ÉÐ޸ĵ쬷ֲ¼µÄ¶ÔÏ󼯺ϡ£Ã¿¸öRDDÓɶà¸ö·ÖÇø×é³É£¬Ã¿¸ö·ÖÇø¿ÉÒÔͬʱÔÚ¼¯ÈºÖеIJ»Í¬½ÚµãÉϼÆËã¡£RDD¿ÉÒÔ°üº¬Python£¬JavaºÍScalaÖеÄÈÎÒâ¶ÔÏó¡£
SparkÉú̬ȦÖÐÓ¦Óö¼ÊÇ»ùÓÚRDD¹¹½¨£¨ÏÂͼ£©£¬ÕâÒ»µã³ä·Ö˵Ã÷RDDµÄ³éÏó×㹻ͨÓ㬿ÉÒÔÃèÊö´ó¶àÊýÓ¦Óó¡¾°¡£

RDD²Ù×÷ÀàÐÍ¡ª×ª»»ºÍ¶¯×÷
RDDµÄ²Ù×÷Ö÷Òª·ÖÁ½Àࣺת»»£¨transformation£©ºÍ¶¯×÷£¨action£©¡£Á½ÀຯÊýµÄÖ÷񻂿±ðÊÇ£¬×ª»»½ÓÊÜRDD²¢·µ»ØRDD£¬¶ø¶¯×÷½ÓÊÜRDDµ«ÊÇ·µ»Ø·ÇRDD¡£×ª»»²ÉÓöèÐÔµ÷ÓûúÖÆ£¬Ã¿¸öRDD¼Ç¼¸¸RDDת»»µÄ·½·¨£¬ÕâÖÖµ÷ÓÃÁ´±í³ÆÖ®ÎªÑªÔµ£¨lineage£©£»¶ø¶¯×÷µ÷ÓûáÖ±½Ó¼ÆËã¡£
²ÉÓöèÐÔµ÷Óã¬Í¨¹ýѪԵÁ¬½ÓµÄRDD²Ù×÷¿ÉÒԹܵÀ»¯£¨pipeline£©£¬¹ÜµÀ»¯µÄ²Ù×÷¿ÉÒÔÖ±½ÓÔÚµ¥½ÚµãÍê³É£¬±ÜÃâ¶à´Îת»»²Ù×÷Ö®¼äÊý¾Ýͬ²½µÄµÈ´ý¡£
ʹÓÃѪԵ´®ÁªµÄ²Ù×÷¿ÉÒÔ±£³Öÿ´Î¼ÆËãÏà¶Ô¼òµ¥£¬¶ø²»Óõ£ÐÄÓйý¶àµÄÖмäÊý¾Ý£¬ÒòΪÕâЩѪԵ²Ù×÷¶¼¹ÜµÀ»¯ÁË£¬ÕâÑùÒ²±£Ö¤ÁËÂß¼µÄµ¥Ò»ÐÔ£¬¶ø²»ÓÃÏñMapReduceÄÇÑù£¬ÎªÁ˾¹¿ÉÄܵļõÉÙmap
reduce¹ý³Ì£¬ÔÚµ¥¸ömap reduceÖÐдÈë¹ý¶à¸´ÔÓµÄÂß¼¡£
RDDʹÓÃģʽ
RDDʹÓþßÓÐÒ»°ãµÄģʽ£¬¿ÉÒÔ³éÏóΪÏÂÃæµÄ¼¸²½
1.¼ÓÔØÍⲿÊý¾Ý£¬´´½¨RDD¶ÔÏó
2.ʹÓÃת»»£¨Èçfilter£©£¬´´½¨ÐµÄRDD¶ÔÏó
3.»º´æÐèÒªÖØÓõÄRDD
4.ʹÓö¯×÷£¨Èçcount£©£¬Æô¶¯²¢ÐмÆËã
RDD¸ßЧµÄ²ßÂÔ
Spark¹Ù·½ÌṩµÄÊý¾ÝÊÇRDDÔÚijЩ³¡¾°Ï£¬¼ÆËãЧÂÊÊÇHadoopµÄ20X¡£Õâ¸öÊý¾ÝÊÇ·ñÓÐË®·Ö£¬ÎÒÃÇÏȲ»×·¾¿£¬µ«ÊÇRDDЧÂʸߵÄÓÉÒ»¶¨»úÖÆ±£Ö¤µÄ£º
1.RDDÊý¾ÝÖ»¶Á£¬²»¿ÉÐ޸ġ£Èç¹ûÐèÒªÐÞ¸ÄÊý¾Ý£¬±ØÐë´Ó¸¸RDDת»»£¨transformation£©µ½×ÓRDD¡£ËùÒÔ£¬ÔÚÈÝ´í²ßÂÔÖУ¬RDDûÓÐÊý¾ÝÈßÓ࣬¶øÊÇͨ¹ýRDD¸¸×ÓÒÀÀµ£¨ÑªÔµ£©¹ØÏµ½øÐÐÖØËãʵÏÖÈÝ´í¡£
2.RDDÊý¾ÝÔÚÄÚ´æÖУ¬¶à¸öRDD²Ù×÷Ö®¼ä£¬Êý¾Ý²»ÓÃÂ䵨µ½´ÅÅÌÉÏ£¬±ÜÃâ²»±ØÒªµÄI/O²Ù×÷¡£
3.RDD´æ·ÅµÄÊý¾Ý¿ÉÒÔÊÇjava¶ÔÏó£¬ËùÒÔ±ÜÃâµÄ²»±ØÒªµÄ¶ÔÏóÐòÁл¯ºÍ·´ÐòÁл¯¡£
×ܶøÑÔÖ®£¬RDD¸ßЧµÄÖ÷ÒªÒòËØÊǾ¡Á¿±ÜÃâ²»±ØÒªµÄ²Ù×÷ºÍÎþÉüÊý¾ÝµÄ²Ù×÷¾«¶È£¬ÓÃÀ´Ìá¸ß¼ÆËãЧÂÊ¡£
SparkʹÓü¼ÇÉ
RDD»ù±¾º¯ÊýÀ©Õ¹
RDDËäÈ»ÌṩÁ˺ܶຯÊý£¬µ«ÊDZϾ¹»¹ÊÇÓÐÏ޵ģ¬ÓÐʱºòÐèÒªÀ©Õ¹£¬×Ô¶¨ÒåеÄRDDµÄº¯Êý¡£ÔÚsparkÖУ¬¿ÉÒÔͨ¹ýÒþʽת»»£¬ÇáËÉʵÏÖ¶ÔRDDÀ©Õ¹¡£»Ïñ¿ª·¢¹ý³ÌÖУ¬Æ½·²µÄ»áʹÓÃrollup²Ù×÷£¨ÀàËÆHIVEÖеÄrollup£©£¬¼ÆËã¶à¸ö¼¶±ðµÄ¾ÛºÏÊý¾Ý¡£ÏÂÃæÊǾßÌåʵ£¬
/**
* À©Õ¹spark rdd,ΪrddÌṩrollup·½·¨
*/
implicit class RollupRDD[T: ClassTag](rdd: RDD[(Array[String],
T)]) extends Serializable {
/**
* ÀàËÆSqlÖеÄrollup²Ù×÷
*
* @param aggregate ¾ÛºÏº¯Êý
* @param keyPlaceHold keyռλ·û£¬Ä¬ÈϲÉÓÃFaceConf.STAT_SUMMARY
* @param isCache£¬È·ÈÏÊÇ·ñ»º´æÊý¾Ý
* @return ·µ»Ø¾ÛºÏºóµÄÊý¾Ý
*/
def rollup[U: ClassTag](
aggregate: Iterable[T] => U,
keyPlaceHold: String = FaceConf.STAT_SUMMARY,
isCache: Boolean = true): RDD[(Array[String],
U)] = {
if (rdd.take(1).isEmpty) {
return rdd.map(x => (Array[String](), aggregate(Array[T](x._2))))
}
if (isCache) {
rdd.cache // Ìá¸ß¼ÆËãЧÂÊ
}
val totalKeyCount = rdd.first._1.size
val result = { 1 to totalKeyCount }.par.map(untilKeyIndex
=> { // ²¢ÐмÆËã
rdd.map(row => {
val combineKey = row._1.slice(0, untilKeyIndex).mkString(FaceConf.KEY_SEP)
// ×éºÏkey
(combineKey, row._2)
}).groupByKey.map(row => { // ¾ÛºÏ¼ÆËã
val oldKeyList = row._1.split(FaceConf.KEY_SEP)
val newKeyList = oldKeyList ++ Array.fill(totalKeyCount
- oldKeyList.size) { keyPlaceHold }
(newKeyList, aggregate(row._2))
})
}).reduce(_ ++ _) // ¾ÛºÏ½á¹û
result
}
}
|
ÉÏÃæ´úÂëÉùÃ÷ÁËÒ»¸öÒþʽÀ࣬¾ßÓÐÒ»¸ö³ÉÔ±±äÁ¿rdd£¬ÀàÐÍÊÇRDD[(Array[String],
T)]£¬ÄÇôÈç¹ûÓ¦ÓôúÂëÖгöÏÖÁËÈκÎÕâÑùµÄrdd¶ÔÏ󣬲¢ÇÒimportµ±Ç°µÄÒþʽת»»£¬ÄÇô±àÒëÆ÷¾Í»á½«Õâ¸örddµ±×öÉÏÃæµÄÒþʽÀàµÄ¶ÔÏó£¬Ò²¾Í¿ÉÒÔʹÓÃrollupº¯Êý£¬ºÍÒ»°ãµÄmap£¬filter·½·¨Ò»Ñù¡£
RDD²Ù×÷±Õ°üÍⲿ±äÁ¿ÔÔò

RDDÏà¹Ø²Ù×÷¶¼ÐèÒª´«Èë×Ô¶¨Òå±Õ°üº¯Êý£¨closure£©£¬Èç¹ûÕâ¸öº¯ÊýÐèÒª·ÃÎÊÍⲿ±äÁ¿£¬ÄÇôÐèÒª×ñÑÒ»¶¨µÄ¹æÔò£¬·ñÔò»áÅ׳öÔËÐÐʱÒì³£¡£±Õ°üº¯Êý´«Èëµ½½Úµãʱ£¬ÐèÒª¾¹ýÏÂÃæµÄ²½Ö裺
1.Çý¶¯³ÌÐò£¬Í¨¹ý·´É䣬ÔËÐÐʱÕÒµ½±Õ°ü·ÃÎʵÄËùÓбäÁ¿£¬²¢·â³ÉÒ»¸ö¶ÔÏó£¬È»ºóÐòÁл¯¸Ã¶ÔÏó
2.½«ÐòÁл¯ºóµÄ¶ÔÏóͨ¹ýÍøÂç´«Êäµ½worker½Úµã
3.worker½Úµã·´ÐòÁл¯±Õ°ü¶ÔÏó
4.worker½ÚµãÖ´Ðбհüº¯Êý£¬
×¢Ò⣺Íⲿ±äÁ¿ÔÚ±Õ°üÄÚµÄÐ޸IJ»»á±»·´À¡µ½Çý¶¯³ÌÐò¡£
¼ò¶øÑÔÖ®£¬¾ÍÊÇͨ¹ýÍøÂ磬´«µÝº¯Êý£¬È»ºóÖ´ÐС£ËùÒÔ£¬±»´«µÝµÄ±äÁ¿±ØÐë¿ÉÒÔÐòÁл¯£¬·ñÔò´«µÝʧ°Ü¡£±¾µØÖ´ÐÐʱ£¬ÈÔÈ»»áÖ´ÐÐÉÏÃæËIJ½¡£
¹ã²¥»úÖÆÒ²¿ÉÒÔ×öµ½ÕâÒ»µã£¬µ«ÊÇÆµ·±µÄʹÓù㲥»áʹ´úÂë²»¹»¼ò½à£¬¶øÇҹ㲥Éè¼ÆµÄ³õÖÔÊǽ«½Ï´óÊý¾Ý»º´æµ½½ÚµãÉÏ£¬±ÜÃâ¶à´ÎÊý¾Ý´«Ê䣬Ìá¸ß¼ÆËãЧÂÊ£¬¶ø²»ÊÇÓÃÓÚ½øÐÐÍⲿ±äÁ¿·ÃÎÊ¡£
RDDÊý¾Ýͬ²½
RDDĿǰÌṩÁ½¸öÊý¾Ýͬ²½µÄ·½·¨£º¹ã²¥ºÍÀÛ¼ÆÆ÷¡£
¹ã²¥ broadcast
Ç°ÃæÌáµ½¹ý£¬¹ã²¥¿ÉÒÔ½«±äÁ¿·¢Ë͵½±Õ°üÖУ¬±»±Õ°üʹÓᣵ«ÊÇ£¬¹ã²¥»¹ÓÐÒ»¸ö×÷ÓÃÊÇͬ²½½Ï´óÊý¾Ý¡£±ÈÈçÄãÓÐÒ»¸öIP¿â£¬¿ÉÄÜÓм¸G£¬ÔÚmap²Ù×÷ÖУ¬ÒÀÀµÕâ¸öip¿â¡£ÄÇô£¬¿ÉÒÔͨ¹ý¹ã²¥½«Õâ¸öip¿â´«µ½±Õ°üÖУ¬±»²¢ÐеÄÈÎÎñÓ¦Ó᣹㲥ͨ¹ýÁ½¸ö·½ÃæÌá¸ßÊý¾Ý¹²ÏíЧÂÊ£º1£¬¼¯ÈºÖÐÿ¸ö½Úµã£¨ÎïÀí»úÆ÷£©Ö»ÓÐÒ»¸ö¸±±¾£¬Ä¬ÈϵıհüÊÇÿ¸öÈÎÎñÒ»¸ö¸±±¾£»2£¬¹ã²¥´«ÊäÊÇͨ¹ýBTÏÂÔØÄ£Ê½ÊµÏֵģ¬Ò²¾ÍÊÇP2PÏÂÔØ£¬ÔÚ¼¯Èº¶àµÄÇé¿öÏ£¬¿ÉÒÔ¼«´óµÄÌá¸ßÊý¾Ý´«ÊäËÙÂÊ¡£¹ã²¥±äÁ¿Ð޸ĺ󣬲»»á·´À¡µ½ÆäËû½Úµã¡£
ÀÛ¼ÓÆ÷ Accumulator
ÀÛ¼ÓÆ÷ÊÇÒ»¸öwrite-onlyµÄ±äÁ¿£¬ÓÃÓÚÀÛ¼Ó¸÷¸öÈÎÎñÖеÄ״̬£¬Ö»ÓÐÔÚÇý¶¯³ÌÐòÖУ¬²ÅÄÜ·ÃÎÊÀÛ¼ÓÆ÷¡£¶øÇÒ£¬½ØÖ¹µ½1.2°æ±¾£¬ÀÛ¼ÓÆ÷ÓÐÒ»¸öÒÑÖªµÄȱÏÝ£¬ÔÚaction²Ù×÷ÖУ¬n¸öÔªËØµÄRDD¿ÉÒÔÈ·±£ÀÛ¼ÓÆ÷Ö»ÀÛ¼Ón´Î£¬µ«ÊÇÔÚtransformationʱ£¬spark²»È·±££¬Ò²¾ÍÊÇÀÛ¼ÓÆ÷¿ÉÄܳöÏÖn+1´ÎÀÛ¼Ó¡£
ĿǰRDDÌṩµÄͬ²½»úÖÆÁ£¶ÈÌ«´Ö£¬ÓÈÆäÊÇת»»²Ù×÷ÖбäÁ¿×´Ì¬²»ÄÜͬ²½£¬ËùÒÔRDDÎÞ·¨×ö¸´ÔӵľßÓÐ״̬µÄÊÂÎñ²Ù×÷¡£²»¹ý£¬RDDµÄʹÃüÊÇÌṩһ¸öͨÓõIJ¢ÐмÆËã¿ò¼Ü£¬¹À¼ÆÓÀÔ¶Ò²²»»áÌṩϸÁ£¶ÈµÄÊý¾Ýͬ²½»úÖÆ£¬ÒòΪÕâÓëÆäÉè¼ÆµÄ³õÖÔÊÇÎ¥±³µÄ¡£
RDDÓÅ»¯¼¼ÇÉ
RDD»º´æ
ÐèҪʹÓöà´ÎµÄÊý¾ÝÐèÒªcache£¬·ñÔò»á½øÐв»±ØÒªµÄÖØ¸´²Ù×÷¡£¾Ù¸öÀý×Ó
val data = ¡ // read from tdw
println(data.filter(_.contains("error")).count)
println(data.filter(_.contains("warning")).count)
|
ÉÏÃæÈý¶Î´úÂëÖУ¬data±äÁ¿»á¼ÓÔØÁ½´Î£¬¸ßЧµÄ×ö·¨ÊÇÔÚdata¼ÓÔØÍêºó£¬Á¢¿Ì³Ö¾Ã»¯µ½ÄÚ´æÖУ¬ÈçÏÂ
val data = ¡ // read from tdw
data.cache
println(data.filter(_.contains("error")).count)
println(data.filter(_.contains("warning")).count)
|
ÕâÑù£¬dataÔÚµÚÒ»¼ÓÔØºó£¬¾Í±»»º´æµ½ÄÚ´æÖУ¬ºóÃæÁ½´Î²Ù×÷¾ùÖ±½ÓʹÓÃÄÚ´æÖеÄÊý¾Ý¡£
ת»»²¢Ðл¯
RDDµÄת»»²Ù×÷ʱ²¢Ðл¯¼ÆËãµÄ£¬µ«ÊǶà¸öRDDµÄת»»Í¬ÑùÊÇ¿ÉÒÔ²¢Ðеģ¬²Î¿¼ÈçÏÂ
val dataList:Array[RDD[Int]] = ¡
val sumList = data.list.map(_.map(_.sum))
|
ÉÏÃæµÄÀý×ÓÖУ¬µÚÒ»¸ömapÊDZãÀûArray±äÁ¿£¬´®ÐеļÆËãÿ¸öRDDÖеÄÿÐеÄsum¡£ÓÉÓÚÿ¸öRDDÖ®¼ä¼ÆËãÊÇûÓÐÂß¼ÁªÏµµÄ£¬ËùÒÔÀíÂÛÉÏÊÇ¿ÉÒÔ½«RDDµÄ¼ÆËã²¢Ðл¯µÄ£¬ÔÚscalaÖпÉÒÔÇáËÉÊÔÏ£¬ÈçÏÂ
val dataList:Array[RDD[Int]] = ¡
val sumList = data.list.par.map(_.map(_.sum))
|
×¢ÒâºìÉ«´úÂë¡£
¼õÉÙshuffleÍøÂç´«Êä
Ò»°ã¶øÑÔ£¬ÍøÂçI/O¿ªÏúÊǺܴóµÄ£¬¼õÉÙÍøÂ翪Ïú£¬¿ÉÒÔÏÔÖø¼Ó¿ì¼ÆËãЧÂÊ¡£ÈÎÒâÁ½¸öRDDµÄshuffle²Ù×÷£¨joinµÈ£©µÄ´óÖ¹ý³ÌÈçÏ£¬
Óû§Êý¾ÝuserDataºÍʼþeventsÊý¾Ýͨ¹ýÓû§idÁ¬½Ó£¬ÄÇô»áÔÚÍøÂçÖд«µ½ÁíÍâÒ»¸ö½Úµã£¬Õâ¸ö¹ý³ÌÖУ¬ÓÐÁ½¸öÍøÂç´«Êä¹ý³Ì¡£SparkµÄĬÈÏÊÇÍê³ÉÕâÁ½¸ö¹ý³Ì¡£µ«ÊÇ£¬Èç¹ûÄã¶à¸æËßsparkһЩÐÅÏ¢£¬spark¿ÉÒÔÓÅ»¯£¬Ö»Ö´ÐÐÒ»¸öÍøÂç´«Êä¡£¿ÉÒÔͨ¹ýʹÓá¢HashPartition£¬ÔÚuserData"±¾µØ"ÏÈ·ÖÇø£¬È»ºóÒªÇóeventsÖ±½Óshuffleµ½userDataµÄ½ÚµãÉÏ£¬ÄÇô¾Í¼õÉÙÁËÒ»²¿·ÖÍøÂç´«Ê䣬¼õÉÙºóµÄЧ¹ûÈçÏ£¬
ÐéÏß²¿·Ö¶¼ÊÇÔÚ±¾µØÍê³ÉµÄ£¬Ã»ÓÐÍøÂç´«Êä¡£ÔÚÊý¾Ý¼ÓÔØÊ±£¬¾Í°´ÕÕkey½øÐÐpartition£¬ÕâÑù¿ÉÒÔ¾Ò»²¿µÄ¼õÉÙ±¾µØµÄHashPartitionµÄ¹ý³Ì£¬Ê¾Àý´úÂëÈçÏ£º
val userData = sc.sequenceFile[UserID, UserInfo]("hdfs://¡")
.partitionBy(new HashPartitioner(100)) // Create
100 partitions
.persist()
|
×¢Ò⣬ÉÏÃæÒ»¶¨Òªpersist£¬·ñÔò»áÖØ¸´¼ÆËã¶à´Î¡£100ÓÃÀ´Ö¸¶¨²¢ÐÐÊýÁ¿¡£
SparkÆäËû
Spark¿ª·¢Ä£Ê½
ÓÉÓÚsparkÓ¦ÓóÌÐòÊÇÐèÒªÔÚ²¿Êðµ½¼¯ÈºÉÏÔËÐе쬵¼Ö±¾µØµ÷ÊԱȽÏÂé·³£¬ËùÒÔ¾¹ýÕâ¶Îʱ¼äµÄ¾ÑéÀÛ»ý£¬×ܽáÁËÒ»Ì׿ª·¢Á÷³Ì£¬Ä¿µÄÊÇΪÁ˾¡¿ÉÄܵÄÌá¸ß¿ª·¢µ÷ÊÔЧÂÊ£¬Í¬Ê±±£Ö¤¿ª·¢ÖÊÁ¿¡£µ±È»£¬ÕâÌ×Á÷³Ì¿ÉÄÜÒ²²»ÊÇ×îÓŵģ¬ºóÃæÐèÒª³ÖÐø¸Ä½ø¡£
Õû¸öÁ÷³Ì±È½ÏÇå³þ£¬ÕâÀïÖ÷Ҫ̸̸ΪʲôÐèÒªµ¥Ôª²âÊÔ¡£¹«Ë¾ÄڵĴó¶àÊýÏîÄ¿£¬Ò»°ã²»Ìᳫµ¥Ôª²âÊÔ£¬¶øÇÒÓÉÓÚÏîÄ¿½ø¶ÈѹÁ¦£¬¿ª·¢ÈËÔ±»á·Ç³£µÖ´¥µ¥Ôª²âÊÔ£¬ÒòΪ»á»¨·Ñ"¶îÍâ"µÄ¾«Á¦¡£BugÕâ¶«Î÷²»»áÒòΪÏîÄ¿¸Ï½ø¶È¶øÏûʧ£¬¶øÇÒÇ¡ºÃÏà·´£¬¿ÉÄÜÒòΪ¸Ï½ø¶È£¬¶ø¸ßÓÚÆ½¾ùˮƽ¡£ËùÒÔ£¬Èç¹û²»»¨Ê±¼ä½øÐе¥Ôª²âÊÔ£¬ÄÇô»á»¨Í¬Ñù¶à£¬ÉõÖÁ¸ü¶àµÄʱ¼äµ÷ÊÔ¡£ºÜ¶àʱºò£¬ÍùÍùһЩºÜСµÄbug£¬È´µ¼ÖÂÄ㻨Á˺ܳ¤Ê±¼äÈ¥µ÷ÊÔ£¬¶øÕâЩbug£¬Ç¡ºÃÊǺÜÈÝÒ×ÔÚµ¥Ôª²âÊÔÖз¢Ïֵġ£¶øÇÒ£¬µ¥Ôª²âÊÔ»¹¿ÉÒÔ´øÀ´Á½¸ö¶îÍâµÄºÃ´¦£º1£©APIʹÓ÷¶Àý£»2£©»Ø¹é²âÊÔ¡£ËùÒÔ£¬»¹Êǵ¥Ôª²âÊÔ°É£¬ÕâÊÇÒ»±ÊͶ×Ê£¬¶øÇÒROI»¹Í¦¸ß£¡²»¹ý·²ÊÂÐèÒªÕÆÎշִ磬µ¥Ôª²âÊÔÓ¦¸Ã¸ù¾ÝÏîÄ¿½ôÆÈ³Ì¶Èµ÷ÕûÁ£¶È£¬×öµ½ÓÐËùΪ£¬ÓÐËù²»Îª¡£
SparkÆäËû¹¦ÄÜ
Ç°ÃæÌáµ½ÁËsparkÉú̬Ȧ£¬spark³ýÁ˺ËÐĵÄRDD£¬»¹ÌṩÁËÖ®ÉϵöºÜʹÓõÄÓ¦Óãº
1.Spark SQL: ÀàËÆhive£¬Ê¹ÓÃrddʵÏÖsql²éѯ
2.Spark Streaming: Á÷ʽ¼ÆË㣬Ìṩʵʱ¼ÆË㹦ÄÜ£¬ÀàËÆstorm
3.MLLib£º»úÆ÷ѧϰ¿â£¬Ìṩ³£Ó÷ÖÀ࣬¾ÛÀ࣬»Ø¹é£¬½»²æ¼ìÑéµÈ»úÆ÷ѧϰËã·¨²¢ÐÐʵÏÖ¡£
4.GraphX£ºÍ¼¼ÆËã¿ò¼Ü£¬ÊµÏÖÁË»ù±¾µÄͼ¼ÆË㹦ÄÜ£¬³£ÓÃͼËã·¨ºÍpregelͼ±à³Ì¿ò¼Ü¡£
ºóÃæÐèÒª¼ÌÐøÑ§Ï°ºÍʹÓÃÉÏÃæµÄ¹¦ÄÜ£¬ÓÈÆäÊÇÓëÊý¾ÝÍÚ¾òÇ¿Ïà¹ØµÄMLLib¡£
|