Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÉîÈëÀí½âSpark StreamingÖ´ÐÐÄ£ÐÍ
 
×÷ÕߣºTathagata DasµÈ À´Ô´: DataBricks   »ðÁú¹ûÈí¼þ  ·¢²¼ÓÚ 2015-9-22
  2988  次浏览      28
 

ÕªÒª£ºSpark StreamingÊÇSparkÖÐ×î³£ÓõÄ×é¼þÖ®Ò»£¬½«»áÓÐÔ½À´Ô½¶àµÄÓÐÁ÷´¦ÀíÐèÇóµÄÓû§Ì¤ÉÏSparkµÄʹÓÃ֮·¡£±¾ÎÄÃèÊöÁËSpark StreamingµÄ¼Ü¹¹²¢½âÊÍÈçºÎÈ¥ÌṩÉÏÊöÓÅÊÆ£¬ÒÔ¼°Ò»Ð©Ä¿Ç°½øÐеÄÁî´ó¼Ò¸ÐÐËȤµÄÏà¹ØºóÐø¹¤×÷¡£

ÕýÈçÊÐÃæÉÏ´æÔÚÖÚ¶à¿ÉÓõÄÁ÷´¦ÀíÒýÇæ£¬ÈËÃǾ­³£Ñ¯ÎÊÎÒÃÇSpark StreamingÓкζÀÌØµÄÓÅÊÆ£¿ÄÇôÊ×ÏÈҪ˵µÄ¾ÍÊÇApache SparkÔÚÅú´¦ÀíÒÔ¼°Á÷´¦ÀíÉÏÌṩÁËÔ­ÉúÖ§³Ö¡£ÕâÓë±ðµÄϵͳ²»Í¬Ö®´¦ÔÚÓÚÆäËûϵͳµÄ´¦ÀíÒýÇæÒªÃ´Ö»×¨×¢ÓÚÁ÷´¦Àí£¬ÒªÃ´Ö»¸ºÔðÅú´¦ÀíÇÒ½öÌṩÐèÒªÍⲿʵÏÖµÄÁ÷´¦ÀíAPI½Ó¿Ú¶øÒÑ¡£Spark ƾ½èÆäÖ´ÐÐÒýÇæÒÔ¼°Í³Ò»µÄ±à³ÌÄ£ÐÍ¿ÉʵÏÖÅú´¦ÀíÓëÁ÷´¦Àí£¬Õâ¾ÍÊÇÓ봫ͳÁ÷´¦ÀíϵͳÏà±ÈSpark StreamingËù¾ß±¸¶ÀÒ»ÎÞ¶þµÄÓÅÊÆ¡£ÓÈÆäÌØ±ðÌåÏÖÔÚÒÔÏÂËĸöÖØÒª²¿·Ö£º

  • ÄÜÔÚ¹ÊÕϱ¨´íÓëstragglerµÄÇé¿öÏÂѸËÙ»Ö¸´×´Ì¬£»
  • ¸üºÃµÄ¸ºÔؾùºâÓë×ÊԴʹÓã»
  • ¾²Ì¬Êý¾Ý¼¯ÓëÁ÷Êý¾ÝµÄÕûºÏºÍ¿É½»»¥²éѯ£»
  • ÄÚÖ÷ḻ¸ß¼¶Ëã·¨´¦Àí¿â£¨SQL¡¢»úÆ÷ѧϰ¡¢Í¼´¦Àí£©¡£

±¾ÎÄ£¬ÎÒÃǽ«ÃèÊöSpark StreamingµÄ¼Ü¹¹²¢½âÊÍÈçºÎÈ¥ÌṩÉÏÊöÓÅÊÆ¡£½ô½Ó×ÅÎÒÃÇ»¹»áÌÖÂÛһЩĿǰÕýÔÚ½øÐÐÁî´ó¼Ò¸ÐÐËȤµÄÏà¹ØºóÐø¹¤×÷¡£

Á÷´¦Àí¼Ü¹¹-¹ýÈ¥ÓëÏÖÔÚ

µ±Ç°·Ö²¼Ê½Á÷´¦Àí¹ÜµÀÖ´Ðз½Ê½ÈçÏÂËùÊö£º

  1. ½ÓÊÕÀ´×ÔÊý¾ÝÔ´µÄÁ÷Êý¾Ý£¨±ÈÈçʱÈÕÖ¾¡¢ÏµÍ³Ò£²âÊý¾Ý¡¢ÎïÁªÍøÉ豸Êý¾ÝµÈµÈ£©£¬´¦Àí³ÉΪÊý¾ÝÉãȡϵͳ£¬±ÈÈçApache Kafka¡¢Amazon KinesisµÈµÈ¡£
  2. ÔÚ¼¯ÈºÉϲ¢Ðд¦ÀíÊý¾Ý¡£ÕâÒ²ÊÇÉè¼ÆÁ÷´¦ÀíÒýÇæµÄ¹Ø¼üËùÔÚ£¬ÎÒÃǽ«ÔÚÏÂÎÄÖÐ×ö³ö¸üϸ½ÚÐÔµÄÌÖÂÛ¡£
  3. Êä³ö½á¹û´æ·ÅÖÁÏÂÓÎϵͳ£¨ÀýÈçHBase¡¢Cassandra, KafkaµÈµÈ£©¡£

ΪÁË´¦ÀíÕâЩÊý¾Ý£¬´ó²¿·Ö´«Í³µÄÁ÷´¦Àíϵͳ±»Éè¼ÆÎªÁ¬ÐøËã×Ó Ä£ÐÍ£¬Æä¹¤×÷·½Ê½ÈçÏ£º

  • ÓÐһϵÁеŤ×÷½Úµã£¬Ã¿×é½ÚµãÔËÐÐÒ»ÖÁ¶à¸öÁ¬ÐøËã×Ó£»
  • ¶ÔÓÚÁ÷Êý¾Ý£¬Ã¿¸öÁ¬ÐøËã×ÓÒ»´Î´¦ÀíÒ»Ìõ¼Ç¼£¬²¢ÇÒ½«¼Ç¼´«Ê䏸¹ÜµÀÖбðµÄËã×Ó£»
  • Ô´Ëã×Ó´ÓÉãÈëϵͳ½ÓÊÕÊý¾Ý£¬½Ó×ųÁËã×ÓÊä³öµ½ÏÂÓÎϵͳ¡£


ͼ1£º´«Í³Á÷´¦Àíϵͳ¼Ü¹¹

Á¬ÐøËã×ÓÊÇÒ»ÖÖ½ÏΪ¼òµ¥¡¢×ÔÈ»µÄÄ£ÐÍ¡£È»¶ø,Ëæ×ÅÈç½ñ´óÊý¾Ýʱ´úÏ£¬Êý¾Ý¹æÄ£µÄ²»¶ÏÀ©´óÒÔ¼°Ô½À´Ô½¸´ÔÓµÄʵʱ·ÖÎö£¬Õâ¸ö´«Í³µÄ¼Ü¹¹Ò²ÃæÁÙ×ÅÑϾþµÄÌôÕ½¡£Òò´Ë£¬ÎÒÃÇÉè¼ÆSpark Streaming¾ÍÊÇΪÁ˽â¾öÈçϼ¸µãÐèÇó£º

  • ¹ÊÕÏѸËÙ»Ö¸´¨CÊý¾ÝÔ½ÅӴ󣬳öÏÖ½Úµã¹ÊÕÏÓë½ÚµãÔËÐбäÂý£¨ÀýÈçstraggler£©Çé¿öµÄ¸ÅÂÊÒ²Ô½À´Ô½¸ß¡£Òò´Ë£¬ÏµÍ³ÒªÊÇÄܹ»ÊµÊ±¸ø³ö½á¹û,¾Í±ØÐëÄܹ»×Ô¶¯ÐÞ¸´¹ÊÕÏ¡£¿ÉϧÔÚ´«Í³Á÷´¦ÀíϵͳÖУ¬ÔÚÕâЩ¹¤×÷½Úµã¾²Ì¬·ÖÅäµÄÁ¬ÐøËã×ÓҪѸËÙÍê³ÉÕâÏ×÷ÈÔÈ»ÊǸöÌôÕ½£»
  • ¸ºÔؾùºâ¨CÔÚÁ¬ÐøËã×ÓϵͳÖй¤×÷½Úµã¼ä²»Æ½ºâ·ÖÅä¼ÓÔØ»áÔì³É²¿·Ö½ÚµãÐÔÄܵÄbottleneck£¨ÔËÐÐÆ¿¾±£©¡£ÕâЩÎÊÌâ¸ü³£¼ûÓÚ´ó¹æÄ£Êý¾ÝÓ붯̬±ä»¯µÄ¹¤×÷Á¿ÃæÇ°¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬ÄÇôҪÇóϵͳ±ØÐëÄܹ»¸ù¾Ý¹¤×÷Á¿¶¯Ì¬µ÷Õû½Úµã¼äµÄ×ÊÔ´·ÖÅ䣻
  • ͳһµÄÁ÷´¦ÀíÓëÅú´¦ÀíÒÔ¼°½»»¥¹¤×÷¨CÔÚÐí¶àÓÃÀýÖУ¬ÓëÁ÷Êý¾ÝµÄ½»»¥ÊǺÜÓбØÒªµÄ£¨±Ï¾¹ËùÓÐÁ÷ϵͳ¶¼½«ÕâÖÃÓÚÄÚ´æÖУ©»òÕßÓ뾲̬Êý¾Ý¼¯½áºÏ£¨ÀýÈçpre-computed model£©¡£ÕâЩ¶¼ºÜÄÑÔÚÁ¬ÐøËã×ÓϵͳÖÐʵÏÖ£¬µ±ÏµÍ³¶¯Ì¬µØÌí¼ÓÐÂËã×Óʱ£¬²¢Ã»ÓÐΪÆäÉè¼ÆÁÙʱ²éѯ¹¦ÄÜ£¬ÕâÑù´ó´óµÄÏ÷ÈõÁËÓû§ÓëϵͳµÄ½»»¥ÄÜÁ¦¡£Òò´ËÎÒÃÇÐèÒªÒ»¸öÒýÇæÄܹ»¼¯³ÉÅú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥²éѯ£»
  • ¸ß¼¶·ÖÎö£¨ÀýÈç»úÆ÷ѧϰ¡¢SQL²éѯµÈµÈ£©¨CһЩ¸ü¸´ÔӵŤ×÷ÐèÒª²»¶ÏѧϰºÍ¸üÐÂÊý¾ÝÄ£ÐÍ£¬»òÕßÀûÓÃSQL²éѯÁ÷Êý¾ÝÖÐ×îеÄÌØÕ÷ÐÅÏ¢¡£Òò´Ë£¬ÕâЩ·ÖÎöÈÎÎñÖÐÐèÒªÓÐÒ»¸ö¹²Í¬µÄ¼¯³É³éÏó×é¼þ£¬Èÿª·¢ÈËÔ±¸üÈÝÒ×µØÈ¥Íê³ÉËûÃǵŤ×÷¡£

ΪÁ˽â¾öÕâЩҪÇó£¬Spark StreamingʹÓÃÁËÒ»¸öеĽṹ£¬ÎÒÃdzÆÖ®Îªdiscretized streams£¨ÀëÉ¢»¯µÄÁ÷Êý¾Ý´¦Àí£©£¬Ëü¿ÉÒÔÖ±½ÓʹÓÃSparkÒýÇæÖзḻµÄ¿â²¢ÇÒÓµÓÐÓÅÐãµÄ¹ÊÕÏÈÝ´í»úÖÆ¡£

Spark Streaming¼Ü¹¹£ºÀëÉ¢»¯µÄÁ÷Êý¾Ý´¦Àí

¶ÔÓÚ´«Í³Á÷´¦ÀíÖÐÒ»´Î´¦ÀíÒ»Ìõ¼Ç¼µÄ·½Ê½¶øÑÔ£¬Spark StreamingÈ¡¶ø´úÖ®µÄÊǽ«Á÷Êý¾ÝÀëÉ¢»¯´¦Àí£¬Ê¹Ö®Äܹ»½øÐÐÃë¼¶ÒÔϵÄ΢ÐÍÅú´¦Àí¡£Í¬Ê±Spark StreamingµÄReceiver²¢ÐнÓÊÕÊý¾Ý£¬½«Êý¾Ý»º´æÖÁSpark¹¤×÷½ÚµãµÄÄÚ´æÖС£¾­¹ýÑÓ³ÙÓÅ»¯ºóSparkÒýÇæ¶Ô¶ÌÈÎÎñ£¨¼¸Ê®ºÁÃ룩Äܹ»½øÐÐÅú´¦Àí²¢Çҿɽ«½á¹ûÊä³öÖÁ±ðµÄϵͳÖС£ÖµµÃ×¢ÒâµÄÊÇÓ봫ͳÁ¬ÐøËã×ÓÄ£ÐͲ»Í¬£¬ÆäÖд«Í³Ä£ÐÍÊǾ²Ì¬·ÖÅä¸øÒ»¸ö½Úµã½øÐмÆË㣬¶øSpark task¿É»ùÓÚÊý¾ÝµÄÀ´Ô´ÒÔ¼°¿ÉÓÃ×ÊÔ´Çé¿ö¶¯Ì¬·ÖÅ䏸¹¤×÷½Úµã¡£ÕâÄܹ»¸üºÃµÄÍê³ÉÎÒÃÇÔÚ½ÓÏÂÀ´ËùÒªÃèÊöµÄÁ½¸öÌØÐÔ£º¸ºÔؾùºâÓë¿ìËÙ¹ÊÕϻָ´¡£

³ý´ËÖ®Í⣬ÿÅúÊý¾ÝÎÒÃǶ¼³ÆÖ®Îªµ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£¨RDD£©£¬ÕâÊÇSparkÖÐÈÝ´íÊý¾Ý¼¯µÄÒ»¸ö»ù±¾³éÏó¡£ÕýÊÇÈç´Ë£¬ÕâЩÁ÷Êý¾Ý²ÅÄÜ´¦ÀíSparkµÄÈÎÒâÖ¸ÁîÓë¿â¡£


ͼ2£ºSpark Streaming¼Ü¹¹

ÀëÉ¢»¯Á÷Êý¾Ý´¦ÀíµÄÓŵã

ÎÒÃÇÀ´¿´¿´Õâ¸ö¼Ü¹¹ÈçºÎͨ¹ýSpark StreamingÀ´Íê³ÉÎÒÃÇ֮ǰÉèÁ¢µÄÄ¿±ê¡£

¶¯Ì¬¸ºÔؾùºâ

Sparkϵͳ½«Êý¾Ý»®·ÖΪСÅúÁ¿£¬ÔÊÐí¶Ô×ÊÔ´½øÐÐϸÁ£¶È·ÖÅä¡£ÀýÈ磬¿¼Âǵ±ÊäÈëÊý¾ÝÁ÷ÐèÒªÓÉÒ»¸ö¼üÖµÀ´·ÖÇø´¦Àí¡£ÔÚÕâÖÖ¼òµ¥µÄÇé¿öÏ£¬±ðµÄϵͳÀïµÄ´«Í³¾²Ì¬·ÖÅätask¸ø½Úµã·½Ê½ÖУ¬Èç¹ûÆäÖÐÒ»¸ö·ÖÇø¼ÆËã±È±ðµÄ¸üÃܼ¯£¬ÄÇô¸Ã½Úµã´¦Àí½«»áÓöµ½ÐÔÄÜÆ¿¾±£¬Í¬Ê±½«»á¼õ»º¹ÜµÀ´¦Àí¡£¶øÔÚSpark StreamingÖУ¬×÷ÒµÈÎÎñ½«»á¶¯Ì¬µØÆ½ºâ·ÖÅ䏸¸÷¸ö½Úµã£¬Ò»Ð©½Úµã»á´¦ÀíÊýÁ¿½ÏÉÙÇÒºÄʱ½Ï³¤task£¬±ðµÄ½Úµã½«»á´¦ÀíÊýÁ¿¸ü¶àÇÒºÄʱ¸ü¶ÌµÄtask¡£


ͼ3£º¶¯Ì¬¸ºÔؾùºâ

¿ìËÙ¹ÊÕϻָ´»úÖÆ

ÔÚ½Úµã¹ÊÕϵݸÀýÖУ¬´«Í³ÏµÍ³»áÔÚ±ðµÄ½ÚµãÉÏÖØÆôʧ°ÜµÄÁ¬ÐøËã×Ó¡£ÎªÁËÖØÐ¼ÆË㶪ʧµÄÐÅÏ¢£¬»¹²»µÃ²»ÖØÐÂÔËÐÐÒ»±éÏÈǰÊý¾ÝÁ÷´¦Àí¹ý³Ì¡£ÖµµÃ×¢ÒâµÄÊÇ£¬´Ë¹ý³ÌÖ»ÓÐÒ»¸ö½ÚµãÔÚ´¦ÀíÖØÐ¼ÆË㣬¶øÇҹܵÀÎÞ·¨¼ÌÐø½øÐй¤×÷£¬³ý·ÇеĽڵãÐÅÏ¢ÒѾ­»Ö¸´µ½¹ÊÕÏǰµÄ״̬¡£ÔÚSparkÖУ¬¼ÆË㽫±»²ð·Ö³É¶à¸öСµÄtask£¬±£Ö¤ÄÜÔÚÈκεط½ÔËÐжøÓÖ²»Ó°ÏìºÏ²¢ºó½á¹ûÕýÈ·ÐÔ¡£Òò´Ë£¬Ê§°ÜµÄtask¿ÉÒÔÍ¬Ê±ÖØÐÂÔÚ¼¯Èº½ÚµãÉϲ¢Ðд¦Àí£¬´Ó¶ø¾ùÔȵķֲ¼ÔÚËùÓÐÖØÐ¼ÆËãÇé¿öϵÄÖÚ¶à½ÚµãÖУ¬ÕâÑùÏà±ÈÓÚ´«Í³·½·¨Äܹ»¸ü¿ìµØ´Ó¹ÊÕÏÖлָ´¹ýÀ´¡£


ͼ4£º¿ìËÙ¹ÊÕϻָ´Ô­Àí

Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½·ÖÎöµÄÒ»Ì廯

ÀëÉ¢Êý¾ÝÁ÷£¨DStream£©×÷ΪSpark StreamingÖÐÒ»¸ö¹Ø¼üµÄ³ÌÐò³éÏó¡£ÔÚÆäÄÚ²¿£¬DStreamÊÇͨ¹ýÒ»×éʱ¼äÐòÁÐÉÏÁ¬ÐøµÄRDDÀ´±íʾµÄ£¬Ã¿Ò»¸öRDD¶¼°üº¬ÁËÌØ¶¨Ê±¼ä¼ä¸ôÄÚµÄÊý¾ÝÁ÷¡£ÕâÖÖ³£ÓñíʾÔÊÐíÅú´¦ÀíºÍÁ÷´¦Àí½øÐÐÎÞ·ì½»»¥²Ù×÷¡£´Ó¶øÓû§¿ÉÒÔ¶ÔÿһÅúÁ÷Êý¾Ý½øÐÐSparkÏà¹Ø²Ù×÷¡£ÀýÈ磺ÀûÓÃDStreamÓëÔ¤ÏÈ´´½¨µÄÊý¾Ý¼¯ÏàÁ¬½Ó¡£

// Create data set from Hadoop file
val dataset = sparkContext.hadoopFile(¡°file¡±)

// Join each batch in stream with the dataset
kafkaDStream.transform { batchRDD =>
batchRDD.join(dataset).filter(...)
}

ÕýÈçÁ÷Êý¾ÝÖÐÿһÅú¶¼´¢´æÓÚSpark½ÚµãÖеÄÄÚ´æÀÎÒÃDZãÄܸù¾ÝËùÐè½øÐн»»¥²éѯ¡£ÀýÈ磬Äã¿ÉÒÔͨ¹ýSpark SQL JDBC server£¬²éѯËùÓÐstreamµÄ״̬£¬¸ÃÄÚÈÝÎÒÃÇÔÚϽÚÖÐÒ²»áչʾ¡£ÕýÒòΪSpark¶ÔÕâЩ¹¤×÷½øÐÐÒ»¸ö¹²ÓеijéÏó£¬ËùÒÔÕâÖÖ½«Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½¹¤×÷½áºÏÔÚÒ»ÆðµÄÇé¿ö£¬ÔÚSparkÖÐÊǷdz£ÈÝÒ×ʵÏֵ쬶øÔÚÄÇЩûÓй²Í¬³éÏóµÄϵͳÖÐÈ´ºÜÄÑ¡£

¸ß¼¶·ÖÎö-»úÆ÷ѧϰ¡¢SQL²éѯ

ÒòΪSpark¾ßÓл¥²Ù×÷ÐÔ£¬Òò´ËÑÓÉì³ö·á¸»µÄ¿â¹©Óû§Ê¹Óã¬ÀýÈ磺MLlib£¨»úÆ÷ѧϰ£©¡¢SQL¡¢DataFramesºÍGraphx¡£ÏÂÃæÎÒÃÇÀ´Ò»Æð̽Ë÷һЩÓÃÀý£º

  • Streaming + SQL and DataFrames

DStreamÄÚ²¿Î¬»¤µÄRDDÐòÁпÉÒÔ±»×ª»»³ÉDataFrame£¨Spark SQLµÄ±à³Ì½Ó¿Ú£©£¬½ø¶ø¿Éͨ¹ýSQLÓï¾ä½øÐвéѯ²Ù×÷¡£ÀýÈ磺ʹÓÃSpark SQLµÄJDBC server,Íⲿ³ÌÐò¿ÉÒÔͨ¹ýSQL²éѯstreamµÄ״̬¡£

val hiveContext = new HiveContext(sparkContext)
...
wordCountsDStream.foreachRDD { rdd =>
// Convert RDD to DataFrame and register it as a SQL table
val wordCountsDataFrame = rdd.toDF("word¡±, ¡°count")
wordCountsDataFrame.registerTempTable("word_counts")
}
...
// Start the JDBC server
HiveThriftServer2.startWithContext(hiveContext)

Äã¿ÉÒÔͨ¹ýJDBC serverʹÓÃSpark¸½´øµÄbeeline client»òÕßtableau¹¤¾ß½»»¥²éѯ³ÖÐø¸üеġ°word_counts¡±±í¡£

1: jdbc:hive2://localhost:10000> show tables;
+--------------+--------------+
| tableName | isTemporary |
+--------------+--------------+
| word_counts | true |
+--------------+--------------+
1 row selected (0.102 seconds)
1: jdbc:hive2://localhost:10000> select * from word_counts;
+-----------+--------+
| word | count |
+-----------+--------+
| 2015 | 264 |
| PDT | 264 |
| 21:45:41 | 27 |
  • Streaming + MLlib

»úÆ÷ѧϰģÐÍ¿Éͨ¹ýMLlib½øÐÐÀëÏßÉú³É£¬ÄÜÓ¦ÓÃÓÚÁ÷Êý¾ÝÖС£ÀýÈ磬ÔÚÏÂÃæµÄ´úÂëÓþ²Ì¬Êý¾ÝÐγÉÒ»¸öKMeans¾ÛÀàÄ£ÐÍ£¬È»ºóʹÓÃÄ£ÐͶÔKafkaÊý¾ÝÁ÷½øÐзÖÀà¡£

// Learn model offline
val model = KMeans.train(dataset, ...)

// Apply model online on stream
val kafkaStream = KafkaUtils.createDStream(...)
kafkaStream.map { event => model.predict(featurize(event)) }

ÎÒÃÇÔÚSpark Summit 2014 Databricks demoÉÏÖ¤Ã÷ÁËÕâÖÖ¡±ÀëÏßѧϰÔÚÏßÔ¤²â¡±µÄ·½·¨¡£×ÔÄÇÒÔºó£¬ÎÒÃÇÒ²ÔÚMLlibÖÐÔö¼Ó¹ØÓÚÁ÷µÄ»úÆ÷ѧϰËã·¨£¬ÕâÑù¾ÍÄܳÖÐøÐγÉһЩ±ê¼ÇÊý¾ÝÁ÷¡£ÆäËûµÄSpark À©Õ¹¿âҲͬÑùÄÜÔÚSpark StreamingÉϱ»ÇáÒ×µ÷Óá£

ÐÔÄÜ·ÖÎö

¼øÓÚSpark Streaming¶ÀÒ»ÎÞ¶þµÄÉè¼Æ£¬ÄÇôËüÔËÐеÄËÙ¶ÈÓжà¿ìÄØ£¿Êµ¼ÊÉÏSpark StreamingµÄÄÜÁ¦ÌåÏÖÔÚÅúÁ¿´¦ÀíÊý¾ÝÒÔ¼°ÀûÓÃSpark ÒýÇæÉú³ÉÓë±ðµÄÁ÷ϵͳ±ÈÏ൱»òÕ߸ü¸ßµÄÍÌÍÂÁ¿¡£ÔÚÑÓ³Ù·½Ã棬Spark Streaming¿ÉÒÔʵÏÖµÍÖÁ¼¸°ÙºÁÃëµÄÑÓ³Ù¡£¿ª·¢ÕßÓÐʱ»áÎÊ΢Åú´¦ÀíÊÇ·ñÓн϶àµÄÑÓ³Ù¡£ÔÚʵ¼ùÖУ¬Åú´¦ÀíÑÓ³ÙÖ»ÊǶ˵½¶Ë¹ÜµÀÑÓ³ÙµÄһС²¿·Ö¡£ÎÞÂÛÊÇÔÚSparkϵͳ»¹ÊÇÁ¬ÐøËã×ÓϵͳÏ£¬Ðí¶àÓ¦ÓóÌÐò¼ÆËã½á¹ûÊǸù¾ÝÒ»¸ö»¬¶¯µÄ´°¿ÚÀïËù»ñµÃµÄÊý¾ÝÁ÷¼ÆËãµÃµ½µÄ£¬Õâ¸ö´°¿ÚµÄ¸üÐÂÒ²ÊǶ¨Ê±µÄ£¨ÀýÈç´°¿Ú¼ä¸ôÉèΪ20Ã룬»¬¶¯¼ä¸ôÉèΪ2Ã룬±íʾÿ¸ô2Ãë¼ÆËã¸üÐÂÒ»´Î´°¿Úǰ20ÃëµÄÐÅÏ¢£©¡£ÐèÒª¹ÜµÀÊÕ¼¯À´×Ô¶à¸öÀ´Ô´µÄ¼Ç¼²¢ÇҵȴýÒ»¸ö¶ÌµÄʱ¼äÄÚ´¦ÀíÑÓ³Ù»òÎÞÐòÊý¾Ý¡£×îºó£¬×Ô¶¯´¥·¢Ëã·¨ÍùÍùµÈ´ýÒ»¶Îʱ¼ä²Å´¥·¢¡£Òò´Ë£¬Ïà±ÈÓڶ˵½¶ËµÄÑÓ³Ù£¬Åú´¦ÀíÑÓ³ÙºÜÉÙ»áÔö¼ÓºÜ¶àµÄ·ÑÓã¬ÒòΪÅú´¦ÀíÑÓ³ÙÍùÍùºÜС¡£´ËÍ⣬´ÓDStreamÍÌÍÂÁ¿ÔöÒæÉÏÀ´¿´Ò»°ãÒâζ×ÅÎÒÃÇ¿ÉÒÔÓøüÉٵĻúÆ÷È¥´¦ÀíͬÑùµÄ¹¤×÷Á¿£¬Õâ±ãÊÇÐÔÄÜÉÏËù´øÀ´µÄÌáÉý¡£

Spark StreamingµÄδÀ´·½Ïò

Spark StreamingÊÇSparkÖÐ×î³£ÓõÄ×é¼þÖ®Ò»£¬½«»áÓÐÔ½À´Ô½¶àµÄÓÐÁ÷´¦ÀíÐèÇóµÄÓû§Ì¤ÉÏSparkµÄʹÓÃ֮·¡£Ò»Ð©ÎÒÃÇÍŶÓÕýÔÚÑо¿µÄ×î¸ßÓÅÏȼ¶µÄÏîÄ¿½«»áÔÚÏÂÎÄÖб»ÌÖÂÛµ½¡£Äã¿ÉÒÔÔÚSpark½ÓÀ´Ï¼¸¸ö°æ±¾ÖÐÆÚ´ýÕâÐ©ÌØÐԵijöÏÖ£º

  • Backpressure¨CÔÚÁ÷×÷ÒµÖпÉÄܾ­³£Óöµ½±¬·¢µÄÊý¾ÝÁ¿£¨ÀýÈç:ÔÚ°Â˹¿¨°ä½±ÆÚ¼ä¼¤ÔöµÄ΢²©Á¿£©£¬Òò´Ëϵͳ±ØÐëÄܹ»ÍêÃÀµÄ´¦ÀíºÃËüÃÇ¡£ÔÚSpark 1.5°æ±¾ÖУ¬Spark½«»áÔö¼Ó¸üºÃµÄBackpressure»úÖÆ£¬ÈÃSpark StremingÄܶ¯Ì¬µØ¿ØÖÆÕâÖÖ±¬·¢µÄ ÉãÈëÂÊ¡£´Ë¹¦ÄÜÊÇÎÒÃÇDatabricksÓëTypesafeµÄ¹¤³ÌʦÃǹ²Í¬Íê³ÉµÄ£»
  • Dynamic scaling ¨Cµ¥µ¥¿ØÖƹ̶¨µÄÊý¾Ý¶ÁÈ¡ingestion rate²»×ãÒÔÈ¥´¦Àí¸ü³¤Ê±¼ä·¶Î§µÄÊý¾Ý±ä»¯¡££¨ÀýÈç:Ïà±ÈÓÚÒ¹¼ä£¬°×Ìì´æÔÚ³ÖÐø½Ï¸ßµÄ·¢Î¢²©ÂÊ£©¡£»ùÓÚÕâ¸ö´¦ÀíÒªÇó £¬ÕâЩ±ä»¯¿ÉÒÔ±»¶¯Ì¬µØËõ·Å¼¯ÈºÉÏ×ÊÔ´¡£ÔÚSpark Streaming¼Ü¹¹ÖУ¬ÕâÊǺÜÈÝÒ×ȥʵÏֵģ¬ÒòΪ¼ÆËãÒѾ­±»·Ö³ÉһϵÁÐСµÄtask£¬Èç¹û¼¯ÈºÄ£Ê½£¨ÀýÈçYARN, Mesos, Amazon EC2µÈµÈ £©ÐèÒª¸ü¶àµÄ½ÚµãÈ¥½øÐмÆË㣬ÄÇôËüÃÇÄܶ¯Ì¬µØ·ÖÅäµ½Ò»¸ö¸ü´óµÄ¼¯Èº»·¾³¡£Îª´ËÎÒÃǽ«Ôö¼ÓÖ§³Ö×Ô¶¯»¯µÄDynamic scaling£»
  • ʼþʱ¼äºÍÎÞÐòÊý¾Ý¨Cʵ¼ùÖУ¬Óû§ÓÐʱ»á¼Ç¼ÏÂÎÞÐòÊý¾ÝÐÅÏ¢£¬Spark StreamingÔÊÐíÓû§Í¨¹ý×Ô¶¨Òåʱ¼äÌáÈ¡º¯ÊýÀ´Ö§³Öʼþʱ¼äÅÅÐò£»
  • UI½çÃæÔöÇ¿¨C×îºó£¬ÎÒÃÇÏ£Íûʹ¿ª·¢ÈËÔ±Äܹ»ÇáËɵ÷ÊÔËûÃǵÄStreaming?applications¡£»ùÓÚÕâ¸öÄ¿µÄ£¬ÔÚSpark 1.4ÖУ¬ÎÒÃÇÔö¼ÓеĿÉÊÓ»¯Spark Streaming UI£¬Èÿª·¢ÈËÔ±ÄÜÃÜÇмàÊÓËûÃÇÓ¦ÓóÌÐòµÄÐÔÄÜ¡£ÔÚSpark 1.5ÖУ¬ÎÒÃÇͨ¹ýչʾ¸ü¶àµÄÊäÈëÐÅÏ¢£¨ÀýÈçKafkaÏûÏ¢Æ«ÒÆÁ¿£©½øÒ»²½Ìá¸ßÁËÕâÏÄÜ¡£

 

   
2988 ´Îä¯ÀÀ       28
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢ 6-12[ÏÃÃÅ]
È˹¤ÖÇÄÜ.»úÆ÷ѧϰTensorFlow 6-22[Ö±²¥]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 6-30[±±¾©]
ǶÈëʽÈí¼þ¼Ü¹¹-¸ß¼¶Êµ¼ù 7-9[±±¾©]
Óû§ÌåÑé¡¢Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À 7-25[Î÷°²]
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-23[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí