ÕªÒª£ºSpark StreamingÊÇSparkÖÐ×î³£ÓõÄ×é¼þÖ®Ò»£¬½«»áÓÐÔ½À´Ô½¶àµÄÓÐÁ÷´¦ÀíÐèÇóµÄÓû§Ì¤ÉÏSparkµÄʹÓÃ֮·¡£±¾ÎÄÃèÊöÁËSpark StreamingµÄ¼Ü¹¹²¢½âÊÍÈçºÎÈ¥ÌṩÉÏÊöÓÅÊÆ£¬ÒÔ¼°Ò»Ð©Ä¿Ç°½øÐеÄÁî´ó¼Ò¸ÐÐËȤµÄÏà¹ØºóÐø¹¤×÷¡£
ÕýÈçÊÐÃæÉÏ´æÔÚÖÚ¶à¿ÉÓõÄÁ÷´¦ÀíÒýÇæ£¬ÈËÃǾ³£Ñ¯ÎÊÎÒÃÇSpark StreamingÓкζÀÌØµÄÓÅÊÆ£¿ÄÇôÊ×ÏÈҪ˵µÄ¾ÍÊÇApache SparkÔÚÅú´¦ÀíÒÔ¼°Á÷´¦ÀíÉÏÌṩÁËÔÉúÖ§³Ö¡£ÕâÓë±ðµÄϵͳ²»Í¬Ö®´¦ÔÚÓÚÆäËûϵͳµÄ´¦ÀíÒýÇæÒªÃ´Ö»×¨×¢ÓÚÁ÷´¦Àí£¬ÒªÃ´Ö»¸ºÔðÅú´¦ÀíÇÒ½öÌṩÐèÒªÍⲿʵÏÖµÄÁ÷´¦ÀíAPI½Ó¿Ú¶øÒÑ¡£Spark ƾ½èÆäÖ´ÐÐÒýÇæÒÔ¼°Í³Ò»µÄ±à³ÌÄ£ÐÍ¿ÉʵÏÖÅú´¦ÀíÓëÁ÷´¦Àí£¬Õâ¾ÍÊÇÓ봫ͳÁ÷´¦ÀíϵͳÏà±ÈSpark StreamingËù¾ß±¸¶ÀÒ»ÎÞ¶þµÄÓÅÊÆ¡£ÓÈÆäÌØ±ðÌåÏÖÔÚÒÔÏÂËĸöÖØÒª²¿·Ö£º
- ÄÜÔÚ¹ÊÕϱ¨´íÓëstragglerµÄÇé¿öÏÂѸËÙ»Ö¸´×´Ì¬£»
- ¸üºÃµÄ¸ºÔؾùºâÓë×ÊԴʹÓã»
- ¾²Ì¬Êý¾Ý¼¯ÓëÁ÷Êý¾ÝµÄÕûºÏºÍ¿É½»»¥²éѯ£»
- ÄÚÖ÷ḻ¸ß¼¶Ëã·¨´¦Àí¿â£¨SQL¡¢»úÆ÷ѧϰ¡¢Í¼´¦Àí£©¡£
±¾ÎÄ£¬ÎÒÃǽ«ÃèÊöSpark StreamingµÄ¼Ü¹¹²¢½âÊÍÈçºÎÈ¥ÌṩÉÏÊöÓÅÊÆ¡£½ô½Ó×ÅÎÒÃÇ»¹»áÌÖÂÛһЩĿǰÕýÔÚ½øÐÐÁî´ó¼Ò¸ÐÐËȤµÄÏà¹ØºóÐø¹¤×÷¡£
Á÷´¦Àí¼Ü¹¹-¹ýÈ¥ÓëÏÖÔÚ
µ±Ç°·Ö²¼Ê½Á÷´¦Àí¹ÜµÀÖ´Ðз½Ê½ÈçÏÂËùÊö£º
- ½ÓÊÕÀ´×ÔÊý¾ÝÔ´µÄÁ÷Êý¾Ý£¨±ÈÈçʱÈÕÖ¾¡¢ÏµÍ³Ò£²âÊý¾Ý¡¢ÎïÁªÍøÉ豸Êý¾ÝµÈµÈ£©£¬´¦Àí³ÉΪÊý¾ÝÉãȡϵͳ£¬±ÈÈçApache Kafka¡¢Amazon KinesisµÈµÈ¡£
- ÔÚ¼¯ÈºÉϲ¢Ðд¦ÀíÊý¾Ý¡£ÕâÒ²ÊÇÉè¼ÆÁ÷´¦ÀíÒýÇæµÄ¹Ø¼üËùÔÚ£¬ÎÒÃǽ«ÔÚÏÂÎÄÖÐ×ö³ö¸üϸ½ÚÐÔµÄÌÖÂÛ¡£
- Êä³ö½á¹û´æ·ÅÖÁÏÂÓÎϵͳ£¨ÀýÈçHBase¡¢Cassandra, KafkaµÈµÈ£©¡£
ΪÁË´¦ÀíÕâЩÊý¾Ý£¬´ó²¿·Ö´«Í³µÄÁ÷´¦Àíϵͳ±»Éè¼ÆÎªÁ¬ÐøËã×Ó Ä£ÐÍ£¬Æä¹¤×÷·½Ê½ÈçÏ£º
- ÓÐһϵÁеŤ×÷½Úµã£¬Ã¿×é½ÚµãÔËÐÐÒ»ÖÁ¶à¸öÁ¬ÐøËã×Ó£»
- ¶ÔÓÚÁ÷Êý¾Ý£¬Ã¿¸öÁ¬ÐøËã×ÓÒ»´Î´¦ÀíÒ»Ìõ¼Ç¼£¬²¢ÇÒ½«¼Ç¼´«Ê䏸¹ÜµÀÖбðµÄËã×Ó£»
- Ô´Ëã×Ó´ÓÉãÈëϵͳ½ÓÊÕÊý¾Ý£¬½Ó×ųÁËã×ÓÊä³öµ½ÏÂÓÎϵͳ¡£

ͼ1£º´«Í³Á÷´¦Àíϵͳ¼Ü¹¹
Á¬ÐøËã×ÓÊÇÒ»ÖÖ½ÏΪ¼òµ¥¡¢×ÔÈ»µÄÄ£ÐÍ¡£È»¶ø,Ëæ×ÅÈç½ñ´óÊý¾Ýʱ´úÏ£¬Êý¾Ý¹æÄ£µÄ²»¶ÏÀ©´óÒÔ¼°Ô½À´Ô½¸´ÔÓµÄʵʱ·ÖÎö£¬Õâ¸ö´«Í³µÄ¼Ü¹¹Ò²ÃæÁÙ×ÅÑϾþµÄÌôÕ½¡£Òò´Ë£¬ÎÒÃÇÉè¼ÆSpark Streaming¾ÍÊÇΪÁ˽â¾öÈçϼ¸µãÐèÇó£º
- ¹ÊÕÏѸËÙ»Ö¸´¨CÊý¾ÝÔ½ÅӴ󣬳öÏÖ½Úµã¹ÊÕÏÓë½ÚµãÔËÐбäÂý£¨ÀýÈçstraggler£©Çé¿öµÄ¸ÅÂÊÒ²Ô½À´Ô½¸ß¡£Òò´Ë£¬ÏµÍ³ÒªÊÇÄܹ»ÊµÊ±¸ø³ö½á¹û,¾Í±ØÐëÄܹ»×Ô¶¯ÐÞ¸´¹ÊÕÏ¡£¿ÉϧÔÚ´«Í³Á÷´¦ÀíϵͳÖУ¬ÔÚÕâЩ¹¤×÷½Úµã¾²Ì¬·ÖÅäµÄÁ¬ÐøËã×ÓҪѸËÙÍê³ÉÕâÏ×÷ÈÔÈ»ÊǸöÌôÕ½£»
- ¸ºÔؾùºâ¨CÔÚÁ¬ÐøËã×ÓϵͳÖй¤×÷½Úµã¼ä²»Æ½ºâ·ÖÅä¼ÓÔØ»áÔì³É²¿·Ö½ÚµãÐÔÄܵÄbottleneck£¨ÔËÐÐÆ¿¾±£©¡£ÕâЩÎÊÌâ¸ü³£¼ûÓÚ´ó¹æÄ£Êý¾ÝÓ붯̬±ä»¯µÄ¹¤×÷Á¿ÃæÇ°¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬ÄÇôҪÇóϵͳ±ØÐëÄܹ»¸ù¾Ý¹¤×÷Á¿¶¯Ì¬µ÷Õû½Úµã¼äµÄ×ÊÔ´·ÖÅ䣻
- ͳһµÄÁ÷´¦ÀíÓëÅú´¦ÀíÒÔ¼°½»»¥¹¤×÷¨CÔÚÐí¶àÓÃÀýÖУ¬ÓëÁ÷Êý¾ÝµÄ½»»¥ÊǺÜÓбØÒªµÄ£¨±Ï¾¹ËùÓÐÁ÷ϵͳ¶¼½«ÕâÖÃÓÚÄÚ´æÖУ©»òÕßÓ뾲̬Êý¾Ý¼¯½áºÏ£¨ÀýÈçpre-computed model£©¡£ÕâЩ¶¼ºÜÄÑÔÚÁ¬ÐøËã×ÓϵͳÖÐʵÏÖ£¬µ±ÏµÍ³¶¯Ì¬µØÌí¼ÓÐÂËã×Óʱ£¬²¢Ã»ÓÐΪÆäÉè¼ÆÁÙʱ²éѯ¹¦ÄÜ£¬ÕâÑù´ó´óµÄÏ÷ÈõÁËÓû§ÓëϵͳµÄ½»»¥ÄÜÁ¦¡£Òò´ËÎÒÃÇÐèÒªÒ»¸öÒýÇæÄܹ»¼¯³ÉÅú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥²éѯ£»
- ¸ß¼¶·ÖÎö£¨ÀýÈç»úÆ÷ѧϰ¡¢SQL²éѯµÈµÈ£©¨CһЩ¸ü¸´ÔӵŤ×÷ÐèÒª²»¶ÏѧϰºÍ¸üÐÂÊý¾ÝÄ£ÐÍ£¬»òÕßÀûÓÃSQL²éѯÁ÷Êý¾ÝÖÐ×îеÄÌØÕ÷ÐÅÏ¢¡£Òò´Ë£¬ÕâЩ·ÖÎöÈÎÎñÖÐÐèÒªÓÐÒ»¸ö¹²Í¬µÄ¼¯³É³éÏó×é¼þ£¬Èÿª·¢ÈËÔ±¸üÈÝÒ×µØÈ¥Íê³ÉËûÃǵŤ×÷¡£
ΪÁ˽â¾öÕâЩҪÇó£¬Spark StreamingʹÓÃÁËÒ»¸öеĽṹ£¬ÎÒÃdzÆÖ®Îªdiscretized streams£¨ÀëÉ¢»¯µÄÁ÷Êý¾Ý´¦Àí£©£¬Ëü¿ÉÒÔÖ±½ÓʹÓÃSparkÒýÇæÖзḻµÄ¿â²¢ÇÒÓµÓÐÓÅÐãµÄ¹ÊÕÏÈÝ´í»úÖÆ¡£
Spark Streaming¼Ü¹¹£ºÀëÉ¢»¯µÄÁ÷Êý¾Ý´¦Àí
¶ÔÓÚ´«Í³Á÷´¦ÀíÖÐÒ»´Î´¦ÀíÒ»Ìõ¼Ç¼µÄ·½Ê½¶øÑÔ£¬Spark StreamingÈ¡¶ø´úÖ®µÄÊǽ«Á÷Êý¾ÝÀëÉ¢»¯´¦Àí£¬Ê¹Ö®Äܹ»½øÐÐÃë¼¶ÒÔϵÄ΢ÐÍÅú´¦Àí¡£Í¬Ê±Spark StreamingµÄReceiver²¢ÐнÓÊÕÊý¾Ý£¬½«Êý¾Ý»º´æÖÁSpark¹¤×÷½ÚµãµÄÄÚ´æÖС£¾¹ýÑÓ³ÙÓÅ»¯ºóSparkÒýÇæ¶Ô¶ÌÈÎÎñ£¨¼¸Ê®ºÁÃ룩Äܹ»½øÐÐÅú´¦Àí²¢Çҿɽ«½á¹ûÊä³öÖÁ±ðµÄϵͳÖС£ÖµµÃ×¢ÒâµÄÊÇÓ봫ͳÁ¬ÐøËã×ÓÄ£ÐͲ»Í¬£¬ÆäÖд«Í³Ä£ÐÍÊǾ²Ì¬·ÖÅä¸øÒ»¸ö½Úµã½øÐмÆË㣬¶øSpark task¿É»ùÓÚÊý¾ÝµÄÀ´Ô´ÒÔ¼°¿ÉÓÃ×ÊÔ´Çé¿ö¶¯Ì¬·ÖÅ䏸¹¤×÷½Úµã¡£ÕâÄܹ»¸üºÃµÄÍê³ÉÎÒÃÇÔÚ½ÓÏÂÀ´ËùÒªÃèÊöµÄÁ½¸öÌØÐÔ£º¸ºÔؾùºâÓë¿ìËÙ¹ÊÕϻָ´¡£
³ý´ËÖ®Í⣬ÿÅúÊý¾ÝÎÒÃǶ¼³ÆÖ®Îªµ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£¨RDD£©£¬ÕâÊÇSparkÖÐÈÝ´íÊý¾Ý¼¯µÄÒ»¸ö»ù±¾³éÏó¡£ÕýÊÇÈç´Ë£¬ÕâЩÁ÷Êý¾Ý²ÅÄÜ´¦ÀíSparkµÄÈÎÒâÖ¸ÁîÓë¿â¡£

ͼ2£ºSpark Streaming¼Ü¹¹
ÀëÉ¢»¯Á÷Êý¾Ý´¦ÀíµÄÓŵã
ÎÒÃÇÀ´¿´¿´Õâ¸ö¼Ü¹¹ÈçºÎͨ¹ýSpark StreamingÀ´Íê³ÉÎÒÃÇ֮ǰÉèÁ¢µÄÄ¿±ê¡£
¶¯Ì¬¸ºÔؾùºâ
Sparkϵͳ½«Êý¾Ý»®·ÖΪСÅúÁ¿£¬ÔÊÐí¶Ô×ÊÔ´½øÐÐϸÁ£¶È·ÖÅä¡£ÀýÈ磬¿¼Âǵ±ÊäÈëÊý¾ÝÁ÷ÐèÒªÓÉÒ»¸ö¼üÖµÀ´·ÖÇø´¦Àí¡£ÔÚÕâÖÖ¼òµ¥µÄÇé¿öÏ£¬±ðµÄϵͳÀïµÄ´«Í³¾²Ì¬·ÖÅätask¸ø½Úµã·½Ê½ÖУ¬Èç¹ûÆäÖÐÒ»¸ö·ÖÇø¼ÆËã±È±ðµÄ¸üÃܼ¯£¬ÄÇô¸Ã½Úµã´¦Àí½«»áÓöµ½ÐÔÄÜÆ¿¾±£¬Í¬Ê±½«»á¼õ»º¹ÜµÀ´¦Àí¡£¶øÔÚSpark StreamingÖУ¬×÷ÒµÈÎÎñ½«»á¶¯Ì¬µØÆ½ºâ·ÖÅ䏸¸÷¸ö½Úµã£¬Ò»Ð©½Úµã»á´¦ÀíÊýÁ¿½ÏÉÙÇÒºÄʱ½Ï³¤task£¬±ðµÄ½Úµã½«»á´¦ÀíÊýÁ¿¸ü¶àÇÒºÄʱ¸ü¶ÌµÄtask¡£

ͼ3£º¶¯Ì¬¸ºÔؾùºâ
¿ìËÙ¹ÊÕϻָ´»úÖÆ
ÔÚ½Úµã¹ÊÕϵݸÀýÖУ¬´«Í³ÏµÍ³»áÔÚ±ðµÄ½ÚµãÉÏÖØÆôʧ°ÜµÄÁ¬ÐøËã×Ó¡£ÎªÁËÖØÐ¼ÆË㶪ʧµÄÐÅÏ¢£¬»¹²»µÃ²»ÖØÐÂÔËÐÐÒ»±éÏÈǰÊý¾ÝÁ÷´¦Àí¹ý³Ì¡£ÖµµÃ×¢ÒâµÄÊÇ£¬´Ë¹ý³ÌÖ»ÓÐÒ»¸ö½ÚµãÔÚ´¦ÀíÖØÐ¼ÆË㣬¶øÇҹܵÀÎÞ·¨¼ÌÐø½øÐй¤×÷£¬³ý·ÇеĽڵãÐÅÏ¢ÒѾ»Ö¸´µ½¹ÊÕÏǰµÄ״̬¡£ÔÚSparkÖУ¬¼ÆË㽫±»²ð·Ö³É¶à¸öСµÄtask£¬±£Ö¤ÄÜÔÚÈκεط½ÔËÐжøÓÖ²»Ó°ÏìºÏ²¢ºó½á¹ûÕýÈ·ÐÔ¡£Òò´Ë£¬Ê§°ÜµÄtask¿ÉÒÔÍ¬Ê±ÖØÐÂÔÚ¼¯Èº½ÚµãÉϲ¢Ðд¦Àí£¬´Ó¶ø¾ùÔȵķֲ¼ÔÚËùÓÐÖØÐ¼ÆËãÇé¿öϵÄÖÚ¶à½ÚµãÖУ¬ÕâÑùÏà±ÈÓÚ´«Í³·½·¨Äܹ»¸ü¿ìµØ´Ó¹ÊÕÏÖлָ´¹ýÀ´¡£

ͼ4£º¿ìËÙ¹ÊÕϻָ´ÔÀí
Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½·ÖÎöµÄÒ»Ì廯
ÀëÉ¢Êý¾ÝÁ÷£¨DStream£©×÷ΪSpark StreamingÖÐÒ»¸ö¹Ø¼üµÄ³ÌÐò³éÏó¡£ÔÚÆäÄÚ²¿£¬DStreamÊÇͨ¹ýÒ»×éʱ¼äÐòÁÐÉÏÁ¬ÐøµÄRDDÀ´±íʾµÄ£¬Ã¿Ò»¸öRDD¶¼°üº¬ÁËÌØ¶¨Ê±¼ä¼ä¸ôÄÚµÄÊý¾ÝÁ÷¡£ÕâÖÖ³£ÓñíʾÔÊÐíÅú´¦ÀíºÍÁ÷´¦Àí½øÐÐÎÞ·ì½»»¥²Ù×÷¡£´Ó¶øÓû§¿ÉÒÔ¶ÔÿһÅúÁ÷Êý¾Ý½øÐÐSparkÏà¹Ø²Ù×÷¡£ÀýÈ磺ÀûÓÃDStreamÓëÔ¤ÏÈ´´½¨µÄÊý¾Ý¼¯ÏàÁ¬½Ó¡£
// Create data set from Hadoop file val dataset = sparkContext.hadoopFile(¡°file¡±) // Join each batch in stream with the dataset kafkaDStream.transform { batchRDD => batchRDD.join(dataset).filter(...) } |
ÕýÈçÁ÷Êý¾ÝÖÐÿһÅú¶¼´¢´æÓÚSpark½ÚµãÖеÄÄÚ´æÀÎÒÃDZãÄܸù¾ÝËùÐè½øÐн»»¥²éѯ¡£ÀýÈ磬Äã¿ÉÒÔͨ¹ýSpark SQL JDBC server£¬²éѯËùÓÐstreamµÄ״̬£¬¸ÃÄÚÈÝÎÒÃÇÔÚϽÚÖÐÒ²»áչʾ¡£ÕýÒòΪSpark¶ÔÕâЩ¹¤×÷½øÐÐÒ»¸ö¹²ÓеijéÏó£¬ËùÒÔÕâÖÖ½«Åú´¦Àí¡¢Á÷´¦ÀíÓë½»»¥Ê½¹¤×÷½áºÏÔÚÒ»ÆðµÄÇé¿ö£¬ÔÚSparkÖÐÊǷdz£ÈÝÒ×ʵÏֵ쬶øÔÚÄÇЩûÓй²Í¬³éÏóµÄϵͳÖÐÈ´ºÜÄÑ¡£
¸ß¼¶·ÖÎö-»úÆ÷ѧϰ¡¢SQL²éѯ
ÒòΪSpark¾ßÓл¥²Ù×÷ÐÔ£¬Òò´ËÑÓÉì³ö·á¸»µÄ¿â¹©Óû§Ê¹Óã¬ÀýÈ磺MLlib£¨»úÆ÷ѧϰ£©¡¢SQL¡¢DataFramesºÍGraphx¡£ÏÂÃæÎÒÃÇÀ´Ò»Æð̽Ë÷һЩÓÃÀý£º
- Streaming + SQL and DataFrames
DStreamÄÚ²¿Î¬»¤µÄRDDÐòÁпÉÒÔ±»×ª»»³ÉDataFrame£¨Spark SQLµÄ±à³Ì½Ó¿Ú£©£¬½ø¶ø¿Éͨ¹ýSQLÓï¾ä½øÐвéѯ²Ù×÷¡£ÀýÈ磺ʹÓÃSpark SQLµÄJDBC server,Íⲿ³ÌÐò¿ÉÒÔͨ¹ýSQL²éѯstreamµÄ״̬¡£
val hiveContext = new HiveContext(sparkContext) ... wordCountsDStream.foreachRDD { rdd => // Convert RDD to DataFrame and register it as a SQL table val wordCountsDataFrame = rdd.toDF("word¡±, ¡°count") wordCountsDataFrame.registerTempTable("word_counts") } ... // Start the JDBC server HiveThriftServer2.startWithContext(hiveContext)
|
Äã¿ÉÒÔͨ¹ýJDBC serverʹÓÃSpark¸½´øµÄbeeline client»òÕßtableau¹¤¾ß½»»¥²éѯ³ÖÐø¸üеġ°word_counts¡±±í¡£
1: jdbc:hive2://localhost:10000> show tables; +--------------+--------------+ | tableName | isTemporary | +--------------+--------------+ | word_counts | true | +--------------+--------------+ 1 row selected (0.102 seconds) 1: jdbc:hive2://localhost:10000> select * from word_counts; +-----------+--------+ | word | count | +-----------+--------+ | 2015 | 264 | | PDT | 264 | | 21:45:41 | 27 |
|
»úÆ÷ѧϰģÐÍ¿Éͨ¹ýMLlib½øÐÐÀëÏßÉú³É£¬ÄÜÓ¦ÓÃÓÚÁ÷Êý¾ÝÖС£ÀýÈ磬ÔÚÏÂÃæµÄ´úÂëÓþ²Ì¬Êý¾ÝÐγÉÒ»¸öKMeans¾ÛÀàÄ£ÐÍ£¬È»ºóʹÓÃÄ£ÐͶÔKafkaÊý¾ÝÁ÷½øÐзÖÀà¡£
// Learn model offline val model = KMeans.train(dataset, ...)
// Apply model online on stream
val kafkaStream = KafkaUtils.createDStream(...)
kafkaStream.map { event => model.predict(featurize(event)) }
|
ÎÒÃÇÔÚSpark Summit 2014 Databricks demoÉÏÖ¤Ã÷ÁËÕâÖÖ¡±ÀëÏßѧϰÔÚÏßÔ¤²â¡±µÄ·½·¨¡£×ÔÄÇÒÔºó£¬ÎÒÃÇÒ²ÔÚMLlibÖÐÔö¼Ó¹ØÓÚÁ÷µÄ»úÆ÷ѧϰËã·¨£¬ÕâÑù¾ÍÄܳÖÐøÐγÉһЩ±ê¼ÇÊý¾ÝÁ÷¡£ÆäËûµÄSpark À©Õ¹¿âҲͬÑùÄÜÔÚSpark StreamingÉϱ»ÇáÒ×µ÷Óá£
ÐÔÄÜ·ÖÎö
¼øÓÚSpark Streaming¶ÀÒ»ÎÞ¶þµÄÉè¼Æ£¬ÄÇôËüÔËÐеÄËÙ¶ÈÓжà¿ìÄØ£¿Êµ¼ÊÉÏSpark StreamingµÄÄÜÁ¦ÌåÏÖÔÚÅúÁ¿´¦ÀíÊý¾ÝÒÔ¼°ÀûÓÃSpark ÒýÇæÉú³ÉÓë±ðµÄÁ÷ϵͳ±ÈÏ൱»òÕ߸ü¸ßµÄÍÌÍÂÁ¿¡£ÔÚÑÓ³Ù·½Ã棬Spark Streaming¿ÉÒÔʵÏÖµÍÖÁ¼¸°ÙºÁÃëµÄÑÓ³Ù¡£¿ª·¢ÕßÓÐʱ»áÎÊ΢Åú´¦ÀíÊÇ·ñÓн϶àµÄÑÓ³Ù¡£ÔÚʵ¼ùÖУ¬Åú´¦ÀíÑÓ³ÙÖ»ÊǶ˵½¶Ë¹ÜµÀÑÓ³ÙµÄһС²¿·Ö¡£ÎÞÂÛÊÇÔÚSparkϵͳ»¹ÊÇÁ¬ÐøËã×ÓϵͳÏ£¬Ðí¶àÓ¦ÓóÌÐò¼ÆËã½á¹ûÊǸù¾ÝÒ»¸ö»¬¶¯µÄ´°¿ÚÀïËù»ñµÃµÄÊý¾ÝÁ÷¼ÆËãµÃµ½µÄ£¬Õâ¸ö´°¿ÚµÄ¸üÐÂÒ²ÊǶ¨Ê±µÄ£¨ÀýÈç´°¿Ú¼ä¸ôÉèΪ20Ã룬»¬¶¯¼ä¸ôÉèΪ2Ã룬±íʾÿ¸ô2Ãë¼ÆËã¸üÐÂÒ»´Î´°¿Úǰ20ÃëµÄÐÅÏ¢£©¡£ÐèÒª¹ÜµÀÊÕ¼¯À´×Ô¶à¸öÀ´Ô´µÄ¼Ç¼²¢ÇҵȴýÒ»¸ö¶ÌµÄʱ¼äÄÚ´¦ÀíÑÓ³Ù»òÎÞÐòÊý¾Ý¡£×îºó£¬×Ô¶¯´¥·¢Ëã·¨ÍùÍùµÈ´ýÒ»¶Îʱ¼ä²Å´¥·¢¡£Òò´Ë£¬Ïà±ÈÓڶ˵½¶ËµÄÑÓ³Ù£¬Åú´¦ÀíÑÓ³ÙºÜÉÙ»áÔö¼ÓºÜ¶àµÄ·ÑÓã¬ÒòΪÅú´¦ÀíÑÓ³ÙÍùÍùºÜС¡£´ËÍ⣬´ÓDStreamÍÌÍÂÁ¿ÔöÒæÉÏÀ´¿´Ò»°ãÒâζ×ÅÎÒÃÇ¿ÉÒÔÓøüÉٵĻúÆ÷È¥´¦ÀíͬÑùµÄ¹¤×÷Á¿£¬Õâ±ãÊÇÐÔÄÜÉÏËù´øÀ´µÄÌáÉý¡£
Spark StreamingµÄδÀ´·½Ïò
Spark StreamingÊÇSparkÖÐ×î³£ÓõÄ×é¼þÖ®Ò»£¬½«»áÓÐÔ½À´Ô½¶àµÄÓÐÁ÷´¦ÀíÐèÇóµÄÓû§Ì¤ÉÏSparkµÄʹÓÃ֮·¡£Ò»Ð©ÎÒÃÇÍŶÓÕýÔÚÑо¿µÄ×î¸ßÓÅÏȼ¶µÄÏîÄ¿½«»áÔÚÏÂÎÄÖб»ÌÖÂÛµ½¡£Äã¿ÉÒÔÔÚSpark½ÓÀ´Ï¼¸¸ö°æ±¾ÖÐÆÚ´ýÕâÐ©ÌØÐԵijöÏÖ£º
- Backpressure¨CÔÚÁ÷×÷ÒµÖпÉÄܾ³£Óöµ½±¬·¢µÄÊý¾ÝÁ¿£¨ÀýÈç:ÔÚ°Â˹¿¨°ä½±ÆÚ¼ä¼¤ÔöµÄ΢²©Á¿£©£¬Òò´Ëϵͳ±ØÐëÄܹ»ÍêÃÀµÄ´¦ÀíºÃËüÃÇ¡£ÔÚSpark 1.5°æ±¾ÖУ¬Spark½«»áÔö¼Ó¸üºÃµÄBackpressure»úÖÆ£¬ÈÃSpark StremingÄܶ¯Ì¬µØ¿ØÖÆÕâÖÖ±¬·¢µÄ ÉãÈëÂÊ¡£´Ë¹¦ÄÜÊÇÎÒÃÇDatabricksÓëTypesafeµÄ¹¤³ÌʦÃǹ²Í¬Íê³ÉµÄ£»
- Dynamic scaling ¨Cµ¥µ¥¿ØÖƹ̶¨µÄÊý¾Ý¶ÁÈ¡ingestion rate²»×ãÒÔÈ¥´¦Àí¸ü³¤Ê±¼ä·¶Î§µÄÊý¾Ý±ä»¯¡££¨ÀýÈç:Ïà±ÈÓÚÒ¹¼ä£¬°×Ìì´æÔÚ³ÖÐø½Ï¸ßµÄ·¢Î¢²©ÂÊ£©¡£»ùÓÚÕâ¸ö´¦ÀíÒªÇó £¬ÕâЩ±ä»¯¿ÉÒÔ±»¶¯Ì¬µØËõ·Å¼¯ÈºÉÏ×ÊÔ´¡£ÔÚSpark Streaming¼Ü¹¹ÖУ¬ÕâÊǺÜÈÝÒ×ȥʵÏֵģ¬ÒòΪ¼ÆËãÒѾ±»·Ö³ÉһϵÁÐСµÄtask£¬Èç¹û¼¯ÈºÄ£Ê½£¨ÀýÈçYARN, Mesos, Amazon EC2µÈµÈ £©ÐèÒª¸ü¶àµÄ½ÚµãÈ¥½øÐмÆË㣬ÄÇôËüÃÇÄܶ¯Ì¬µØ·ÖÅäµ½Ò»¸ö¸ü´óµÄ¼¯Èº»·¾³¡£Îª´ËÎÒÃǽ«Ôö¼ÓÖ§³Ö×Ô¶¯»¯µÄDynamic scaling£»
- ʼþʱ¼äºÍÎÞÐòÊý¾Ý¨Cʵ¼ùÖУ¬Óû§ÓÐʱ»á¼Ç¼ÏÂÎÞÐòÊý¾ÝÐÅÏ¢£¬Spark StreamingÔÊÐíÓû§Í¨¹ý×Ô¶¨Òåʱ¼äÌáÈ¡º¯ÊýÀ´Ö§³Öʼþʱ¼äÅÅÐò£»
- UI½çÃæÔöÇ¿¨C×îºó£¬ÎÒÃÇÏ£Íûʹ¿ª·¢ÈËÔ±Äܹ»ÇáËɵ÷ÊÔËûÃǵÄStreaming?applications¡£»ùÓÚÕâ¸öÄ¿µÄ£¬ÔÚSpark 1.4ÖУ¬ÎÒÃÇÔö¼ÓеĿÉÊÓ»¯Spark Streaming UI£¬Èÿª·¢ÈËÔ±ÄÜÃÜÇмàÊÓËûÃÇÓ¦ÓóÌÐòµÄÐÔÄÜ¡£ÔÚSpark 1.5ÖУ¬ÎÒÃÇͨ¹ýչʾ¸ü¶àµÄÊäÈëÐÅÏ¢£¨ÀýÈçKafkaÏûÏ¢Æ«ÒÆÁ¿£©½øÒ»²½Ìá¸ßÁËÕâÏÄÜ¡£
|