Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÈçºÎ»ùÓÚSpark Streaming¹¹½¨ÊµÊ±¼ÆËãÆ½Ì¨
 
×÷ÕߣºÅ˹úÇì ·¢²¼ÓÚ£º2017-10-17
  3514  次浏览      28
 

Ëæ×Å»¥ÁªÍø¼¼ÊõµÄѸËÙ·¢Õ¹£¬Óû§¶ÔÓÚÊý¾Ý´¦ÀíµÄʱЧÐÔ¡¢×¼È·ÐÔÓëÎȶ¨ÐÔÒªÇóÔ½À´Ô½¸ß£¬ÈçºÎ¹¹½¨Ò»¸öÎȶ¨Ò×Óò¢ÌṩÆë±¸µÄ¼à¿ØÓëÔ¤¾¯¹¦ÄܵÄʵʱ¼ÆËãÆ½Ì¨Ò²³ÉÁ˺ܶ๫˾һ¸öºÜ´óµÄÌôÕ½¡£

×Ô2015ÄêЯ³Ìʵʱ¼ÆËãÆ½Ì¨´î½¨ÒÔÀ´£¬¾­¹ýÁ½Äê¶à²»¶ÏµÄ¼¼ÊõÑݽø£¬Ä¿Ç°ÊµÊ±¼¯Èº¹æÄ£ÒÑ´ïÉϰŲ̀£¬Æ½Ì¨º­¸Ç¸÷¸öSBUÓ빫¹²²¿ÃÅÊý°Ù¸öʵʱӦÓã¬È«ÄêJStorm¼¯ÈºÎȶ¨ÐÔ´ïµ½100%¡£Ä¿Ç°ÊµÊ±Æ½Ì¨Ö÷Òª»ùÓÚJStormÓëSpark Streaming¹¹½¨¶ø³É£¬±¾´Î·ÖÏí½«×ÅÖØÓÚ½éÉÜЯ³ÌÈçºÎ»ùÓÚSpark Streaming¹¹½¨ÊµÊ±¼ÆËãÆ½Ì¨£¬ÎÄÕ½«´ÓÒÔϼ¸¸ö·½Ãæ·Ö±ð²ûÊöƽ̨µÄ¹¹½¨ÓëÓ¦Óãº

  • Spark Streaming vs JStorm
  • Spark StreamingÉè¼ÆÓë·â×°
  • Spark StreamingÔÚЯ³ÌµÄʵ¼ù
  • Ôø¾­²È¹ýµÄ¿Ó
  • δÀ´Õ¹Íû

1 Spark Streaming vs JStorm

Я³Ìʵʱƽ̨ÔÚ½ÓÈëSpark Streaming֮ǰ£¬JStormÒÑÎȶ¨ÔËÐÐÓÐÒ»Äê°ë£¬»ù±¾Äܹ»Âú×ã´ó²¿·ÖµÄÓ¦Óó¡¾°¡£½ÓÈëSpark StreamingÖ÷ÒªÓÐÒÔϼ¸µã¿¼ÂÇ£ºÊ×ÏÈЯ³ÌʹÓõÄJStorm°æ±¾Îª2.1.1°æ±¾£¬´Ë°æ±¾µÄJStorm·â×°Óë³éÏó³Ì¶È½ÏµÍ£¬²¢Ã»ÓÐÌṩHigh Level³éÏó·½·¨ÒÔ¼°¶Ô´°¿Ú¡¢×´Ì¬ºÍSqlµÈ·½ÃæµÄ¹¦ÄÜÖ§³Ö£¬Õâ´ó´óµÄÌá¸ßÁËÓû§Ê¹ÓÃJStormʵÏÖʵʱӦÓõÄÃż÷ÒÔ¼°¿ª·¢¸´ÔÓʵʱӦÓó¡¾°µÄÄѶȡ£ÔÚÕ⼸¸ö·½Ã棬Spark Streaming±íÏÖ¾ÍÏà¶ÔºÃµÄ¶à£¬²»µ«ÌṩÁ˸߶ȼ¯³ÉµÄ³éÏó·½·¨£¨¸÷ÖÖËã×Ó£©£¬²¢ÇÒÓû§»¹¿ÉÒÔÓëSpark SQLÏà½áºÏÖ±½ÓʹÓÃSQL´¦ÀíÊý¾Ý¡£

Æä´Î£¬Óû§ÔÚ´¦ÀíÊý¾ÝµÄ¹ý³ÌÖÐÍùÍùÐèҪά»¤Á½Ì×Êý¾Ý´¦ÀíÂß¼­£¬ÊµÊ±¼ÆËãʹÓÃJStorm£¬ÀëÏß¼ÆËãʹÓÃHive»òSpark¡£ÎªÁ˽µµÍ¿ª·¢ºÍά»¤³É±¾£¬ÊµÏÖÁ÷ʽÓëÀëÏß¼ÆËãÒýÇæµÄͳһ£¬SparkΪ´ËÌṩÁËÁ¼ºÃµÄÖ§³Å¡£

×îºó£¬ÔÚÒýÈëSpark Streaming֮ǰ£¬ÎÒÃÇÖØµã·ÖÎöÁËSparkÓëFlinkÁ½Ì×¼¼ÊõµÄÒýÈë³É±¾¡£Flinkµ±Ê±µÄ°æ±¾Îª1.2°æ±¾£¬SparkµÄ°æ±¾Îª2.0.1¡£Ïà±È½ÏÓÚSpark£¬FlinkÔÚSQLÓëMLlibÉϵÄÖ§³ÖÏà¶ÔÈõÓÚSpark£¬²¢ÇÒ¹«Ë¾Ðí¶à²¿ÃŶ¼ÊÇ»ùÓÚSpark SQLÓëMLlib¿ª·¢ÀëÏßÈÎÎñÓëË㷨ģÐÍ£¬Ê¹µÃ´ó´ó½µµÍÁËÓû§Ê¹ÓÃSparkµÄѧϰ³É±¾¡£

ÏÂͼ¼òµ¥µÄ¸ø³öÁ˵±Ç°ÎÒÃÇʹÓÃSpark StreamingÓëJStormµÄ¶Ô±È£º

2 Spark StreamingÉè¼ÆÓë·â×°

ÔÚ½ÓÈëSpark StreamingµÄ³õÆÚ£¬Ê×ÏÈÐèÒª¿¼ÂǵÄÊÇÈçºÎ»ùÓÚÏÖÓеÄʵʱƽ̨ÎÞ·ìµÄǶÈëSpark Streaming¡£Ô­ÏȵÄʵʱƽ̨ÒѾ­°üº¬ÁËÐí¶à¹¦ÄÜ£ºÔªÊý¾Ý¹ÜÀí¡¢¼à¿ØÓë¸æ¾¯µÈ¹¦ÄÜ£¬ËùÒÔµÚÒ»²½ÎÒÃÇÏÈÕë¶ÔSpark Streaming½øÐÐÁË·â×°²¢ÌṩÁ˷ḻµÄ¹¦ÄÜ¡£ÕûÌ×Ìåϵ×ܹ²°üº¬ÁËMuise Spark Core¡¢Muise PortalÒÔ¼°Íⲿϵͳ¡£

Muise Spark Core

Muise Spark CoreÊÇÎÒÃÇ»ùÓÚSpark StreamingʵÏֵĶþ´Î·â×°£¬ÓÃÓÚÖ§³ÖЯ³Ì¶àÖÖÏûÏ¢¶ÓÁУ¬ÆäÖÐHermes KafkaÓëÔ´ÉúµÄKafka»ùÓÚDirect ApproachµÄ·½Ê½Ïû·ÑÊý¾Ý£¬Hermes MysqlÓëQmq»ùÓÚReceiverµÄ·½Ê½Ïû·ÑÊý¾Ý¡£½ÓÏÂÀ´½«Òª½²µÄÖî¶àÌØÐÔÖ÷ÒªÊÇÕë¶ÔKafkaÀàÐ͵ÄÊý¾ÝÔ´¡£

Muise spark coreÖ÷Òª°üº¬ÁËÒÔÏÂÌØÐÔ£º

  • Kafka Offset×Ô¶¯¹ÜÀí
  • Ö§³ÖExactly OnceÓëAt Least OnceÓïÒå
  • ÌṩMetric×¢²áϵͳ£¬Óû§¿É×¢²á×Ô¶¨Òåmetric
  • »ùÓÚϵͳÓëÓû§×Ô¶¨Òåmetric½øÐÐÔ¤¾¯
  • Long running on Yarn£¬ÌṩÈÝ´í»úÖÆ

Kafka Offset×Ô¶¯¹ÜÀí

·â×°muise spark coreµÄµÚһĿ±ê¾ÍÊǼòµ¥Ò×Óã¬ÈÃÓû§ÒÔ×î¼òµ¥µÄ·½Ê½Äܹ»ÉÏÊÖʹÓÃSpark Streaming¡£Ê×ÏÈÎÒÃÇʵÏÖÁ˰ïÖúÓû§×Ô¶¯¶ÁÈ¡Óë´æ´¢Kafka OffsetµÄ¹¦ÄÜ£¬Óû§ÎÞÐè¹ØÐÄOffsetÊÇÈçºÎ±»´¦ÀíµÄ¡£Æä´ÎÎÒÃÇÒ²¶ÔKafka OffsetµÄÓÐЧÐÔ½øÐÐÁËУÑ飬ÓеÄÓû§µÄ×÷Òµ¿ÉÄÜÔÚÍ£Ö¹Á˽ϳ¤Ê±¼äºóÖØÐÂÔËÐлá³öÏÖOffsetʧЧµÄÇéÐΣ¬ÎÒÃÇÒ²¶Ô´Ë×÷Á˶ÔÓ¦µÄ²Ù×÷£¬Ä¿Ç°µÄ²Ù×÷Êǽ«Ê§Ð§µÄOffsetÉèÖÃΪµ±Ç°ÓÐЧµÄ×îÀϵÄOffset¡£ÏÂͼչÏÖÁËÓû§»ùÓÚmuise spark core±àдһ¸öSpark streaming×÷ÒµµÄ¼òµ¥Ê¾Àý£¬Óû§Ö»ÐèÒª¶Ì¶Ì¼¸ÐдúÂë¼´¿ÉÍê³É´úÂëµÄ³õʼ»¯²¢´´½¨ºÃ¶ÔÓ¦µÄDStream£º

ĬÈÏÇé¿öÏ£¬×÷ҵÿ´Î¶¼ÊÇ»ùÓÚÉϴδ洢µÄKafka Offset¼ÌÐøÏû·Ñ£¬µ«ÊÇÓû§Ò²¿ÉÒÔ×ÔÐоö¶¨OffsetµÄÏû·ÑÆðµã¡£ÏÂͼÖÐչʾÁËÉèÖÃÏû·ÑÆðµãµÄÈýÖÖ·½Ê½:

Exactly OnceµÄʵÏÖ

Èç¹ûʵʱ×÷ҵҪʵÏֶ˶Զ˵Äexactly onceÔòÐèÒªÊý¾ÝÔ´¡¢Êý¾Ý´¦ÀíÓëÊý¾Ý´æ´¢µÄÈý¸ö½×¶Î¶¼±£Ö¤exactly onceµÄÓïÒ塣Ŀǰ»ùÓÚKafka Direct API¼ÓÉÏSpark RDDËã×Ó¾«È·Ò»´ÎµÄ±£Ö¤Äܹ»ÊµÏֶ˶Զ˵Äexactly onceµÄÓïÒå¡£ÔÚÊý¾Ý´æ´¢½×¶ÎÒ»°ãʵÏÖexactly onceÐèÒª±£Ö¤´æ´¢µÄ¹ý³ÌÊÇÃݵȲÙ×÷»òÊÂÎñ²Ù×÷¡£ºÜ¶àϵͳ±¾Éí¾ÍÖ§³ÖÁËÃݵȲÙ×÷£¬±ÈÈçÏàͬÊý¾Ýдhdfsͬһ¸öÎļþ£¬Õâ±¾Éí¾ÍÊÇÃݵȲÙ×÷£¬±£Ö¤Á˶à´Î²Ù×÷×îÖÕ»ñÈ¡µÄÖµ»¹ÊÇÏàͬ£»HBase¡¢ElasticSearchÓëredisµÈ¶¼Äܹ»ÊµÏÖÃݵȲÙ×÷¡£¶ÔÓÚ¹ØÏµÐÍÊý¾Ý¿âµÄ²Ù×÷Ò»°ã¶¼ÊÇÄܹ»Ö§³ÖÊÂÎñÐÔ²Ù×÷¡£

¹Ù·½ÔÚ´´½¨DirectKafkaInputStreamʱֻÐèÒªÊäÈëÏû·ÑKafkaµÄFrom Offset£¬È»ºóÆä×ÔÐлñÈ¡±¾´ÎÏû·ÑµÄEnd Offset£¬Ò²¾ÍÊǵ±Ç°×îеÄOffset¡£±£´æµÄOffsetÊDZ¾Åú´ÎµÄEnd Offset£¬Ï´ÎÏû·Ñ´ÓÉϴεÄEnd Offset¿ªÊ¼Ïû·Ñ¡£µ±³ÌÐòå´»ú»òÖØÆôÈÎÎñºó£¬ÕâÆäÖдæÔÚһЩÎÊÌâ¡£Èç¹ûÔÚÊý¾Ý´¦ÀíÍê³Éǰ´æ´¢Offset£¬Ôò¿ÉÄÜ´æÔÚ×÷Òµ´¦ÀíÊý¾Ýʧ°ÜÓë×÷ҵ崻úµÈÇé¿ö£¬ÖØÆôºó»áÎÞ·¨×·ËÝÉϴδ¦ÀíµÄÊý¾Ýµ¼ÖÂÊý¾Ý³öÏÖ¶ªÊ§¡£Èç¹ûÔÚÊý¾Ý´¦ÀíÍê³Éºó´æ´¢Offset£¬µ«ÊÇ´æ´¢Offset¹ý³ÌÖз¢Éúʧ°Ü»ò×÷ҵ崻úµÈÇé¿ö£¬ÔòÔÚÖØÆôºó»áÖØ¸´Ïû·ÑÉÏ´ÎÒѾ­Ïû·Ñ¹ýµÄÊý¾Ý¡£¶øÇÒ´ËʱÓÖÎÞ·¨±£Ö¤ÖØÆôºóÏû·ÑµÄÊý¾ÝÓëå´»úǰµÄÊý¾ÝÁ¿ÏàͬÊý¾ÝÏ൱£¬ÕâÓÖ»áÒýÈëÁíÍâÒ»¸öÎÊÌ⣬Èç¹ûÊÇ»ùÓÚ¾ÛºÏͳ¼ÆÖ¸±ê×÷¸üвÙ×÷£¬Õâ»á´øÀ´ÎÞ·¨ÅжÏÉÏ´ÎÊý¾ÝÊÇ·ñÒѾ­¸üгɹ¦¡£

ËùÒÔÔÚmuise spark coreÖÐÎÒÃǼÓÈëÁË×Ô¼ºµÄʵÏÖÓÃÒÔ±£Ö¤Exactly onceµÄÓïÒå¡£¾ßÌåµÄʵÏÖÊÇÎÒÃǶÔSparkÔ´Âë½øÐÐÁ˸ÄÔ죬±£Ö¤ÔÚ´´½¨DirectKafkaInputStream¿ÉÒÔͬʱÊäÈëFrom OffsetÓëEnd Offset£¬²¢ÇÒÎÒÃÇÔÚ´æ´¢Kafka OffsetµÄʱºò±£´æÁËÿ¸öÅú´ÎµÄÆðʼOffsetÓë½áÊøOffset£¬¾ßÌå¸ñʽÈçÏ£º

Èç´Ë×öµÄÓÃÒâÔÚÓÚÄܹ»È·±£ÎÞÂÛÊÇå´»ú»¹ÊÇÈËÎªÖØÆô£¬ÖØÆôºóµÄµÚÒ»¸öÅú´ÎÓëÖØÆôǰµÄ×îºóÒ»¸öÅú´ÎÊý¾ÝһģһÑù¡£ÕâÑùµÄÉè¼ÆÊ¹µÃºóÃæÓû§ÔÚºóÃæ¶ÔÓÚµÚÒ»¸öÅú´ÎµÄÊý¾Ý´¦Àí·Ç³£Áé»î¿É±ä£¬Èç¹ûÓû§Ö±½ÓºöÂÔµÚÒ»¸öÅú´ÎµÄÊý¾Ý£¬ÄÇ´Ëʱ±£Ö¤µÄÊÇat most onceµÄÓïÒ壬ÒòΪÎÒÃÇÎÞ·¨»ñÖªÖØÆôǰµÄ×îºóÒ»¸öÅú´ÎÊý¾Ý²Ù×÷ÊÇ·ñÓгɹ¦Íê³É£»Èç¹ûÓû§ÒÀÕÕÔ­ÓÐÂß¼­´¦ÀíµÚÒ»¸öÅú´ÎµÄÊý¾Ý£¬²»¶ÔÆä×öÈ¥ÖØ²Ù×÷£¬ÄÇ´Ëʱ±£Ö¤µÄÊÇat least onceµÄÓïÒ壬×îÖÕ½á¹ûÖпÉÄÜ´æÔÚÖØ¸´Êý¾Ý£»×îºóÈç¹ûÓû§ÏëҪʵÏÖexactly once£¬muise spark coreÌṩÁ˸ù¾Ýtopic¡¢partitionÓëoffsetÉú³ÉUIDµÄ¹¦ÄÜ£¬Ö»ÒªÈ·±£Á½¸öÅú´ÎÏû·ÑµÄOffsetÏàͬ£¬Ôò×îÖÕÉú³ÉµÄUIDÒ²Ïàͬ£¬Óû§¿ÉÒÔ¸ù¾Ý´ËUID×÷ΪÅжÏÉϸöÅú´ÎÊý¾ÝÊÇ·ñÓд洢³É¹¦µÄÒÀ¾Ý¡£ÏÂÃæ¼òµ¥µÄ¸ø³öÁËÖØÆôºóµÚÒ»¸öÅú´Î²Ù×÷µÄÐÐΪ¡£

Metricsϵͳ

Musie spark core»ùÓÚSpark±¾ÉíµÄmetricsϵͳ½øÐÐÁ˸ÄÔ죬Ìí¼ÓÁËÐí¶à¶¨ÖƵÄmetrics£¬²¢ÇÒÏòÓû§±©Â¶ÁËmetrics×¢²á½Ó¿Ú£¬Óû§¿ÉÒԷdz£·½±ãµÄ×¢²á×Ô¼ºµÄmetrics²¢ÔÚ³ÌÐòÖиüÐÂmetricsµÄÊýÖµ¡£×îºóËùÓеÄmetrics»á¸ù¾Ý×÷ÒµÉ趨µÄÅú´Î¼ä¸ôдÈëGraphite£¬»ùÓÚ¹«Ë¾¶¨ÖƵÄÔ¤¾¯ÏµÍ³½øÐб¨¾¯£¬Ç°¶Ë¿ÉÒÔͨ¹ýGrafanaÕ¹ÏÖ¸÷ÏîmetricsÖ¸±ê¡£

Muise spark core±¾Éí¶¨ÖÆµÄmetrics°üº¬ÒÔÏÂÈýÖÖ£º

  • Fail Åú´Îʱ¼äÄÚspark taskʧ°Ü´ÎÊý³¬¹ý4´Î±ã±¨¾¯£¬ÓÃÓÚ¼à¿Ø³ÌÐòµÄÔËÐÐ״̬¡£
  • Ack Åú´Îʱ¼äÄÚspark streaming´¦ÀíµÄÊý¾ÝÁ¿Ð¡0±ã±¨¾¯£¬ÓÃÓÚ¼à¿Ø³ÌÐòÊÇ·ñÔÚÕý³£Ïû·ÑÊý¾Ý¡£
  • Lag Åú´Îʱ¼äÄÚÊý¾ÝÏû·ÑÑÓ³Ù´óÓÚÉ趨ֵ±ã±¨¾¯¡£

ÆäÖÐÓÉÓÚÎÒÃǴ󲿷Ö×÷Òµ¿ªÆôÁËBack Pressure¹¦ÄÜ£¬Õâ¾Íµ¼ÖÂÔÚSpark UIÖп´µ½Ã¿¸öÅú´ÎÊý¾Ý¶¼ÄÜÔÚÕý³£Ê±¼äÄÚÏû·ÑÍê³É£¬È»¶ø¿ÉÄÜ´ËʱkafkaÖÐÒѾ­»ýѹÁË´óÁ¿Êý¾Ý£¬¹Êÿ¸öÅú´ÎÎÒÃǶ¼»á¼ÆË㵱ǰÏû·Ñʱ¼äÓëÊý¾Ý±¾Éíʱ¼äµÄÒ»¸öƽ¾ù²îÖµ£¬Èç¹ûÕâ¸ö²îÖµ´óÓÚÅú´Îʱ¼ä£¬ËµÃ÷±¾ÉíÊý¾ÝÏû·Ñ¾ÍÒѾ­´æÔÚÁËÑÓ³Ù¡£

ÏÂͼչÏÖÁËÔ¤¾¯ÏµÍ³ÖУ¬»ùÓÚÓû§×Ô¶¨Òå×¢²áµÄMetricsÒÔ¼°ÏµÍ³¶¨ÖƵÄMetrics½øÐÐÔ¤¾¯¡£

ÈÝ´í

ÆäʵÔÚÉÏÃæExactly OnceÒ»ÕÂÖÐÒѾ­ÏêϸµÄÃèÊöÁËmuise spark coreÈçºÎÔÚ³ÌÐòå´»úºóÄܹ»±£Ö¤Êý¾ÝÕýÈ·µÄ´¦Àí¡£µ«ÊÇΪÁËÄܹ»ÈÃSpark SreamingÄܹ»³¤Ê±¼äÎȶ¨µÄÔËÐÐÔÚYarn¼¯ÈºÉÏ£¬»¹ÐèÒªÌí¼ÓÐí¶àÅäÖ㬸ÐÐËȤµÄÅóÓÑ¿ÉÒԲ鿴£ºLong running Spark Streaming Jobs on Yarn Cluster£¨http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/£©¡£

³ýÁËÉÏÊöÈÝ´í±£Ö¤Ö®Í⣬Muise Portal£¨ºóÃæ»á½²£©Ò²ÌṩÁ˶ÔSpark Streaming×÷Òµ¶¨Ê±¼ì²âµÄ¹¦ÄÜ¡£Ä¿Ç°Ã¿¹ý5·ÖÖÓ¶Ôµ±Ç°ËùÓÐÊý¾Ý¿âÖÐ״̬±ê¼ÇΪRunningµÄSpark Streaming×÷Òµ½øÐÐ״̬¼ì²â£¬Í¨¹ýYarnÌṩµÄREST APIs¿ÉÒÔ¸ù¾Ýÿ¸ö×÷ÒµµÄApplication Id²éѯ×÷ÒµÔÚYarnÉϵÄ״̬£¬Èç¹û״̬´¦ÓÚ·ÇÔËÐÐ״̬£¬Ôò»á³¢ÊÔÖØÆô×÷Òµ¡£

Muise Portal

ÔÚ·â×°ÍêËùÓеÄSpark StreamingÖ®ºó£¬ÎÒÃǾÍÐèÒªÓÐÒ»¸öƽ̨Äܹ»¹ÜÀíÅäÖÃ×÷Òµ£¬Muise Portal¾ÍÊÇÕâÑùµÄ´æÔÚ¡£Muise PortalĿǰÖ÷ÒªÖ§³ÖÁËStormÓëSpark StreamingÁ½Àà×÷Òµ£¬Ö§³Öн¨×÷Òµ¡¢Jar°ü·¢²¼¡¢×÷ÒµÔËÐÐÓëÍ£Ö¹µÈһϵÁй¦ÄÜ¡£ÏÂͼչÏÖÁËн¨×÷ÒµµÄ½çÃæ£º

Spark Streaming×÷Òµ»ùÓÚYarn ClusterģʽÔËÐУ¬ËùÓÐ×÷ҵͨ¹ýÔÚMuise PortalÉϵÄSpark¿Í»§¶ËÌá½»µ½Yarn¼¯ÈºÉÏÔËÐС£¾ßÌåµÄÒ»¸ö×÷ÒµÔËÐÐÁ÷³ÌÈçÏÂͼËùʾ£º

ÕûÌå¼Ü¹¹

×îºóÕâ±ß¸ø³öÒ»ÏÂĿǰЯ³Ìʵʱƽ̨µÄÕûÌå¼Ü¹¹¡£

3 Spark StreamingÔÚЯ³ÌµÄʵ¼ù

ĿǰSpark StreamingÔÚЯ³ÌµÄÒµÎñ³¡¾°Ö÷Òª¿ÉÒÔ·ÖΪÒÔϼ¸¿é£ºETL¡¢ÊµÊ±±¨±íͳ¼Æ¡¢¸öÐÔ»¯ÍƼöÀàµÄÓªÏú³¡¾°ÒÔ¼°·ç¿ØÓ밲ȫµÄÓ¦ÓᣴӳéÏóÉÏÀ´Ëµ£¬Ö÷Òª¿ÉÒÔ·ÖΪÊý¾Ý¹ýÂ˳éÈ¡¡¢Êý¾ÝÖ¸±êͳ¼ÆÓëÄ£ÐÍËã·¨µÄʹÓá£

ETL

Èç½ñÊÐÃæÉÏÓÐÐÎÐÎɫɫµÄ¹¤¾ß¿ÉÒÔ´ÓKafkaʵʱÏû·ÑÊý¾Ý²¢½øÐйýÂËÇåÏ´×îÖÕÂ䵨µ½¶ÔÓ¦µÄ´æ´¢ÏµÍ³£¬È磺Camus¡¢FlumeµÈ¡£Ïà±È½ÏÓÚ´ËÀà²úÆ·£¬Spark StreamingµÄÓÅÊÆÊ×ÏÈÔÚÓÚ¿ÉÒÔÖ§³Ö¸üΪ¸´ÔӵĴ¦ÀíÂß¼­£¬Æä´Î»ùÓÚYarnϵͳµÄ×ÊÔ´µ÷¶ÈʹµÃSpark StreamingµÄ×ÊÔ´ÅäÖøü¼ÓÁé»î£¬×îºóÓû§¿ÉÒÔ½«Spark RDDÊý¾Ýת»»³ÉSpark DataframeÊý¾Ý£¬Ê¹µÃ¿ÉÒÔÓëSpark SQLÏà½áºÏ£¬²¢ÇÒ×îÖÕ½«Êý¾ÝÊä³öµ½HDFSºÍAlluxioµÈ·Ö²¼Ê½Îļþϵͳʱ¿ÉÒԴ洢ΪParquetÖ®ÀàµÄ¸ñʽ»¯Êý¾Ý£¬Óû§ÔÚºóÐøÊ¹ÓÃSpark SQL´¦ÀíÊý¾Ýʱ¸üΪµÄ¼ò±ã¡£

ĿǰÔÚETLʹÓó¡¾°ÖнÏΪµäÐ͵ÄÊÇЯ³Ì¶È¼Ù²¿ÃŵÄData LakeÓ¦Ó㬶ȼٲ¿ÃÅʹÓÃSpark Streaming¶ÔÊý¾Ý×öETL²Ù×÷×îÖÕ½«Êý¾Ý´æ´¢ÖÁAlluxio£¬ÆÚ¼ä»ùÓÚmuise-spark-coreµÄ×Ô¶¨Òåmetric¹¦ÄܶÔÊý¾ÝµÄÊý¾ÝÁ¿¡¢×Ö¶ÎÊý¡¢Êý¾Ý¸ñʽÓëÖØ¸´Êý¾Ý½øÐÐÁËÊý¾ÝÖÊÁ¿Ð£ÑéÓë¼à¿Ø£¬¾ßÌåµÄ¼à¿ØÔ¤¾¯ÒÑÔÚÉÏÃæËµ¹ý¡£

ʵʱ±¨±íͳ¼Æ

ʵʱ±¨±íͳ¼ÆÓëÕ¹ÏÖÒ²ÊÇSpark StreamingʹÓý϶àµÄÒ»¸ö³¡¾°£¬Êý¾Ý¿ÉÒÔ»ùÓÚProcess Timeͳ¼Æ£¬Ò²¿ÉÒÔ»ùÓÚEvent Timeͳ¼Æ¡£ÓÉÓÚ±¾ÉíSpark Streaming²»Í¬Åú´ÎµÄjob¿ÉÒÔÊÓΪһ¸ö¸öµÄ¹ö¶¯´°¿Ú£¬Ä³¸ö¶ÀÁ¢µÄ´°¿ÚÖаüº¬Á˶à¸öʱ¼ä¶ÎµÄÊý¾Ý£¬ÕâʹµÃʹÓÃSpark Streaming»ùÓÚEvent Timeͳ¼ÆÊ±´æÔÚÒ»¶¨µÄÏÞÖÆ¡£Ò»°ã½ÏΪ³£Óõķ½Ê½ÊÇͳ¼ÆÃ¿¸öÅú´ÎÖв»Í¬Ê±¼äά¶ÈµÄÀÛ»ýÖµ²¢µ¼Èëµ½Íⲿϵͳ£¬ÈçES£»È»ºóÔÚ±¨±íÕ¹ÏÖµÄʱ»ùÓÚʱ¼ä×ö¶þ´Î¾ÛºÏ»ñµÃÍêÕûµÄÀÛ¼ÓÖµ×îÖÕÇóµÃ¾ÛºÏÖµ¡£ÏÂͼչʾÁËЯ³ÌIBU»ùÓÚSpark StreamingʵÏÖµÄʵʱ¿´°å¡£

¸öÐÔ»¯ÍƼöÓë·ç¿Ø°²È«

ÕâÁ½ÀàÓ¦ÓõĹ²Í¬µãιýÓÚËüÃǶ¼ÐèÒª»ùÓÚË㷨ģÐͶÔÓû§µÄÐÐΪ×÷³öÏà¶ÔÓ¦µÄÔ¤²â»ò·ÖÀ࣬Я³ÌĿǰËùÓÐÄ£ÐͶ¼ÊÇ»ùÓÚÀëÏßÊý¾ÝÿÌ춨ʱÀëÏßѵÁ·¡£ÔÚÒýÈëSpark StreamingÖ®ºó£¬Ðí¶à²¿ÃÅ¿ªÊ¼»ý¼«µÄ³¢ÊÔÌØÕ÷µÄʵʱÌáÈ¡¡¢Ä£Ð͵ÄÔÚÏßѵÁ·¡£²¢ÇÒSpark Streaming¿ÉÒԺܺõÄÓëSpark MLlibÏà½áºÏ£¬ÆäÖÐ×îΪ³É¹¦µÄ°¸ÀýΪÐŰ²²¿ÃÅÒÔǰÊÇ»ùÓÚ¸÷Àà¹ýÂËÌõ¼þץȡ¹¥»÷ÇëÇ󣬺óÀ´ËûÃDzÉÓÃÀëÏßÄ£ÐÍѵÁ·£¬Spark Streaming¼ÓSpark MLlib¶ÔÓû§½øÐÐʵʱԤ²â£¬ÐÔÄÜÉϽÏJStorm£¨»ùÓÚ´óÁ¿ÕýÔò±í´ïʽƥÅäÓû§£¬Ê®·ÖÏûºÄCPU£©Ìá¸ßÁËÊ®±¶£¬Â©±¨ÂʽµµÍÁË20%¡£

4 Ôø¾­²È¹ýµÄ¿Ó

ĿǰЯ³ÌµÄSpark Streaming×÷ÒµÔËÐеÄYARN¼¯ÈºÓëÀëÏß×÷ҵͬÊôÒ»¸ö¼¯Èº£¬Õâ¶Ô×÷ÒµÎÞÂÛÊÇÐÔÄÜ»¹ÊÇÎȶ¨ÐÔ¶¼´øÀ´ÁËÖî¶àÓ°Ïì¡£ÓÈÆäÊǵ±YARN»òÕßHadoop¼¯ÈºÐèÒª¸üÐÂά»¤ÖØÆô·þÎñʱ£¬Ôںܴó³Ì¶ÈÉϻᵼÖÂSpark Streaming×÷Òµ³öÏÖ±¨´í¡¢¹ÒµôµÈ×´¿ö£¬ËäÈ»ÓÐÖî¶àµÄÈÝ´í±£ÕÏ£¬µ«Ò²»áµ¼ÖÂÊý¾Ý»ýѹÊý¾Ý´¦ÀíÑÓ³Ù¡£ºóÆÚ½«»á¶ÀÁ¢²¿ÊðHadoopÓëYarn¼¯Èº£¬ËùÓеÄʵʱ×÷Òµ¶¼ÔËÐÐÔÚ¶ÀÁ¢µÄ¼¯ÈºÉÏ£¬²»ÊÜÍⲿµÄÓ°Ï죬ÕâÒ²·½±ãºóÆÚ¶ÔÓÚFlink×÷ÒµµÄ¿ª·¢Óëά»¤¡£ºóÆÚͨ¹ýAlluxioʵÏÖÖ÷¼¯ÈºÓë×Ó¼¯Èº¼äµÄÊý¾Ý¹²Ïí¡£

ÔÚʹÓùý³ÌÖУ¬Ò²Óöµ½ÁËÐÎÐÎɫɫ²»Í¬µÄBug£¬Õâ±ß¼òµ¥µÄ½éÉܼ¸¸ö½ÏΪÑÏÖØµÄÎÊÌâ¡£Ê×ÏȵÚÒ»¸öÎÊÌâÊÇ£¬Spark Streamingÿ¸öÅú´ÎJob¶¼»áͨ¹ýDirectKafkaInputStreamµÄcomput·½·¨»ñÈ¡Ïû·ÑµÄKafka Topicµ±Ç°×îеÄoffset£¬Èç¹û´Ëʱkafka¼¯ÈºÓÉÓÚijЩԭÒò²»Îȶ¨£¬¾Í»áµ¼Ö java.lang.RuntimeException: No leader found for partition xxµÄÎÊÌ⣬ÓÉÓڴ˶δúÂëÔËÐÐÔÚDriver¶Ë£¬Èç¹ûûÓÐ×öÈκÎÅäÖúʹ¦ÀíµÄÇé¿öÏ£¬»áµ¼Ö³ÌÐòÖ±½Ó¹Òµô¡£¶ÔÓ¦µÄ½â¾ö·½·¨ÊÇÅäÖÃspark.streaming.kafka.maxRetries´óÓÚ1£¬²¢ÇÒ¿ÉÒÔͨ¹ýÅäÖÃrefresh.leader.backoff.ms²ÎÊýÉèÖÃÿ´ÎÖØÊԵļä¸ôʱ¼ä¡£

Æä´ÎÔÚʹÓÃSpark StreamingÓëSpark SqlÏà½áºÏµÄ¹ý³ÌÖУ¬Ò²»áÓÐÖî¶àÎÊÌâ¡£±ÈÈçÔÚʹÓùý³ÌÖпÉÄܳöÏÖout of memory£ºPermGen space£¬ÕâÊÇÓÉÓÚSpark sqlʹÓÃcode generatorµ¼Ö´óÁ¿Ê¹ÓÃPermGen space£¬Í¨¹ýÔÚspark.driver.extraJavaOptionsÖÐÌí¼Ó-XX:MaxPermSize=1024m -XX:PermSize=512m½â¾ö¡£»¹ÓÐSpark SqlÐèÒª´´½¨Spark Warehouse£¬Èç¹û»ùÓÚYarnÀ´ÔËÐУ¬Ä¬ÈÏ¿ÉÄÜÊÇÔÚHDFSÉÏ´´½¨Ïà¶ÔÓ¦µÄĿ¼£¬Èç¹ûûÓÐȨÏ޻ᱨ³öPermission deniedµÄÎÊÌ⣬Óû§¿ÉÒÔͨ¹ýÅäÖÃconfig("spark.sql.warehouse.dir", "file:${system:user.dir} /spark-warehouse")À´½â¾ö¡£

5 δÀ´Õ¹Íû

ÉÏÃæÖ÷ÒªÕë¶ÔSpark StreamingÔÚЯ³Ìʵʱƽ̨ÖеÄÔËÓÃ×öÁËÏêϸµÄ½éÉÜ£¬ÔÚʹÓÃSpark Streaming¹ý³ÌÖл¹ÊÇ´æÔÚһЩʹµã£¬±ÈÈç´°¿Ú¹¦ÄܱȽϵ¥Ò»¡¢»ùÓÚEvent Timeͳ¼ÆÖ¸±ê¹ýÓÚ·±ËöÒÔ¼°¹Ù·½ÔÚеİ汾Öлù±¾Ã»ÓÐеÄÌØÐÔ¼ÓÈëµÈ£¬ÕâʹµÃÎÒÃǸü¼ÓÇãÏòÓÚ³¢ÊÔFlink¡£Flink»ù±¾ÊµÏÖÁËGoogleÌá³öµÄ¸÷Ààʵʱ´¦ÀíµÄÀíÄÒýÈëÁËWaterMarkµÄʵÏÖ£¬¸ÐÐËȤµÄÅóÓÑ¿ÉÒԲ鿴Google¹Ù·½Îĵµ£ºThe world beyond batch: Streaming 102£¨https:// www.oreilly.com/ideas/ the-world-beyond-batch-streaming-102£©¡£

ĿǰFlink 1.4 release°æ±¾·¢²¼ÔÚ¼´£¬Spark 2.2.0»ùÓÚkafkaÊý¾ÝÔ´µÄStructured StreamingÒ²Ö§³ÖÁ˸ü¶àµÄÌØÐÔ¡£Ç°ÆÚÎÒÃÇÒѶÔFlink×öÁ˳ä·ÖµÄµ÷ÑУ¬Ï°ëÄêÖ÷Òª¹¤×÷½«·ÅÔÚFlinkµÄ¶Ô½ÓÉÏ¡£ÔÚÌṩÁËÖî¶àʵʱ¼ÆËã¿ò¼ÜµÄÖ§³Öºó£¬ËæÖ®¶øÀ´µÄÊÇ´øÀ´Á˸ü¶àµÄѧϰ³É±¾£¬½ñºóÎÒÃǵÄÖØÐĽ«·ÅÔÚÈçºÎʹÓû§¸ü¼ÓÈÝÒ×µÄʵÏÖʵʱ¼ÆËãÂß¼­¡£ÆäÖÐApache Beam¶Ô¸÷ÖÖʵʱ³¡¾°ÌṩÁËÁ¼ºÃµÄ·â×°²¢¶Ô¶àÖÖʵʱ¼ÆËãÒýÇæ×öÁËÖ§³Ö£¬Æä´Î»ùÓÚStream SqlʵÏÖ¸´ÔÓµÄʵʱӦÓó¡¾°¶¼½«ÊÇÎÒÃÇÖ÷Òªµ÷Ñеķ½Ïò¡£

   
3514 ´Îä¯ÀÀ       28
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ