Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÉîÈëÀí½â Flink ÈÝ´í»úÖÆ
 
×÷Õߣº Paul Lin
  1392  次浏览      27
 2021-11-5
 
±à¼­ÍƼö:
±¾ÎÄÏêϸ½â¶Á Flink µÄ´íÎó»Ö¸´»úÖÆ,·Ö±ð·ÖÎöÁ½¸ö´íÎó»Ö¸´²ßÂԵij¡¾°¼°ÊµÏÖ £¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÓÚ×ÔÍøÒ×ÓÎÏ·µÄÁÖС²¬ £¬ÓÉAlice±à¼­¡¢ÍƼö¡£

×÷Ϊ·Ö²¼Ê½ÏµÍ³£¬ÓÈÆäÊǶÔÑÓ³ÙÃô¸ÐµÄʵʱ¼ÆËãÒýÇæ£¬Apache Flink ÐèÒªÓÐÇ¿´óµÄÈÝ´í»úÖÆ£¬ÒÔÈ·±£ÔÚ³öÏÖ»úÆ÷¹ÊÕÏ»òÍøÂç·ÖÇøµÈ²»¿ÉÔ¤ÖªµÄÎÊÌâʱ¿ÉÒÔ¿ìËÙ×Ô¶¯»Ö¸´²¢ÒÀ¾ÉÄܲúÉú׼ȷµÄ¼ÆËã½á¹û¡£ÊÂʵÉÏ£¬Flink ÓÐÒ»Ì×ÏȽøµÄ¿ìÕÕ»úÖÆÀ´³Ö¾Ã»¯×÷ҵ״̬[1]£¬È·±£ÖмäÊý¾Ý²»»á¶ªÊ§£¬Õâͨ³£ÐèÒªºÍ´íÎó»Ö¸´»úÖÆ£¨×÷ÒµÖØÆô²ßÂÔ»ò failover ²ßÂÔ£©ÅäºÏʹÓá£ÔÚÓöµ½´íÎóʱ£¬Flink ×÷Òµ»á¸ù¾ÝÖØÆô²ßÂÔ×Ô¶¯ÖØÆô²¢´Ó×î½üÒ»¸ö³É¹¦µÄ¿ìÕÕ£¨checkpoint£©»Ö¸´×´Ì¬¡£ºÏÊʵÄÖØÆô²ßÂÔ¿ÉÒÔ¼õÉÙ×÷Òµ²»¿ÉÓÃʱ¼äºÍ±ÜÃâÈ˹¤½éÈë´¦Àí¹ÊÕϵÄÔËά³É±¾£¬Òò´Ë¶ÔÓÚ Flink ×÷ÒµÎȶ¨ÐÔÀ´ËµÓÐמÙ×ãÇáÖØµÄ×÷Óá£ÏÂÎľͽ«Ïêϸ½â¶Á Flink µÄ´íÎó»Ö¸´»úÖÆ¡£

Flink ÈÝ´í»úÖÆÖ÷ÒªÓÐ×÷ÒµÖ´ÐеÄÈÝ´íÒÔ¼°ÊØ»¤½ø³ÌµÄÈÝ´íÁ½·½Ã棬ǰÕß°üÀ¨ Flink runtime µÄ ExecutionGraph ºÍ Execution µÄÈÝ´í£¬ºóÕßÔò°üÀ¨ JobManager ºÍ TaskManager µÄÈÝ´í¡£

×÷ÒµÖ´ÐÐÈÝ´í

ÖÚËùÖÜÖª£¬Óû§Ê¹Óà Flink ±à³Ì API£¨DataStream/DataSet/Table/SQL£©±àдµÄ×÷Òµ×îÖջᱻ·­ÒëΪ JobGraph ¶ÔÏóÔÙÌá½»¸ø JobManager È¥Ö´ÐУ¬¶øºóÕ߻Ὣ JobGraph ½áºÏÆäËûÅäÖÃÉú³É¾ßÌåµÄ Task µ÷¶Èµ½ TaskManager ÉÏÖ´ÐС£

ÏàÐŲ»ÉÙ¶ÁÕßÓ¦¸Ã¼û¹ýÀ´×Ô¹ÙÍøÎĵµµÄÕâÕżܹ¹Í¼£¨Í¼1£©£¬ËüÇåÎúµØÃè»æÁË×÷ÒµµÄ·Ö²¼Ê½Ö´ÐлúÖÆ: Ò»¸ö×÷ÒµÓжà¸ö Operator£¬Ï໥ûÓÐÊý¾Ý shuffle ¡¢²¢ÐжÈÏàͬÇÒ·ûºÏÆäËûÓÅ»¯Ìõ¼þµÄÏàÁÚ Operator ¿ÉÒԺϲ¢³É OperatorChain£¬È»ºóÿ¸ö Operator »òÕß OperatorChain ³ÆÎªÒ»¸ö JobVertex£»ÔÚ·Ö²¼Ê½Ö´ÐÐʱ£¬Ã¿¸ö JobVertex »á×÷Ϊһ¸ö Task£¬Ã¿¸ö Task ÓÐÆä²¢ÐжÈÊýÄ¿µÄ SubTask£¬¶øÕâЩ SubTask ÔòÊÇ×÷Òµµ÷¶ÈµÄ×îСÂß¼­µ¥Ôª¡£

ͼ1. ×÷ÒµµÄ·Ö²¼Ê½Ö´ÐÐ

¸ÃͼÖ÷Òª´Ó TaskManager ½Ç¶È³ö·¢£¬¶øÆäʵÔÚ JobManager ¶ËÒ²´æÔÚÒ»¸öºËÐĵÄÊý¾Ý½á¹¹À´Ó³Éä×÷ÒµµÄ·Ö²¼Ê½Ö´ÐУ¬¼´ ExecutionGraph¡£ExecutionGraph ÀàËÆÓÚͼÖв¢ÐÐÊÓ½ÇµÄ Streaming Dataflow£¬Ëü´ú±íÁË Job µÄÒ»´ÎÖ´ÐС£´ÓijÖÖÒâÒåÉϽ²£¬Èç¹û JobGraph ÊÇÒ»¸öÀàµÄ»°£¬ExecutionGraph ÔòÊÇËüµÄÒ»¸öʵÀý¡£ExecutionGraph Öаüº¬µÄ½Úµã³ÆÎª ExecutionJobVertex£¬¶ÔÓ¦ JobGrap µÄÒ»¸ö JobVertex »òÕß˵ͼÖеÄÒ»¸ö Task¡£ExecutionJobVertex ¿ÉÒÔÓжà¸ö²¢ÐÐʵÀý£¬¼´ ExecutionVertex£¬¶ÔӦͼÖеÄÒ»¸ö SubTask¡£ÔÚÒ»¸ö ExecutionGraph µÄÉúÃüÖÜÆÚÖУ¬Ò»¸ö ExecutionVertex ¿ÉÒÔ±»Ö´ÐУ¨ÖØÆô£©¶à´Î£¬Ã¿´ÎÔò³ÆÎªÒ»¸ö Execution¡£Ð¡½áһϣ¬ExecutionGraph ¶ÔÓ¦ Flink Job µÄÒ»´ÎÖ´ÐУ¬Execution ¶ÔÓ¦ SubTask µÄÒ»´ÎÖ´ÐС£

Ïà¶ÔµØ£¬Flink µÄ´íÎó»Ö¸´»úÖÆ·ÖΪ¶à¸ö¼¶±ð£¬¼´ Execution ¼¶±ðµÄ Failover ²ßÂÔºÍ ExecutionGraph ¼¶±ðµÄ Job Restart ²ßÂÔ¡£µ±³öÏÖ´íÎóʱ£¬Flink »áÏȳ¢ÊÔ´¥·¢·¶Î§Ð¡µÄ´íÎó»Ö¸´»úÖÆ£¬Èç¹ûÈÔ´¦Àí²»Á˲ŻáÉý¼¶Îª¸ü´ó·¶Î§µÄ´íÎó»Ö¸´»úÖÆ£¬¾ßÌå¿ÉÒÔÓÃÏÂÃæµÄÐòÁÐͼÀ´±í´ï£¨ÆäÖÐÊ¡ÂÔÁËExection ºÍ ExecutionGraph µÄ·Ç¹Ø¼ü״̬ת»»£©¡£

ͼ2. ×÷ÒµÖ´ÐÐÈÝ´í

µ± Task ·¢Éú´íÎó£¬TaskManager »áͨ¹ý RPC ֪ͨ JobManager£¬ºóÕß½«¶ÔÓ¦ Execution µÄ״̬תΪ failed ²¢´¥·¢ Failover ²ßÂÔ¡£Èç¹û·ûºÏ Failover ²ßÂÔ£¬JobManager »áÖØÆô Execution£¬·ñÔòÉý¼¶Îª ExecutionGraph µÄʧ°Ü¡£ExecutionGraph ʧ°ÜÔò½øÈë failing µÄ״̬£¬ÓÉ Restart ²ßÂÔ¾ö¶¨ÆäÖØÆô£¨ restarting ״̬£©»¹ÊÇÒì³£Í˳ö£¨ failed ״̬£©¡£

ÏÂÃæ·Ö±ð·ÖÎöÁ½¸ö´íÎó»Ö¸´²ßÂԵij¡¾°¼°ÊµÏÖ¡£

Task Failover ²ßÂÔ

×÷Ϊ¼ÆËãµÄ×îСִÐе¥Î»£¬Task ´íÎóÊÇÊ®·Ö³£¼ûµÄ£¬±ÈÈç»úÆ÷¹ÊÕÏ¡¢Óû§´úÂëÅ׳ö´íÎó»òÕßÍøÂç¹ÊÕϵȵȶ¼¿ÉÄÜÔì³É Task ´íÎó¡£¶ÔÓÚ·Ö²¼Ê½ÏµÍ³À´Ëµ£¬Í¨³£µ¥¸ö Task ´íÎóµÄ´¦Àí·½Ê½Êǽ«Õâ¸ö Task ÖØÐµ÷¶ÈÖÁÐ嵀 worker ÉÏ£¬²»Ó°ÏìÆäËû Task ºÍÕûÌå Job µÄÔËÐУ¬È»¶øÕâ¸ö·½Ê½¶ÔÓÚÁ÷´¦ÀíµÄ Flink À´Ëµ²¢²»¿ÉÓá£

Flink µÄÈÝ´í»úÖÆÖ÷Òª·ÖΪ´Ó checkpoint »Ö¸´×´Ì¬ºÍÖØÁ÷Êý¾ÝÁ½²½£¬ÕâÒ²ÊÇΪʲô Flink ͨ³£ÒªÇóÊý¾ÝÔ´µÄÊý¾ÝÊÇ¿ÉÒÔÖØ¸´¶ÁÈ¡µÄ¡£¶ÔÓÚÖØÆôºóµÄРTask À´Ëµ£¬Ëü¿ÉÒÔͨ¹ý¶ÁÈ¡ checkpoint ºÜÈÝÒ׵ػָ´×´Ì¬ÐÅÏ¢£¬µ«ÊÇÈ´²»ÄܶÀÁ¢µØÖØÁ÷Êý¾Ý£¬ÒòΪ checkpoint ÊDz»°üº¬Êý¾ÝµÄ£¬ÒªÖØÁ÷Êý¾ÝÖ»¿ÉÒÔÒªÇóÒÀÀµµ½µÄÈ«²¿ÉÏÓÎ Task ÖØÐ¼ÆË㣬ͨ³£À´Ëµ»áÒ»Ö±×·Ëݵ½Êý¾ÝÔ´ Task¡£ÊìϤ Spark µÄͬѧ´ó¸Å»áÁªÏëµ½ Spark µÄѪԵ»úÖÆ¡£¼òµ¥À´Ëµ£¬Spark ÒÀ¾ÝÊÇ·ñÐèÒª shuffle ½«×÷Òµ·Ö»®Îª¶à¸ö Stage£¬Ã¿¸ö Stage µÄ¼ÆËã¶¼ÊǶÀÁ¢µÄ Task£¬Æä½á¹û¿ÉÒÔ±»»º´æÆðÀ´¡£Èç¹ûij¸ö Task Ö´ÐÐʧ°Ü£¬ÄÇôËüÖ»ÒªÖØ¶ÁÉϸö Stage µÄ¼ÆË㻺´æ½á¹û¼´¿É£¬²»Ó°ÏìÆäËû Task µÄ¼ÆËã¡£Spark ¿ÉÒÔ¶ÀÁ¢µØ»Ö¸´Ò»¸ö Task£¬ºÜ´ó³Ì¶ÈÉÏÊÇÒòΪËüµÄÅú´¦ÀíÌØÐÔ£¬ÕâÔÊÐíÁË×÷ҵͨ¹ý»º´æÖÐ¼ä¼ÆËã½á¹ûÀ´½âñîÉÏÏÂÓÎ Task µÄÁªÏµ¡£¶ø Flink ×÷ΪÁ÷¼ÆËãÒýÇæ£¬ÏÔÈ»ÊÇÎÞ·¨¼òµ¥×öµ½ÕâµãµÄ¡£

Òª×öµ½Ï¸Á£¶ÈµÄ´íÎó»Ö¸´»úÖÆ£¬¼õСµ¥¸ö Task ´íÎó¶ÔÓÚÕûÌå×÷ÒµµÄÓ°Ï죬Flink ÐèҪʵÏÖÒ»Ì׸ü¼Ó¸´ÔÓµÄËã·¨£¬Ò²¾ÍÊÇ FLIP-1 [2] ÒýÈëµÄ Task Failover ²ßÂÔ¡£Task Failover ²ßÂÔĿǰÓÐÈý¸ö£¬·Ö±ðÊÇ RestartAll ¡¢ RestartIndividualStrategy ºÍ RestartPipelinedRegionStrategy ¡£

ͼ3. Restart Region ²ßÂÔÖØÆôÓÐÊý¾Ý½»»»µÄ Task

  • RestartAll: ÖØÆôÈ«²¿ Task£¬Êǻָ´×÷ÒµÒ»ÖÂÐÔµÄ×ȫ²ßÂÔ£¬»áÔÚÆäËû Failover ²ßÂÔʧ°Üʱ×÷Ϊ±£µ×²ßÂÔʹÓá£
  • ĿǰÊÇĬÈ쵀 Task Failover ²ßÂÔ¡£
  • RestartPipelinedRegionStrategy: ÖØÆô´íÎó Task ËùÔÚ Region µÄÈ«²¿ Task¡£
  • Task Region ÊÇÓÉ Task µÄÊý¾Ý´«Êä¾ö¶¨µÄ£¬ÓÐÊý¾Ý´«ÊäµÄ Task »á±»·ÅÔÚͬһ¸ö Region£¬¶ø²»Í¬ Region Ö®¼äûÓÐÊý¾Ý½»»»¡£
  • RestartIndividualStrategy: »Ö¸´µ¥¸ö Task¡£
  • ÒòΪÈç¹û¸Ã Task ûÓаüº¬Êý¾ÝÔ´£¬Õâ»áµ¼ÖÂËü²»ÄÜÖØÁ÷Êý¾Ý¶øµ¼ÖÂÒ»²¿·ÖÊý¾Ý¶ªÊ§¡£
  • ¿¼Âǵ½ÖÁÉÙÌṩ׼ȷһ´ÎµÄͶµÝÓïÒ壬Õâ¸ö²ßÂÔµÄʹÓ÷¶Î§±È½ÏÓÐÏÞ£¬Ö»Ó¦ÓÃÓÚ Task ¼äûÓÐÊý¾Ý´«ÊäµÄ×÷Òµ¡£
  • ²»¹ýÒ²Óв¿·ÖÒµÎñ³¡¾°¿ÉÄÜÐèÒªÕâÖÖ at-most-once µÄͶµÝÓïÒ壬±ÈÈç¶ÔÑÓ³ÙÃô¸Ð¶ø¶ÔÊý¾ÝÒ»ÖÂÐÔÒªÇóÏà¶ÔµÍµÄÍÆ¼öϵͳ¡£
  • ×ÜÌåÀ´Ëµ£¬ RestartAll ½ÏΪ±£ÊØ»áÔì³É×ÊÔ´ÀË·Ñ£¬¶ø RestartIndividualStrategy ÔòÌ«¹ý¼¤½ø²»Äܱ£Ö¤Êý¾ÝÒ»ÖÂÐÔ£¬¶ø RestartPipelinedRegionStrategy ÖØÆôµÄÊÇËùÓÐ Task Àï×îС±ØÒª×Ó¼¯£¬ÆäʵÊÇ×îºÃµÄ Failover ²ßÂÔ¡£¶øÊµ¼ÊÉÏ Apache ÉçÇøÒ²Õý×¼±¸ÔÚ 1.9 °æ±¾½«ÆäÉèΪĬÈ쵀 Failover ²ßÂÔ[3]¡£²»¹ýÖµµÃ×¢ÒâµÄÊÇ£¬ÔÚ 1.9 °æ±¾ÒÔǰ RestartPipelinedRegionStrategy ÓиöÑÏÖØµÄÎÊÌâÊÇÔÚÖØÆô Task ʱ²¢²»»á»Ö¸´Æä״̬[4]£¬ËùÒÔÇëÔÚ 1.9 °æ±¾ÒÔºó²ÅʹÓÃËü£¬³ý·ÇÄãÔÚÅÜÒ»¸öÎÞ״̬µÄ×÷Òµ¡£

    Job Restart ²ßÂÔ

    Èç¹û Task ´íÎó×îÖÕ´¥·¢ÁË Full Restart£¬´Ëʱ Job Restart ²ßÂÔ½«»á¿ØÖÆÊÇ·ñÐèÒª»Ö¸´×÷Òµ¡£Flink ÌṩÈýÖÖ Job ¾ßÌåµÄ Restart Strategy¡£

  • FixedDelayRestartStrategy: ÔÊÐíÖ¸¶¨´ÎÊýÄÚµÄ Execution ʧ°Ü£¬Èç¹û³¬¹ý¸Ã´ÎÊýÔòµ¼Ö Job ʧ°Ü¡£
  • FixedDelayRestartStrategy ÖØÆô¿ÉÒÔÉèÖÃÒ»¶¨µÄÑÓ³Ù£¬ÒÔ¼õÉÙÆµ·±ÖØÊÔ¶ÔÍⲿϵͳ´øÀ´µÄ¸ºÔغͲ»±ØÒªµÄ´íÎóÈÕÖ¾¡£
  • Ŀǰ FixedDelayRestartStrategy ÊÇĬÈ쵀 Restart Strategy¡£
  • FailureRateRestartStrategy: ÔÊÐíÔÚÖ¸¶¨Ê±¼ä´°¿ÚÄÚµÄÖ¸¶¨´ÎÊýÄÚµÄ Execution ʧ°Ü£¬Èç¹û³¬¹ýÕâ¸öƵÂÊÔòµ¼Ö Job ʧ°Ü¡£
  • ͬÑùµØ£¬FailureRateRestartStrategy Ò²¿ÉÒÔÉèÖÃÒ»¶¨µÄÖØÆôÑÓ³Ù¡£
  • NoRestartStrategy: ÔÚ Execution ʧ°Üʱֱ½ÓÈà Job ʧ°Ü¡£
  • ĿǰµÄ Restart Strategy ¿ÉÒÔ»ù±¾Âú×ã¡°×Ô¶¯ÖØÆô¹ÒµôµÄ×÷Òµ¡±ÕâÑùµÄ¼òµ¥ÐèÇó£¬È»¶ø²¢Ã»ÓÐÇø·Ö×÷Òµ³ö´íµÄÔ­Òò£¬Õâµ¼Ö¿ÉÄÜ»á¶Ô²»¿É»Ö¸´µÄ´íÎ󣨱ÈÈçÓû§´úÂëÅ׳öµÄ NPE »òÕßijЩ²Ù×÷±¨ Permission Denied£©½øÐв»±ØÒªµÄÖØÊÔ£¬½øÒ»²½µÄºó¹ûÊÇûÓеÚһʱ¼äÍ˳ö£¬¿ÉÄܵ¼ÖÂÓû§Ã»Óм°Ê±·¢ÏÖÎÊÌ⣬ÆäÍâ¶ÔÓÚ×ÊÔ´À´ËµÒ²ÊÇÒ»ÖÖÀË·Ñ£¬×îºó»¹¿ÉÄܵ¼ÖÂһЩ¸±×÷Ó㨱ÈÈçÓÐЩ at-leaset-once µÄ²Ù×÷±»Ö´Ðжà´Î£©¡£

    ¶Ô´Ë£¬ÉçÇøÔÚ 1.7 °æ±¾ÒýÈëÁË Exception µÄ·ÖÀà[5]£¬¾ßÌå»á½« Runtime Å׳öµÄ Exception ·ÖΪÒÔϼ¸Àà:

  • NonRecoverableError: ²»¿É»Ö¸´µÄ´íÎó¡£
  • ²»¶Ô´ËÀà´íÎó½øÐÐÖØÊÔ¡£
  • PartitionDataMissingError: µ±Ç° Task ¶Á²»µ½ÉÏÓÎ Task µÄijЩÊý¾Ý£¬ÐèÒªÉÏÓÎ Task ÖØÅܺÍÖØ·¢Êý¾Ý¡£
  • EnvironmentError: Ö´Ðл·¾³µÄ´íÎó£¬Í¨³£ÊÇ Flink ÒÔÍâµÄÎÊÌ⣬±ÈÈç»úÆ÷ÎÊÌâ¡¢ÒÀÀµÎÊÌâ¡£
  • ÕâÖÖ´íÎóµÄÒ»¸öÃ÷ÏÔÌØÕ÷ÊÇ»áÔÚijЩ»úÆ÷ÉÏÖ´Ðгɹ¦£¬µ«ÔÚÁíÍâһЩ»úÆ÷ÉÏÖ´ÐÐʧ°Ü¡£ Flink ºóÐø¿ÉÒÔÒýÈëºÚÃûµ¥»úÆ÷À´¸ü´ÏÃ÷µØ½øÐÐ Task µ÷¶ÈÒÔÔÝʱ±ÜÃâÕâÀàÎÊÌâµÄÓ°Ïì¡£
  • RecoverableError: ¿É»Ö¸´´íÎó¡£
  • ²»ÊôÓÚÉÏÊöÀàÐ͵ĴíÎó¶¼ÔÝÉèΪ¿É»Ö¸´µÄ¡£
  • ÆäʵÕâ¸ö·ÖÀà»áÓ¦ÓÃÓÚ Task Failover ²ßÂÔºÍ Job Restart ²ßÂÔ£¬²»¹ýĿǰֻÓкóÕß»á·ÖÀà´¦Àí£¬¶øÇÒ Job Restart ²ßÂÔ¶Ô Flink ×÷ÒµµÄÎȶ¨ÐÔÓ°ÏìÏÔÈ»¸ü´ó£¬Òò´Ë·ÅÔÚÕâ¸öµØ·½½²¡£ÖµµÃ×¢ÒâµÄÊÇ£¬½ØÖÁĿǰ£¨1.8 °æ±¾£©Õâ¸ö·ÖÀàÖ»´¦Óںܳõ¼¶µÄ½×¶Î£¬Ïñ NonRecoverable Ö»°üº¬ÁË×÷Òµ State ÃüÃû³åÍ»µÈÉÙÊý¼¸¸öÄÚ²¿´íÎ󣬶ø PartitionDataMissingError ºÍ EnvironmentError »¹Î´ÓÐÓ¦Óã¬ËùÒÔ¾ø´ó¶àÊýµÄ´íÎóÈÔÊÇ RecoverableError¡£

    ÊØ»¤½ø³ÌÈÝ´í

    ¶ÔÓÚ·Ö²¼Ê½ÏµÍ³À´Ëµ£¬ÊØ»¤½ø³ÌµÄÈÝ´íÊÇ»ù±¾ÒªÇó¶øÇÒÒѾ­±È½Ï³ÉÊ죬»ù±¾°üÀ¨¹ÊÕϼì²âºÍ¹ÊÕϻָ´Á½¸ö²¿·Ö£º¹ÊÕϼì²âͨ³£Í¨¹ýÐÄÌøµÄ·½Ê½À´ÊµÏÖ£¬ÐÄÌø¿ÉÒÔÔÚÄÚ²¿×é¼þ¼äʵÏÖ»òÕßÒÀÀµÓÚ zookeeper µÈÍⲿ·þÎñ£»¶ø¹ÊÕϻָ´Ôòͨ³£ÒªÇó½«×´Ì¬³Ö¾Ã»¯µ½Íⲿ´æ´¢£¬È»ºóÔÚ¹ÊÕϳöÏÖʱÓÃÓÚ³õʼеĽø³Ì¡£

    ÒÔ×îΪ³£ÓÃµÄ on YARN µÄ²¿ÊðģʽÀ´½²£¬Flink ¹Ø¼üµÄÊØ»¤½ø³ÌÓÐ JobManager ºÍ TaskManager Á½¸ö£¬ÆäÖÐ JobManager µÄÖ÷ÒªÖ°ÔðЭµ÷×ÊÔ´ºÍ¹ÜÀí×÷ÒµµÄÖ´ÐзֱðΪ ResourceManager ºÍ JobMaster Á½¸öÊØ»¤Ï̳߳е££¬ÈýÕßÖ®¼äµÄ¹ØÏµÈçÏÂͼËùʾ¡£

    ͼ4. ResourceManager¡¢JobMaster ºÍ TaskManager ÈýÕß¹ØÏµ

    ÔÚÈÝ´í·½Ãæ£¬Èý¸ö½ÇÉ«Á½Á½Ö®¼äÏ໥·¢ËÍÐÄÌøÀ´½øÐй²Í¬µÄ¹ÊÕϼì²â[7]¡£´ËÍâÔÚ HA ³¡¾°Ï£¬ResourceManager ºÍ JobMaster ¶¼»á×¢²áµ½ zookeeper ½ÚµãÉÏÒÔʵÏÖ leader Ëø¡£

    TaskManager µÄÈÝ´í

    Èç¹û ResouceManager ͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½»òÕßͨ¹ý¼¯Èº¹ÜÀíÆ÷µÄ֪ͨÁ˽⵽ TaskManager ¹ÊÕÏ£¬Ëü»á֪ͨ¶ÔÓ¦µÄ JobMaster ²¢Æô¶¯Ò»¸öÐ嵀 TaskManager ÒÔ×ö´úÌæ¡£×¢Òâ ResouceManager ²¢²»¹ØÐÄ Flink ×÷ÒµµÄÇé¿ö£¬ÕâÊÇ JobMaster µÄÖ°ÔðÈ¥¹ÜÀí Flink ×÷ÒµÒª×öºÎÖÖ·´Ó¦¡£

    Èç¹û JobMaster ͨ¹ý ResouceManager µÄ֪ͨÁ˽⵽»òÕßͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½ TaskManager ¹ÊÕÏ£¬ËüÊ×ÏÈ»á´Ó×Ô¼ºµÄ slot pool ÖÐÒÆ³ý¸Ã TaskManager£¬²¢½«¸Ã TaskManager ÉÏÔËÐеÄËùÓÐ Tasks ±ê¼ÇΪʧ°Ü£¬´Ó¶ø´¥·¢ Flink ×÷ÒµÖ´ÐеÄÈÝ´í»úÖÆÒÔ»Ö¸´×÷Òµ¡£

    TaskManager µÄ״̬ÒѾ­Ð´Èë checkpoint ²¢»áÔÚÖØÆôºó×Ô¶¯»Ö¸´£¬Òò´Ë²»»áÔì³ÉÊý¾Ý²»Ò»ÖµÄÎÊÌâ¡£

    ResourceManager µÄÈÝ´í

    Èç¹û TaskManager ͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½ ResourceManager ¹ÊÕÏ£¬»òÕßÊÕµ½ zookeeper µÄ¹ØÓÚ ResourceManager ʧȥ leadership ֪ͨ£¬TaskManager »áѰÕÒÐ嵀 leader ResourceManager ²¢½«×Ô¼ºÖØÆô×¢²áµ½ÆäÉÏ£¬ÆÚ¼ä²¢²»»áÖÐ¶Ï Task µÄÖ´ÐС£

    Èç¹û JobMaster ͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½ ResourceManager ¹ÊÕÏ£¬»òÕßÊÕµ½ zookeeper µÄ¹ØÓÚ ResourceManager ʧȥ leadership ֪ͨ£¬JobMaster ͬÑù»áµÈ´ýÐ嵀 ResourceManager ±ä³É leader£¬È»ºóÖØÐÂÇëÇóËùÓÐµÄ TaskManager¡£ ¿¼Âǵ½ TaskManager Ò²¿ÉÄܳɹ¦»Ö¸´£¬ÕâÑùµÄ»° JobMaster ÐÂÇëÇóµÄ TaskManager »áÔÚ¿ÕÏÐÒ»¶Îʱ¼äºó±»ÊÍ·Å¡£

    ResourceManager Éϱ£³ÖÁ˺ܶà״̬ÐÅÏ¢£¬°üÀ¨»îÔ¾µÄ container¡¢¿ÉÓÃµÄ TaskManager¡¢TaskManager ºÍ JobMaster µÄÓ³Éä¹ØÏµµÈµÈÐÅÏ¢£¬²»¹ýÕâЩÐÅÏ¢²¢²»ÊÇ ground truth£¬¿ÉÒÔ´ÓÓë JobMaster ¼° TaskManager µÄ״̬ͬ²½ÖÐÔÙÖØÐ»ñµÃ£¬ËùÒÔÕâЩÐÅÏ¢²¢²»ÐèÒª³Ö¾Ã»¯¡£

    JobMaster µÄÈÝ´í

    Èç¹û TaskManager ͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½ JobMaster ¹ÊÕÏ£¬»òÕßÊÕµ½ zookeeper µÄ¹ØÓÚ JobMaster ʧȥ leadership ֪ͨ£¬TaskManager »á´¥·¢×Ô¼ºµÄ´íÎó»Ö¸´£¨Ä¿Ç°ÊÇÊÍ·ÅËùÓÐ Task£©£¬È»ºóµÈ´ýÐ嵀 JobMaster¡£Èç¹ûÐ嵀 JobMaster ÔÚÒ»¶¨Ê±¼äºóÈÔδ³öÏÖ£¬TaskManager »á½«Æä slot ±ê¼ÇΪ¿ÕÏв¢¸æÖª ResourceManager¡£

    Èç¹û ResourceManager ͨ¹ýÐÄÌø³¬Ê±¼ì²âµ½ JobMaster ¹ÊÕÏ£¬»òÕßÊÕµ½ zookeeper µÄ¹ØÓÚ JobMaster ʧȥ leadership ֪ͨ£¬ResourceManager »á½«Æä¸æÖª TaskManager£¬ÆäËû²»×÷´¦Àí¡£

    JobMaster ±£´æÁ˺ܶà¶Ô×÷ÒµÖ´ÐÐÖÁ¹ØÖØÒªµÄ״̬£¬ÆäÖÐ JobGraph ºÍÓû§´úÂë»áÖØÐÂ´Ó HDFS µÈ³Ö¾Ã»¯´æ´¢ÖлñÈ¡£¬checkpoint ÐÅÏ¢»á´Ó zookeeper »ñµÃ£¬Task µÄÖ´ÐÐÐÅÏ¢¿ÉÒÔ²»»Ö¸´ÒòΪÕû¸ö×÷Òµ»áÖØÐµ÷¶È£¬¶ø³ÖÓÐµÄ slot Ôò´Ó ResourceManager µÄ TaskManager µÄͬ²½ÐÅÏ¢Öлָ´¡£

    ²¢·¢¹ÊÕÏ

    ÔÚ on YARN ²¿ÊðģʽÏ£¬ÒòΪ JobMaster ºÍ ResourceManager ¶¼ÔÚ JobManager ½ø³ÌÄÚ£¬Èç¹û JobManager ½ø³Ì³öÎÊÌ⣬ͨ³£ÊÇ JobMaster ºÍ ResourceManager ²¢·¢¹ÊÕÏ£¬ÄÇô TaskManager »á°´ÒÔϲ½Öè´¦Àí:

  • °´ÕÕÆÕͨµÄ JobMaster ¹ÊÕÏ´¦Àí¡£
  • ÔÚÒ»¶Îʱ¼äÄÚ²»¶Ï³¢ÊÔ½« slot Ìṩ¸øÐ嵀 JobMaster¡£
  • ²»¶Ï³¢ÊÔ½«×Ô¼º×¢²áµ½ ResourceManager ÉÏ¡£
  • ÖµµÃ×¢ÒâµÄÊÇ£¬Ð JobManager µÄÀ­ÆðÊÇÒÀ¿¿ YARN µÄ Application attempt ÖØÊÔ»úÖÆÀ´×Ô¶¯Íê³ÉµÄ£¬¶ø¸ù¾Ý Flink ÅäÖÃµÄ YARN Application keep-containers-across-application-attempts ÐÐΪ£¬TaskManager ²»»á±»ÇåÀí£¬Òò´Ë¿ÉÒÔÖØÐÂ×¢²áµ½ÐÂÆô¶¯µÄ Flink ResourceManager ºÍ JobMaster ÖС£

    Flink ÈÝ´í»úÖÆÈ·±£ÁË Flink µÄ¿É¿¿ÐԺͳ־ÃÐÔ£¬ÊÇ Flink Ó¦ÓÃÓÚÆóÒµ¼¶Éú²ú»·¾³µÄÖØÒª±£Ö¤£¬¾ßÌåÀ´ËµËü°üÀ¨×÷ÒµÖ´ÐеÄÈÝ´íºÍÊØ»¤½ø³ÌµÄÈÝ´íÁ½¸ö·½Ãæ¡£ÔÚ×÷ÒµÖ´ÐÐÈÝ´í·½Ãæ£¬Flink Ìṩ Task ¼¶±ðµÄ Failover ²ßÂÔºÍ Job ¼¶±ðµÄ Restart ²ßÂÔÀ´½øÐйÊÕÏÇé¿öϵÄ×Ô¶¯ÖØÊÔ¡£ÔÚÊØ»¤½ø³ÌµÄÈÝ´í·½Ãæ£¬ÔÚon YARN ģʽÏ£¬Flink ͨ¹ýÄÚ²¿×é¼þµÄÐÄÌøºÍ YARN µÄ¼à¿Ø½øÐйÊÕϼì²â¡£TaskManager µÄ¹ÊÕÏ»áͨ¹ýÉêÇëÐ嵀 TaskManager ²¢ÖØÆô Task »ò Job À´»Ö¸´£¬JobManager µÄ¹ÊÕÏ»áͨ¹ý¼¯Èº¹ÜÀíÆ÷µÄ×Ô¶¯À­ÆðРJobManager ºÍ TaskManager µÄÖØÐÂ×¢²áµ½Ð leader JobManager À´»Ö¸´¡£

     

     

     

       
    1392 ´Îä¯ÀÀ       27
    Ïà¹ØÎÄÕÂ

    »ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
    Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
    ¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
    ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
     
    Ïà¹ØÎĵµ

    GreenplumÊý¾Ý¿â»ù´¡Åàѵ
    MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
    ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
    MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
    Ïà¹Ø¿Î³Ì

    Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
    MongoDBʵս¿Î³Ì
    ²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
    PostgreSQLÊý¾Ý¿âʵսÅàѵ
    ×îл¼Æ»®
    DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
    DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
    UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
    AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
    »ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
    ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
     
    ×îÐÂÎÄÕÂ
    ´óÊý¾Ýƽ̨ϵÄÊý¾ÝÖÎÀí
    ÈçºÎÉè¼ÆÊµÊ±Êý¾Ýƽ̨£¨¼¼Êõƪ£©
    ´óÊý¾Ý×ʲú¹ÜÀí×ÜÌå¿ò¼Ü¸ÅÊö
    Kafka¼Ü¹¹ºÍÔ­Àí
    ELK¶àÖּܹ¹¼°ÓÅÁÓ
    ×îпγÌ
    ´óÊý¾Ýƽ̨´î½¨Óë¸ßÐÔÄܼÆËã
    ´óÊý¾Ýƽ̨¼Ü¹¹ÓëÓ¦ÓÃʵս
    ´óÊý¾ÝϵͳÔËά
    ´óÊý¾Ý·ÖÎöÓë¹ÜÀí
    Python¼°Êý¾Ý·ÖÎö
    ³É¹¦°¸Àý
    ijͨÐÅÉ豸ÆóÒµ PythonÊý¾Ý·ÖÎöÓëÍÚ¾ò
    Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
    ±±¾© Python¼°Êý¾Ý·ÖÎö
    ÉñÁúÆû³µ ´óÊý¾Ý¼¼Êõƽ̨-Hadoop
    ÖйúµçÐÅ ´óÊý¾Ýʱ´úÓëÏÖ´úÆóÒµµÄÊý¾Ý»¯ÔËӪʵ¼ù