Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Apache Flink Âþ̸ϵÁÐ - ¸ÅÊö
 
  2502  次浏览      29
 2018-9-30 
 
±à¼­ÍƼö:
À´Ô´51cto£¬±¾ÆªÎÄÕÂÎÒÃÇÓÃÒ»¾ä»°ÁÄÁÄʲôÊÇ Apache Flink µÄÃüÂö£¿ÎҵĴð°¸ÊÇ£ºApache Flink ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"µÄÈÏÖª½øÐÐϵͳÉè¼ÆµÄ¡£

Ò»¡¢Apache Flink µÄÃüÂö

"ÃüÂö" ¼´ÉúÃüÓëѪÂö£¬³£Ó÷¼«ÎªÖØÒªµÄÊÂÎϵÁеÄÊׯª£¬ÊׯªµÄÊ׶β»ÁÄApache FlinkµÄÀúÊ·£¬²»ÁÄApache FlinkµÄ¼Ü¹¹£¬²»ÁÄApache FlinkµÄ¹¦ÄÜÌØÐÔ£¬ÎÒÃÇÓÃÒ»¾ä»°ÁÄÁÄʲôÊÇ Apache Flink µÄÃüÂö?ÎҵĴð°¸ÊÇ£ºApache Flink ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"µÄÈÏÖª½øÐÐϵͳÉè¼ÆµÄ¡£

¶þ¡¢Î¨¿ì²»ÆÆ

ÎÒÃǾ­³£Ìý˵ "ÌìÏÂÎ书£¬Î¨¿ì²»ÆÆ"£¬´ó¸ÅÒâ˼ÊÇ˵ "ÈκÎÒ»ÖÖÎ书µÄÕÐÊý¶¼ÊÇÓвðÕеģ¬Î¨ÓÐËٶȿ죬¿ìµ½¶ÔÊÖ¸ù±¾À´²»¼°·´Ó¦£¬Äã¾Í½«¶ÔÊÖKOÁË£¬¶ÔÊÖûÓлú»á²ðÕУ¬ËùÒÔΨ¿ì²»ÆÆ"¡£ ÄÇôÕâÓëApache FlinkÓÐʲô¹ØÏµÄØ?Apache FlinkÊÇNative Streaming(´¿Á÷ʽ)¼ÆËãÒýÇæ£¬ÔÚʵʱ¼ÆË㳡¾°×î¹ØÐĵľÍÊÇ"¿ì",Ò²¾ÍÊÇ "µÍÑÓʱ"¡£

¾ÍĿǰ×îÈȵÄÁ½ÖÖÁ÷¼ÆËãÒýÇæApache SparkºÍApache Flink¶øÑÔ£¬Ë­×îÖÕ»á³ÉΪNo1ÄØ?µ¥´Ó "µÍÑÓʱ" µÄ½Ç¶È¿´£¬SparkÊÇMicro Batching(΢Åúʽ)ģʽ£¬×îµÍÑÓ³ÙSparkÄÜ´ïµ½0.5~2Ãë×óÓÒ£¬FlinkÊÇNative Streaming(´¿Á÷ʽ)ģʽ£¬×îµÍÑÓʱÄܴﵽ΢Ãë¡£ºÜÏÔÈ»ÊÇÏà¶Ô½ÏÍí³öµÀµÄ Apache Flink ºóÀ´Õß¾ÓÉÏ¡£ ÄÇôΪʲôApache FlinkÄÜ×öµ½Èç´ËÖ® "¿ì"ÄØ?¸ù±¾Ô­ÒòÊÇApache Flink Éè¼ÆÖ®³õ¾ÍÈÏΪ "ÅúÊÇÁ÷µÄÌØÀý"£¬Õû¸öϵͳÊÇNative StreamingÉè¼Æ£¬Ã¿À´Ò»ÌõÊý¾Ý¶¼Äܹ»´¥·¢¼ÆËã¡£Ïà¶ÔÓÚÐèÒª¿¿Ê±¼äÀ´»ýÔÜÊý¾ÝMicro BatchingģʽÀ´Ëµ£¬Ôڼܹ¹ÉϾÍÒѾ­Õ¼¾ÝÁ˾ø¶ÔÓÅÊÆ¡£

ÄÇôΪʲô¹ØÓÚÁ÷¼ÆËã»áÓÐÁ½ÖÖ¼ÆËãÄ£Ê½ÄØ?¹éÆä¸ù±¾ÊÇÒòΪ¶ÔÁ÷¼ÆËãµÄÈÏÖª²»Í¬£¬ÊÇ"Á÷ÊÇÅúµÄÌØÀý" ºÍ "ÅúÊÇÁ÷µÄÌØÀý" Á½ÖÖ²»Í¬ÈÏÖª²úÎï¡£

1. Micro Batching ģʽ

Micro-Batching ¼ÆËãģʽÈÏΪ "Á÷ÊÇÅúµÄÌØÀý"£¬ Á÷¼ÆËã¾ÍÊǽ«Á¬Ðø²»¶ÏµÄÅú½øÐгÖÐø¼ÆË㣬Èç¹ûÅú×㹻СÄÇô¾ÍÓÐ×㹻СµÄÑÓʱ£¬ÔÚÒ»¶¨³Ì¶ÈÉÏÂú×ãÁË99%µÄʵʱ¼ÆË㳡¾°¡£ÄÇôÄÇ1%Ϊɶ×ö²»µ½ÄØ?Õâ¾ÍÊǼܹ¹µÄ÷ÈÁ¦£¬ÔÚMicro-BatchingģʽµÄ¼Ü¹¹ÊµÏÖÉϾÍÓÐÒ»¸ö×ÔÈ»Á÷Êý¾ÝÁ÷Èëϵͳ½øÐÐÔÜÅúµÄ¹ý³Ì£¬ÕâÔÚÒ»¶¨³Ì¶ÈÉϾÍÔö¼ÓÁËÑÓʱ¡£¾ßÌåÈçÏÂʾÒâͼ£º

ºÜÏÔÈ»Micro-BatchingģʽÓÐÆäÌìÉúµÄµÍÑÓʱƿ¾±£¬µ«ÈκÎÊÂÎïµÄ´æÔÚ¶¼ÓÐÁ½ÃæÐÔ£¬ÔÚ´óÊý¾Ý¼ÆËãµÄ·¢Õ¹ÀúÊ·ÉÏ£¬×î³õHadoopÉϵÄMapReduce¾ÍÊÇÓÅÐãµÄÅúģʽ¼ÆËã¿ò¼Ü£¬Micro-BatchingÔÚÉè¼ÆºÍʵÏÖÉÏ¿ÉÒÔ½è¼øºÜ¶à³ÉÊìʵ¼ù¡£

2. Native Streaming ģʽ

Native Streaming ¼ÆËãģʽÈÏΪ ""ÅúÊÇÁ÷µÄÌØÀý"£¬Õâ¸öÈÏÖª¸üÌùÇÐÁ÷µÄ¸ÅÄ±ÈÈçһЩ¼à¿ØÀàµÄÏûÏ¢Á÷£¬Êý¾Ý¿â²Ù×÷µÄbinlog£¬ÊµÊ±µÄÖ§¸¶½»Ò×ÐÅÏ¢µÈµÈ×ÔÈ»Á÷Êý¾Ý¶¼ÊÇÒ»Ìõ£¬Ò»ÌõµÄÁ÷Èë¡£Native Streaming ¼ÆËãģʽÿÌõÊý¾ÝµÄµ½À´¶¼½øÐмÆË㣬ÕâÖÖ¼ÆËãģʽÏԵøü×ÔÈ»£¬²¢ÇÒÑÓʱÐÔÄÜ´ïµ½¸üµÍ¡£¾ßÌåÈçÏÂʾÒâͼ£º

ºÜÃ÷ÏÔNative Streamingģʽռ¾ÝÁËÁ÷¼ÆËãÁìÓò "µÍÑÓʱ" µÄºËÐľºÕùÁ¦£¬µ±È»Native StreamingģʽµÄʵÏÖ¿ò¼ÜÊÇÒ»¸öÀúÊ·ÏȺӣ¬µÚÒ»¸öʵÏÖNative StreamingģʽµÄÁ÷¼ÆËã¿ò¼ÜÊǵÚÒ»¸ö³Ôó¦Ð·µÄÈË£¬ÐèÒªÃæÁÙ¸ü¶àµÄÌôÕ½£¬ºóÐøÕ½ÚÎÒÃÇ»áÂýÂý½éÉÜ¡£µ±È»Native StreamingģʽµÄ¿ò¼ÜʵÏÖÉÏÃæºÜÈÝÒ×ʵÏÖMicro-BatchingºÍBatchingģʽģʽµÄ¼ÆË㣬Apache Flink¾ÍÊÇNative Streaming¼ÆËãģʽµÄÁ÷ÅúͳһµÄ¼ÆËãÒýÇæ¡£

Èý¡¢·á¸»µÄ²¿Êðģʽ

Apache Flink °´²»Í¬µÄÐèÇóÖ§³ÖLocal£¬Cluster£¬CloudÈýÖÖ²¿Êðģʽ£¬Í¬Ê±Apache FlinkÔÚ²¿ÊðÉÏÄܹ»ÓëÆäËû³ÉÊìµÄÉú̬²úÆ·½øÐÐÍêÃÀ¼¯³É£¬Èç ClusterģʽÏ¿ÉÒÔÀûÓÃYARN(Yet Another Resource Negotiator)/Mesos¼¯³É½øÐÐ×ÊÔ´¹ÜÀí£¬ÔÚCloud²¿ÊðģʽÏ¿ÉÒÔÓëGCE(Google Compute Engine), EC2(Elastic Compute Cloud)½øÐм¯³É¡£

1. Local ģʽ

¸ÃģʽÏÂApache Flink ÕûÌåÔËÐÐÔÚSingle JVMÖУ¬ÔÚ¿ª·¢Ñ§Ï°ÖÐʹÓã¬Í¬Ê±Ò²¿ÉÒÔ°²×°µ½ºÜ¶à¶ËÀàÉ豸ÉÏ¡£

2. Clusterģʽ

¸ÃģʽÊǵäÐ͵ÄͶ²úµÄ¼¯ÈºÄ£Ê½£¬Apache Flink ¼È¿ÉÒÔStandaloneµÄ·½Ê½½øÐв¿Êð£¬Ò²¿ÉÒÔÓëÆäËû×ÊÔ´¹ÜÀíϵͳ½øÐм¯³É²¿Ê𣬱ÈÈçÓëYARN½øÐм¯³É¡£

ÕâÖÖ²¿ÊðģʽÊǵäÐ͵ÄMaster/Slaveģʽ£¬ÎÒÃÇÒÔStandalone ClusterģʽΪÀýʾÒâÈçÏ£º

ÆäÖÐJM(JobManager)ÊÇMaster£¬TM(TaskManager)ÊÇSlave£¬ÕâÖÖMaster/SlaveģʽÓÐÒ»¸öµäÐ͵ÄÎÊÌâ¾ÍÊÇSPOF(single point of failure), SPOFÈçºÎ½â¾öÄØ?Apache Flink ÓÖÌṩÁËHA(High Availability)·½°¸£¬Ò²¾ÍÊÇÌṩ¶à¸öMaster£¬ÔÚÈκÎʱºò×ÜÓÐÒ»¸öJM·þÒÛ£¬N(N>=1)¸öJMºòÑ¡,½ø¶ø½â¾öSPOFÎÊÌ⣬ʾÒâÈçÏ£º

ÔÚʵ¼ÊµÄÉú²ú»·¾³ÎÒÃǶ¼»áÅäÖÃHA·½°¸£¬Ä¿Ç°AlibabaÄÚ²¿Ê¹ÓõÄÒ²ÊÇ»ùÓÚYARN ClusterµÄHA·½°¸¡£

3. Cloud ģʽ

¸ÃģʽÖ÷ÒªÊÇÓë³ÉÊìµÄÔÆ²úÆ·½øÐм¯³É£¬Apache Flink¹ÙÍø½éÉÜÁËGoogleµÄGCE ²Î¿¼£¬AmazonµÄEC2 ²Î¿¼£¬ÔÚAlibabaÎÒÃÇÒ²¿ÉÒÔ½«Apache Flink²¿Êðµ½AlibabaµÄECS(Elastic Compute Service)¡£

ËÄ¡¢ÍêÉÆµÄÈÝ´í»úÖÆ

1. ʲôÊÇÈÝ´í

ÈÝ´í(Fault Tolerance) ÊÇÖ¸ÈÝÈ̹ÊÕÏ£¬ÔÚ¹ÊÕÏ·¢ÉúʱÄܹ»×Ô¶¯¼ì²â³öÀ´²¢Ê¹ÏµÍ³Äܹ»×Ô¶¯»Ø¸´Õý³£ÔËÐС£µ±³öÏÖijЩָ¶¨µÄÍøÂç¹ÊÕÏ¡¢Ó²¼þ¹ÊÕÏ¡¢Èí¼þ´íÎóʱ£¬ÏµÍ³ÈÔÄÜÖ´Ðй涨µÄÒ»×é³ÌÐò£¬»òÕß˵³ÌÐò²»»áÒòϵͳÖеĹÊÕ϶øÖÐÖ¹£¬²¢ÇÒÖ´Ðнá¹ûÒ²²»»áÒòϵͳ¹ÊÕ϶øÒýÆð¼ÆËã²î´í¡£

2. ÈÝ´íµÄ´¦Àíģʽ

ÔÚÒ»¸ö·Ö²¼Ê½ÏµÍ³ÖÐÓÉÓÚµ¥¸ö½ø³Ì»òÕß½Úµãå´»ú¶¼ÓпÉÄܵ¼ÖÂÕû¸öJobʧ°Ü£¬ÄÇôÈÝ´í»úÖÆ³ýÁËÒª±£Ö¤ÔÚÓöµ½·ÇÔ¤ÆÚÇé¿öϵͳÄܹ»"ÔËÐÐ"Í⣬»¹ÒªÇóÄÜ"ÕýÈ·ÔËÐÐ",Ò²¾ÍÊÇÊý¾ÝÄܰ´Ô¤ÆÚµÄ´¦Àí·½Ê½½øÐд¦Àí£¬±£Ö¤¼ÆËã½á¹ûµÄÕýÈ·ÐÔ¡£¼ÆËã½á¹ûµÄÕýÈ·ÐÔÈ¡¾öÓÚϵͳ¶ÔÿһÌõ¼ÆËãÊý¾Ý´¦Àí»úÖÆ£¬Ò»°ãÓÐÈçÏÂÈýÖÖ´¦Àí»úÖÆ£º

At Most Once£º×î¶àÏû·ÑÒ»´Î£¬ÕâÖÖ´¦Àí»úÖÆ»á´æÔÚÊý¾Ý¶ªÊ§µÄ¿ÉÄÜ¡£

At Least Once£º×îÉÙÏû·ÑÒ»´Î£¬ÕâÖÖ´¦Àí»úÖÆÊý¾Ý²»»á¶ªÊ§£¬µ«ÊÇÓпÉÄÜÖØ¸´Ïû·Ñ¡£

Exactly Once£º¾«È·Ò»´Î£¬ÎÞÂÛºÎÖÖÇé¿öÏ£¬Êý¾Ý¶¼Ö»»áÏû·ÑÒ»´Î£¬ÕâÖÖ»úÖÆÊǶÔÊý¾Ý׼ȷÐÔµÄ×î¸ßÒªÇó£¬ÔÚ½ðÈÚÖ§¸¶£¬ÒøÐÐÕËÎñµÈÁìÓò±ØÐë²ÉÓÃÕâÖÖģʽ¡£

3. Apache FlinkµÄÈÝ´í»úÖÆ

Apache FlinkµÄJob»áÉæ¼°µ½3¸ö²¿·Ö£¬ÍⲿÊý¾ÝÔ´(External Input), FlinkÄÚ²¿Êý¾Ý´¦Àí(Flink Data Flow)ºÍÍⲿÊä³ö(External Output)¡£ÈçÏÂʾÒâͼ:

ĿǰApache Flink Ö§³ÖÁ½ÖÖÊý¾ÝÈÝ´í»úÖÆ£º

At Least Once

Exactly Once

ÆäÖÐ Exactly Once ÊÇ×îÑϸñµÄÈÝ´í»úÖÆ£¬¸ÃģʽҪÇóÿÌõÊý¾Ý±ØÐë´¦ÀíÇÒ½ö´¦ÀíÒ»´Î¡£ÄÇô¶ÔÓÚÕâÖÖÑϸñÈÝ´í»úÖÆ£¬Ò»¸öÍêÕûµÄFlink JobÈÝ´íÒª×öµ½ End-to-End µÄ ÈÝ´í±ØÐë½áºÏÈý¸ö²¿·Ö½øÐÐÁªºÏ´¦Àí£¬¸ù¾ÝÉÏͼÎÒÃÇ¿¼ÂÇÈý¸ö³¡¾°£º

³¡¾°Ò»£ºFlinkµÄSource Operator ÔÚ¶ÁÈ¡µ½KaflaÖÐpos=2000µÄÊý¾Ýʱºò£¬ÓÉÓÚijÖÖÔ­Òòå´»úÁË£¬Õâ¸öʱºòFlink¿ò¼Ü»á·ÖÅäÒ»¸öеĽڵã¼ÌÐø¶ÁÈ¡KaflaÊý¾Ý£¬ÄÇôеĴ¦Àí½ÚµãÔõÑù´¦Àí²ÅÄܱ£Ö¤Êý¾Ý´¦ÀíÇÒÖ»±»´¦ÀíÒ»´ÎÄØ?

³¡¾°¶þ£ºFlink Data FlowÄÚ²¿Ä³¸ö½Úµã£¬Èç¹ûÉÏͼµÄagg()½Úµã·¢ÉúÎÊÌ⣬ÔÚ»Ö¸´Ö®ºóÔõÑù´¦Àí²ÅÄܱ£³Ömap()Á÷³öµÄÊý¾Ý´¦ÀíÇÒÖ»±»´¦ÀíÒ»´Î?

³¡¾°Èý£ºFlinkµÄSink Operator ÔÚдÈëKafka¹ý³ÌÖÐ×ÔÉí½Úµã³öÏÖÎÊÌ⣬ÔÚ»Ö¸´Ö®ºóÈçºÎ´¦Àí£¬¼ÆËã½á¹û²ÅÄܱ£Ö¤Ð´ÈëÇÒÖ»±»Ð´ÈëÒ»´Î?

4. ϵͳÄÚ²¿ÈÝ´í

Apache FlinkÀûÓÃCheckpointing»úÖÆÀ´´¦ÀíÈÝ´í£¬CheckpointingµÄÀíÂÛ»ù´¡ Stephan ÔÚ Lightweight Asynchronous Snapshots for Distributed Dataflows ½øÐÐÁËϸ½ÚÃèÊö£¬¸Ã»úÖÆÔ´ÓÚÓÐK. MANI CHANDYºÍLESLIE LAMPORT ·¢±íµÄ Determining-Global-States-of-a-Distributed-System Paper¡£Apache Flink »ùÓÚCheckpointing»úÖÆ¶ÔFlink Data FlowʵÏÖÁËAt Least Once ºÍ Exactly Once Á½ÖÖÈÝ´í´¦Àíģʽ¡£

Apache Flink CheckpointingµÄÄÚ²¿ÊµÏÖ»áÀûÓà Barriers£¬StateBackendµÈºóÐøÕ½ڻáÏêϸ½éÉܵļ¼ÊõÀ´½«Êý¾ÝµÄ´¦Àí½øÐÐMarker¡£Apache Flink»áÀûÓÃBarrier½«Õû¸öÁ÷½øÐбê¼ÇÇз֣¬ÈçÏÂʾÒâͼ£º

ÕâÑùApache FlinkµÄÿ¸öOperator¶¼»á¼Ç¼µ±Ç°³É¹¦´¦ÀíµÄCheckpoint£¬Èç¹û·¢Éú´íÎ󣬾ͻá´ÓÉÏÒ»¸ö³É¹¦µÄCheckpoint¿ªÊ¼¼ÌÐø´¦ÀíºóÐøÊý¾Ý¡£±ÈÈç Soruce Operator»á½«¶ÁÈ¡ÍⲿÊý¾ÝÔ´µÄPositionʵʱµÄ¼Ç¼µ½CheckpointÖУ¬Ê§°Üʱºò»á´ÓCheckpointÖжÁÈ¡³É¹¦µÄposition¼ÌÐø¾«×¼µÄÏû·ÑÊý¾Ý¡£Ã¿¸öËã×Ó»áÔÚCheckpointÖмǼ×Ô¼º»Ö¸´Ê±ºò±ØÐëµÄÊý¾Ý£¬±ÈÈçÁ÷µÄԭʼÊý¾ÝºÍÖÐ¼ä¼ÆËã½á¹ûµÈÐÅÏ¢£¬ÔÚ»Ö¸´µÄʱºò´ÓCheckpointÖжÁÈ¡²¢³ÖÐø´¦ÀíÁ÷Êý¾Ý¡£

5. ÍⲿSourceÈÝ´í

Apache Flink Òª×öµ½ End-to-End µÄ Exactly Once ÐèÒªÍⲿSourceµÄÖ§³Ö£¬±ÈÈçÉÏÃæÎÒÃÇ˵¹ý Apache FlinkµÄCheckpointing»úÖÆ»áÔÚSource½Úµã¼Ç¼¶ÁÈ¡µÄPosition£¬ÄǾÍÐèÒªÍⲿÊý¾ÝÌṩ¶ÁÈ¡µÄPositionºÍÖ§³Ö¸ù¾ÝPosition½øÐÐÊý¾Ý¶ÁÈ¡¡£

6. ÍⲿSinkÈÝ´í

Apache Flink Òª×öµ½ End-to-End µÄ Exactly Once Ïà¶Ô±È½ÏÀ§ÄÑ£¬ÈçÉϳ¡¾°ÈýËùÊö£¬µ±Sink Operator½Úµãå´»ú£¬ÖØÐ»ָ´Ê±ºò¸ù¾ÝApache Flink ÄÚ²¿ÏµÍ³ÈÝ´í exactly onceµÄ±£Ö¤,ϵͳ»á»Ø¹öµ½Éϴγɹ¦µÄCheckpoin¼ÌÐøÐ´È룬µ«ÊÇÉϴγɹ¦CheckpointÖ®ºóµ±Ç°CheckpointδÍê³É֮ǰÒѾ­°ÑÒ»²¿·ÖÐÂÊý¾ÝдÈëµ½kafkaÁË. Apache Flink×ÔÉϴγɹ¦µÄCheckpoint¼ÌÐøÐ´Èëkafka£¬¾ÍÔì³ÉÁËkafkaÔٴνÓÊÕµ½Ò»·ÝͬÑùµÄÀ´×ÔSink OperatorµÄÊý¾Ý,½ø¶øÆÆ»µÁËEnd-to-End µÄ Exactly Once ÓïÒå(ÖØ¸´Ð´Èë¾Í±ä³ÉÁËAt Least OnceÁË)£¬Èç¹ûÒª½â¾öÕâÒ»ÎÊÌ⣬Apache Flink ÀûÓÃTwo phase commit(Á½½×¶ÎÌá½»)µÄ·½Ê½À´½øÐд¦Àí¡£±¾ÖÊÉÏÊÇSink Operator ÐèÒª¸ÐÖªÕûÌåCheckpointµÄÍê³É£¬²¢ÔÚÕûÌåCheckpointÍê³Éʱºò½«¼ÆËã½á¹ûдÈëKafka¡£

Îå¡¢Á÷ÅúͳһµÄ¼ÆËãÒýÇæ

ÅúÓëÁ÷ÊÇÁ½ÖÖ²»Í¬µÄÊý¾Ý´¦Àíģʽ£¬ÈçApache StormÖ»Ö§³ÖÁ÷ģʽµÄÊý¾Ý´¦Àí£¬Apache SparkÖ»Ö§³ÖÅú(Micro Batching)ģʽµÄÊý¾Ý´¦Àí¡£ÄÇôApache Flink ÊÇÈçºÎ×öµ½¼ÈÖ§³ÖÁ÷´¦ÀíģʽҲ֧³ÖÅú´¦ÀíÄ£Ê½ÄØ?

1. ͳһµÄÊý¾Ý´«Êä²ã

¿ªÆªÎÒÃǾͽéÉÜApache Flink µÄ "ÃüÂö"ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"Ϊµ¼ÏòÀ´½øÐÐÒýÇæµÄÉè¼ÆµÄ£¬ÏµÍ³Éè¼Æ³ÉΪ "Native Streaming"µÄģʽ½øÐÐÊý¾Ý´¦Àí¡£ÄÇôApache FLink½«ÅúģʽִÐеÄÈÎÎñ¿´×öÊÇÁ÷ʽ´¦ÀíÈÎÎñµÄÌØÊâÇé¿ö£¬Ö»ÊÇÔÚÊý¾ÝÉÏÅúÊÇÓнçµÄ(ÓÐÏÞÊýÁ¿µÄÔªËØ)¡£

Apache Flink ÔÚÍøÂç´«Êä²ãÃæÓÐÁ½ÖÖÊý¾Ý´«Êäģʽ£º

PIPELINEDģʽ - ¼´Ò»ÌõÊý¾Ý±»´¦ÀíÍê³ÉÒÔºó£¬Á¢¿Ì´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí¡£

BATCH ģʽ - ¼´Ò»ÌõÊý¾Ý±»´¦ÀíÍê³Éºó£¬²¢²»»áÁ¢¿Ì´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí£¬¶øÊÇдÈëµ½»º´æÇø£¬Èç¹û»º´æÐ´Âú¾Í³Ö¾Ã»¯µ½±¾µØÓ²ÅÌÉÏ£¬×îºóµ±ËùÓÐÊý¾Ý¶¼±»´¦ÀíÍê³Éºó£¬²Å½«Êý¾Ý´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí¡£

¶ÔÓÚÅúÈÎÎñ¶øÑÔͬÑù¿ÉÒÔÀûÓÃPIPELINEDģʽ£¬±ÈÈçÎÒÒª×öcountͳ¼Æ£¬ÀûÓÃPIPELINEDģʽÄÜÄõ½¸üºÃµÄÖ´ÐÐÐÔÄÜ¡£Ö»ÓÐÔÚÌØÊâÇé¿ö£¬±ÈÈçSortMergeJoin£¬ÕâʱºòÎÒÃÇÐèҪȫ¾ÖÊý¾ÝÅÅÐò£¬²ÅÐèÒªBATCHģʽ¡£´ó²¿·ÖÇé¿öÁ÷ÓëÅú¿ÉÓÃͳһµÄ´«Êä²ßÂÔ£¬Ö»ÓÐÌØÊâÇé¿ö£¬²Å½«Åú¿´×öÊÇÁ÷µÄÒ»¸öÌØÀý¼ÌÐøÌØÊâ´¦Àí¡£

2. ͳһÈÎÎñµ÷¶È²ã

Apache Flink ÔÚÈÎÎñµ÷¶ÈÉÏÁ÷ÓëÅú¹²ÏíͳһµÄ×ÊÔ´ºÍÈÎÎñµ÷¶È»úÖÆ(ºóÐøÕ½ڻáÏêϸ½éÉÜ)¡£

3. ͳһµÄÓû§API²ã

Apache Flink ÔÚDataStremAPIºÍDataSetAPI»ù´¡ÉÏ£¬ÎªÓû§ÌṩÁËÁ÷ÅúͳһµÄÉϲãTableAPIºÍSQL£¬ÔÚÓï·¨ºÍÓïÒåÉÏÁ÷Åú½øÐи߶Èͳһ¡£(ÆäÖÐDataStremAPIºÍDataSetAPI¶ÔÁ÷ºÍÅú½øÐÐÁË·Ö±ð³éÏó£¬ÕâÒ»µã²¢²»ÓÅÑÅ£¬ÔÚAlibabaÄÚ²¿¶ÔÆä½øÐÐÁËͳһ³éÏó)¡£

4. Çóͬ´æÒì

Apache Flink ÊÇÁ÷ÅúͳһµÄ¼ÆËãÒýÇæ£¬²¢²»Òâζ×ÅÁ÷ÓëÅúµÄÈÎÎñ¶¼×ßͳһµÄcode path£¬ÔڶԵײãµÄ¾ßÌåËã×ÓµÄʵÏÖÒ²ÊÇÓи÷×ԵĴ¦ÀíµÄ£¬ÔÚ¾ßÌ幦ÄÜÉÏÃæ»á¸ù¾Ý²»Í¬µÄÌØÐÔÇø±ð´¦Àí¡£±ÈÈç ÅúûÓÐCheckpoint»úÖÆ£¬Á÷Éϲ»ÄÜ×öSortMergeJoin¡£

Áù¡¢Apache Flink ¼Ü¹¹

1. ×é¼þÕ»

ÎÒÃÇÉÏÃæÄÚÈÝÒѾ­½éÉÜÁ˺ܶàApache FlinkµÄ¸÷ÖÖ×é¼þ£¬ÏÂÃæÎÒÃÇÕûÌå¸ÅÀÀÒ»ÏÂȫò£¬ÈçÏ£º

TableAPIºÍSQL¶¼½¨Á¢ÔÚDataSetAPIºÍDataStreamAPIµÄ»ù´¡Ö®ÉÏ£¬ÄÇôTableAPIºÍSQLÊÇÈçºÎת»»ÎªDataStreamºÍDataSetµÄÄØ?

2. TableAPI&SQLµ½DataStrem&DataSetµÄ¼Ü¹¹

TableAPI&SQL×îÖջᾭ¹ýCalciteÓÅ»¯Ö®ºóת»»ÎªDataStreamºÍDataSet£¬¾ßÌåת»»Ê¾ÒâÈçÏ£º

¶ÔÓÚÁ÷ÈÎÎñ×îÖÕ»áת»»³ÉDataStream£¬¶ÔÓÚÅúÈÎÎñ×îÖÕ»áת»»³ÉDataSet¡£

3. ANSI-SQLµÄÖ§³Ö

Apache Flink Ö®ËùÒÔÀûÓÃANSI-SQL×÷ΪÓû§Í³Ò»µÄ¿ª·¢ÓïÑÔ£¬ÊÇÒòΪSQLÓÐ×ŷdz£Ã÷ÏÔµÄÓŵ㣬ÈçÏ£º

Declarative - Óû§Ö»ÐèÒª±í´ïÎÒÏëҪʲô£¬²»ÓùØÐÄÈçºÎ¼ÆËã¡£

Optimized - ²éѯÓÅ»¯Æ÷¿ÉÒÔΪÓû§µÄ SQL Éú³É×îÓŵÄÖ´Ðмƻ®£¬»ñÈ¡×îºÃµÄ²éѯÐÔÄÜ¡£

Understandable - SQLÓïÑÔ±»²»Í¬ÁìÓòµÄÈËËùÊìÖª£¬ÓÃSQL ×÷Ϊ¿çÍŶӵĿª·¢ÓïÑÔ¿ÉÒԺܴóµØÌá¸ßЧÂÊ¡£

Stable - SQL ÊÇÒ»¸öÓµÓм¸Ê®ÄêÀúÊ·µÄÓïÑÔ£¬ÊÇÒ»¸ö·Ç³£Îȶ¨µÄÓïÑÔ£¬ºÜÉÙÓб䶯¡£

Unify - Apache FlinkÔÚÒýÇæÉ϶ÔÁ÷ÓëÅú½øÐÐͳһ£¬Í¬Ê±ÓÖÀûÓÃANSI-SQLÔÚÓï·¨ºÍÓïÒå²ãÃæ½øÐÐͳһ¡£

4. ÎÞÏÞÀ©Õ¹µÄÓÅ»¯»úÖÆ

Apache Flink ÀûÓÃApache Calcite¶ÔSQL½øÐнâÎöºÍÓÅ»¯£¬Apache Calcite²ÉÓÃCalciteÊÇ¿ªÔ´µÄÒ»ÌײéѯÒýÇæ£¬ÊµÏÖÁËÁ½Ì×Planner£º

HepPlanner - ÊÇRBO(Rule Base Optimize)ģʽ£¬»ùÓÚ¹æÔòµÄÓÅ»¯¡£

VolcanoPlanner - ÊÇCBO(Cost Base Optimize)ģʽ£¬»ùÓڳɱ¾µÄÓÅ»¯¡£

Flink SQL»áÀûÓÃCalcite½âÎöÓÅ»¯Ö®ºó£¬×îÖÕת»»Îªµ×²ãµÄDataStremºÍDataset¡£ÉÏͼÖÐ Batch rulesºÍStream rules¿ÉÒÔ¸ù¾ÝÓÅ»¯ÐèÒªÎÞÏÞÌí¼ÓÓÅ»¯¹æÔò¡£

Æß¡¢·á¸»µÄÀà¿âºÍËã×Ó

Apache Flink ÓÅÐãµÄ¼Ü¹¹¾ÍÏñÒ»×ùĦÌì´óÏõĵػùÒ»ÑùΪApache Flink ³Ö¾ÃµÄÉúÃüÁ¦´òÏÂÁËÁ¼ºÃµÄ»ù´¡£¬Îª´òÔìApache Flink·á¸»µÄ¹¦ÄÜÉú̬ÁôÏÂÎÞÏ޵Ŀռ䡣

1. Àà¿â

CEP - ¸´ÔÓʼþ´¦ÀíÀà¿â£¬ºËÐÄÊÇÒ»¸ö״̬»ú£¬¹ã·ºÓ¦ÓÃÓÚʼþÇý¶¯µÄ¼à¿ØÔ¤¾¯ÀàÒµÎñ³¡¾°¡£

ML - »úÆ÷ѧϰÀà¿â£¬»úÆ÷ѧϰÖ÷ÒªÊÇʶ±ðÊý¾ÝÖеĹØÏµ¡¢Ç÷ÊÆºÍģʽ£¬Ò»°ãÓ¦ÓÃÔÚÔ¤²âÀàÒµÎñ³¡¾°¡£

GELLY - ͼ¼ÆËãÀà¿â£¬Í¼¼ÆËã¸ü¶àµÄÊÇ¿¼ÂDZߺ͵ãµÄ¸ÅÄһ°ã±»ÓÃÀ´½â¾öÍø×´¹ØÏµµÄÒµÎñ³¡¾°¡£

2. Ëã×Ó

Apache Flink ÌṩÁ˷ḻµÄ¹¦ÄÜËã×Ó£¬¶ÔÓÚÊý¾ÝÁ÷µÄ´¦ÀíÀ´½²£¬¿ÉÒÔ·ÖΪµ¥Á÷´¦Àí(Ò»¸öÊý¾ÝÔ´)ºÍ¶àÁ÷´¦Àí(¶à¸öÊý¾ÝÔ´)¡£

3. ¶àÁ÷²Ù×÷

UNION - ½«¶à¸ö×Ö¶ÎÀàÐÍÒ»ÖÂÊý¾ÝÁ÷ºÏ²¢ÎªÒ»¸öÊý¾ÝÁ÷£¬ÈçÏÂʾÒ⣺

JOIN - ½«¶à¸öÊý¾ÝÁ÷(Êý¾ÝÀàÐÍ¿ÉÒÔ²»Ò»ÖÂ)Áª½ÓΪһ¸öÊý¾ÝÁ÷£¬ÈçÏÂʾÒ⣺

ÈçÉÏͨ¹ýUIONºÍJOINÎÒÃÇ¿ÉÒÔ½«¶àÁ÷×îÖÕ±ä³Éµ¥Á÷£¬Apache Flink ÔÚµ¥Á÷ÉÏÌṩÁ˸ü¶àµÄ²Ù×÷Ëã×Ó¡£

4. µ¥Á÷²Ù×÷

½«¶àÁ÷±ä³Éµ¥Á÷Ö®ºó£¬ÎÒÃǰ´Êý¾ÝÊäÈëÊä³öµÄ²»Í¬¹éÀàÈçÏ£º

ÈçÉϱí¸ñ¶Ôµ¥Á÷ÉÏÃæ²Ù×÷×ö¼òµ¥¹éÀ࣬³ý´ËÖ®Í⻹¿ÉÒÔ×ö ¹ýÂË£¬ÅÅÐò£¬´°¿ÚµÈ²Ù×÷£¬ÎÒÃǺóÐøÕ½ڻáÖðÒ»½éÉÜ¡£

4. ´æÔÚµÄÎÊÌâ

Apache Flink ĿǰµÄ¼Ü¹¹»¹´æÔںܴóµÄÓÅ»¯¿Õ¼ä£¬±ÈÈçÇ°ÃæÌáµ½µÄDataStreamAPIºÍDataSetAPIÆäʵÊÇÁ÷ÓëÅúÔÚAPI²ãÃæ²»Í³Ò»µÄÌåÏÖ£¬Í¬Ê±¿´¾ßÌåʵÏֻᷢÏÖDataStreamAPI»áÉú³ÉTransformation treeÈ»ºóÉú³ÉStreamGraph£¬×îºóÉú³ÉJobGraph£¬µ×²ã¶ÔÓ¦StreamTask£¬µ«DataSetAPI»áÐγÉOperator tree£¬flink-optimizeÄ£¿é»á¶ÔBatch Plan½øÐÐÓÅ»¯£¬ÐγÉOptimized Plan ºóÐγÉJobGraph,×îºóÐγÉBatchTask¡£¾ßÌåʾÒâÈçÏ£º

ÕâÖÖÇé¿öÆäʵ DataStreamAPIµ½Runtime ºÍ DataSetAPIµ½RuntimeµÄʵÏÖÉϲ¢Ã»Óеõ½×î´ó³Ì¶ÈµÄͳһºÍ¸´Óá£ÔÚÕâÒ»µãÉÏÃæAalibab ÆóÒµ°æµÄFlinkÔڼܹ¹ºÍʵÏÖÉ϶¼½øÐÐÁ˽øÒ»²½ÓÅ»¯¡£

°Ë¡¢AlibabaÆóÒµ°æFlink¼Ü¹¹

1. ×é¼þÕ»

Alibaba ¶ÔApache Flink½øÐÐÁË´óÁ¿µÄ¼Ü¹¹ÓÅ»¯£¬Èçϼܹ¹ÊÇһֱŬÁ¦µÄ·½Ïò£¬´ó²¿·Ö¹¦ÄÜ»¹ÔÚ³ÖÐø¿ª·¢ÖУ¬¾ßÌåÈçÏ£º

ÈçÉϼܹ¹ÎÒÃÇ·¢ÏֽϴóµÄ±ä»¯ÊÇ£º

Query Processing - ÎÒÃÇÔö¼ÓÁËQuery Processing²ã£¬ÔÚÕâÒ»²ã½øÐÐͳһµÄÁ÷ºÍÅúµÄ²éѯÓÅ»¯ºÍµ×²ãËã×ÓµÄת»»¡£

DAG API - ÎÒÃÇÔÚRuntime²ãÃæÍ³Ò»³éÏóAPI½Ó¿Ú£¬ÔÚAPI²ã¶ÔÁ÷ÓëÅú½øÐÐͳһ¡£

2. TableAPI&SQLµ½RuntimeµÄ¼Ü¹¹

Apache FlinkÖ´ÐвãÊÇÁ÷ÅúͳһµÄÉè¼Æ£¬ÔÚAPIºÍËã×ÓÉè¼ÆÉÏÃæÎÒÃǾ¡Á¿´ïµ½Á÷ÅúµÄ¹²Ïí£¬ÔÚTableAPIºÍSQL²ãÎÞÂÛÊÇÁ÷ÈÎÎñ»¹ÊÇÅúÈÎÎñ×îÖÕ¶¼×ª»»ÎªÍ³Ò»µÄµ×²ãʵÏÖ¡£Ê¾ÒâͼÈçÏ£º

Õâ¸ö²ãÃæ×îºËÐĵı仯ÊÇÅú×îÖÕÒ²»áÉú³ÉStreamGraph£¬Ö´ÐвãÔËÐÐStream Task¡£

¾Å¡¢Ìرð˵Ã÷

ºóÐøÕ½ڻáÒÔAlibaba ÆóÒµ°æ FlinkΪÖ÷½éÉܹ¦ÄÜËã×Ó£¬ÆªÕÂÖзÖÏíµÄ¹¦ÄÜ¿ÉÄÜ¿ªÔ´ÔÝʱûÓУ¬µ«ÕâЩµÄÄÚÈݺóÐøAlibaba»á¹²Ïí¸øÉçÇø£¬ÐèÒª´ó¼ÒÄÍÐĵȴý¡£

Ê®¡¢Ð¡½á

±¾Æª¸ÅÒªµÄ½éÉÜÁË"ÅúÊÇÁ÷µÄÌØÀý"ÕâÒ»Éè¼Æ¹ÛµãÊÇApache FlinkµÄ"ÃüÂö"£¬Ëü¾ö¶¨ÁËApache FlinkµÄÔËÐÐģʽÊÇ´¿Á÷ʽµÄ£¬ÕâÔÚʵʱ¼ÆË㳡¾°µÄ"µÍÑÓ³Ù"ÐèÇóÉÏ£¬Ïà¶ÔÓÚMicro Batchingģʽռ¾ÝÁ˼ܹ¹µÄ¾ø¶ÔÓÅÊÆ£¬Í¬Ê±¸ÅÒªµÄÏò´ó¼Ò½éÉÜÁËApache FlinkµÄ²¿Êðģʽ£¬ÈÝ´í´¦Àí£¬ÒýÇæµÄͳһÐÔºÍApache FlinkµÄ¼Ü¹¹£¬×îºóºÍ´ó¼Ò·ÖÏíÁËAlibabaÆóÒµ°æµÄFlinkµÄ¼Ü¹¹£¬ÒÔ¼°¶Ô¿ªÔ´Apache FlinkËù×÷³öµÄÓÅ»¯¡£

±¾ÆªÃ»ÓжԾßÌå¼¼Êõ½øÐÐÏêϸչ¿ª£¬´ó¼ÒÖ»Òª¶ÔApache FlinkÓгõ²½¸ÐÖª£¬Í·ÄÔÖÐÖªµÀAlibaba¶ÔApache Flink½øÐÐÁ˼ܹ¹ÓÅ»¯£¬Ôö¼ÓÁËÖڶ๦ÄܾͿÉÒÔÁË£¬ÖÁÓÚApache FlinkµÄ¾ßÌå¼¼Êõϸ½ÚºÍʵÏÖÔ­Àí£¬ÒÔ¼°Alibaba¶ÔApache Flink×öÁËÄÄЩ¼Ü¹¹ÓÅ»¯ºÍÔö¼ÓÁËÄÄЩ¹¦ÄܺóÐøÕ½ڻáÕ¹¿ª½éÉÜ!

   
2502 ´Îä¯ÀÀ       29
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ