±à¼ÍƼö: |
À´Ô´51cto£¬±¾ÆªÎÄÕÂÎÒÃÇÓÃÒ»¾ä»°ÁÄÁÄʲôÊÇ
Apache Flink µÄÃüÂö£¿ÎҵĴð°¸ÊÇ£ºApache Flink ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"µÄÈÏÖª½øÐÐϵͳÉè¼ÆµÄ¡£ |
|
Ò»¡¢Apache Flink µÄÃüÂö
"ÃüÂö" ¼´ÉúÃüÓëѪÂö£¬³£Ó÷¼«ÎªÖØÒªµÄÊÂÎϵÁеÄÊׯª£¬ÊׯªµÄÊ׶β»ÁÄApache
FlinkµÄÀúÊ·£¬²»ÁÄApache FlinkµÄ¼Ü¹¹£¬²»ÁÄApache FlinkµÄ¹¦ÄÜÌØÐÔ£¬ÎÒÃÇÓÃÒ»¾ä»°ÁÄÁÄʲôÊÇ
Apache Flink µÄÃüÂö?ÎҵĴð°¸ÊÇ£ºApache Flink ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"µÄÈÏÖª½øÐÐϵͳÉè¼ÆµÄ¡£
¶þ¡¢Î¨¿ì²»ÆÆ
ÎÒÃǾ³£Ìý˵ "ÌìÏÂÎ书£¬Î¨¿ì²»ÆÆ"£¬´ó¸ÅÒâ˼ÊÇ˵ "ÈκÎÒ»ÖÖÎ书µÄÕÐÊý¶¼ÊÇÓвðÕеģ¬Î¨ÓÐËٶȿ죬¿ìµ½¶ÔÊÖ¸ù±¾À´²»¼°·´Ó¦£¬Äã¾Í½«¶ÔÊÖKOÁË£¬¶ÔÊÖûÓлú»á²ðÕУ¬ËùÒÔΨ¿ì²»ÆÆ"¡£
ÄÇôÕâÓëApache FlinkÓÐʲô¹ØÏµÄØ?Apache FlinkÊÇNative Streaming(´¿Á÷ʽ)¼ÆËãÒýÇæ£¬ÔÚʵʱ¼ÆË㳡¾°×î¹ØÐĵľÍÊÇ"¿ì",Ò²¾ÍÊÇ
"µÍÑÓʱ"¡£
¾ÍĿǰ×îÈȵÄÁ½ÖÖÁ÷¼ÆËãÒýÇæApache SparkºÍApache Flink¶øÑÔ£¬Ë×îÖÕ»á³ÉΪNo1ÄØ?µ¥´Ó
"µÍÑÓʱ" µÄ½Ç¶È¿´£¬SparkÊÇMicro Batching(΢Åúʽ)ģʽ£¬×îµÍÑÓ³ÙSparkÄÜ´ïµ½0.5~2Ãë×óÓÒ£¬FlinkÊÇNative
Streaming(´¿Á÷ʽ)ģʽ£¬×îµÍÑÓʱÄܴﵽ΢Ãë¡£ºÜÏÔÈ»ÊÇÏà¶Ô½ÏÍí³öµÀµÄ Apache Flink
ºóÀ´Õß¾ÓÉÏ¡£ ÄÇôΪʲôApache FlinkÄÜ×öµ½Èç´ËÖ® "¿ì"ÄØ?¸ù±¾ÔÒòÊÇApache
Flink Éè¼ÆÖ®³õ¾ÍÈÏΪ "ÅúÊÇÁ÷µÄÌØÀý"£¬Õû¸öϵͳÊÇNative StreamingÉè¼Æ£¬Ã¿À´Ò»ÌõÊý¾Ý¶¼Äܹ»´¥·¢¼ÆËã¡£Ïà¶ÔÓÚÐèÒª¿¿Ê±¼äÀ´»ýÔÜÊý¾ÝMicro
BatchingģʽÀ´Ëµ£¬Ôڼܹ¹ÉϾÍÒѾռ¾ÝÁ˾ø¶ÔÓÅÊÆ¡£
ÄÇôΪʲô¹ØÓÚÁ÷¼ÆËã»áÓÐÁ½ÖÖ¼ÆËãÄ£Ê½ÄØ?¹éÆä¸ù±¾ÊÇÒòΪ¶ÔÁ÷¼ÆËãµÄÈÏÖª²»Í¬£¬ÊÇ"Á÷ÊÇÅúµÄÌØÀý"
ºÍ "ÅúÊÇÁ÷µÄÌØÀý" Á½ÖÖ²»Í¬ÈÏÖª²úÎï¡£
1. Micro Batching ģʽ
Micro-Batching ¼ÆËãģʽÈÏΪ "Á÷ÊÇÅúµÄÌØÀý"£¬ Á÷¼ÆËã¾ÍÊǽ«Á¬Ðø²»¶ÏµÄÅú½øÐгÖÐø¼ÆË㣬Èç¹ûÅú×㹻СÄÇô¾ÍÓÐ×㹻СµÄÑÓʱ£¬ÔÚÒ»¶¨³Ì¶ÈÉÏÂú×ãÁË99%µÄʵʱ¼ÆË㳡¾°¡£ÄÇôÄÇ1%Ϊɶ×ö²»µ½ÄØ?Õâ¾ÍÊǼܹ¹µÄ÷ÈÁ¦£¬ÔÚMicro-BatchingģʽµÄ¼Ü¹¹ÊµÏÖÉϾÍÓÐÒ»¸ö×ÔÈ»Á÷Êý¾ÝÁ÷Èëϵͳ½øÐÐÔÜÅúµÄ¹ý³Ì£¬ÕâÔÚÒ»¶¨³Ì¶ÈÉϾÍÔö¼ÓÁËÑÓʱ¡£¾ßÌåÈçÏÂʾÒâͼ£º

ºÜÏÔÈ»Micro-BatchingģʽÓÐÆäÌìÉúµÄµÍÑÓʱƿ¾±£¬µ«ÈκÎÊÂÎïµÄ´æÔÚ¶¼ÓÐÁ½ÃæÐÔ£¬ÔÚ´óÊý¾Ý¼ÆËãµÄ·¢Õ¹ÀúÊ·ÉÏ£¬×î³õHadoopÉϵÄMapReduce¾ÍÊÇÓÅÐãµÄÅúģʽ¼ÆËã¿ò¼Ü£¬Micro-BatchingÔÚÉè¼ÆºÍʵÏÖÉÏ¿ÉÒÔ½è¼øºÜ¶à³ÉÊìʵ¼ù¡£
2. Native Streaming ģʽ
Native Streaming ¼ÆËãģʽÈÏΪ ""ÅúÊÇÁ÷µÄÌØÀý"£¬Õâ¸öÈÏÖª¸üÌùÇÐÁ÷µÄ¸ÅÄ±ÈÈçһЩ¼à¿ØÀàµÄÏûÏ¢Á÷£¬Êý¾Ý¿â²Ù×÷µÄbinlog£¬ÊµÊ±µÄÖ§¸¶½»Ò×ÐÅÏ¢µÈµÈ×ÔÈ»Á÷Êý¾Ý¶¼ÊÇÒ»Ìõ£¬Ò»ÌõµÄÁ÷Èë¡£Native
Streaming ¼ÆËãģʽÿÌõÊý¾ÝµÄµ½À´¶¼½øÐмÆË㣬ÕâÖÖ¼ÆËãģʽÏԵøü×ÔÈ»£¬²¢ÇÒÑÓʱÐÔÄÜ´ïµ½¸üµÍ¡£¾ßÌåÈçÏÂʾÒâͼ£º

ºÜÃ÷ÏÔNative Streamingģʽռ¾ÝÁËÁ÷¼ÆËãÁìÓò "µÍÑÓʱ" µÄºËÐľºÕùÁ¦£¬µ±È»Native
StreamingģʽµÄʵÏÖ¿ò¼ÜÊÇÒ»¸öÀúÊ·ÏȺӣ¬µÚÒ»¸öʵÏÖNative StreamingģʽµÄÁ÷¼ÆËã¿ò¼ÜÊǵÚÒ»¸ö³Ôó¦Ð·µÄÈË£¬ÐèÒªÃæÁÙ¸ü¶àµÄÌôÕ½£¬ºóÐøÕ½ÚÎÒÃÇ»áÂýÂý½éÉÜ¡£µ±È»Native
StreamingģʽµÄ¿ò¼ÜʵÏÖÉÏÃæºÜÈÝÒ×ʵÏÖMicro-BatchingºÍBatchingģʽģʽµÄ¼ÆË㣬Apache
Flink¾ÍÊÇNative Streaming¼ÆËãģʽµÄÁ÷ÅúͳһµÄ¼ÆËãÒýÇæ¡£
Èý¡¢·á¸»µÄ²¿Êðģʽ
Apache Flink °´²»Í¬µÄÐèÇóÖ§³ÖLocal£¬Cluster£¬CloudÈýÖÖ²¿Êðģʽ£¬Í¬Ê±Apache
FlinkÔÚ²¿ÊðÉÏÄܹ»ÓëÆäËû³ÉÊìµÄÉú̬²úÆ·½øÐÐÍêÃÀ¼¯³É£¬Èç ClusterģʽÏ¿ÉÒÔÀûÓÃYARN(Yet
Another Resource Negotiator)/Mesos¼¯³É½øÐÐ×ÊÔ´¹ÜÀí£¬ÔÚCloud²¿ÊðģʽÏ¿ÉÒÔÓëGCE(Google
Compute Engine), EC2(Elastic Compute Cloud)½øÐм¯³É¡£
1. Local ģʽ
¸ÃģʽÏÂApache Flink ÕûÌåÔËÐÐÔÚSingle JVMÖУ¬ÔÚ¿ª·¢Ñ§Ï°ÖÐʹÓã¬Í¬Ê±Ò²¿ÉÒÔ°²×°µ½ºÜ¶à¶ËÀàÉ豸ÉÏ¡£
2. Clusterģʽ
¸ÃģʽÊǵäÐ͵ÄͶ²úµÄ¼¯ÈºÄ£Ê½£¬Apache Flink ¼È¿ÉÒÔStandaloneµÄ·½Ê½½øÐв¿Êð£¬Ò²¿ÉÒÔÓëÆäËû×ÊÔ´¹ÜÀíϵͳ½øÐм¯³É²¿Ê𣬱ÈÈçÓëYARN½øÐм¯³É¡£
ÕâÖÖ²¿ÊðģʽÊǵäÐ͵ÄMaster/Slaveģʽ£¬ÎÒÃÇÒÔStandalone ClusterģʽΪÀýʾÒâÈçÏ£º

ÆäÖÐJM(JobManager)ÊÇMaster£¬TM(TaskManager)ÊÇSlave£¬ÕâÖÖMaster/SlaveģʽÓÐÒ»¸öµäÐ͵ÄÎÊÌâ¾ÍÊÇSPOF(single
point of failure), SPOFÈçºÎ½â¾öÄØ?Apache Flink ÓÖÌṩÁËHA(High
Availability)·½°¸£¬Ò²¾ÍÊÇÌṩ¶à¸öMaster£¬ÔÚÈκÎʱºò×ÜÓÐÒ»¸öJM·þÒÛ£¬N(N>=1)¸öJMºòÑ¡,½ø¶ø½â¾öSPOFÎÊÌ⣬ʾÒâÈçÏ£º

ÔÚʵ¼ÊµÄÉú²ú»·¾³ÎÒÃǶ¼»áÅäÖÃHA·½°¸£¬Ä¿Ç°AlibabaÄÚ²¿Ê¹ÓõÄÒ²ÊÇ»ùÓÚYARN ClusterµÄHA·½°¸¡£
3. Cloud ģʽ
¸ÃģʽÖ÷ÒªÊÇÓë³ÉÊìµÄÔÆ²úÆ·½øÐм¯³É£¬Apache Flink¹ÙÍø½éÉÜÁËGoogleµÄGCE ²Î¿¼£¬AmazonµÄEC2
²Î¿¼£¬ÔÚAlibabaÎÒÃÇÒ²¿ÉÒÔ½«Apache Flink²¿Êðµ½AlibabaµÄECS(Elastic
Compute Service)¡£
ËÄ¡¢ÍêÉÆµÄÈÝ´í»úÖÆ
1. ʲôÊÇÈÝ´í
ÈÝ´í(Fault Tolerance) ÊÇÖ¸ÈÝÈ̹ÊÕÏ£¬ÔÚ¹ÊÕÏ·¢ÉúʱÄܹ»×Ô¶¯¼ì²â³öÀ´²¢Ê¹ÏµÍ³Äܹ»×Ô¶¯»Ø¸´Õý³£ÔËÐС£µ±³öÏÖijЩָ¶¨µÄÍøÂç¹ÊÕÏ¡¢Ó²¼þ¹ÊÕÏ¡¢Èí¼þ´íÎóʱ£¬ÏµÍ³ÈÔÄÜÖ´Ðй涨µÄÒ»×é³ÌÐò£¬»òÕß˵³ÌÐò²»»áÒòϵͳÖеĹÊÕ϶øÖÐÖ¹£¬²¢ÇÒÖ´Ðнá¹ûÒ²²»»áÒòϵͳ¹ÊÕ϶øÒýÆð¼ÆËã²î´í¡£
2. ÈÝ´íµÄ´¦Àíģʽ
ÔÚÒ»¸ö·Ö²¼Ê½ÏµÍ³ÖÐÓÉÓÚµ¥¸ö½ø³Ì»òÕß½Úµãå´»ú¶¼ÓпÉÄܵ¼ÖÂÕû¸öJobʧ°Ü£¬ÄÇôÈÝ´í»úÖÆ³ýÁËÒª±£Ö¤ÔÚÓöµ½·ÇÔ¤ÆÚÇé¿öϵͳÄܹ»"ÔËÐÐ"Í⣬»¹ÒªÇóÄÜ"ÕýÈ·ÔËÐÐ",Ò²¾ÍÊÇÊý¾ÝÄܰ´Ô¤ÆÚµÄ´¦Àí·½Ê½½øÐд¦Àí£¬±£Ö¤¼ÆËã½á¹ûµÄÕýÈ·ÐÔ¡£¼ÆËã½á¹ûµÄÕýÈ·ÐÔÈ¡¾öÓÚϵͳ¶ÔÿһÌõ¼ÆËãÊý¾Ý´¦Àí»úÖÆ£¬Ò»°ãÓÐÈçÏÂÈýÖÖ´¦Àí»úÖÆ£º
At Most Once£º×î¶àÏû·ÑÒ»´Î£¬ÕâÖÖ´¦Àí»úÖÆ»á´æÔÚÊý¾Ý¶ªÊ§µÄ¿ÉÄÜ¡£
At Least Once£º×îÉÙÏû·ÑÒ»´Î£¬ÕâÖÖ´¦Àí»úÖÆÊý¾Ý²»»á¶ªÊ§£¬µ«ÊÇÓпÉÄÜÖØ¸´Ïû·Ñ¡£
Exactly Once£º¾«È·Ò»´Î£¬ÎÞÂÛºÎÖÖÇé¿öÏ£¬Êý¾Ý¶¼Ö»»áÏû·ÑÒ»´Î£¬ÕâÖÖ»úÖÆÊǶÔÊý¾Ý׼ȷÐÔµÄ×î¸ßÒªÇó£¬ÔÚ½ðÈÚÖ§¸¶£¬ÒøÐÐÕËÎñµÈÁìÓò±ØÐë²ÉÓÃÕâÖÖģʽ¡£
3. Apache FlinkµÄÈÝ´í»úÖÆ
Apache FlinkµÄJob»áÉæ¼°µ½3¸ö²¿·Ö£¬ÍⲿÊý¾ÝÔ´(External Input), FlinkÄÚ²¿Êý¾Ý´¦Àí(Flink
Data Flow)ºÍÍⲿÊä³ö(External Output)¡£ÈçÏÂʾÒâͼ:

ĿǰApache Flink Ö§³ÖÁ½ÖÖÊý¾ÝÈÝ´í»úÖÆ£º
At Least Once
Exactly Once
ÆäÖÐ Exactly Once ÊÇ×îÑϸñµÄÈÝ´í»úÖÆ£¬¸ÃģʽҪÇóÿÌõÊý¾Ý±ØÐë´¦ÀíÇÒ½ö´¦ÀíÒ»´Î¡£ÄÇô¶ÔÓÚÕâÖÖÑϸñÈÝ´í»úÖÆ£¬Ò»¸öÍêÕûµÄFlink
JobÈÝ´íÒª×öµ½ End-to-End µÄ ÈÝ´í±ØÐë½áºÏÈý¸ö²¿·Ö½øÐÐÁªºÏ´¦Àí£¬¸ù¾ÝÉÏͼÎÒÃÇ¿¼ÂÇÈý¸ö³¡¾°£º
³¡¾°Ò»£ºFlinkµÄSource Operator ÔÚ¶ÁÈ¡µ½KaflaÖÐpos=2000µÄÊý¾Ýʱºò£¬ÓÉÓÚijÖÖÔÒòå´»úÁË£¬Õâ¸öʱºòFlink¿ò¼Ü»á·ÖÅäÒ»¸öеĽڵã¼ÌÐø¶ÁÈ¡KaflaÊý¾Ý£¬ÄÇôеĴ¦Àí½ÚµãÔõÑù´¦Àí²ÅÄܱ£Ö¤Êý¾Ý´¦ÀíÇÒÖ»±»´¦ÀíÒ»´ÎÄØ?

³¡¾°¶þ£ºFlink Data FlowÄÚ²¿Ä³¸ö½Úµã£¬Èç¹ûÉÏͼµÄagg()½Úµã·¢ÉúÎÊÌ⣬ÔÚ»Ö¸´Ö®ºóÔõÑù´¦Àí²ÅÄܱ£³Ömap()Á÷³öµÄÊý¾Ý´¦ÀíÇÒÖ»±»´¦ÀíÒ»´Î?

³¡¾°Èý£ºFlinkµÄSink Operator ÔÚдÈëKafka¹ý³ÌÖÐ×ÔÉí½Úµã³öÏÖÎÊÌ⣬ÔÚ»Ö¸´Ö®ºóÈçºÎ´¦Àí£¬¼ÆËã½á¹û²ÅÄܱ£Ö¤Ð´ÈëÇÒÖ»±»Ð´ÈëÒ»´Î?

4. ϵͳÄÚ²¿ÈÝ´í
Apache FlinkÀûÓÃCheckpointing»úÖÆÀ´´¦ÀíÈÝ´í£¬CheckpointingµÄÀíÂÛ»ù´¡
Stephan ÔÚ Lightweight Asynchronous Snapshots for Distributed
Dataflows ½øÐÐÁËϸ½ÚÃèÊö£¬¸Ã»úÖÆÔ´ÓÚÓÐK. MANI CHANDYºÍLESLIE LAMPORT
·¢±íµÄ Determining-Global-States-of-a-Distributed-System
Paper¡£Apache Flink »ùÓÚCheckpointing»úÖÆ¶ÔFlink Data FlowʵÏÖÁËAt
Least Once ºÍ Exactly Once Á½ÖÖÈÝ´í´¦Àíģʽ¡£
Apache Flink CheckpointingµÄÄÚ²¿ÊµÏÖ»áÀûÓà Barriers£¬StateBackendµÈºóÐøÕ½ڻáÏêϸ½éÉܵļ¼ÊõÀ´½«Êý¾ÝµÄ´¦Àí½øÐÐMarker¡£Apache
Flink»áÀûÓÃBarrier½«Õû¸öÁ÷½øÐбê¼ÇÇз֣¬ÈçÏÂʾÒâͼ£º

ÕâÑùApache FlinkµÄÿ¸öOperator¶¼»á¼Ç¼µ±Ç°³É¹¦´¦ÀíµÄCheckpoint£¬Èç¹û·¢Éú´íÎ󣬾ͻá´ÓÉÏÒ»¸ö³É¹¦µÄCheckpoint¿ªÊ¼¼ÌÐø´¦ÀíºóÐøÊý¾Ý¡£±ÈÈç
Soruce Operator»á½«¶ÁÈ¡ÍⲿÊý¾ÝÔ´µÄPositionʵʱµÄ¼Ç¼µ½CheckpointÖУ¬Ê§°Üʱºò»á´ÓCheckpointÖжÁÈ¡³É¹¦µÄposition¼ÌÐø¾«×¼µÄÏû·ÑÊý¾Ý¡£Ã¿¸öËã×Ó»áÔÚCheckpointÖмǼ×Ô¼º»Ö¸´Ê±ºò±ØÐëµÄÊý¾Ý£¬±ÈÈçÁ÷µÄÔʼÊý¾ÝºÍÖÐ¼ä¼ÆËã½á¹ûµÈÐÅÏ¢£¬ÔÚ»Ö¸´µÄʱºò´ÓCheckpointÖжÁÈ¡²¢³ÖÐø´¦ÀíÁ÷Êý¾Ý¡£
5. ÍⲿSourceÈÝ´í
Apache Flink Òª×öµ½ End-to-End µÄ Exactly Once ÐèÒªÍⲿSourceµÄÖ§³Ö£¬±ÈÈçÉÏÃæÎÒÃÇ˵¹ý
Apache FlinkµÄCheckpointing»úÖÆ»áÔÚSource½Úµã¼Ç¼¶ÁÈ¡µÄPosition£¬ÄǾÍÐèÒªÍⲿÊý¾ÝÌṩ¶ÁÈ¡µÄPositionºÍÖ§³Ö¸ù¾ÝPosition½øÐÐÊý¾Ý¶ÁÈ¡¡£
6. ÍⲿSinkÈÝ´í
Apache Flink Òª×öµ½ End-to-End µÄ Exactly Once Ïà¶Ô±È½ÏÀ§ÄÑ£¬ÈçÉϳ¡¾°ÈýËùÊö£¬µ±Sink
Operator½Úµãå´»ú£¬ÖØÐ»ָ´Ê±ºò¸ù¾ÝApache Flink ÄÚ²¿ÏµÍ³ÈÝ´í exactly onceµÄ±£Ö¤,ϵͳ»á»Ø¹öµ½Éϴγɹ¦µÄCheckpoin¼ÌÐøÐ´È룬µ«ÊÇÉϴγɹ¦CheckpointÖ®ºóµ±Ç°CheckpointδÍê³É֮ǰÒѾ°ÑÒ»²¿·ÖÐÂÊý¾ÝдÈëµ½kafkaÁË.
Apache Flink×ÔÉϴγɹ¦µÄCheckpoint¼ÌÐøÐ´Èëkafka£¬¾ÍÔì³ÉÁËkafkaÔٴνÓÊÕµ½Ò»·ÝͬÑùµÄÀ´×ÔSink
OperatorµÄÊý¾Ý,½ø¶øÆÆ»µÁËEnd-to-End µÄ Exactly Once ÓïÒå(ÖØ¸´Ð´Èë¾Í±ä³ÉÁËAt
Least OnceÁË)£¬Èç¹ûÒª½â¾öÕâÒ»ÎÊÌ⣬Apache Flink ÀûÓÃTwo phase commit(Á½½×¶ÎÌá½»)µÄ·½Ê½À´½øÐд¦Àí¡£±¾ÖÊÉÏÊÇSink
Operator ÐèÒª¸ÐÖªÕûÌåCheckpointµÄÍê³É£¬²¢ÔÚÕûÌåCheckpointÍê³Éʱºò½«¼ÆËã½á¹ûдÈëKafka¡£
Îå¡¢Á÷ÅúͳһµÄ¼ÆËãÒýÇæ
ÅúÓëÁ÷ÊÇÁ½ÖÖ²»Í¬µÄÊý¾Ý´¦Àíģʽ£¬ÈçApache StormÖ»Ö§³ÖÁ÷ģʽµÄÊý¾Ý´¦Àí£¬Apache SparkÖ»Ö§³ÖÅú(Micro
Batching)ģʽµÄÊý¾Ý´¦Àí¡£ÄÇôApache Flink ÊÇÈçºÎ×öµ½¼ÈÖ§³ÖÁ÷´¦ÀíģʽҲ֧³ÖÅú´¦ÀíÄ£Ê½ÄØ?
1. ͳһµÄÊý¾Ý´«Êä²ã
¿ªÆªÎÒÃǾͽéÉÜApache Flink µÄ "ÃüÂö"ÊÇÒÔ"ÅúÊÇÁ÷µÄÌØÀý"Ϊµ¼ÏòÀ´½øÐÐÒýÇæµÄÉè¼ÆµÄ£¬ÏµÍ³Éè¼Æ³ÉΪ
"Native Streaming"µÄģʽ½øÐÐÊý¾Ý´¦Àí¡£ÄÇôApache FLink½«ÅúģʽִÐеÄÈÎÎñ¿´×öÊÇÁ÷ʽ´¦ÀíÈÎÎñµÄÌØÊâÇé¿ö£¬Ö»ÊÇÔÚÊý¾ÝÉÏÅúÊÇÓнçµÄ(ÓÐÏÞÊýÁ¿µÄÔªËØ)¡£
Apache Flink ÔÚÍøÂç´«Êä²ãÃæÓÐÁ½ÖÖÊý¾Ý´«Êäģʽ£º
PIPELINEDģʽ - ¼´Ò»ÌõÊý¾Ý±»´¦ÀíÍê³ÉÒÔºó£¬Á¢¿Ì´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí¡£
BATCH ģʽ - ¼´Ò»ÌõÊý¾Ý±»´¦ÀíÍê³Éºó£¬²¢²»»áÁ¢¿Ì´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí£¬¶øÊÇдÈëµ½»º´æÇø£¬Èç¹û»º´æÐ´Âú¾Í³Ö¾Ã»¯µ½±¾µØÓ²ÅÌÉÏ£¬×îºóµ±ËùÓÐÊý¾Ý¶¼±»´¦ÀíÍê³Éºó£¬²Å½«Êý¾Ý´«Êäµ½ÏÂÒ»¸ö½Úµã½øÐд¦Àí¡£
¶ÔÓÚÅúÈÎÎñ¶øÑÔͬÑù¿ÉÒÔÀûÓÃPIPELINEDģʽ£¬±ÈÈçÎÒÒª×öcountͳ¼Æ£¬ÀûÓÃPIPELINEDģʽÄÜÄõ½¸üºÃµÄÖ´ÐÐÐÔÄÜ¡£Ö»ÓÐÔÚÌØÊâÇé¿ö£¬±ÈÈçSortMergeJoin£¬ÕâʱºòÎÒÃÇÐèҪȫ¾ÖÊý¾ÝÅÅÐò£¬²ÅÐèÒªBATCHģʽ¡£´ó²¿·ÖÇé¿öÁ÷ÓëÅú¿ÉÓÃͳһµÄ´«Êä²ßÂÔ£¬Ö»ÓÐÌØÊâÇé¿ö£¬²Å½«Åú¿´×öÊÇÁ÷µÄÒ»¸öÌØÀý¼ÌÐøÌØÊâ´¦Àí¡£
2. ͳһÈÎÎñµ÷¶È²ã
Apache Flink ÔÚÈÎÎñµ÷¶ÈÉÏÁ÷ÓëÅú¹²ÏíͳһµÄ×ÊÔ´ºÍÈÎÎñµ÷¶È»úÖÆ(ºóÐøÕ½ڻáÏêϸ½éÉÜ)¡£
3. ͳһµÄÓû§API²ã
Apache Flink ÔÚDataStremAPIºÍDataSetAPI»ù´¡ÉÏ£¬ÎªÓû§ÌṩÁËÁ÷ÅúͳһµÄÉϲãTableAPIºÍSQL£¬ÔÚÓï·¨ºÍÓïÒåÉÏÁ÷Åú½øÐи߶Èͳһ¡£(ÆäÖÐDataStremAPIºÍDataSetAPI¶ÔÁ÷ºÍÅú½øÐÐÁË·Ö±ð³éÏó£¬ÕâÒ»µã²¢²»ÓÅÑÅ£¬ÔÚAlibabaÄÚ²¿¶ÔÆä½øÐÐÁËͳһ³éÏó)¡£
4. Çóͬ´æÒì
Apache Flink ÊÇÁ÷ÅúͳһµÄ¼ÆËãÒýÇæ£¬²¢²»Òâζ×ÅÁ÷ÓëÅúµÄÈÎÎñ¶¼×ßͳһµÄcode path£¬ÔڶԵײãµÄ¾ßÌåËã×ÓµÄʵÏÖÒ²ÊÇÓи÷×ԵĴ¦ÀíµÄ£¬ÔÚ¾ßÌ幦ÄÜÉÏÃæ»á¸ù¾Ý²»Í¬µÄÌØÐÔÇø±ð´¦Àí¡£±ÈÈç
ÅúûÓÐCheckpoint»úÖÆ£¬Á÷Éϲ»ÄÜ×öSortMergeJoin¡£
Áù¡¢Apache Flink ¼Ü¹¹
1. ×é¼þÕ»
ÎÒÃÇÉÏÃæÄÚÈÝÒѾ½éÉÜÁ˺ܶàApache FlinkµÄ¸÷ÖÖ×é¼þ£¬ÏÂÃæÎÒÃÇÕûÌå¸ÅÀÀÒ»ÏÂȫò£¬ÈçÏ£º

TableAPIºÍSQL¶¼½¨Á¢ÔÚDataSetAPIºÍDataStreamAPIµÄ»ù´¡Ö®ÉÏ£¬ÄÇôTableAPIºÍSQLÊÇÈçºÎת»»ÎªDataStreamºÍDataSetµÄÄØ?
2. TableAPI&SQLµ½DataStrem&DataSetµÄ¼Ü¹¹
TableAPI&SQL×îÖջᾹýCalciteÓÅ»¯Ö®ºóת»»ÎªDataStreamºÍDataSet£¬¾ßÌåת»»Ê¾ÒâÈçÏ£º

¶ÔÓÚÁ÷ÈÎÎñ×îÖÕ»áת»»³ÉDataStream£¬¶ÔÓÚÅúÈÎÎñ×îÖÕ»áת»»³ÉDataSet¡£
3. ANSI-SQLµÄÖ§³Ö
Apache Flink Ö®ËùÒÔÀûÓÃANSI-SQL×÷ΪÓû§Í³Ò»µÄ¿ª·¢ÓïÑÔ£¬ÊÇÒòΪSQLÓÐ×ŷdz£Ã÷ÏÔµÄÓŵ㣬ÈçÏ£º

Declarative - Óû§Ö»ÐèÒª±í´ïÎÒÏëҪʲô£¬²»ÓùØÐÄÈçºÎ¼ÆËã¡£
Optimized - ²éѯÓÅ»¯Æ÷¿ÉÒÔΪÓû§µÄ SQL Éú³É×îÓŵÄÖ´Ðмƻ®£¬»ñÈ¡×îºÃµÄ²éѯÐÔÄÜ¡£
Understandable - SQLÓïÑÔ±»²»Í¬ÁìÓòµÄÈËËùÊìÖª£¬ÓÃSQL ×÷Ϊ¿çÍŶӵĿª·¢ÓïÑÔ¿ÉÒԺܴóµØÌá¸ßЧÂÊ¡£
Stable - SQL ÊÇÒ»¸öÓµÓм¸Ê®ÄêÀúÊ·µÄÓïÑÔ£¬ÊÇÒ»¸ö·Ç³£Îȶ¨µÄÓïÑÔ£¬ºÜÉÙÓб䶯¡£
Unify - Apache FlinkÔÚÒýÇæÉ϶ÔÁ÷ÓëÅú½øÐÐͳһ£¬Í¬Ê±ÓÖÀûÓÃANSI-SQLÔÚÓï·¨ºÍÓïÒå²ãÃæ½øÐÐͳһ¡£
4. ÎÞÏÞÀ©Õ¹µÄÓÅ»¯»úÖÆ
Apache Flink ÀûÓÃApache Calcite¶ÔSQL½øÐнâÎöºÍÓÅ»¯£¬Apache Calcite²ÉÓÃCalciteÊÇ¿ªÔ´µÄÒ»ÌײéѯÒýÇæ£¬ÊµÏÖÁËÁ½Ì×Planner£º
HepPlanner - ÊÇRBO(Rule Base Optimize)ģʽ£¬»ùÓÚ¹æÔòµÄÓÅ»¯¡£
VolcanoPlanner - ÊÇCBO(Cost Base Optimize)ģʽ£¬»ùÓڳɱ¾µÄÓÅ»¯¡£
Flink SQL»áÀûÓÃCalcite½âÎöÓÅ»¯Ö®ºó£¬×îÖÕת»»Îªµ×²ãµÄDataStremºÍDataset¡£ÉÏͼÖÐ
Batch rulesºÍStream rules¿ÉÒÔ¸ù¾ÝÓÅ»¯ÐèÒªÎÞÏÞÌí¼ÓÓÅ»¯¹æÔò¡£
Æß¡¢·á¸»µÄÀà¿âºÍËã×Ó
Apache Flink ÓÅÐãµÄ¼Ü¹¹¾ÍÏñÒ»×ùĦÌì´óÏõĵػùÒ»ÑùΪApache Flink ³Ö¾ÃµÄÉúÃüÁ¦´òÏÂÁËÁ¼ºÃµÄ»ù´¡£¬Îª´òÔìApache
Flink·á¸»µÄ¹¦ÄÜÉú̬ÁôÏÂÎÞÏ޵Ŀռ䡣
1. Àà¿â
CEP - ¸´ÔÓʼþ´¦ÀíÀà¿â£¬ºËÐÄÊÇÒ»¸ö״̬»ú£¬¹ã·ºÓ¦ÓÃÓÚʼþÇý¶¯µÄ¼à¿ØÔ¤¾¯ÀàÒµÎñ³¡¾°¡£
ML - »úÆ÷ѧϰÀà¿â£¬»úÆ÷ѧϰÖ÷ÒªÊÇʶ±ðÊý¾ÝÖеĹØÏµ¡¢Ç÷ÊÆºÍģʽ£¬Ò»°ãÓ¦ÓÃÔÚÔ¤²âÀàÒµÎñ³¡¾°¡£
GELLY - ͼ¼ÆËãÀà¿â£¬Í¼¼ÆËã¸ü¶àµÄÊÇ¿¼ÂDZߺ͵ãµÄ¸ÅÄһ°ã±»ÓÃÀ´½â¾öÍø×´¹ØÏµµÄÒµÎñ³¡¾°¡£
2. Ëã×Ó
Apache Flink ÌṩÁ˷ḻµÄ¹¦ÄÜËã×Ó£¬¶ÔÓÚÊý¾ÝÁ÷µÄ´¦ÀíÀ´½²£¬¿ÉÒÔ·ÖΪµ¥Á÷´¦Àí(Ò»¸öÊý¾ÝÔ´)ºÍ¶àÁ÷´¦Àí(¶à¸öÊý¾ÝÔ´)¡£
3. ¶àÁ÷²Ù×÷
UNION - ½«¶à¸ö×Ö¶ÎÀàÐÍÒ»ÖÂÊý¾ÝÁ÷ºÏ²¢ÎªÒ»¸öÊý¾ÝÁ÷£¬ÈçÏÂʾÒ⣺

JOIN - ½«¶à¸öÊý¾ÝÁ÷(Êý¾ÝÀàÐÍ¿ÉÒÔ²»Ò»ÖÂ)Áª½ÓΪһ¸öÊý¾ÝÁ÷£¬ÈçÏÂʾÒ⣺

ÈçÉÏͨ¹ýUIONºÍJOINÎÒÃÇ¿ÉÒÔ½«¶àÁ÷×îÖÕ±ä³Éµ¥Á÷£¬Apache Flink ÔÚµ¥Á÷ÉÏÌṩÁ˸ü¶àµÄ²Ù×÷Ëã×Ó¡£
4. µ¥Á÷²Ù×÷
½«¶àÁ÷±ä³Éµ¥Á÷Ö®ºó£¬ÎÒÃǰ´Êý¾ÝÊäÈëÊä³öµÄ²»Í¬¹éÀàÈçÏ£º

ÈçÉϱí¸ñ¶Ôµ¥Á÷ÉÏÃæ²Ù×÷×ö¼òµ¥¹éÀ࣬³ý´ËÖ®Í⻹¿ÉÒÔ×ö ¹ýÂË£¬ÅÅÐò£¬´°¿ÚµÈ²Ù×÷£¬ÎÒÃǺóÐøÕ½ڻáÖðÒ»½éÉÜ¡£
4. ´æÔÚµÄÎÊÌâ
Apache Flink ĿǰµÄ¼Ü¹¹»¹´æÔںܴóµÄÓÅ»¯¿Õ¼ä£¬±ÈÈçÇ°ÃæÌáµ½µÄDataStreamAPIºÍDataSetAPIÆäʵÊÇÁ÷ÓëÅúÔÚAPI²ãÃæ²»Í³Ò»µÄÌåÏÖ£¬Í¬Ê±¿´¾ßÌåʵÏֻᷢÏÖDataStreamAPI»áÉú³ÉTransformation
treeÈ»ºóÉú³ÉStreamGraph£¬×îºóÉú³ÉJobGraph£¬µ×²ã¶ÔÓ¦StreamTask£¬µ«DataSetAPI»áÐγÉOperator
tree£¬flink-optimizeÄ£¿é»á¶ÔBatch Plan½øÐÐÓÅ»¯£¬ÐγÉOptimized
Plan ºóÐγÉJobGraph,×îºóÐγÉBatchTask¡£¾ßÌåʾÒâÈçÏ£º

ÕâÖÖÇé¿öÆäʵ DataStreamAPIµ½Runtime ºÍ DataSetAPIµ½RuntimeµÄʵÏÖÉϲ¢Ã»Óеõ½×î´ó³Ì¶ÈµÄͳһºÍ¸´Óá£ÔÚÕâÒ»µãÉÏÃæAalibab
ÆóÒµ°æµÄFlinkÔڼܹ¹ºÍʵÏÖÉ϶¼½øÐÐÁ˽øÒ»²½ÓÅ»¯¡£
°Ë¡¢AlibabaÆóÒµ°æFlink¼Ü¹¹
1. ×é¼þÕ»
Alibaba ¶ÔApache Flink½øÐÐÁË´óÁ¿µÄ¼Ü¹¹ÓÅ»¯£¬Èçϼܹ¹ÊÇһֱŬÁ¦µÄ·½Ïò£¬´ó²¿·Ö¹¦ÄÜ»¹ÔÚ³ÖÐø¿ª·¢ÖУ¬¾ßÌåÈçÏ£º

ÈçÉϼܹ¹ÎÒÃÇ·¢ÏֽϴóµÄ±ä»¯ÊÇ£º
Query Processing - ÎÒÃÇÔö¼ÓÁËQuery Processing²ã£¬ÔÚÕâÒ»²ã½øÐÐͳһµÄÁ÷ºÍÅúµÄ²éѯÓÅ»¯ºÍµ×²ãËã×ÓµÄת»»¡£
DAG API - ÎÒÃÇÔÚRuntime²ãÃæÍ³Ò»³éÏóAPI½Ó¿Ú£¬ÔÚAPI²ã¶ÔÁ÷ÓëÅú½øÐÐͳһ¡£
2. TableAPI&SQLµ½RuntimeµÄ¼Ü¹¹
Apache FlinkÖ´ÐвãÊÇÁ÷ÅúͳһµÄÉè¼Æ£¬ÔÚAPIºÍËã×ÓÉè¼ÆÉÏÃæÎÒÃǾ¡Á¿´ïµ½Á÷ÅúµÄ¹²Ïí£¬ÔÚTableAPIºÍSQL²ãÎÞÂÛÊÇÁ÷ÈÎÎñ»¹ÊÇÅúÈÎÎñ×îÖÕ¶¼×ª»»ÎªÍ³Ò»µÄµ×²ãʵÏÖ¡£Ê¾ÒâͼÈçÏ£º

Õâ¸ö²ãÃæ×îºËÐĵı仯ÊÇÅú×îÖÕÒ²»áÉú³ÉStreamGraph£¬Ö´ÐвãÔËÐÐStream Task¡£
¾Å¡¢Ìرð˵Ã÷
ºóÐøÕ½ڻáÒÔAlibaba ÆóÒµ°æ FlinkΪÖ÷½éÉܹ¦ÄÜËã×Ó£¬ÆªÕÂÖзÖÏíµÄ¹¦ÄÜ¿ÉÄÜ¿ªÔ´ÔÝʱûÓУ¬µ«ÕâЩµÄÄÚÈݺóÐøAlibaba»á¹²Ïí¸øÉçÇø£¬ÐèÒª´ó¼ÒÄÍÐĵȴý¡£
Ê®¡¢Ð¡½á
±¾Æª¸ÅÒªµÄ½éÉÜÁË"ÅúÊÇÁ÷µÄÌØÀý"ÕâÒ»Éè¼Æ¹ÛµãÊÇApache FlinkµÄ"ÃüÂö"£¬Ëü¾ö¶¨ÁËApache
FlinkµÄÔËÐÐģʽÊÇ´¿Á÷ʽµÄ£¬ÕâÔÚʵʱ¼ÆË㳡¾°µÄ"µÍÑÓ³Ù"ÐèÇóÉÏ£¬Ïà¶ÔÓÚMicro
Batchingģʽռ¾ÝÁ˼ܹ¹µÄ¾ø¶ÔÓÅÊÆ£¬Í¬Ê±¸ÅÒªµÄÏò´ó¼Ò½éÉÜÁËApache FlinkµÄ²¿Êðģʽ£¬ÈÝ´í´¦Àí£¬ÒýÇæµÄͳһÐÔºÍApache
FlinkµÄ¼Ü¹¹£¬×îºóºÍ´ó¼Ò·ÖÏíÁËAlibabaÆóÒµ°æµÄFlinkµÄ¼Ü¹¹£¬ÒÔ¼°¶Ô¿ªÔ´Apache FlinkËù×÷³öµÄÓÅ»¯¡£
±¾ÆªÃ»ÓжԾßÌå¼¼Êõ½øÐÐÏêϸչ¿ª£¬´ó¼ÒÖ»Òª¶ÔApache FlinkÓгõ²½¸ÐÖª£¬Í·ÄÔÖÐÖªµÀAlibaba¶ÔApache
Flink½øÐÐÁ˼ܹ¹ÓÅ»¯£¬Ôö¼ÓÁËÖڶ๦ÄܾͿÉÒÔÁË£¬ÖÁÓÚApache FlinkµÄ¾ßÌå¼¼Êõϸ½ÚºÍʵÏÖÔÀí£¬ÒÔ¼°Alibaba¶ÔApache
Flink×öÁËÄÄЩ¼Ü¹¹ÓÅ»¯ºÍÔö¼ÓÁËÄÄЩ¹¦ÄܺóÐøÕ½ڻáÕ¹¿ª½éÉÜ! |