±à¼ÍƼö: |
±¾ÎĶÔÕûÌå½øÐÐÁ˽éÉÜ£¬Êµ¼ùÓÅ»¯¡¢È»ºó½éÉÜÁËÁ÷ÅúÒ»Ìå¡¢×îºó¶ÔδÀ´½øÐÐÁ˹滮¡£
±¾ÎÄÀ´×Ô΢ÐŹ«ÖÚºÅFlink ÖÐÎÄÉçÇø£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
Ò»¡¢ÕûÌå½éÉÜ

2018 Äê 12 Ô Blink Ðû²¼¿ªÔ´£¬¾ÀúÁËÔ¼Ò»ÄêµÄʱ¼ä Flink 1.9 ÓÚ 2019
Äê 8 Ô 22 ·¢²¼¡£ÔÚ Flink 1.9 ·¢²¼Ö®Ç°×Ö½ÚÌø¶¯ÄÚ²¿»ùÓÚ master ·ÖÖ§½øÐÐÄÚ²¿µÄ
SQL ƽ̨¹¹½¨¡£¾ÀúÁË 2~3 ¸öÔµÄʱ¼ä×Ö½ÚÄÚ²¿ÔÚ 19 Äê 10 Ô·ݷ¢²¼ÁË»ùÓÚ Flink
1.9 µÄ Blink planner ¹¹½¨µÄ Streaming SQL ƽ̨£¬²¢½øÐÐÄÚ²¿Íƹ㡣ÔÚÕâ¸ö¹ý³ÌÖз¢ÏÖÁËһЩ±È½ÏÓÐÒâ˼µÄÐèÇ󳡾°£¬ÒÔ¼°Ò»Ð©½ÏÎªÆæ¹ÖµÄ
BUG¡£
»ùÓÚ 1.9 µÄ Flink SQL À©Õ¹
ËäÈ»×îÐ嵀 Flink °æ±¾ÒѾ֧³Ö SQL µÄ DDL£¬µ« Flink 1.9 ²¢²»Ö§³Ö¡£×Ö½ÚÄÚ²¿»ùÓÚ
Flink 1.9 ½øÐÐÁË DDL µÄÀ©Õ¹Ö§³ÖÒÔÏÂÓï·¨£º
create table
create view
create function
add resource
ͬʱ Flink 1.9 °æ±¾²»Ö§³ÖµÄ watermark ¶¨ÒåÔÚ DDL À©Õ¹ºóÒ²Ö§³ÖÁË¡£
ÎÒÃÇÔÚÍÆ¼ö´ó¼Ò¾¡Á¿µÄÈ¥Óà SQL ±í´ï×÷ҵʱÊÕµ½ºÜ¶à¡°SQL ÎÞ·¨±í´ï¸´ÔÓµÄÒµÎñÂß¼¡±µÄ·´À¡¡£Ê±¼ä¾ÃÁË·¢ÏÖÆäʵºÜ¶àÓû§ËùνµÄ¸´ÔÓÒµÎñÂß¼ÓеÄÊÇ×öһЩÍⲿµÄ
RPC µ÷Óã¬×Ö½ÚÄÚ²¿Õë¶ÔÕâ¸ö³¡¾°×öÁËÒ»¸ö RPC µÄά±íºÍ sink£¬ÈÃÓû§¿ÉÒÔÈ¥¶Áд RPC ·þÎñ£¬¼«´óµÄÀ©Õ¹ÁË
SQL µÄʹÓó¡¾°£¬°üÀ¨ FaaS Æäʵ¸ú RPC Ò²ÊÇÀàËÆµÄ¡£ÔÚ×Ö½ÚÄÚ²¿Ìí¼ÓÁË Redis/Abase/Bytable/ByteSQL/RPC/FaaS
µÈά±íµÄÖ§³Ö¡£
ͬʱ»¹ÊµÏÖÁ˶à¸öÄÚ²¿Ê¹ÓÃµÄ connectors£º
source: RocketMQ
sink: RocketMQ/ClickHouse/Doris/LogHouse/Redis/Abase/Bytable/ByteSQL/RPC/Print/Metrics
²¢ÇÒΪ connector ¿ª·¢ÁËÅäÌ×µÄ format£ºPB/Binlog/Bytes¡£
ÔÚÏߵĽçÃæ»¯ SQL ƽ̨

³ýÁË¶Ô Flink ±¾Éí¹¦ÄܵÄÀ©Õ¹£¬×Ö½ÚÄÚ²¿Ò²ÉÏÏßÁËÒ»¸ö SQL ƽ̨£¬Ö§³ÖÒÔϹ¦ÄÜ£º
SQL ±à¼
SQL ½âÎö
SQL µ÷ÊÔ
×Ô¶¨Òå UDF ºÍ Connector
°æ±¾¿ØÖÆ
ÈÎÎñ¹ÜÀí
¶þ¡¢Êµ¼ùÓÅ»¯
³ýÁ˶Թ¦ÄܵÄÀ©Õ¹£¬Õë¶Ô Flink 1.9 SQL µÄ²»×ãÖ®´¦Ò²×öÁËһЩÓÅ»¯¡£
Window ÐÔÄÜÓÅ»¯
1¡¢Ö§³ÖÁË window Mini-Batch
Mini-Batch ÊÇ Blink planner µÄÒ»¸ö±È½ÏÓÐÌØÉ«µÄ¹¦ÄÜ£¬ÆäÖ÷Ҫ˼ÏëÊÇ»ýÔÜÒ»ÅúÊý¾Ý£¬ÔÙ½øÐÐÒ»´Î״̬·ÃÎÊ£¬´ïµ½¼õÉÙ·ÃÎÊ״̬µÄ´ÎÊý½µµÍÐòÁл¯·´ÐòÁл¯µÄ¿ªÏú¡£Õâ¸öÓÅ»¯Ö÷ÒªÊÇÔÚ
RocksDB µÄ³¡¾°¡£Èç¹ûÊÇ Heap ״̬ Mini-Batch ²¢Ã»Ê²Ã´ÓÅ»¯¡£ÔÚһЩµäÐ͵ÄÒµÎñ³¡¾°ÖУ¬µÃµ½µÄ·´À¡ÊÇÄܼõÉÙ
20~30% ×óÓÒµÄ CPU ¿ªÏú¡£
2¡¢À©Õ¹ window ÀàÐÍ
Ŀǰ SQL ÖеÄÈýÖÖÄÚÖà window£¬¹ö¶¯´°¿Ú¡¢»¬¶¯´°¿Ú¡¢session ´°¿Ú£¬ÕâÈýÖÖÓïÒâµÄ´°¿ÚÎÞ·¨Âú×ãһЩÓû§³¡¾°µÄÐèÇó¡£±ÈÈçÔÚÖ±²¥µÄ³¡¾°£¬·ÖÎöʦÏëͳ¼ÆÒ»¸öÖ÷²¥ÔÚ¿ª²¥Ö®ºó£¬Ã¿Ò»¸öСʱµÄ
UV(Unique Visitor)¡¢GMV(Gross Merchandise Volume) µÈÖ¸±ê¡£×ÔÈ»µÄ¹ö¶¯´°¿ÚµÄ»®·Ö·½Ê½²¢²»Äܹ»Âú×ãÓû§µÄÐèÇó£¬×Ö½ÚÄÚ²¿¾Í×öÁËһЩ¶¨ÖƵĴ°¿ÚÀ´Âú×ãÓû§µÄһЩ¹²ÐÔÐèÇó¡£
-- my_window
Ϊ×Ô¶¨ÒåµÄ´°¿Ú£¬Âú×ãÌØ¶¨µÄ»®·Ö·½Ê½
SELECT
room_id,
COUNT(DISTINCT user_id)
FROM MySource
GROUP BY
room_id,
my_window(ts, INTERVAL '1' HOURS) |
3¡¢window offset
ÕâÊÇÒ»¸ö½ÏΪͨÓõŦÄÜ£¬ÔÚ Datastream API ²ãÊÇÖ§³ÖµÄ£¬µ«
SQL Öв¢Ã»ÓС£ÕâÀïÓиö±È½ÏÓÐÒâ˼µÄ³¡¾°£¬Óû§ÏëÒª¿ªÒ»ÖܵĴ°¿Ú£¬Ò»ÖܵĴ°¿Ú±ä³ÉÁË´ÓÖÜËÄ¿ªÊ¼µÄ·Ç×ÔÈ»ÖÜ¡£ÒòΪËÒ²²»»áÏëµ½
1970 Äê 1 Ô 1 ºÅÄÇÌì¾ÓÈ»ÊÇÖÜËÄ¡£ÔÚ¼ÓÈëÁË offset µÄÖ§³Öºó¾Í¿ÉÒÔÖ§³ÖÕýÈ·µÄ×ÔÈ»ÖÜ´°¿Ú¡£
SELECT
room_id,
COUNT(DISTINCT user_id)
FROM MySource
GROUP BY
room_id,
TUMBLE(ts, INTERVAL '7' DAY, INTERVAL '3', DAY) |
ά±íÓÅ»¯
1¡¢ÑÓ³Ù Join
ά±í Join µÄ³¡¾°ÏÂÒòΪά±í¾³£·¢Éú±ä»¯ÓÈÆäÊÇÐÂÔöά¶È£¬¶ø Join ²Ù×÷·¢ÉúÔÚά¶ÈÐÂÔö֮ǰ£¬¾³£µ¼Ö¹ØÁª²»ÉÏ¡£
ËùÒÔÓû§Ï£ÍûÈç¹û Join ²»µ½£¬ÔòÔÝʱ½«Êý¾Ý»º´æÆðÀ´Ö®ºóÔÙ½øÐг¢ÊÔ£¬²¢ÇÒ¿ÉÒÔ¿ØÖƳ¢ÊÔ´ÎÊý£¬Äܹ»×Ô¶¨ÒåÑÓ³Ù
Join µÄ¹æÔò¡£Õâ¸öÐèÇ󳡾°²»µ¥µ¥ÔÚ×Ö½ÚÄÚ²¿£¬ÉçÇøµÄºÜ¶àͬѧҲÓÐÀàËÆµÄÐèÇó¡£
»ùÓÚÉÏÃæµÄ³¡¾°ÊµÏÖÁËÑÓ³Ù Join ¹¦ÄÜ£¬Ìí¼ÓÁËÒ»¸ö¿ÉÒÔÖ§³ÖÑÓ³Ù Join ά±íµÄËã×Ó¡£µ± Join
ûÓÐÃüÖУ¬local cache ²»»á»º´æ¿ÕµÄ½á¹û£¬Í¬Ê±½«Êý¾ÝÔÝʱ±£´æÔÚÒ»¸ö״̬ÖУ¬Ö®ºó¸ù¾ÝÉèÖö¨Ê±Æ÷ÒÔ¼°ËüµÄÖØÊÔ´ÎÊý½øÐÐÖØÊÔ¡£

2¡¢Î¬±í Keyby ¹¦ÄÜ

ͨ¹ýÍØÆËÎÒÃÇ·¢ÏÖ Cacl Ëã×ÓºÍ lookUpJoin Ëã×ÓÊÇ chain ÔÚÒ»ÆðµÄ¡£ÒòΪËüûÓÐÒ»¸ö
key µÄÓïÒå¡£
µ±×÷Òµ²¢ÐжȱȽϴó£¬Ã¿Ò»¸öά±í Join µÄ subtask£¬·ÃÎʵÄÊÇËùÓеĻº´æ¿Õ¼ä£¬ÕâÑù¶Ô»º´æÀ´ËµÓкܴóµÄѹÁ¦¡£
µ«¹Û²ì Join µÄ SQL£¬µÈÖµ Join ÊÇÌìÈ»¾ßÓÐ Hash ÊôÐԵġ£Ö±½Ó¿ª·ÅÁËÅäÖã¬ÔËÐÐÓû§Ö±½Ó°Ñά±í
Join µÄ key ×÷Ϊ Hash µÄÌõ¼þ£¬½«Êý¾Ý½øÐзÖÇø¡£ÕâÑù¾ÍÄܱ£Ö¤ÏÂÓÎÿһ¸öËã× subtask
Ö®¼äµÄ·ÃÎʿռäÊǶÀÁ¢µÄ£¬ÕâÑù¿ÉÒÔ´ó´óµÄÌáÉý¿ªÊ¼µÄ»º´æÃüÖÐÂÊ¡£
³ýÁËÒÔÉϵÄÓÅ»¯£¬»¹ÓÐÁ½µãĿǰÕýÔÚ¿ª·¢µÄά±íÓÅ»¯¡£
¹ã²¥Î¬±í£ºÓÐЩ³¡¾°ÏÂά±í±È½ÏС£¬¶øÇÒ¸üв»Æµ·±£¬µ«×÷ÒµµÄ QPS ÌØ±ð¸ß¡£Èç¹ûÒÀÈ»·ÃÎÊÍⲿϵͳ½øÐÐ
Join£¬ÄÇôѹÁ¦»á·Ç³£´ó¡£²¢ÇÒµ±×÷Òµ Failover µÄʱºò local cache »áÈ«²¿Ê§Ð§£¬½ø¶øÓÖ¶ÔÍⲿϵͳÔì³ÉºÜ´ó·ÃÎÊѹÁ¦¡£ÄÇô¸Ä½øµÄ·½°¸ÊǶ¨ÆÚÈ«Á¿
scan ά±í£¬Í¨¹ýJoin key hash µÄ·½Ê½·¢Ë͵½ÏÂÓΣ¬¸üÐÂÿ¸öά±í subtask µÄ»º´æ¡£
Mini-Batch£ºÖ÷ÒªÕë¶ÔһЩ I/O ÇëÇó±È½Ï¸ß£¬ÏµÍ³ÓÖÖ§³Ö batch ÇëÇóµÄÄÜÁ¦£¬±ÈÈç˵
RPC¡¢HBase¡¢Redis µÈ¡£ÒÔÍùµÄ·½Ê½¶¼ÊÇÖðÌõµÄÇëÇó£¬ÇÒ Async I/O Ö»Äܽâ¾ö I/O
ÑÓ³ÙµÄÎÊÌâ,²¢²»Äܽâ¾ö·ÃÎÊÁ¿µÄÎÊÌ⡣ͨ¹ýʵÏÖ Mini-Batch °æ±¾µÄά±íËã×Ó£¬´óÁ¿½µµÍά±í¹ØÁª·ÃÎÊÍⲿ´æ´¢´ÎÊý¡£
Join ÓÅ»¯
Ŀǰ Flink Ö§³ÖµÄÈýÖÖ Join ·½Ê½£»·Ö±ðÊÇ Interval Join¡¢Regular Join¡¢Temporal
Table Function¡£
ǰÁ½ÖÖÓïÒåÊÇÒ»ÑùµÄÁ÷ºÍÁ÷ Join¡£¶ø Temporal Table ÊÇÁ÷ºÍ±íµÄµÄ Join£¬ÓұߵÄÁ÷»áÒÔÖ÷¼üµÄÐÎʽÐγÉÒ»ÕÅ±í£¬×ó±ßµÄÁ÷È¥
Join ÕâÕÅ±í£¬ÕâÑùÒ»´Î Join Ö»ÄÜÓÐÒ»ÌõÊý¾Ý²ÎÓë²¢ÇÒÖ»·µ»ØÒ»¸ö½á¹û¡£¶ø²»ÊÇÓжàÉÙÌõ¶¼ÄÜ Join
µ½¡£
ËüÃÇÖ®¼äµÄÇø±ðÁÐÁ˼¸µã£º

¿ÉÒÔ¿´µ½ÈýÖÖ Join ·½Ê½¶¼ÓÐËü±¾ÉíµÄһЩȱÏÝ¡£
Interval Join ĿǰʹÓÃÉϵÄȱÏÝÊÇËü»á²úÉúÒ»¸ö out join Êý¾ÝºÍ watermark
ÂÒÐòµÄÇé¿ö¡£
Regular Join µÄ»°£¬Ëü×î´óµÄȱÏÝÊÇ retract ·Å´ó(Ö®ºó»áÏêϸ˵Ã÷Õâ¸öÎÊÌâ)¡£
Temporal table function µÄÎÊÌâ½ÏÆäËü¶àһЩ£¬ÓÐÈý¸öÎÊÌâ¡£
²»Ö§³Ö DDl
²»Ö§³Ö out join µÄÓïÒå (FLINK-7865 µÄÏÞÖÆ)
ÓÒ²àÊý¾Ý¶ÏÁ÷µ¼Ö watermark ²»¸üУ¬ÏÂÓÎÎÞ·¨ÕýÈ·¼ÆËã (FLINK-18934)
¶ÔÓÚÒÔÉϵIJ»×ãÖ®´¦×Ö½ÚÄÚ²¿¶¼×öÁ˶ÔÓ¦µÄÐ޸ġ£
ÔöÇ¿ Checkpoint »Ö¸´ÄÜÁ¦
¶ÔÓÚ SQL ×÷ÒµÀ´ËµÒ»µ©·¢ÉúÌõ¼þ±ä»¯¶¼ºÜÄÑ´Ó checkpoint Öлָ´¡£
SQL ×÷ҵȷʵ´Ó checkpoint »Ö¸´µÄÄÜÁ¦±È½ÏÈõ£¬ÒòΪÓÐʱºò×öһЩ¿´ÆðÀ´²»Ì«Ó°Ïì checkpoint
µÄÐ޸ģ¬ËüÈÔÈ»ÎÞ·¨»Ö¸´¡£ÎÞ·¨»Ö¸´Ö÷ÒªÓÐÁ½µã£»
µÚÒ»µã£ºoperate ID ÊÇ×Ô¶¯Éú³ÉµÄ£¬È»ºóÒòΪijЩÔÒòµ¼ÖÂËüÉú³ÉµÄ ID ¸Ä±äÁË¡£
µÚ¶þµã£ºËã×ӵļÆËãµÄÂß¼·¢ÉúÁ˸ı䣬¼´Ëã×ÓÄÚ²¿µÄ״̬µÄ¶¨Òå·¢ÉúÁ˱仯¡£
Àý×Ó1£º²¢Ðжȷ¢ÉúÐ޸ĵ¼ÖÂÎÞ·¨»Ö¸´¡£

source ÊÇÒ»¸ö×î³£¼ûµÄÓÐ״̬µÄËã×Ó£¬source Èç¹ûºÍÖ®ºóµÄËã× operator chain
Âß¼·¢ÉúÁ˸ı䣬ÊÇÍêÈ«ÎÞ·¨»Ö¸´µÄ¡£
ÏÂͼ×óÉÏÊÇÕý³£µÄÉçÇø°æµÄ×÷Òµ»á²úÉúµÄÒ»¸öÂß¼£¬ source ºÍºóÃæµÄ²¢ÐжÈÒ»ÑùµÄËã×ӻᱻ chain
ÔÚÒ»Æð£¬Óû§ÊÇÎÞ·¨È¥¸Ä±äµÄ¡£µ«Ëã×Ó²¢ÐжÈÊdz£»á»á·¢ÉúÐ޸쬱ÈÈç˵ source ÓÉÔÀ´µÄ 100 ÐÞ¸ÄΪ
50£¬cacl µÄ²¢·¢ÊÇ 100¡£´Ëʱ chain µÄÂß¼¾Í»á·¢Éú±ä»¯¡£

Õë¶ÔÕâÖÖÇé¿ö£¬×Ö½ÚÄÚ²¿×öÁËÐ޸ģ¬ÔÊÐíÓû§È¥ÅäÖ㬼´Ê¹ source µÄ²¢ÐжȸúºóÃæÕûÌåµÄ×÷ÒµµÄ²¢ÐжÈÊÇÒ»ÑùµÄ£¬Ò²ÈÃÆä²»ÓëÖ®ºóµÄËã×Ó
chain ÔÚÒ»Æð¡£
Àý×Ó2£ºDAG ¸Ä±äµ¼ÖÂÎÞ·¨»Ö¸´¡£

ÕâÊÇÒ»ÖֱȽÏÌØÊâµÄÇé¿ö£¬ÓÐÒ»Ìõ SQL (ÉÏͼ)£¬¿ÉÒÔ¿´µ½ source ûÓз¢Éú±ä»¯£¬Ö®ºóµÄÈý¸ö¾ÛºÏ»¥ÏàÖ®¼äûÓйØÏµ£¬×´Ì¬¾¹È»Ò²ÊÇÎÞ·¨»Ö¸´¡£
×÷ÒµÖ®ËùÒÔÎÞ·¨»Ö¸´£¬ÊÇÒòΪ operator ID Éú³É¹æÔòµ¼Öµġ£Ä¿Ç° SQL ÖÐ operator
ID µÄÉú³ÉµÄ¹æÔòÓëÉÏÓΡ¢±¾ÉíÅäÖÃÒÔ¼°ÏÂÓοÉÒÔ chain ÔÚÒ»ÆðµÄËã×ÓµÄÊýÁ¿¶¼ÓйØÏµ¡£ ÒòΪÐÂÔöÖ¸±ê£¬»áµ¼ÖÂÐÂÔöÒ»¸ö
Calc µÄÏÂÓνڵ㣬½ø¶øµ¼Ö operator ID ·¢Éú±ä»¯¡£
ΪÁË´¦ÀíÕâÖÖÇé¿ö£¬Ö§³ÖÁËÒ»ÖÖÌØÊâµÄÅäÖÃģʽ£¬ÔÊÐíÓû§ÅäÖÃÉú³É operator ID µÄʱºò¿ÉÒÔºöÂÔÏÂÓÎ
chain ÔÚÒ»ÆðËã×ÓÊýÁ¿µÄÌõ¼þ¡£
Àý×Ó3£ºÐÂÔö¾ÛºÏÖ¸±êµ¼ÖÂÎÞ·¨»Ö¸´
Õâ¿éÊÇÓû§ËßÇó×î´óµÄ£¬Ò²ÊÇ×ÔӵIJ¿·Ö¡£Óû§ÆÚÍûÐÂÔöһЩ¾ÛºÏÖ¸±êºó£¬ÔÀ´µÄÖ¸±êÒªÄÜ´Ó checkpoint
Öлָ´¡£

¿ÉÒÔ¿´µ½Í¼ÖÐ×󲿷ÖÊÇ SQL Éú³ÉµÄËã×ÓÂß¼¡£count£¬sum£¬sum£¬count£¬distinct
»áÒÔÒ»¸ö BaseRow µÄ½á¹¹´æ´¢ÔÚ ValueState ÖС£distinct ±È½ÏÌØÊâһЩ£¬»¹»áµ¥¶À´æ´¢ÔÚÒ»¸ö
MapState ÖС£
Õâµ¼ÖÂÁËÈçÐÂÔö»òÕß¼õÉÙÖ¸±ê£¬¶¼»áʹÔÏȵÄ״̬û°ì·¨´Ó ValueState ÖÐÕý³£»Ö¸´£¬ÒòΪ VauleState
Öд洢µÄ״̬ ¡°schema¡± ºÍеģ¨ÐÞ¸ÄÖ¸±êºó£©µÄ ¡°schema¡±²»Æ¥Å䣬ÎÞ·¨Õý³£·´ÐòÁл¯¡£


ÔÚÌÖÂÛ½â¾ö·½°¸Ö®Ç°£¬ÎÒÃÇÏȻعËÒ»ÏÂÕý³£µÄ»Ö¸´Á÷¡£ÏÈ´Ó checkpoint Öлָ´³ö״̬µÄ serializer£¬ÔÙͨ¹ý
serializer °Ñ״̬»Ö¸´¡£½ÓÏÂÀ´ operator È¥×¢²áеÄ״̬¶¨Ò壬еÄ״̬¶¨Òå»áºÍÔÏȵÄ״̬¶¨Òå½øÐÐÒ»¸ö¼æÈÝÐԶԱȣ¬Èç¹ûÊǼæÈÝÔò״̬»Ö¸´³É¹¦£¬Èç¹û²»¼æÈÝÔòÅ׳öÒì³£ÈÎÎñʧ°Ü¡£
²»¼æÈݵÄÁíÒ»ÖÖ´¦ÀíÇé¿öÊÇÔÊÐí·µ»ØÒ»¸ö migration£¨ÊµÏÖÁ½¸ö²»Æ¥ÅäÀàÐ͵Ä״̬»Ö¸´£©ÄÇôҲ¿ÉÒÔ»Ö¸´³É¹¦¡£
Õë¶ÔÉÏÃæµÄÁ÷³Ì×ö³ö¶ÔÓ¦µÄÐ޸ģº
µÚÒ»²½Ê¹ÐÂ¾É serializer »¥ÏàÖªµÀ¶Ô·½µÄÐÅÏ¢£¬Ìí¼ÓÒ»¸ö½Ó¿Ú£¬ÇÒÐÞ¸ÄÁË statebackend
resolve compatibility µÄ¹ý³Ì£¬°Ñ¾ÉµÄÐÅÏ¢´«µÝ¸øÐµģ¬²¢Ê¹Æä»ñÈ¡Õû¸ö migrate
¹ý³Ì¡£
µÚ¶þ²½ÅжÏÐÂÀÏÖ®¼äÊÇ·ñ¼æÈÝ£¬Èç¹û²»¼æÈÝÊÇ·ñÐèÒª×öÒ»´Î migration¡£È»ºóÈÃ¾ÉµÄ serializer
È¥»Ö¸´Ò»±é״̬£¬²¢Ê¹ÓÃÐ嵀 serializer дÈëеÄ״̬¡£
¶Ô aggregation µÄ´úÂëÉú³É½øÐд¦Àí£¬µ±·¢ÏÖ aggregation Äõ½µÄÊÇÖ¸±êÊÇ null£¬ÄÇô½«×öһЩ³õʼ»¯µÄ¹¤×÷¡£
ͨ¹ýÒÔÉϵÄÐ޸Ļù±¾¾Í¿ÉÒÔ×öµ½Õý³£µÄ£¬ÐÂÔöµÄ¾ÛºÏÖ¸±ê´Ó²ð¿ªµÄ·½°¸»Ö¸´¡£
Èý¡¢Á÷ÅúÒ»Ìå̽Ë÷
ÒµÎñÏÖ×´
×Ö½ÚÌø¶¯ÄÚ²¿¶ÔÁ÷ÅúÒ»ÌåºÍÒµÎñÍÆ¹ã֮ǰ£¬¼¼ÊõÍŶÓÌáǰ×öÁË´óÁ¿¼¼Êõ·½ÃæµÄ̽Ë÷¡£ÕûÌåÅжÏÊÇ SQL ÕâÒ»²ãÊÇ¿ÉÒÔ×öµ½Á÷ÅúÒ»ÌåµÄÓïÒ壬µ«Êµ¼ùÖÐÈ´ÓÖ·¢ÏÖ²»ÉÙ²»Í¬¡£
±ÈÈç˵Á÷¼ÆËãµÄ session window£¬»òÊÇ»ùÓÚ´¦Àíʱ¼äµÄ window£¬ÔÚÅú¼ÆËãÖÐÎÞ·¨×öµ½¡£Í¬Ê±
SQL ÔÚÅú¼ÆËãÖÐһЩ¸´Ô over window£¬ÔÚÁ÷¼ÆËãÖÐҲûÓжÔÓ¦µÄʵÏÖ¡£
µ«ÕâÐ©ÌØ±ðµÄ³¡¾°¿ÉÄÜÖ»Õ¼ 10% ÉõÖÁ¸üÉÙ£¬ËùÒÔÓà SQL È¥ÂäʵÁ÷ÅúÒ»ÌåÊÇ¿ÉÐеġ£

Á÷ÅúÒ»Ìå
ÕâÕÅͼÊDZȽϳ£¼ûµÄºÍ´ó¶àÊý¹«Ë¾ÀïµÄ¼Ü¹¹¶¼ÀàËÆ¡£ÕâÖּܹ¹ÓÐʲôȱÏÝÄØ£¿
Êý¾Ý²»Í¬Ô´£ºÅúÈÎÎñÒ»°ã»áÓÐÒ»´ÎǰÖô¦ÀíÈÎÎñ£¬²»¹ÜÊÇÀëÏßµÄÒ²ºÃʵʱµÄÒ²ºÃ£¬Ô¤ÏȽø¹ýÒ»²ã¼Ó¹¤ºóдÈë Hive¡£¶øÊµÊ±ÈÎÎñÊÇ´Ó
kafka ¶ÁÈ¡ÔʼµÄÊý¾Ý£¬¿ÉÄÜÊÇ json ¸ñʽ£¬Ò²¿ÉÄÜÊÇ avro µÈµÈ¡£Ö±½Óµ¼ÖÂÅúÈÎÎñÖпÉÖ´ÐеÄ
SQL ÔÚÁ÷ÈÎÎñÖÐûÓнá¹ûÉú³É»òÕßÖ´Ðнá¹û²»¶Ô¡£
¼ÆË㲻ͬԴ£ºÅúÈÎÎñÒ»°ãÊÇ Hive + Spark µÄ¼Ü¹¹£¬¶øÁ÷ÈÎÎñ»ù±¾¶¼ÊÇ»ùÓÚ Flink¡£²»Í¬µÄÖ´ÐÐÒýÇæÔÚʵÏÖÉ϶¼»áÓÐһЩ²îÒ죬µ¼Ö½á¹û²»Ò»Ö¡£²»Í¬µÄÖ´ÐÐÒýÇæÓв»Í¬µÄ
API ¶¨Òå UDF£¬ËüÃÇÖ®¼äÒ²ÊÇÎÞ·¨±»¹«Óõġ£´ó²¿·ÖÇé¿ö϶¼ÊÇά»¤Á½Ì×»ùÓÚ²»Í¬ API ʵÏÖµÄÏàͬ¹¦ÄܵÄ
UDF¡£
¼øÓÚÉÏÃæµÄÎÊÌ⣬Ìá³öÁË»ùÓÚ Flink µÄÁ÷ÅúÒ»Ìå¼Ü¹¹À´½â¾ö¡£
Êý¾Ý²»Í¬Ô´£ºÁ÷ʽ´¦ÀíÏÈͨ¹ý Flink ´¦ÀíÖ®ºóдÈë MQ ¹©ÏÂÓÎÁ÷ʽ Flink job È¥Ïû·Ñ£¬¶ÔÓÚÅúʽ´¦ÀíÓÉ
Flink ´¦ÀíºóÁ÷ʽдÈëµ½ Hive£¬ÔÙÓÉÅúʽµÄ Flink job È¥´¦Àí¡£
ÒýÇæ²»Í¬Ô´£º¼ÈÈ»¶¼ÊÇ»ùÓÚ Flink ¿ª·¢µÄÁ÷ʽ£¬Åúʽ job£¬×ÔȻûÓмÆË㲻ͬԴÎÊÌ⣬ͬʱҲ±ÜÃâÁËά»¤¶àÌ×Ïàͬ¹¦ÄܵÄ
UDF¡£
»ùÓÚ Flink ʵÏÖµÄÁ÷ÅúÒ»Ìå¼Ü¹¹£º

ÒµÎñÊÕÒæ
ͳһµÄ SQL£ºÍ¨¹ýÒ»Ì× SQL À´±í´ïÁ÷ºÍÅú¼ÆËãÁ½ÖÖ³¡¾°£¬¼õÉÙ¿ª·¢Î¬»¤¹¤×÷¡£
¸´Óà UDF£ºÁ÷ʽºÍÅúʽ¼ÆËã¿ÉÒÔ¹²ÓÃÒ»Ì× UDF¡£Õâ¶ÔÒµÎñÀ´ËµÊÇÓлý¼«ÒâÒåµÄ¡£
ÒýÇæÍ³Ò»£º¶ÔÓÚÒµÎñµÄѧϰ³É±¾ºÍ¼Ü¹¹µÄά»¤³É±¾¶¼»á½µµÍºÜ¶à¡£
ÓÅ»¯Í³Ò»£º´ó²¿·ÖµÄÓÅ»¯¶¼ÊÇ¿ÉÒÔͬʱ×÷ÓÃÔÚÁ÷ʽºÍÅúʽ¼ÆËãÉÏ£¬±ÈÈç¶Ô planner¡¢operator
µÄÓÅ»¯Á÷ºÍÅú¿ÉÒÔ¹²Ïí¡£
ËÄ¡¢Î´À´¹¤×÷ºÍ¹æ»®
ÓÅ»¯ retract ·Å´óÎÊÌâ

ʲôÊÇ retract ·Å´ó£¿
ÉÏͼÓÐ 4 ÕÅ±í£¬µÚÒ»ÕÅ±í½øÐÐÈ¥ÖØ²Ù×÷ (Dedup)£¬Ö®ºó·Ö±ðºÍÁíÍâÈýÕűí×ö Join¡£Âß¼±È½Ï¼òµ¥£¬±í
A ÊäÈë(A1)£¬×îºó²ú³ö (A1,B1,C1,D1) µÄ½á¹û¡£
µ±±í A ÊäÈëÒ»¸ö A2£¬ÒòΪ Dedup Ëã×Ó£¬µ¼ÖÂÊý¾ÝÐèÒªÈ¥ÖØ£¬ÔòÏòÏÂÓη¢ËÍÒ»¸ö³·»Ø A1 µÄ²Ù×÷
-(A1) ºÍÒ»¸öÐÂÔö A2 µÄ²Ù×÷ +(A2)¡£µÚÒ»¸ö Join Ëã×ÓÊÕµ½ -(A1) ºó»á½« -(A1)
±ä³É -(A1,B1) ºÍ +(null,B1)(ΪÁ˱£³ÖËüÈÏΪµÄÕýÈ·ÓïÒå) ·¢Ë͵½ÏÂÓΡ£Ö®ºóÓÖÊÕµ½ÁË
+(A2) £¬ÔòÓÖÏòÏÂÓη¢ËÍ -(null,B1) ºÍ +(A2,B1) ÕâÑù²Ù×÷¾Í·Å´óÁËÁ½±¶¡£ÔÙ¾ÓÉÏÂÓεÄËã×Ó²Ù×÷»áÒ»Ö±±»·Å´ó£¬µ½×îÖÕµÄ
sink Êä³ö¿ÉÄܻᱻ·Å´ó 1000 ±¶Ö®¶à¡£

ÈçºÎ½â¾ö£¿
½«ÔÏÈ retract µÄÁ½ÌõÊý¾Ý±ä³ÉÒ»Ìõ changelog µÄ¸ñʽÊý¾Ý£¬ÔÚËã×ÓÖ®¼ä´«µÝ¡£Ëã×Ó½ÓÊÕµ½
changelog ºó´¦Àí±ä¸ü£¬È»ºó½ö½öÏòÏÂÓη¢ËÍÒ»¸ö±ä¸ü changelog ¼´¿É¡£
δÀ´¹æ»®

1.¹¦ÄÜÓÅ»¯
Ö§³ÖËùÓÐÀàÐ;ۺÏÖ¸±ê±ä¸üµÄ checkpoint »Ö¸´ÄÜÁ¦
window local-global
ʼþʱ¼äµÄ Fast Emit
¹ã²¥Î¬±í
¸ü¶àËã× Mini-Batch Ö§³Ö:ά±í£¬TopN£¬Join µÈ
È«Ãæ¼æÈÝ Hive SQL Óï·¨
2.ÒµÎñÀ©Õ¹
½øÒ»²½Íƶ¯Á÷ʽ SQL ´ïµ½ 80%
̽Ë÷Â䵨Á÷ÅúÒ»Ìå²úÆ·ÐÎ̬
ÍÆ¶¯ÊµÊ±Êý²Ö±ê×¼»¯
|