Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Flink SQL 1.11 й¦ÄÜÓë×î¼Ñʵ¼ù
 
×÷ÕߣºÎéÁˆ£¨ÔÆÐ°£©
  2469  次浏览      27
 2021-3-2
 
±à¼­ÍƼö:
±¾ÎÄÖ÷Òª½éÉÜÁË Flink 1.11 °æ±¾ÔÚ connectivity ºÍ simplicity ·½Ãæ¶¼´øÀ´µÄй¦ÄÜ¡£
±¾ÎÄÀ´×ÔÓÚFlinkÖÐÎÄÉçÇø £¬ÓÉAlice±à¼­¡¢ÍƼö¡£

±¾ÎÄÕûÀí×Ô Flink PMC member ÔÆÐ°ÔÚ Apache Flink Meetup 2020 ¡¤ ÉϺ£Õ¾µÄ talk£¬Ö¼ÔÚ°ïÖúÓû§¿ìËÙÁ˽âа汾 Table & SQL ÔÚ Connectivity ºÍ Simplicity µÈ·½ÃæµÄÓÅ»¯¼°Êµ¼Ê¿ª·¢Ê¹ÓõÄ×î¼Ñʵ¼ù£¬Ö÷Òª·ÖΪÒÔÏÂËĸö²¿·Ö£º

1.¼òÒª»Ø¹Ë Flink 1.8 ~ Flink 1.11 °æ±¾ÔÚ Apache ÉçÇøµÄ·¢Õ¹Ç÷ÊÆ£¬ÆäÖйúÄÚ¿ª·¢ÕߵĻý¼«²ÎÓëºÍÖÐÎÄÉçÇøµÄÅ·¢Õ¹¶Ô Flink ÔÚÉçÇøºÍ GitHub µÄ»îÔ¾¶È×ö³öÁËÖØÒª¹±Ïס£

2.Ïêϸ½â¶Á Flink SQL 1.11 й¦ÄÜ£¬E.g. connectors ²ÎÊý¼ò»¯ + ¶¯Ì¬ Table ²ÎÊý¼õÉÙ´úÂëÈßÓ࣬ÄÚÖà connectors + LIKE Óï·¨°ïÖú¿ìËÙ²âÊÔ£¬Öع¹µÄ TableEnvironment ¡¢TableSource / TableSink ½Ó¿ÚÌáÉýÒ×ÓÃÐÔ£¬Hive Dialect + CDC ½øÒ»²½Ö§³ÖÁ÷ÅúÒ»Ìå¡£

3.ÖØµãչʾа汾¶Ô Hive Êý²Öʵʱ»¯µÄÖ§³ÖºÍ Flink SQL ÒýÈë CDC µÄÊý¾Ýͬ²½×î¼Ñʵ¼ù¡£

4.¼òÒª½â¶Á Flink SQL 1.12 δÀ´¹æ»®¡£

1 Flink 1.8 ~ 1.11 ÉçÇø·¢Õ¹Ç÷ÊÆ»Ø¹Ë

×Ô 2019 Äê³õ°¢Àï°Í°ÍÐû²¼Ïò Flink ÉçÇø¹±Ï× Blink Ô´Âë²¢ÔÚͬÄê 4 Ô·¢²¼ Flink 1.8 °æ±¾ºó£¬Flink ÔÚÉçÇøµÄ»îÔ¾³Ì¶ÈÓÌÈç×øÉÏС»ð¼ý°ãÉÏÉý£¬Ã¿¸ö°æ±¾°üº¬µÄ git commits ÊýÁ¿ÒÔ 50% µÄÔöËÙ³ÖÐøÉÏÕÇ£¬ ÎüÒýÁËÒ»´óÅú¹úÄÚ¿ª·¢ÕߺÍÓû§²ÎÓëµ½ÉçÇøµÄÉú̬·¢Õ¹ÖÐÀ´£¬ÖÐÎÄÓû§ÓʼþÁÐ±í£¨user-zh@£©¸üÊÇÔÚ½ñÄê 6 ÔÂÊ״㬳öÓ¢ÎÄÓû§ÓʼþÁÐ±í£¨user@£©£¬ÔÚ 7 Ô³¬³ö±ÈÀý´ïµ½ÁË 50%¡£¶Ô±ÈÆäËü Apache ¿ªÔ´ÉçÇøÈç Spark£¬Kafka µÄÓû§ÓʼþÁбíÊý£¨Ã¿ÔÂÔ¼ 200 ·â×óÓÒ£©¿ÉÒÔ¿´³ö£¬Õû¸ö Flink ÉçÇøµÄ·¢Õ¹ÒÀÈ»·Ç³£½¡¿µºÍ»îÔ¾¡£

2 Flink SQL й¦Äܽâ¶Á

ÔÚÁ˽â Flink ÕûÌå·¢Õ¹Ç÷ÊÆºó£¬ÎÒÃÇÀ´¿´ÏÂ×î½ü·¢²¼µÄ Flink 1.11 °æ±¾ÔÚ connectivity ºÍ simplicity ·½Ãæ¶¼´øÀ´ÁËÄÄЩÁîÈ˶úĿһÐµĹ¦ÄÜ¡£

FLIP-122£º¼ò»¯ connector ²ÎÊý

Õû¸ö Flink SQL 1.11 ÔÚÎ§ÈÆÒ×ÓÃÐÔ·½Ãæ×öÁ˺ܶàÓÅ»¯£¬±ÈÈç FLIP-122£¬ÓÅ»¯ÁË connector µÄ property ²ÎÊýÃû³ÆÈß³¤µÄÎÊÌâ¡£ÒÔ Kafka ΪÀý£¬ÔÚ 1.11 °æ±¾Ö®Ç°Óû§µÄ DDL ÐèÒªÉùÃ÷³ÉÈçÏ·½Ê½

CREATE TABLE user_behavior (
...
) WITH (
'connector.type'='kafka',
'connector.version'='universal',
'connector.topic'='user_behavior',
'connector.startup-mode'='earliest-offset',
'connector.properties.zookeeper.connect'='localhost:2181',
'connector.properties.bootstrap.servers'='localhost:9092',
'format.type'='json'
);

¶øÔÚ Flink SQL 1.11 ÖÐÔò¼ò»¯Îª

CREATE TABLE user_behavior (
...
) WITH (
'connector'='kafka',
'topic'='user_behavior',
'scan.startup.mode'='earliest-offset',
'properties.zookeeper.connect'='localhost:2181',
'properties.bootstrap.servers'='localhost:9092',
'format'='json'
);

DDL ±í´ïµÄÐÅÏ¢Á¿Ë¿ºÁδÉÙ£¬µ«ÊÇ¿´ÆðÀ´ÇåˬÐí¶à :) Flink µÄ¿ª·¢ÕßÃÇΪÕâ¸öÓÅ»¯×öÁ˺ܶàÌÖÂÛ£¬ÓÐÐËȤ¿ÉÒÔΧ¹Û FLIP-122 Discussion Thread¡£

FLINK-16743£ºÄÚÖà connectors

Flink SQL 1.11 мÓÈëÁËÈýÖÖÄÚÖÃµÄ connectors£¬ÈçϱíËùʾ

ÔÚÍⲿ connector »·¾³»¹Ã»ÓÐ ready ʱ£¬Óû§¿ÉÒÔÑ¡Ôñ datagen source ºÍ print sink ¿ìËÙ¹¹½¨ pipeline ÊìϤ Flink SQL£»¶ÔÓÚÏëÒª²âÊÔ Flink SQL ÐÔÄܵÄÓû§£¬¿ÉÒÔʹÓà blackhole ×÷Ϊ sink£»¶ÔÓÚµ÷ÊÔÅÅ´í³¡¾°£¬print sink »á½«¼ÆËã½á¹û´òµ½±ê×¼Êä³ö£¨±ÈÈ缯Ⱥ»·¾³Ï¾ͻá´òµ½ taskmanager.out Îļþ£©£¬Ê¹µÃ¶¨Î»ÎÊÌâµÄ³É±¾´ó´ó½µµÍ¡£

FLIP-110£ºLIKE Óï·¨

Flink SQL 1.11 Ö§³ÖÓû§´ÓÒѶ¨ÒåºÃµÄ table DDL ÖпìËÙ ¡°fork¡± ×Ô¼ºµÄ°æ±¾²¢½øÒ»²½ÐÞ¸Ä watermark »òÕß connector µÈÊôÐÔ¡£±ÈÈçÏÂÃæÕâÕÅ base_table ÉÏÏë¼ÓÒ»¸ö watermark£¬ÔÚ Flink 1.11 °æ±¾Ö®Ç°£¬Óû§Ö»ÄÜÖØÐ½«±íÉùÃ÷Ò»±é£¬²¢¼ÓÈë×Ô¼ºµÄÐ޸ģ¬¿Éν ¡°Ç£Ò»·¢¶ø¶¯È«Éí¡±¡£

-- before Flink SQL 1.11
CREATE TABLE base_table (
id BIGINT,
name STRING,
ts TIMESTAMP
) WITH (
'connector.type'='kafka',
...
);

CREATE TABLE derived_table (
id BIGINT,
name STRING,
ts TIMESTAMP,
WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
) WITH (
'connector.type'='kafka',
...
);

´Ó Flink 1.11 ¿ªÊ¼£¬Óû§Ö»ÐèҪʹÓà CREATE TABLE LIKE Óï·¨¾Í¿ÉÒÔÍê³É֮ǰµÄ²Ù×÷

-- Flink SQL 1.11
CREATE TABLE base_table (
id BIGINT,
name STRING,
ts TIMESTAMP
) WITH (
'connector'='kafka',
...
);

CREATE TABLE derived_table (
WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
) LIKE base_table;

¶øÄÚÖà connector Óë CREATE TABLE LIKE Óï·¨´îÅäʹÓÃÔò»áÈçÏÂͼһ°ã²úÉú¡°ÌìÀ×¹´µØ»ð¡±µÄЧ¹û£¬¼«´óÌáÉý¿ª·¢Ð§ÂÊ¡£

FLIP-113£º¶¯Ì¬ Table ²ÎÊý

¶ÔÓÚÏñ Kafka ÕâÖÖÏûÏ¢¶ÓÁУ¬ÔÚÉùÃ÷ DDL ʱͨ³£»áÓÐÒ»¸öÆô¶¯µãλȥָ¶¨¿ªÊ¼Ïû·ÑÊý¾ÝµÄʱ¼ä£¬Èç¹ûÐèÒª¸ü¸ÄÆô¶¯µã룬ÔÚÀϰ汾ÉϾÍÐèÒªÖØÐÂÉùÃ÷Ò»±éеãλµÄ DDL£¬·Ç³£²»·½±ã¡£

CREATE TABLE user_behavior (
user_id BIGINT,
behavior STRING,
ts TIMESTAMP(3)
) WITH (
'connector'='kafka',
'topic'='user_behavior',
'scan.startup.mode'='timestamp',
'scan.startup.timestamp-millis'='123456',
'properties.bootstrap.servers'='localhost:9092',
'format'='json'
);

´Ó Flink 1.11 ¿ªÊ¼£¬Óû§¿ÉÒÔÔÚ SQL client Öа´ÈçÏ·½Ê½ÉèÖÿªÆô SQL ¶¯Ì¬²ÎÊý£¨Ä¬ÈÏÊǹرյģ©£¬Èç´Ë¼´¿ÉÔÚ DML ÀïÖ¸¶¨¾ßÌåµÄÆô¶¯µãλ¡£

SET 'table.dynamic-table-options.enabled' = 'true';
SELECT user_id, COUNT(DISTINCT behaviro)
FROM user_behavior /*+ OPTIONS('scan.startup.timestamp-millis'='1596282223') */
GROUP BY user_id;

³ýÆô¶¯µãλÍ⣬¶¯Ì¬²ÎÊý»¹Ö§³ÖÏñ sink.partition ¡¢ scan.startup.mode µÈ¸ü¶àÔËÐÐʱ²ÎÊý£¬¸ÐÐËȤ¿ÉÒÆ²½ FLIP-113 »ñµÃ¸ü¶àÐÅÏ¢¡£

FLIP-84£ºÖع¹ÓÅ»¯ TableEnvironment ½Ó¿Ú

Flink SQL 1.11 ÒÔǰµÄ TableEnvironment ½Ó¿Ú¶¨ÒåºÍÐÐΪÓÐһЩ²»¹»ÇåÎú£¬±ÈÈç

TableEnvironment#sqlUpdate() ·½·¨¶ÔÓÚ DDL »áÁ¢¼´Ö´ÐУ¬µ«¶ÔÓÚ INSERT INTO DML Óï¾äÈ´ÊÇ buffer סµÄ£¬Ö±µ½µ÷Óà TableEnvironment#execute() ²Å»á±»Ö´ÐУ¬ËùÒÔÔÚÓû§¿´ÆðÀ´Ë³ÐòÖ´ÐеÄÓï¾ä£¬Êµ¼Ê²úÉúµÄЧ¹û¿ÉÄܻ᲻һÑù¡£

´¥·¢×÷ÒµÌá½»ÓÐÁ½¸öÈë¿Ú£¬Ò»¸öÊÇ TableEnvironment#execute(), ÁíÒ»¸öÊÇ StreamExecutionEnvironment#execute()£¬ÓÚÓû§¶øÑÔºÜÄÑÀí½âÓ¦¸ÃʹÓÃÄĸö·½·¨´¥·¢×÷ÒµÌá½»¡£

µ¥´ÎÖ´Ðв»½ÓÊܶà¸ö INSERT INTO Óï¾ä¡£

Õë¶ÔÕâЩÎÊÌ⣬Flink SQL 1.11 ÌṩÁËРAPI£¬¼´ TableEnvironment#executeSql()£¬ËüͳһÁËÖ´ÐÐ sql µÄÐÐΪ£¬ ÎÞÂÛ½ÓÊÕ DDL¡¢²éѯ query »¹ÊÇ INSERT INTO ¶¼»áÁ¢¼´Ö´ÐС£Õë¶Ô¶à sink ³¡¾°ÌṩÁË StatementSet ºÍ TableEnvironment#createStatementSet() ·½·¨£¬ÔÊÐíÓû§Ìí¼Ó¶àÌõ INSERT Óï¾äÒ»ÆðÖ´ÐС£

³ý´ËÖ®Í⣬Ð嵀 execute ·½·¨¶¼Óзµ»ØÖµ£¬Óû§¿ÉÒÔÔÚ·µ»ØÖµÉÏÖ´ÐÐ print, collect µÈ·½·¨¡£

ÐÂ¾É API ¶Ô±ÈÈçϱíËùʾ

¶ÔÓÚÔÚ Flink 1.11 ÉÏʹÓÃнӿÚÓöµ½µÄһЩ³£¼ûÎÊÌâ£¬ÔÆÐ°×öÁËͳһ½â´ð£¬¿ÉÔÚ Appendix ²¿·Ö²é¿´¡£

FLIP-95£ºTableSource & TableSink ÖØ¹¹

¿ª·¢ÕßÃÇÔÚ Flink SQL 1.11 °æ±¾»¨ÁË´óÁ¿¾­Àú¶Ô TableSource ºÍ TableSink API ½øÐÐÁËÖØ¹¹£¬ºËÐÄÓÅ»¯µãÈçÏÂ

ÒÆ³ýÀàÐÍÏà¹Ø½Ó¿Ú£¬¼ò»¯¿ª·¢£¬½â¾öÃÔ»óµÄÀàÐÍÎÊÌ⣬֧³ÖÈ«ÀàÐÍ

ѰÕÒ Factory ʱ£¬¸üÇåÎúµÄ±¨´íÐÅÏ¢

½â¾öÕÒ²»µ½ primary key µÄÎÊÌâ

ͳһÁËÁ÷Åú source£¬Í³Ò»ÁËÁ÷Åú sink

Ö§³Ö¶ÁÈ¡ CDC ºÍÊä³ö CDC

Ö±½Ó¸ßЧµØÉú³É Flink SQL ÄÚ²¿Êý¾Ý½á¹¹ RowData

ÀÏ TableSink API ÈçÏÂËùʾ£¬ÆäÖÐÓÐ 6 ¸ö·½·¨ÊÇÀàÐÍÏà¹Ø²¢ÇÒ»¹³ä³â×Å deprecated ·½·¨£¬µ¼Ö connector ¾­³£³ö bug¡£Ð DynamicTableSink API È¥µôÁËËùÓÐÀàÐÍÏà¹Ø½Ó¿Ú£¬ÒòΪËùÓеÄÀàÐͶ¼ÊÇ´Ó DDL À´µÄ£¬²»ÐèÒª TableSink ¸æËß¿ò¼ÜÊÇʲôÀàÐÍ¡£¶ø¶ÔÓÚÓû§À´Ëµ£¬×îÖ±¹ÛµÄÌåÑé¾ÍÊÇÔÚÀϰ汾ÉÏÓöµ½¸÷ÖÖÆæÆæ¹Ö¹Ö±¨´íµÄ¸ÅÂʽµµÍÁ˺ܶ࣬±ÈÈç²»Ö§³ÖµÄ¾«¶ÈÀàÐͺÍÕÒ²»µ½ primary key / table factory µÄ¹îÒ챨´íÔÚа汾É϶¼²»¸´´æÔÚÁË¡£¹ØÓÚ Flink 1.11 ÊÇÈçºÎ½â¾öÕâЩÎÊÌâµÄÏêϸ¿ÉÒÔÔÚ Appendix ²¿·ÖÔĶÁ¡£

FLIP-123£ºHive Dialect

Flink 1.10 °æ±¾¶Ô Hive connector µÄÖ§³Ö´ïµ½ÁËÉú²ú¿ÉÓ㬵«ÊÇÀϰ汾µÄ Flink SQL ²»Ö§³Ö Hive DDL ¼°Ê¹Óà Hive syntax£¬ÕâÎÞÒÉÏÞÖÆÁË Flink connectivity¡£ÔÚа汾ÖУ¬¿ª·¢ÕßÃÇΪ֧³Ö HiveQL ÒýÈëÁËРparser£¬Óû§¿ÉÒÔÔÚ SQL client µÄ yaml ÎļþÖÐÖ¸¶¨ÊÇ·ñʹÓà Hive Óï·¨£¬Ò²¿ÉÒÔÔÚ SQL client ÖÐͨ¹ý set table.sql-dialect=hive/default ¶¯Ì¬Çл»¡£¸ü¶àÐÅÏ¢¿ÉÒԲο¼ FLIP-123¡£

ÒÔÉϼòÒª½éÉÜÁË Flink 1.11 ÔÚ ¼õÉÙÓû§²»±ØÒªµÄÊäÈëºÍ²Ù×÷·½Ãæ¶Ô connectivity ºÍ simplicity ·½Ãæ×ö³öµÄÓÅ»¯¡£ÏÂÃæ»áÖØµã½éÉÜÔÚÍⲿϵͳºÍÊý¾ÝÉú̬·½Ãæ¶Ô connectivity ºÍ simplicity µÄÁ½¸öºËÐÄÓÅ»¯£¬²¢¸½ÉÏ×î¼Ñʵ¼ù½éÉÜ¡£

3 Hive Êý²Öʵʱ»¯ & Flink SQL + CDC ×î¼Ñʵ¼ù

FLINK-17433£ºHive Êý²Öʵʱ»¯

ÏÂͼÊÇÒ»Õŷdz£¾­µäµÄ Lambda Êý²Ö¼Ü¹¹£¬ÔÚÕû¸ö´óÊý¾ÝÐÐÒµ´ÓÅú´¦ÀíÖð²½Óµ±§Á÷¼ÆËãµÄÐí¶àÄêÀï´ú±í¡°×îÏȽøµÄÉú²úÁ¦¡±¡£È»¶øËæ×ÅÒµÎñ·¢Õ¹ºÍ¹æÄ£À©´ó£¬Á½Ì×µ¥¶ÀµÄ¼Ü¹¹Ëù´øÀ´µÄ¿ª·¢¡¢ÔËά¡¢¼ÆËã³É±¾ÎÊÌâÒѾ­ÈÕÒæÍ¹ÏÔ¡£

¶ø Flink ×÷Ϊһ¸öÁ÷ÅúÒ»ÌåµÄ¼ÆËãÒýÇæ£¬ÔÚ×î³õµÄÉè¼ÆÉϾÍÈÏΪ¡°ÍòÎï±¾ÖʽÔÊÇÁ÷¡±£¬Åú´¦ÀíÊÇÁ÷¼ÆËãµÄÌØÀý£¬Èç¹ûÄܹ»ÔÚ×ÔÉíÌṩ¸ßЧÅú´¦ÀíÄÜÁ¦µÄͬʱÓëÏÖÓеĴóÊý¾ÝÉú̬½áºÏ£¬ÔòÄÜÒÔ×îСÇÖÈëµÄ·½Ê½¸ÄÔìÏÖÓеÄÊý²Ö¼Ü¹¹Ê¹ÆäÖ§³ÖÁ÷ÅúÒ»Ìå¡£ÔÚа汾ÖУ¬Flink SQL ÌṩÁË¿ªÏä¼´ÓÃµÄ ¡°Hive Êý²Öͬ²½¡±¹¦ÄÜ£¬¼´ËùÓеÄÊý¾Ý¼Ó¹¤Âß¼­ÓÉ Flink SQL ÒÔÁ÷¼ÆËãģʽִÐУ¬ÔÚÊý¾ÝдÈë¶Ë£¬×Ô¶¯½« ODS£¬DWD ºÍ DWS ²ãµÄÒѾ­¼Ó¹¤ºÃµÄÊý¾Ýʵʱ»ØÁ÷µ½ Hive table¡£One size (sql) fits for all suites (tables) µÄÉè¼Æ£¬Ê¹µÃÔÚ batch ²ã²»ÔÙÐèҪά»¤ÈκμÆËã pipeline¡£

¶Ô±È´«Í³¼Ü¹¹£¬Ëü´øÀ´µÄºÃ´¦ºÍ½â¾öµÄÎÊÌâÓÐÄÄÐ©ÄØ£¿

¼ÆËã¿Ú¾¶Óë´¦ÀíÂß¼­Í³Ò»£¬½µµÍ¿ª·¢ºÍÔËά³É±¾

´«Í³¼Ü¹¹Î¬»¤Á½Ì×Êý¾Ý pipeline ×î´óµÄÎÊÌâÔÚÓÚÐèÒª±£³ÖËüÃÇ´¦ÀíÂß¼­µÄµÈ¼ÛÐÔ£¬µ«ÓÉÓÚʹÓÃÁ˲»Í¬µÄ¼ÆËãÒýÇæ£¨±ÈÈçÀëÏßʹÓà Hive£¬ÊµÊ±Ê¹Óà Flink »ò Spark Streaming£©£¬SQL ÍùÍù²»ÄÜÖ±½ÓÌ×Ó㬴æÔÚ´úÂëÉϵIJîÒìÐÔ£¬¾­ÄêÀÛÔÂÏÂÀ´£¬ÀëÏߺÍʵʱ´¦ÀíÂß¼­ºÜ¿ÉÄÜ»áÍêÈ« diverge£¬ÓÐЩ´óµÄ¹«Ë¾ÉõÖÁ»á´æÔÚÁ½¸öÍŶӷֱðȥά»¤ÊµÊ±ºÍÀëÏßÊý²Ö£¬ÈËÁ¦ÎïÁ¦³É±¾·Ç³£¸ß¡£Flink Ö§³Ö Hive Streaming Sink ºó£¬ÊµÊ±´¦Àí½á¹û¿ÉÒÔʵʱ»ØÁ÷µ½ Hive ±í£¬ÀëÏߵļÆËã²ã¿ÉÒÔÍêȫȥµô£¬´¦ÀíÂß¼­ÓÉ Flink SQL ͳһά»¤£¬ÀëÏß²ãÖ»ÐèҪʹÓûØÁ÷ºÃµÄ ODS¡¢DWD¡¢DWS ±í×ö½øÒ»²½ ad-hoc ²éѯ¼´¿É¡£

ÀëÏß¶ÔÓÚ¡°Êý¾ÝÆ¯ÒÆ¡±µÄ´¦Àí¸ü×ÔÈ»£¬ÀëÏßÊý²Ö¡°ÊµÊ±»¯¡±

ÀëÏßÊý²Ö pipeline ·Ç data-driven µÄµ÷¶ÈÖ´Ðз½Ê½£¬ÔÚ¿ç·ÖÇøµÄÊý¾Ý±ß½ç´¦ÀíÉÏÍùÍùÐèÒªºÜ¶à trick À´±£Ö¤·ÖÇøÊý¾ÝµÄÍêÕûÐÔ£¬¶øÔÚÁ½Ì×Êý²Ö¼Ü¹¹²¢ÐеÄÇé¿öÏ£¬ÓÐʱ»á´æÔÚ¶Ô late event ´¦Àí²îÒìµ¼ÖÂÊý¾Ý¶Ô±È²»Ò»ÖµÄÎÊÌâ¡£¶øÊµÊ± data-driven µÄ´¦Àí·½Ê½ºÍ Flink ¶ÔÓÚ event time µÄÓѺÃÖ§³Ö±¾Éí¾ÍÒâζ×ÅÒÔÒµÎñʱ¼äΪ·ÖÇø£¨window£©£¬Í¨¹ý event time + watermark ¿ÉÒÔͳһ¶¨ÒåʵʱºÍÀëÏßÊý¾ÝµÄÍêÕûÐÔºÍʱЧÐÔ£¬Hive Streaming Sink ¸üÊǽâ¾öÁËÀëÏßÊý²Öͬ²½µÄ¡°×îºóÒ»¹«ÀïÎÊÌ⡱¡£

FLIP-105£ºÖ§³Ö Change Data Capture (CDC)

³ýÁË¶Ô Hive Streaming Sink µÄÖ§³Ö£¬Flink SQL 1.11 µÄÁíÒ»´óÁÁµã¾ÍÊÇÒýÈëÁË CDC »úÖÆ¡£CDC µÄÈ«³ÆÊÇ Change Data Capture£¬ÓÃÓÚ tracking Êý¾Ý¿â±íµÄÔöɾ¸Ä²é²Ù×÷£¬ÊÇĿǰ·Ç³£³ÉÊìµÄͬ²½Êý¾Ý¿â±ä¸üµÄÒ»ÖÖ·½°¸¡£ÔÚ¹úÄÚ³£¼ûµÄ CDC ¹¤¾ß¾ÍÊǰ¢À↑ԴµÄ Canal£¬ÔÚ¹úÍâ±È½ÏÁ÷ÐеÄÓÐ Debezium¡£Flink SQL ÔÚÉè¼ÆÖ®³õ¾ÍÌá³öÁË Dynamic Table ºÍ¡°Á÷±í¶þÏóÐÔ¡±µÄ¸ÅÄ²¢ÇÒÔÚ Flink SQL ÄÚ²¿ÍêÕûÖ§³ÖÁË Changelog ¹¦ÄÜ£¬Ïà¶ÔÓÚÆäËû¿ªÔ´Á÷¼ÆËãϵͳÊÇÒ»¸öÖØÒªÓÅÊÆ¡£±¾ÖÊÉÏ Changelog ¾ÍµÈ¼ÛÓÚÒ»ÕÅÒ»Ö±Ôڱ仯µÄÊý¾Ý¿âµÄ±í¡£Dynamic Table Õâ¸ö¸ÅÄîÊÇ Flink SQL µÄ»ùʯ£¬ Flink SQL µÄ¸÷¸öËã×ÓÖ®¼ä´«µÝµÄ¾ÍÊÇ Changelog£¬ÍêÕûµØÖ§³ÖÁË Insert¡¢Delete¡¢Update Õ⼸ÖÖÏûÏ¢ÀàÐÍ¡£

µÃÒæÓÚ Flink SQL ÔËÐÐʱµÄÇ¿´ó£¬Flink Óë CDC ¶Ô½ÓÖ»ÐèÒª½«ÍⲿµÄÊý¾ÝÁ÷תΪ Flink ϵͳÄÚ²¿µÄ Insert¡¢Delete¡¢Update ÏûÏ¢¼´¿É¡£½øÈëµ½ Flink ÄÚ²¿ºó£¬¾Í¿ÉÒÔÁé»îµØÓ¦Óà Flink ¸÷ÖÖ query Óï·¨ÁË¡£

ÔÚʵ¼ÊÓ¦ÓÃÖУ¬°Ñ Debezium Kafka Connect Service ×¢²áµ½ Kafka ¼¯Èº²¢´øÉÏÏëͬ²½µÄÊý¾Ý¿â±íÐÅÏ¢£¬Kafka Ôò»á×Ô¶¯´´½¨ topic ²¢¼àÌý Binlog£¬°Ñ±ä¸üͬ²½µ½ topic ÖС£ÔÚ Flink ¶ËÏëÒªÏû·Ñ´ø CDC µÄÊý¾ÝÒ²ºÜ¼òµ¥£¬Ö»ÐèÒªÔÚ DDL ÖÐÉùÃ÷ format = debezium-json ¼´¿É¡£

ÔÚ Flink 1.11 ÉÏ¿ª·¢ÕßÃÇ»¹×öÁËһЩÓÐȤµÄ̽Ë÷£¬¼ÈÈ» Flink SQL ÔËÐÐʱÄܹ»ÍêÕûÖ§³Ö Changelog£¬ÄÇÊÇ·ñÓпÉÄܲ»ÐèÒª Debezium »òÕß Canal µÄ·þÎñ£¬Ö±½Óͨ¹ý Flink »ñÈ¡ MySQL µÄ±ä¸üÄØ£¿´ð°¸µ±È»ÊÇ¿ÉÒÔ£¬Debezium Àà¿âµÄÁ¼ºÃÉè¼ÆÊ¹µÃËüµÄ API ¿ÉÒÔ±»·âװΪ Flink µÄ Source Function£¬²»ÐèÒªÔÙÆð¶îÍâµÄ Service£¬Ä¿Ç°Õâ¸öÏîÄ¿ÒѾ­¿ªÔ´£¬Ö§³ÖÁË MySQL ºÍ Postgres µÄ CDC ¶ÁÈ¡£¬ºóÐøÒ²»áÖ§³Ö¸ü¶àÀàÐ͵ÄÊý¾Ý¿â£¬¿ÉÒÆ²½½âËø¸ü¶àʹÓÃ×ËÊÆ¡£

ÏÂÃæµÄ Demo »á½éÉÜÈçºÎʹÓà flink-cdc-connectors ²¶»ñ mysql ºÍ postgres µÄÊý¾Ý±ä¸ü£¬²¢ÀûÓà Flink SQL ×ö¶àÁ÷ join ºóʵʱͬ²½µ½ elasticsearch ÖС£

¼ÙÉèÄãÔÚÒ»¸öµçÉ̹«Ë¾£¬¶©µ¥ºÍÎïÁ÷ÊÇÄã×îºËÐĵÄÊý¾Ý£¬ÄãÏëҪʵʱ·ÖÎö¶©µ¥µÄ·¢»õÇé¿ö¡£ÒòΪ¹«Ë¾ÒѾ­ºÜ´óÁË£¬ËùÒÔÉÌÆ·µÄÐÅÏ¢¡¢¶©µ¥µÄÐÅÏ¢¡¢ÎïÁ÷µÄÐÅÏ¢£¬¶¼·ÖÉ¢ÔÚ²»Í¬µÄÊý¾Ý¿âºÍ±íÖС£ÎÒÃÇÐèÒª´´½¨Ò»¸öÁ÷ʽ ETL£¬È¥ÊµÊ±Ïû·ÑËùÓÐÊý¾Ý¿âÈ«Á¿ºÍÔöÁ¿µÄÊý¾Ý£¬²¢½«ËûÃǹØÁªÔÚÒ»Æð£¬´ò³ÉÒ»¸ö´ó¿í±í¡£´Ó¶ø·½±ãÊý¾Ý·ÖÎöʦºóÐøµÄ·ÖÎö¡£

4 Flink SQL 1.12 δÀ´¹æ»®

ÒÔÉϽéÉÜÁË Flink SQL 1.11 µÄºËÐŦÄÜÓë×î¼Ñʵ¼ù£¬¶ÔÓÚϸö°æ±¾£¬ÔÆÐ°Ò²¸ø³öÁËһЩ ongoing µÄ¼Æ»®£¬²¢»¶Ó­´ó¼ÒÔÚÉçÇø»ý¼«Ìá³öÒâ¼û & ½¨Òé¡£

1.FLIP-132£ºTemporal Table DDL £¨Binlog ģʽµÄά±í¹ØÁª£©

2.FLIP-129£ºÖع¹ Descriptor API £¨Table API µÄ DDL£©

3.Ö§³Ö Schema Registry Avro ¸ñʽ

4.CDC ¸üÍêÉÆµÄÖ§³Ö£¨Åú´¦Àí£¬upsert Êä³öµ½ Kafka »ò Hive£©

5.ÓÅ»¯ Streaming File Sink СÎļþÎÊÌâ

6.N-ary input operator £¨Batch ÐÔÄÜÌáÉý£©

5 Appendix

ʹÓÃа汾 TableEnvironment Óöµ½µÄ³£¼û±¨´í¼°Ô­Òò

µÚÒ»¸ö³£¼û±¨´íÊÇ No operators defined in streaming topology. Óöµ½Õâ¸öÎÊÌâµÄÔ­ÒòÊÇÔÚÀϰ汾ÖÐÖ´ÐÐ INSERT INTO Óï¾äµÄÏÂÃæÁ½¸ö·½·¨

TableEnvironment#sqlUpdate()
TableEnvironment#execute()

ÔÚа汾ÖÐûÓÐÍêÈ«Ïòǰ¼æÈÝ£¨·½·¨»¹ÔÚ£¬Ö´ÐÐÂß¼­±äÁË£©£¬Èç¹ûûÓн« Table ת»»Îª AppendedStream/RetractStream ʱ£¨Í¨¹ýStreamExecutionEnvironmen t#toAppendStream / toRetractStream £©£¬ÉÏÃæµÄ´úÂëÖ´Ðоͻá³öÏÖÉÏÊö´íÎó£»Óë´Ëͬʱ£¬Ò»µ©×öÁËÉÏÊöת»»£¬¾Í±ØÐëʹÓà StreamExecutionEnvironment#execute() À´´¥·¢×÷ÒµÖ´ÐС£ËùÒÔ½¨ÒéÓû§»¹ÊÇÇ¨ÒÆµ½Ð°汾µÄ API ÉÏÃæ£¬ÓïÒåÉÏÒ²»á¸üÇåÎúһЩ¡£

µÚ¶þ¸öÎÊÌâÊǵ÷ÓÃÐ嵀 TableEnvironemnt#executeSql() ºó print ûÓп´µ½·µ»ØÖµ£¬Ô­ÒòÊÇÒòΪĿǰ print ÒÀÀµÁË checkpoint »úÖÆ£¬¿ªÆô exactly-onece ºó¾Í¿ÉÒÔÁË£¬Ð°汾»áÓÅ»¯´ËÎÊÌâ¡£

Àϰ汾µÄ StreamTableSource¡¢StreamTableSink ³£¼û±¨´í¼°Ð°汾ÓÅ»¯

µÚÒ»¸ö³£¼û±¨´íÊDz»Ö§³Ö¾«¶ÈÀàÐÍ£¬¾­³£³öÏÖÔÚ JDBC »òÕß HBase Êý¾ÝÔ´ÉÏ £¬ÔÚа汾ÉÏÕâ¸öÎÊÌâ¾Í²»»áÔÙ³öÏÖÁË¡£

µÚ¶þ¸ö³£¼û±¨´íÊÇ Sink ʱÕÒ²»µ½ PK£¬ÒòΪÀ쵀 StreamSink ÐèҪͨ¹ý query È¥ÍÆµ¼³ö PK£¬µ± query ±äµÃ¸´ÔÓʱÓпÉÄܻᶪʧ PK ÐÅÏ¢£¬µ«Êµ¼ÊÉÏ PK ÐÅÏ¢ÔÚ DDL Àï¾Í¿ÉÒÔ»ñÈ¡£¬Ã»ÓбØÒªÍ¨¹ý query È¥ÍÆµ¼£¬ËùÒÔа汾µÄ Sink ¾Í²»»áÔÙ³öÏÖÕâ¸ö´íÎóÀ²¡£

µÚÈý¸ö³£¼û±¨´íÊÇÔÚ½âÎö Source ºÍ Sink ʱ£¬Èç¹ûÓû§ÉÙÌî»òÕßÌî´íÁ˲ÎÊý£¬¿ò¼Ü·µ»ØµÄ±¨´íÐÅÏ¢ºÜÄ£ºý£¬¡°ÕÒ²»µ½ table factory¡±£¬Óû§Ò²²»ÖªµÀ¸ÃÔõôÐ޸ġ£ÕâÊÇÒòΪÀϰ汾 SPI Éè¼ÆµÃ±È½ÏͨÓã¬Ã»ÓÐ¶Ô Source ºÍ Sink ½âÎöµÄÂß¼­×öµ¥¶À´¦Àí£¬µ±Æ¥Åä²»µ½ÍêÕû²ÎÊýÁбíµÄʱºò¿ò¼ÜÒѾ­Ä¬Èϵ±Ç°µÄ table factory ²»ÊÇÒªÕҵģ¬È»ºó±éÀúËùÓÐµÄ table factories ·¢ÏÖÒ»¸öÒ²²»Æ¥Å䣬¾Í±¨ÁËÕâ¸ö´í¡£ÔÚаæµÄ¼ÓÔØÂß¼­ÀFlink »áÏÈÅÐ¶Ï connector ÀàÐÍ£¬ÔÙÆ¥ÅäÊ£ÓàµÄ²ÎÊýÁÐ±í£¬Õâ¸öʱºòÈç¹û±ØÌîµÄ²ÎÊýȱʧ»òÌî´íÁË£¬¿ò¼Ü¾Í¿ÉÒÔ¾«×¼±¨´í¸øÓû§¡£

   
2469 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢ 6-12[ÏÃÃÅ]
È˹¤ÖÇÄÜ.»úÆ÷ѧϰTensorFlow 6-22[Ö±²¥]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 6-30[±±¾©]
ǶÈëʽÈí¼þ¼Ü¹¹-¸ß¼¶Êµ¼ù 7-9[±±¾©]
Óû§ÌåÑé¡¢Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À 7-25[Î÷°²]
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-23[±±¾©]
 
×îÐÂÎÄÕÂ
´óÊý¾Ýƽ̨ϵÄÊý¾ÝÖÎÀí
ÈçºÎÉè¼ÆÊµÊ±Êý¾Ýƽ̨£¨¼¼Êõƪ£©
´óÊý¾Ý×ʲú¹ÜÀí×ÜÌå¿ò¼Ü¸ÅÊö
Kafka¼Ü¹¹ºÍÔ­Àí
ELK¶àÖּܹ¹¼°ÓÅÁÓ
×îпγÌ
´óÊý¾Ýƽ̨´î½¨Óë¸ßÐÔÄܼÆËã
´óÊý¾Ýƽ̨¼Ü¹¹ÓëÓ¦ÓÃʵս
´óÊý¾ÝϵͳÔËά
´óÊý¾Ý·ÖÎöÓë¹ÜÀí
Python¼°Êý¾Ý·ÖÎö
³É¹¦°¸Àý
ijͨÐÅÉ豸ÆóÒµ PythonÊý¾Ý·ÖÎöÓëÍÚ¾ò
Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
±±¾© Python¼°Êý¾Ý·ÖÎö
ÉñÁúÆû³µ ´óÊý¾Ý¼¼Êõƽ̨-Hadoop
ÖйúµçÐÅ ´óÊý¾Ýʱ´úÓëÏÖ´úÆóÒµµÄÊý¾Ý»¯ÔËӪʵ¼ù