±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁËÖ÷Òª½éÉÜÌÚѶ´óÊý¾Ý²¿ÃÅ»ùÓÚ Apache Flink ºÍ Apache
Iceberg ¹¹½¨ÊµÊ±Êý²ÖµÄÓ¦ÓÃʵ¼ù£¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£
±¾ÎÄÀ´×ÔÓÚÌÚÑ¶ÔÆ£¬ÓÉLinda±à¼¡¢ÍƼö¡£ |
|
Ò»£®±³¾°¼°Í´µã
Èçͼ 1 Ëùʾ£¬ÕâÊǵ±Ç°ÒѾÖúÁ¦µÄһЩÄÚ²¿Ó¦ÓõÄÓû§£¬ÆäÖÐС³ÌÐòºÍÊÓÆµºÅÕâÁ½¿îÓ¦ÓÃÿÌì»òÕßÿ¸öÔ²úÉúµÄÊý¾ÝÁ¿¶¼ÔÚ
PB ¼¶»òÕß EB ¼¶ÒÔÉÏ¡£

ͼ1
ÕâЩӦÓõÄÓû§ÔÚ¹¹½¨ËûÃÇ×Ô¼ºµÄÊý¾Ý·ÖÎöƽ̨¹ý³ÌÖУ¬ËûÃÇÍùÍù»á²ÉÓÃͼ 2 ÕâÑùµÄÒ»¸ö¼Ü¹¹£¬ÏàÐÅ´ó¼Ò¶ÔÕâ¸ö¼Ü¹¹Ò²·Ç³£µÄÊìϤÁË¡£
1.Êý¾Ýƽ̨¼Ü¹¹
ÒµÎñ·½±ÈÈçÌÚѶ¿´µã»òÕßÊÓÆµºÅµÄÓû§£¬ËûÃÇͨ³£»á²É¼¯Ó¦ÓÃǰ¶ËµÄÒµÎñ´òµãÊý¾ÝÒÔ¼°Ó¦Ó÷þÎñÈÕÖ¾Ö®ÀàµÄÊý¾Ý£¬ÕâЩÊý¾Ý»áͨ¹ýÏûÏ¢Öмä¼þ£¨Kafka/RocketMQ£©»òÕßÊý¾Ýͬ²½·þÎñ
(flume/nifi/dataX) ½ÓÈëÊý²Ö»òÕßʵʱ¼ÆËãÒýÇæ¡£
ÔÚÊý²ÖÌåϵÖлáÓи÷ÖÖ¸÷ÑùµÄ´óÊý¾Ý×é¼þ£¬Æ©Èç Hive/HBase/HDFS/S3£¬¼ÆËãÒýÇæÈç MapReduce¡¢Spark¡¢Flink£¬¸ù¾Ý²»Í¬µÄÐèÇó£¬Óû§»á¹¹½¨´óÊý¾Ý´æ´¢ºÍ´¦ÀíÆ½Ì¨£¬Êý¾ÝÔÚÆ½Ì¨¾¹ý´¦ÀíºÍ·ÖÎö£¬½á¹ûÊý¾Ý»á±£´æµ½
MySQL¡¢Elasticsearch µÈÖ§³Ö¿ìËÙ²éѯµÄ¹ØÏµÐÍ¡¢·Ç¹ØÏµÐÍÊý¾Ý¿âÖУ¬½ÓÏÂÀ´Ó¦Óòã¾Í¿ÉÒÔ»ùÓÚÕâЩÊý¾Ý½øÐÐ
BI ±¨±í¿ª·¢¡¢Óû§»Ïñ£¬»ò»ùÓÚ Presto ÕâÖÖ OLAP ¹¤¾ß½øÐн»»¥Ê½²éѯµÈ¡£

ͼ2
2.Lambda ¼Ü¹¹µÄÍ´µã
ÔÚÕû¸ö¹ý³ÌÖÐÎÒÃdz£³£»áÓÃһЩÀëÏߵĵ÷¶Èϵͳ£¬¶¨ÆÚµÄ£¨T+1 »òÕßÿ¸ô¼¸Ð¡Ê±£©È¥Ö´ÐÐһЩ Spark
·ÖÎöÈÎÎñ£¬×öһЩÊý¾ÝµÄÊäÈë¡¢Êä³ö»òÊÇ ETL ¹¤×÷¡£ÀëÏßÊý¾Ý´¦ÀíµÄÕû¸ö¹ý³ÌÖбØÈ»´æÔÚÊý¾ÝÑÓ³ÙµÄÏÖÏ󣬲»¹ÜÊÇÊý¾Ý½ÓÈ뻹ÊÇÖмäµÄ·ÖÎö£¬Êý¾ÝµÄÑÓ³Ù¶¼ÊDZȽϴóµÄ£¬¿ÉÄÜÊÇСʱ¼¶Ò²ÓпÉÄÜÊÇÌì¼¶±ðµÄ¡£ÁíÍâһЩ³¡¾°ÖÐÎÒÃÇÒ²³£³£»áΪÁËһЩʵʱÐÔµÄÐèÇóÈ¥¹¹½¨Ò»¸öʵʱ´¦Àí¹ý³Ì£¬±ÈÈç½èÖú
Flink+Kafka È¥¹¹½¨ÊµÊ±µÄÁ÷´¦Àíϵͳ¡£
ÕûÌåÉÏ£¬Êý²Ö¼Ü¹¹ÖÐÓзdz£¶àµÄ×é¼þ£¬´ó´óÔö¼ÓÁËÕû¸ö¼Ü¹¹µÄ¸´ÔÓÐÔºÍÔËάµÄ³É±¾¡£
ÈçÏÂͼ£¬ÕâÊǺܶ๫˾֮ǰ»òÕßÏÖÔÚÕýÔÚ²ÉÓÃµÄ Lambda ¼Ü¹¹£¬Lambda ¼Ü¹¹½«Êý²Ö·ÖΪÀëÏß²ãºÍʵʱ²ã£¬ÏàÓ¦µÄ¾ÍÓÐÅú´¦ÀíºÍÁ÷´¦ÀíÁ½¸öÏ໥¶ÀÁ¢µÄÊý¾Ý´¦ÀíÁ÷³Ì£¬Í¬Ò»·ÝÊý¾Ý»á±»´¦ÀíÁ½´ÎÒÔÉÏ£¬Í¬Ò»Ì×ÒµÎñÂß¼´úÂëÐèÒªÊÊÅäÐԵĿª·¢Á½´Î¡£Lambda
¼Ü¹¹´ó¼ÒÓ¦¸ÃÒѾ·Ç³£ÊìϤÁË£¬ÏÂÃæÎÒ¾Í×ÅÖØ½éÉÜÒ»ÏÂÎÒÃDzÉÓà Lambda ¼Ü¹¹ÔÚÊý²Ö½¨Éè¹ý³ÌÖÐÓöµ½µÄһЩʹµãÎÊÌâ¡£

ͼ3
ÀýÈçÔÚʵʱ¼ÆËãһЩÓû§Ïà¹ØÖ¸±êµÄʵʱ³¡¾°Ï£¬ÎÒÃÇÏë¿´µ½µ±Ç° pv¡¢uv ʱ£¬ÎÒÃǻὫÕâЩÊý¾Ý·Åµ½ÊµÊ±²ãÈ¥×öһЩ¼ÆË㣬ÕâЩָ±êµÄÖµ¾Í»áʵʱ³ÊÏÖ³öÀ´£¬µ«Í¬Ê±ÏëÁ˽âÓû§µÄÒ»¸öÔö³¤Ç÷ÊÆ£¬ÐèÒª°Ñ¹ýÈ¥Ò»ÌìµÄÊý¾Ý¼ÆËã³öÀ´¡£ÕâÑù¾ÍÐèҪͨ¹ýÅú´¦ÀíµÄµ÷¶ÈÈÎÎñÀ´ÊµÏÖ£¬±ÈÈçÁ賿Á½ÈýµãµÄʱºòÔÚµ÷¶ÈϵͳÉÏÆðÒ»¸ö
Spark µ÷¶ÈÈÎÎñ°Ñµ±ÌìËùÓеÄÊý¾ÝÖØÐÂÅÜÒ»±é¡£
ºÜÏÔÈ»ÔÚÕâ¸ö¹ý³ÌÖУ¬ÓÉÓÚÁ½¸ö¹ý³ÌÔËÐеÄʱ¼äÊDz»Ò»ÑùµÄ£¬ÅܵÄÊý¾ÝÈ´Ïàͬ£¬Òò´Ë¿ÉÄÜÔì³ÉÊý¾ÝµÄ²»Ò»Ö¡£ÒòΪijһÌõ»ò¼¸ÌõÊý¾ÝµÄ¸üУ¬ÐèÒªÖØÐÂÅÜÒ»±éÕû¸öÀëÏß·ÖÎöµÄÁ´Â·£¬Êý¾Ý¸üгɱ¾ºÜ´ó£¬Í¬Ê±ÐèҪά»¤ÀëÏߺÍʵʱ·ÖÎöÁ½Ì×¼ÆËãÆ½Ì¨£¬Õû¸öÉÏÏÂÁ½²ãµÄ¿ª·¢Á÷³ÌºÍÔËά³É±¾Æäʵ¶¼ÊǷdz£¸ßµÄ¡£
ΪÁ˽â¾ö Lambda ¼Ü¹¹´øÀ´µÄ¸÷ÖÖÎÊÌ⣬¾Íµ®ÉúÁË Kappa ¼Ü¹¹£¬Õâ¸ö¼Ü¹¹´ó¼ÒÓ¦¸ÃÒ²·Ç³£µÄÊìϤ¡£
3.Kappa ¼Ü¹¹µÄÍ´µã
ÎÒÃÇÀ´½²Ò»Ï Kappa ¼Ü¹¹£¬Èçͼ 4£¬ËüÖмäÆäʵÓõÄÊÇÏûÏ¢¶ÓÁУ¬Í¨¹ýÓà Flink ½«Õû¸öÁ´Â·´®ÁªÆðÀ´¡£Kappa
¼Ü¹¹½â¾öÁË Lambda ¼Ü¹¹ÖÐÀëÏß´¦Àí²ãºÍʵʱ´¦Àí²ãÖ®¼äÓÉÓÚÒýÇæ²»Ò»Ñù£¬µ¼ÖµÄÔËά³É±¾ºÍ¿ª·¢³É±¾¸ß°ºµÄÎÊÌ⣬µ«
Kappa ¼Ü¹¹Ò²ÓÐÆäÍ´µã¡£
Ê×ÏÈ£¬ÔÚ¹¹½¨ÊµÊ±ÒµÎñ³¡¾°Ê±£¬»áÓõ½ Kappa È¥¹¹½¨Ò»¸ö½üʵʱµÄ³¡¾°£¬µ«Èç¹ûÏë¶ÔÊý²ÖÖмä²ãÀýÈç
ODS ²ã×öһЩ¼òµ¥µÄ OLAP ·ÖÎö»òÕß½øÒ»²½µÄÊý¾Ý´¦Àíʱ£¬È罫Êý¾Ýдµ½ DWD ²ãµÄ Kafka£¬ÔòÐèÒªÁíÍâ½ÓÈë
Flink¡£Í¬Ê±£¬µ±ÐèÒª´Ó DWD ²ãµÄ Kafka °ÑÊý¾ÝÔÙµ¼Èëµ½ Clickhouse£¬Elasticsearch£¬MySQL
»òÕßÊÇ Hive ÀïÃæ×ö½øÒ»²½µÄ·ÖÎöʱ£¬ÏÔÈ»¾ÍÔö¼ÓÁËÕû¸ö¼Ü¹¹µÄ¸´ÔÓÐÔ¡£
Æä´Î£¬Kappa ¼Ü¹¹ÊÇÇ¿ÁÒÒÀÀµÏûÏ¢¶ÓÁеģ¬ÎÒÃÇÖªµÀÏûÏ¢¶ÓÁб¾ÉíÔÚÕû¸öÁ´Â·ÉÏÊý¾Ý¼ÆËãµÄ׼ȷÐÔÊÇÑϸñÒÀÀµËüÉÏÓÎÊý¾ÝµÄ˳Ðò£¬ÏûÏ¢¶ÓÁнӵÄÔ½¶à£¬·¢ÉúÂÒÐòµÄ¿ÉÄÜÐÔ¾ÍÔ½´ó¡£ODS
²ãÊý¾ÝÒ»°ãÊǾø¶Ô׼ȷµÄ£¬°Ñ ODS ²ãµÄÊý¾Ý·¢Ë͵½ÏÂÒ»¸ö kafka µÄʱºò¾ÍÓпÉÄÜ·¢ÉúÂÒÐò£¬DWD
²ãÔÙ·¢µ½ DWS µÄʱºò¿ÉÄÜÓÖÂÒÐòÁË£¬ÕâÑùÊý¾Ý²»Ò»ÖÂÐԾͻá±äµÃºÜÑÏÖØ¡£
µÚÈý£¬Kafka ÓÉÓÚËüÊÇÒ»¸ö˳Ðò´æ´¢µÄϵͳ£¬Ë³Ðò´æ´¢ÏµÍ³ÊÇûÓа취ֱ½ÓÔÚÆäÉÏÃæÀûÓà OLAP ·ÖÎöµÄһЩÓÅ»¯²ßÂÔ£¬ÀýÈçν´ÊÏÂÍÆÕâÀàµÄÓÅ»¯²ßÂÔ£¬ÔÚ˳Ðò´æ´¢µÄ
Kafka ÉÏÀ´ÊµÏÖÊDZȽÏÀ§ÄѵÄÊÂÇé¡£
ÄÇôÓÐûÓÐÕâÑùÒ»¸ö¼Ü¹¹£¬¼ÈÄܹ»Âú×ãʵʱÐÔµÄÐèÇó£¬ÓÖÄܹ»Âú×ãÀëÏß¼ÆËãµÄÒªÇ󣬶øÇÒ»¹Äܹ»¼õÇáÔËά¿ª·¢µÄ³É±¾£¬½â¾öͨ¹ýÏûÏ¢¶ÓÁй¹½¨
Kappa ¼Ü¹¹¹ý³ÌÖÐÓöµ½µÄһЩʹµã£¿´ð°¸Êǿ϶¨µÄ£¬ºóÃæµÄƪ·ù»áÏêϸÂÛÊö¡£

ͼ4
4.Í´µã×ܽá
¡ö ´«Í³ T+1 ÈÎÎñ
º£Á¿µÄTB¼¶ T+ 1 ÈÎÎñÑÓ³Ùµ¼ÖÂÏÂÓÎÊý¾Ý²ú³öʱ¼ä²»Îȶ¨¡£
ÈÎÎñÓöµ½¹ÊÕÏÖØÊÔ»Ö¸´´ú¼Û°º¹ó
Êý¾Ý¼Ü¹¹ÔÚ´¦ÀíÈ¥ÖØºÍ exactly-onceÓïÒåÄÜÁ¦·½Ãæ±È½Ï³ÔÁ¦
¼Ü¹¹¸´ÔÓ£¬Éæ¼°¶à¸öϵͳе÷£¬¿¿µ÷¶ÈϵͳÀ´¹¹½¨ÈÎÎñÒÀÀµ¹ØÏµ
¡ö Lambda ¼Ü¹¹Í´µã
ͬʱά»¤ÊµÊ±Æ½Ì¨ºÍÀëÏ߯½Ì¨Á½Ì×ÒýÇæ£¬ÔËά³É±¾¸ß
ʵʱÀëÏßÁ½¸öƽ̨ÐèҪά»¤Á½Ì׿ò¼Ü²»Í¬µ«ÒµÎñÂß¼Ïàͬ´úÂ룬¿ª·¢³É±¾¸ß
Êý¾ÝÓÐÁ½Ìõ²»Í¬Á´Â·£¬ÈÝÒ×Ôì³ÉÊý¾ÝµÄ²»Ò»ÖÂÐÔ
Êý¾Ý¸üгɱ¾´ó£¬ÐèÒªÖØÅÜÁ´Â·
¡ö Kappa ¼Ü¹¹Í´µã
¶ÔÏûÏ¢¶ÓÁд洢ҪÇó¸ß£¬ÏûÏ¢¶ÓÁеĻØËÝÄÜÁ¦²»¼°ÀëÏß´æ´¢
ÏûÏ¢¶ÓÁб¾Éí¶ÔÊý¾Ý´æ´¢ÓÐʱЧÐÔ£¬ÇÒµ±Ç°ÎÞ·¨Ê¹Óà OLAP ÒýÇæÖ±½Ó·ÖÎöÏûÏ¢¶ÓÁÐÖеÄÊý¾Ý
È«Á´Â·ÒÀÀµÏûÏ¢¶ÓÁеÄʵʱ¼ÆËã¿ÉÄÜÒòΪÊý¾ÝµÄʱÐòÐÔµ¼Ö½á¹û²»ÕýÈ·

ͼ5
5.ʵʱÊý²Ö½¨ÉèÐèÇó
ÊÇ·ñ´æÔÚÒ»ÖÖ´æ´¢¼¼Êõ£¬¼ÈÄܹ»Ö§³ÖÊý¾Ý¸ßЧµÄ»ØËÝÄÜÁ¦£¬Ö§³ÖÊý¾ÝµÄ¸üУ¬ÓÖÄܹ»ÊµÏÖÊý¾ÝµÄÅúÁ÷¶Áд£¬²¢ÇÒ»¹Äܹ»ÊµÏÖ·ÖÖÓ¼¶µ½Ãë¼¶µÄÊý¾Ý½ÓÈ룿
ÕâÒ²ÊÇʵʱÊý²Ö½¨ÉèµÄÆÈÇÐÐèÇó£¨Í¼ 6£©¡£Êµ¼ÊÉÏÊÇ¿ÉÒÔͨ¹ý¶Ô Kappa ¼Ü¹¹½øÐÐÉý¼¶£¬ÒÔ½â¾ö Kappa
¼Ü¹¹ÖÐÓöµ½µÄһЩÎÊÌ⣬½ÓÏÂÀ´Ö÷Òª·ÖÏíµ±Ç°±È½Ï»ðµÄÊý¾Ýºþ¼¼Êõ--Iceberg¡£

ͼ 6
¶þ¡¢Êý¾Ýºþ Apache Iceberg µÄ½éÉÜ
1.Iceberg ÊÇʲô
Ê×ÏȽéÉÜÒ»ÏÂʲôÊÇ Iceberg¡£¹ÙÍøÃèÊöÈçÏ£º
Apache Iceberg is an open table format for huge analytic
datasets. Iceberg adds tables to Presto and Spark
that use a high-performance format that works just
like a SQL table.
Iceberg µÄ¹Ù·½¶¨ÒåÊÇÒ»ÖÖ±í¸ñʽ£¬¿ÉÒÔ¼òµ¥Àí½âΪÊÇ»ùÓÚ¼ÆËã²ã£¨Flink , Spark£©ºÍ´æ´¢²ã£¨ORC£¬Parqurt£¬Avro£©µÄÒ»¸öÖмä²ã£¬ÓÃ
Flink »òÕß Spark ½«Êý¾ÝдÈë Iceberg£¬È»ºóÔÙͨ¹ýÆäËû·½Ê½À´¶ÁÈ¡Õâ¸ö±í£¬±ÈÈç Spark£¬Flink£¬Presto
µÈ¡£

ͼ 7
2.Iceberg µÄ table format ½éÉÜ
Iceberg ÊÇΪ·ÖÎöº£Á¿Êý¾Ý×¼±¸µÄ£¬±»¶¨ÒåΪ table format£¬table format
½éÓÚ¼ÆËã²ãºÍ´æ´¢²ãÖ®¼ä¡£
table format Ö÷ÒªÓÃÓÚÏòϹÜÀíÔڴ洢ϵͳÉϵÄÎļþ£¬ÏòÉÏΪ¼ÆËã²ãÌṩһЩ½Ó¿Ú¡£´æ´¢ÏµÍ³ÉϵÄÎļþ´æ´¢¶¼»á²ÉÓÃÒ»¶¨µÄ×éÖ¯ÐÎʽ£¬Æ©Èç¶ÁÒ»ÕÅ
Hive ±íµÄʱºò£¬HDFS Îļþϵͳ»á´øÒ»Ð© partition£¬Êý¾Ý´æ´¢¸ñʽ¡¢Êý¾ÝѹËõ¸ñʽ¡¢Êý¾Ý´æ´¢
HDFS Ŀ¼µÄÐÅÏ¢µÈ£¬ÕâЩÐÅÏ¢¶¼´æÔÚ Metastore ÉÏ£¬Metastore ¾Í¿ÉÒÔ³ÆÖ®ÎªÒ»ÖÖÎļþ×éÖ¯¸ñʽ¡£
Ò»¸öÓÅÐãµÄÎļþ×éÖ¯¸ñʽ£¬Èç Iceberg£¬¿ÉÒÔ¸ü¸ßЧµÄÖ§³ÖÉϲãµÄ¼ÆËã²ã·ÃÎÊ´ÅÅÌÉϵÄÎļþ£¬×öһЩ
list¡¢rename »òÕß²éÕҵȲÙ×÷¡£
3.Iceberg µÄÄÜÁ¦×ܽá
Iceberg Ŀǰ֧³ÖÈýÖÖÎļþ¸ñʽ parquet£¬Avro£¬ORC£¬Èçͼ 7£¬ÎÞÂÛÊÇ HDFS
»òÕß S3 ÉϵÄÎļþ£¬¿ÉÒÔ¿´µ½ÓÐÐдæÒ²ÓÐÁд棬ºóÃæ»áÏêϸµÄÈ¥½éÉÜÆä×÷Óá£Iceberg ±¾Éí¾ß±¸µÄÄÜÁ¦×ܽáÈçÏ£¨Èçͼ
8£©£¬ÕâЩÄÜÁ¦¶ÔÓÚºóÃæÎÒÃÇÀûÓà Iceberg À´¹¹½¨ÊµÊ±Êý²ÖÊǷdz£ÖØÒªµÄ¡£

ͼ8
»ùÓÚ¿ìÕյĶÁд·ÖÀëºÍ»ØËÝ
Á÷ÅúͳһµÄдÈëºÍ¶ÁÈ¡
²»Ç¿°ó¶¨¼ÆËã´æ´¢ÒýÇæ
ACID ÓïÒå¼°Êý¾Ý¶à°æ±¾
±í, ģʽ¼°·ÖÇøµÄ±ä¸ü
4.Iceberg µÄÎļþ×éÖ¯¸ñʽ½éÉÜ
ÏÂͼչʾµÄÊÇ Iceberg µÄÕû¸öÎļþ×éÖ¯¸ñʽ¡£´ÓÉÏÍùÏ¿´£º
Ê×ÏÈ×îÉϲãÊÇ snapshot Ä£¿é¡£Iceberg ÀïÃæµÄ snapshot ÊÇÒ»¸öÓû§¿É¶ÁÈ¡µÄ»ù±¾µÄÊý¾Ýµ¥Î»£¬Ò²¾ÍÊÇ˵Óû§Ã¿´Î¶ÁȡһÕűíÀïÃæµÄËùÓÐÊý¾Ý£¬¶¼ÊÇÒ»¸ösnapshot
ϵÄÊý¾Ý¡£
Æä´Î£¬manifest¡£Ò»¸ö snapshot ÏÂÃæ»áÓжà¸ö manifest£¬Èçͼ snapshot-0
ÓÐÁ½¸ö manifest£¬¶ø snapshot-1 ÓÐÈý¸ö manifest£¬Ã¿¸ö manifest
ÏÂÃæ»á¹ÜÀíÒ»¸öÖÁ¶à¸ö DataFiles Îļþ¡£
µÚÈý£¬DataFiles¡£manifest ÎļþÀïÃæ´æ·ÅµÄ¾ÍÊÇÊý¾ÝµÄÔªÐÅÏ¢£¬ÎÒÃÇ¿ÉÒÔ´ò¿ª manifest
Îļþ£¬¿ÉÒÔ¿´µ½ÀïÃæÆäʵÊÇÒ»ÐÐÐÐµÄ datafiles Îļþ·¾¶¡£
´ÓͼÉÏ¿´µ½£¬snapshot-1 °üº¬ÁË snapshop-0 µÄÊý¾Ý£¬¶ø snapshot-1 Õâ¸öʱ¿ÌдÈëµÄÊý¾ÝÖ»ÓÐ
manifest2£¬Õâ¸öÄÜÁ¦Æäʵ¾ÍΪÎÒÃǺóÃæÈ¥×öÔöÁ¿¶ÁÈ¡ÌṩÁËÒ»¸öºÜºÃµÄÖ§³Ö¡£

ͼ 9
5.Iceberg ¶Áд¹ý³Ì½éÉÜ
¡ö Apache Iceberg ¶Áд
Ê×ÏÈ£¬Èç¹ûÓÐÒ»¸ö write ²Ù×÷£¬ÔÚд snapsho-1 µÄʱºò£¬snapshot-1 ÊÇÐéÏß¿ò£¬Ò²¾ÍÊÇ˵´Ëʱ»¹Ã»Óз¢Éú
commit ²Ù×÷¡£Õâʱºò¶Ô snapshot-1 µÄ¶ÁÆäʵÊDz»¿É¶ÁµÄ£¬ÒòΪÓû§µÄ¶ÁÖ»ÄܶÁµ½ÒѾ commit
Ö®ºóµÄ snapshot¡£·¢Éú commit Ö®ºó²Å¿ÉÒÔ¶Á¡£Í¬Àí£¬»áÓÐ snapshot-2£¬snapshot-3¡£
Iceberg ÌṩµÄÒ»¸öÖØÒªÄÜÁ¦£¬¾ÍÊǶÁд·ÖÀëÄÜÁ¦¡£ÔÚ¶Ô snapshot-4 ½øÐÐдµÄʱºò£¬ÆäʵÊÇÍêÈ«²»Ó°Ïì¶Ô
snapshot-2 ºÍ snapshot-3 µÄ¶Á¡£Iceberg µÄÕâ¸öÄÜÁ¦¶ÔÓÚ¹¹½¨ÊµÊ±Êý²ÖÊǷdz£ÖØÒªµÄÄÜÁ¦Ö®Ò»¡£

ͼ 10
ͬÀí£¬¶ÁÒ²ÊÇ¿ÉÒÔ²¢·¢µÄ£¬¿ÉÒÔͬʱ¶Á s1¡¢s2¡¢s3 µÄ¿ìÕÕÊý¾Ý£¬Õâ¾ÍÌṩÁË»ØËݶÁµ½ snapshot-2
»òÕß snapshot-3 Êý¾ÝµÄÄÜÁ¦¡£Snapshot-4 дÍê³ÉÖ®ºó£¬»á·¢ÉúÒ»´Î commit
²Ù×÷£¬Õâ¸öʱºò snapshot-4 ±ä³ÉÁËʵÐÄ£¬´Ëʱ¾Í¿ÉÒÔ¶ÁÁË¡£ÁíÍ⣬¿ÉÒÔ¿´µ½ current Snapshot
µÄÖ¸ÕëÒÆµ½ s4£¬Ò²¾ÍÊÇ˵ĬÈÏÇé¿öÏ£¬Óû§¶ÔÒ»ÕűíµÄ¶Á²Ù×÷£¬¶¼ÊǶÁ current Snapshot
Ö¸ÕëËùÖ¸ÏòµÄ Snapshot£¬µ«²»»áÓ°ÏìÇ°ÃæµÄ snapshot µÄ¶Á²Ù×÷¡£
¡ö Apache Iceberg ÔöÁ¿¶Á
½ÓÏÂÀ´½²Ò»Ï Iceberg µÄÔöÁ¿¶Á¡£Ê×ÏÈÎÒÃÇÖªµÀ Iceberg µÄ¶Á²Ù×÷Ö»ÄÜ»ùÓÚÒѾÌá½»Íê³ÉµÄ
snapshot-1£¬´Ëʱ»áÓÐÒ»¸ö snapshot-2£¬¿ÉÒÔ¿´µ½Ã¿¸ö snapshot ¶¼°üº¬Ç°Ãæ
snapshot µÄËùÓÐÊý¾Ý£¬Èç¹ûÿ´Î¶¼¶ÁÈ«Á¿µÄÊý¾Ý£¬Õû¸öÁ´Â·É϶ԼÆËãÒýÇæÀ´Ëµ£¬¶ÁÈ¡µÄ´ú¼Û·Ç³£¸ß¡£
Èç¹ûֻϣÍû¶Áµ½µ±Ç°Ê±¿ÌÐÂÔöµÄÊý¾Ý£¬Õâ¸öʱºòÆäʵ¾Í¿ÉÒÔ¸ù¾Ý Iceberg µÄ snapshot µÄ»ØËÝ»úÖÆ£¬½ö¶ÁÈ¡
snapshot1 µ½ snapshot2 µÄÔöÁ¿Êý¾Ý£¬Ò²¾ÍÊÇ×ÏÉ«Õâ¿éµÄÊý¾Ý¿ÉÒÔ¶ÁµÄ¡£

ͼ 11
ͬÀí s3 Ò²ÊÇ¿ÉÒÔÖ»¶Á»ÆÉ«µÄÕâ¿éÇøÓòµÄÊý¾Ý£¬Í¬Ê±Ò²¿ÉÒÔ¶Á s3 µ½ s1 Õâ¿éµÄÔöÁ¿Êý¾Ý£¬»ùÓÚ
Flink source µÄ streaming reader ¹¦ÄÜÔÚÄÚ²¿ÎÒÃÇÒѾʵÏÖÕâÖÖÔöÁ¿¶ÁÈ¡µÄ¹¦ÄÜ£¬²¢ÇÒÒѾÔÚÏßÉÏÔËÐÐÁË¡£¸Õ²Å½²µ½ÁËÒ»¸ö·Ç³£ÖØÒªµÄÎÊÌ⣬¼ÈÈ»
Iceberg ÒѾÓÐÁ˶Áд·ÖÀ룬²¢·¢¶Á£¬ÔöÁ¿¶ÁµÄ¹¦ÄÜ£¬Iceberg Òª¸ú Flink ʵÏÖ¶Ô½Ó£¬ÄÇô¾Í±ØÐëʵÏÖ
Iceberg µÄ sink¡£
¡ö ʵʱСÎļþÎÊÌâ
ÉçÇøÏÖÔÚÒÑ¾ÖØ¹¹ÁË Flink ÀïÃæµÄ FlinkIcebergSink£¬ÌṩÁË global committee
µÄ¹¦ÄÜ£¬ÎÒÃǵļܹ¹Æäʵ¸úÉçÇøµÄ¼Ü¹¹ÊDZ£³ÖÒ»Öµģ¬ÇúÏß¿òÖеÄÕâ¿éÄÚÈÝÊÇ FlinkIcebergSink¡£
ÔÚÓжà¸ö IcebergStreamWriter ºÍÒ»¸ö IcebergFileCommitter
µÄÇé¿öÏ£¬ÉÏÓεÄÊý¾Ýдµ½ IcebergStreamWriter µÄʱºò£¬Ã¿¸ö writer ÀïÃæ×öµÄÊÂÇé¶¼ÊÇȥд
datafiles Îļþ¡£

ͼ 12
µ±Ã¿¸ö writer дÍê×Ô¼ºµ±Ç°ÕâÒ»Åú datafiles СÎļþµÄʱºò£¬¾Í»á·¢ËÍÏûÏ¢¸ø IcebergFileCommitter£¬¸æËßËü¿ÉÒÔÌá½»ÁË¡£¶ø
IcebergFileCommitter ÊÕµ½ÐÅÏ¢µÄʱ£¬¾ÍÒ»´ÎÐÔ½« datafiles µÄÎļþÌá½»£¬½øÐÐÒ»´Î
commit ²Ù×÷¡£
commit ²Ù×÷±¾ÉíÖ»ÊǶÔһЩÔʼÐÅÏ¢µÄÐ޸쬵±Êý¾Ý¶¼ÒѾдµ½´ÅÅÌÁË£¬Ö»ÊÇÈÃÆä´Ó²»¿É¼û±ä³É¿É¼û¡£ÔÚÕâ¸öÇé¿öÏ£¬Iceberg
Ö»ÐèÒªÓÃÒ»¸ö commit ¼´¿ÉÍê³ÉÊý¾Ý´Ó²»¿É¼û±ä³É¿É¼ûµÄ¹ý³Ì¡£
¡ö ʵʱСÎļþºÏ²¢
Flink ʵʱ×÷ÒµÒ»°ã»á³¤ÆÚÔÚ¼¯ÈºÖÐÔËÐУ¬ÎªÁËÒª±£Ö¤Êý¾ÝµÄʱЧÐÔ£¬Ò»°ã»á°Ñ Iceberg commit
²Ù×÷µÄʱ¼äÖÜÆÚÉè³É 30 Ãë»òÕßÊÇÒ»·ÖÖÓ¡£µ± Flink ×÷ÒµÅÜÒ»Ììʱ£¬Èç¹ûÊÇÒ»·ÖÖÓÒ»´Î commit£¬Ò»ÌìÐèÒª
1440 ¸ö commit£¬Èç¹û Flink ×÷ÒµÅÜÒ»¸öÔÂcommit ²Ù×÷»á¸ü¶à¡£ÉõÖÁ snapshot
commit µÄʱ¼ä¼ä¸ôÔ½¶Ì£¬Éú³ÉµÄ snapshot µÄÊýÁ¿»áÔ½¶à¡£µ±Á÷ʽ×÷ÒµÔËÐк󣬾ͻáÉú³É´óÁ¿µÄСÎļþ¡£
Õâ¸öÎÊÌâÈç¹û²»½â¾öµÄ»°£¬Iceberg ÔÚ Flink ´¦ÀíÒýÇæÉ쵀 sink ²Ù×÷¾Í²»¿ÉÓÃÁË¡£ÎÒÃÇÔÚÄÚ²¿ÊµÏÖÁËÒ»¸ö½Ð×ö
data compaction operator µÄ¹¦ÄÜ£¬Õâ¸ö operator ÊǸú×Å Flink
sink Ò»Æð×ߵġ£µ± Iceberg µÄ FlinkIcebergSink ÿÍê³ÉÒ»´Î commit
²Ù×÷µÄʱºò£¬Ëü¶¼»áÏòÏÂÓÎ FileScanTaskGen ·¢ËÍÏûÏ¢£¬¸æËß FileScanTaskGen
ÒѾÍê³ÉÁËÒ»´Î commit¡£

ͼ 13
FileScanTaskGen ÀïÃæ»áÓÐÏà¹ØµÄÂß¼£¬Äܹ»¸ù¾ÝÓû§µÄÅäÖûòÕßµ±Ç°´ÅÅ̵ÄÌØÐÔÀ´½øÐÐÎļþºÏ²¢ÈÎÎñµÄÉú³É²Ù×÷¡£FileScanTaskGen
·¢Ë͵½ DataFileRewitre µÄÄÚÈÝÆäʵ¾ÍÊÇÔÚ FileScanTaskGen ÀïÃæÉú³ÉµÄÐèÒªºÏ²¢µÄÎļþµÄÁÐ±í¡£Í¬Àí£¬ÒòΪºÏ²¢ÎļþÊÇÐèÒªÒ»¶¨µÄºÄʱ²Ù×÷£¬ËùÒÔÐèÒª½«Æä½øÐÐÒì²½µÄ²Ù×÷·Ö·¢µ½²»Í¬µÄ
task rewrite operator ÖС£
ÉÏÃæ½²¹ýµÄ Iceberg ÊÇÓÐ commit ²Ù×÷£¬¶ÔÓÚ rewrite Ö®ºóµÄÎļþÐèÒªÓÐÒ»¸öеÄ
snapshot ¡£ÕâÀï¶Ô Iceberg À´Ëµ£¬Ò²ÊÇÒ»¸ö commit ²Ù×÷£¬ËùÒÔ²ÉÓÃÒ»¸öµ¥²¢·¢µÄÏñ
commit ²Ù×÷Ò»ÑùµÄʼþ¡£
ÕûÌõÁ´Â·ÏÂÀ´£¬Ð¡ÎļþµÄºÏ²¢Ä¿Ç°²ÉÓõÄÊÇ commit ²Ù×÷£¬Èç¹û commit ²Ù×÷ºóÃæ×èÈûÁË£¬»áÓ°ÏìÇ°ÃæµÄдÈë²Ù×÷£¬Õâ¿éÎÒÃǺóÃæ»á³ÖÐøÓÅ»¯¡£ÏÖÔÚÎÒÃÇÒ²ÔÚ
Iceberg ÉçÇø¿ªÁËÒ»¸ö design doc ÎĵµÔÚÍÆ½ø£¬¸úÉçÇøÌÖÂÛ½øÐкϲ¢µÄÏà¹Ø¹¤×÷¡£
Èý¡¢Flink+Iceberg ¹¹½¨ÊµÊ±Êý²Ö
1.½üʵʱµÄÊý¾Ý½ÓÈë
Ç°Ãæ½éÉÜÁË Iceberg ¼ÈÖ§³Ö¶Áд·ÖÀ룬ÓÖÖ§³Ö²¢·¢¶Á¡¢ÔöÁ¿¶Á¡¢Ð¡ÎļþºÏ²¢£¬»¹¿ÉÒÔÖ§³ÖÃë¼¶µ½·ÖÖÓ¼¶µÄÑÓ³Ù£¬»ùÓÚÕâЩÓÅÊÆÎÒÃdz¢ÊÔ²ÉÓÃ
Iceberg ÕâЩ¹¦ÄÜÀ´¹¹½¨»ùÓÚ Flink µÄʵʱȫÁ´Â·ÅúÁ÷Ò»Ì廯µÄʵʱÊý²Ö¼Ü¹¹¡£
ÈçÏÂͼËùʾ£¬Iceberg ÿ´ÎµÄ commit ²Ù×÷£¬¶¼ÊǶÔÊý¾ÝµÄ¿É¼ûÐԵĸı䣬±ÈÈç˵ÈÃÊý¾Ý´Ó²»¿É¼û±ä³É¿É¼û£¬ÔÚÕâ¸ö¹ý³ÌÖУ¬¾Í¿ÉÒÔʵÏÖ½üʵʱµÄÊý¾Ý¼Ç¼¡£

ͼ 14
2.ʵʱÊý²Ö - Êý¾Ýºþ·ÖÎöϵͳ
´ËǰÐèÒªÏȽøÐÐÊý¾Ý½ÓÈ룬±ÈÈçÓà Spark µÄÀëÏßµ÷¶ÈÈÎÎñÈ¥ÅÜһЩÊý¾Ý£¬ÀÈ¡£¬³éÈ¡×îºóÔÙдÈëµ½ Hive
±íÀïÃæ£¬Õâ¸ö¹ý³ÌµÄÑÓʱ±È½Ï´ó¡£ÓÐÁË Iceberg µÄ±í½á¹¹£¬¿ÉÒÔÖмäʹÓà Flink£¬»òÕß spark
streaming£¬Íê³É½üʵʱµÄÊý¾Ý½ÓÈë¡£
»ùÓÚÒÔÉϹ¦ÄÜ£¬ÎÒÃÇÔÙÀ´»Ø¹ËÒ»ÏÂÇ°ÃæÌÖÂÛµÄ Kappa ¼Ü¹¹£¬Kappa ¼Ü¹¹µÄÍ´µãÉÏÃæÒѾÃèÊö¹ý£¬Iceberg
¼ÈÈ»Äܹ»×÷Ϊһ¸öÓÅÐãµÄ±í¸ñʽ£¬¼ÈÖ§³Ö Streaming reader£¬ÓÖ¿ÉÒÔÖ§³Ö Streaming
sink£¬ÊÇ·ñ¿ÉÒÔ¿¼Âǽ« Kafka Ìæ»»³É Iceberg£¿
Iceberg µ×²ãÒÀÀµµÄ´æ´¢ÊÇÏñ HDFS »ò S3 ÕâÑùµÄÁ®¼Û´æ´¢£¬¶øÇÒ Iceberg ÊÇÖ§³Ö
parquet¡¢orc¡¢Avro ÕâÑùµÄÁÐʽ´æ´¢¡£ÓÐÁÐʽ´æ´¢µÄÖ§³Ö£¬¾Í¿ÉÒÔ¶Ô OLAP ·ÖÎö½øÐлù±¾µÄÓÅ»¯£¬ÔÚÖмä²ãÖ±½Ó½øÐмÆËã¡£ÀýÈçν´ÊÏÂÍÆ×î»ù±¾µÄ
OLAP ÓÅ»¯²ßÂÔ£¬»ùÓÚ Iceberg snapshot µÄ Streaming reader ¹¦ÄÜ£¬¿ÉÒÔ°ÑÀëÏßÈÎÎñÌì¼¶±ðµ½Ð¡Ê±¼¶±ðµÄÑÓ³Ù´ó´óµÄ½µµÍ£¬¸ÄÔì³ÉÒ»¸ö½üʵʱµÄÊý¾Ýºþ·ÖÎöϵͳ¡£

ͼ 15
ÔÚÖм䴦Àí²ã£¬¿ÉÒÔÓà presto ½øÐÐһЩ¼òµ¥µÄ²éѯ£¬ÒòΪ Iceberg Ö§³Ö Streaming
read£¬ËùÒÔÔÚϵͳµÄÖмä²ãÒ²¿ÉÒÔÖ±½Ó½ÓÈë Flink£¬Ö±½ÓÔÚÖмä²ãÓà Flink ×öһЩÅú´¦Àí»òÕßÁ÷ʽ¼ÆËãµÄÈÎÎñ£¬°ÑÖмä½á¹û×ö½øÒ»²½¼ÆËãºóÊä³öµ½ÏÂÓΡ£
¡ö Ìæ»» Kafka µÄÓÅÁÓÊÆ
×ܵÄÀ´Ëµ£¬Iceberg Ìæ»» Kafka µÄÓÅÊÆÖ÷Òª°üÀ¨£º
ʵÏÖ´æ´¢²ãµÄÁ÷Åúͳһ
Öмä²ãÖ§³Ö OLAP ·ÖÎö
ÍêÃÀÖ§³Ö¸ßЧ»ØËÝ
´æ´¢³É±¾½µµÍ
µ±È»£¬Ò²´æÔÚÒ»¶¨µÄȱÏÝ£¬È磺
Êý¾ÝÑÓ³Ù´Óʵʱ±ä³É½üʵʱ
¶Ô½ÓÆäËûÊý¾ÝϵͳÐèÒª¶îÍ⿪·¢¹¤×÷

ͼ 16
¡ö Ãë¼¶·ÖÎö - Êý¾Ýºþ¼ÓËÙ
ÓÉÓÚ Iceberg ±¾ÉíÊǽ«Êý¾ÝÎļþÈ«²¿´æ´¢ÔÚ HDFS Éϵģ¬HDFS ¶ÁдÕâ¿é¶ÔÓÚÃë¼¶·ÖÎöµÄ³¡¾°£¬»¹ÊDz»Äܹ»ÍêÈ«Âú×ãÎÒÃǵÄÐèÇó£¬ËùÒÔ½ÓÏÂÈ¥ÎÒÃÇ»áÔÚ
Iceberg µ×²ãÖ§³Ö Alluxio ÕâÑùÒ»¸ö»º´æ£¬½èÖúÓÚ»º´æµÄÄÜÁ¦¿ÉÒÔʵÏÖÊý¾ÝºþµÄ¼ÓËÙ¡£Õâ¿éµÄ¼Ü¹¹Ò²ÔÚÎÒÃÇδÀ´µÄÒ»¸ö¹æ»®ºÍ½¨ÉèÖС£

ͼ 17
3.×î¼Ñʵ¼ù
¡ö ʵʱСÎļþºÏ²¢
Èçͼ 18 Ëùʾ£¬ÌÚѶÄÚ²¿ÒѾʵÏÖÁË Iceberg µÄÍêÈ« SQL »¯£¬ÆäʵÎÒÃÇÔÚ table
properties ÀïÃæ¿ÉÒÔÉèÖÃһЩСÎļþºÏ²¢µÄ²ÎÊý£¬ÀýÈç snapshot ´ïµ½¶àÉÙ½øÐÐÒ»´ÎºÏ²¢£¬Ò»¹²ÓжàÉÙ¸ö
snapshot ʱ½øÐкϲ¢µÈ£¬ÕâÑùµ×²ã¾Í¿ÉÒÔÖ±½Óͨ¹ýÒ»Ìõ insert Óï¾äÆô¶¯ Flink ÈëºþÈÎÎñ£¬Õû¸öÈÎÎñ¾Í¿ÉÒÔ³ÖÐøÔËÐУ¬ºǫ́Êý¾ÝµÄ
datafiles ÎļþÒ²»áÔÚºǫ́×Ô¶¯Íê³ÉºÏ²¢µÄ²Ù×÷¡£

ͼ 18
ÏÂÃæÕâÕÅͼ¾ÍÊÇ Iceberg ÖÐÊý¾ÝÎļþºÍÊý¾ÝÎļþ¶ÔÓ¦µÄ meta ÎļþµÄÐÅÏ¢£¬ÒòΪÏÖÔÚÉçÇø¿ªÔ´µÄ
IceberFlinkSink »¹Ã»ÓÐÎļþºÏ²¢µÄ¹¦ÄÜ£¬¿ÉÒÔ³¢ÊÔ´ò¿ªÒ»¸ö±È½ÏСµÄÁ÷´¦ÀíÈÎÎñ£¬È»ºóÔÚ×Ô¼ºµçÄÔÉÏÅÜһϣ¬¿ÉÒÔ¿´µ½
Flink ÈÎÎñÔËÐÐÖ®ºó£¬Ò»¶Îʱ¼äºó£¬¶ÔӦĿ¼µÄÎļþÊý¾Í»á±©ÕÇ¡£

ͼ 19
ÀûÓÃÁË Iceberg µÄʵʱºÏ²¢Ð¡Îļþ¹¦ÄÜÖ®ºó£¬¿ÉÒÔ¿´µ½ÎļþÊýÆäʵÊÇ¿ÉÒÔ¿ØÖÆÔÚÒ»¸ö±È½ÏÎȶ¨µÄÊýÁ¿¡£
¡ö Flink ʵʱÔöÁ¿¶ÁÈ¡
ʵÏÖʵʱÊý¾ÝµÄÔöÁ¿¶ÁÈ¡£¬¿ÉÒÔ½«ÆäÅäÖõ½ Iceberg µÄ table properties ²ÎÊýÀïÃæ£¬²¢ÇÒ¿ÉÒÔÖ¸¶¨´ÓÄĸö
snapshot ¿ªÊ¼Ïû·Ñ¡£Èç¹ûÖ¸¶¨ÁË´ÓÄĸö snapshot Ïû·ÑÖ®ºó£¬Ã¿´Î Flink ÈÎÎñÆô¶¯£¬¾ÍÖ»»á¶ÁÈ¡µ±Ç°×îÐÂ
snapshot ÀïÃæÐÂÔöµÄÊý¾Ý¡£

ͼ 20
ÔÚ±¾ÊµÀýÖУ¬¿ªÆôÁËСÎļþºÏ²¢µÄ¹¦ÄÜ£¬×îºóÓà SQL Æô¶¯ÁËÒ»¸ö Flink sink µÄÈëºþÈÎÎñ¡£
¡ö SQL Extension ¹ÜÀíÎļþ
µ±Ç°Óû§·Ç³£Ï£ÍûËùÓеÄÈÎÎñ¶¼Óà SQL À´½â¾ö£¬Ð¡ÎļþºÏ²¢µÄ¹¦ÄÜÆäʵֻÊÊÓÃÓÚÔÚÏßÉÏÅܵÄһЩ Flink
ÈÎÎñ£¬Ïà½ÏÓÚÀëÏßÈÎÎñÀ´Ëµ£¬Ã¿Ò»´Î commit ÖÜÆÚÄÚËüËùÉú³ÉµÄÎļþÊýÁ¿»òÕßÎļþ´óС¶¼²»»áÌØ±ð´ó¡£
µ«µ±Óû§µÄÈÎÎñÅÜÁ˱Ƚϳ¤µÄʱ¼ä£¬µ×²ãµÄÎļþ¿ÉÄÜÒѾ³ÉǧÉÏÍò¸öÁË£¬Õâ¸öʱºòÖ±½ÓÔÚÏßÉÏÓÃʵʱµÄÈÎÎñÈ¥×öºÏ²¢ÏÔÈ»ÊDz»ºÏÊʵ쬲¢¿ÉÄÜ»áÓ°Ïìµ½ÏßÉÏʵʱÈÎÎñµÄʱЧÐÔ£¬ÎÒÃÇ¿ÉÒÔͨ¹ýʹÓÃ
SQL extension À´´¦ÀíСÎļþºÏ²¢£¬»òÕßÊÇɾ³ýÒÅÁôµÄÎļþ£¬»òÕßÊǹýÆÚ snapshot¡£
ÎÒÃÇÄÚ²¿ÆäʵÒѾʵÏÖÁËͨ¹ýÓà SQL extension µÄ·½Ê½À´¹ÜÀí Iceberg ÔÚ´ÅÅÌÉϵÄÊý¾ÝºÍÊý¾ÝÔªÐÅÏ¢µÄÎļþ£¬ºóÃæÎÒÃÇ»á³ÖÐøµÄÍù
SQL extension Ôö¼Ó¸ü¶àµÄ¹¦ÄÜ£¬À´ÍêÉÆ Iceberg µÄ¿ÉÓÃÐÔ£¬ÌáÉýÓû§ÌåÑé¡£

ͼ 21
ËÄ¡¢Î´À´¹æ»®

ͼ 22
1.Iceberg ÄÚºËÄÜÁ¦ÌáÉý
Row-level delete ¹¦ÄÜ¡£ÔÚÓà Iceberg ¹¹½¨Õû¸öÊý¾ÝÁ´Â·µÄ¹ý³ÌÖУ¬Èç¹ûÓÐÊý¾ÝµÄ¸üÐÂÔõô°ì£¿Iceberg
µ±Ç°Ö»Ö§³Ö copy on write µÄ update µÄÄÜÁ¦£¬copy on write ¶ÔдÊÇÓÐÒ»¸ö·Å´óµÄ×÷Óã¬Èç¹ûÒªÕæÕýµÄÔÚÕû¸öÁ´Â·ÉϹ¹½¨Ò»¸öʵʱÊý¾Ý´¦Àí¹ý³Ì£¬»¹ÊÇÐèÒªÒ»¸ö¸ßЧµÄ
merge on read µÄ update ÄÜÁ¦¡£ÕâÊǷdz£ÖØÒªµÄ£¬ºóÃæÎÒÃÇÒ²»áÔÙ¼ÌÐø¸úÉçÇøºÏ×÷£¬ÌÚѶÄÚ²¿Ò²»áÈ¥×öһЩʵ¼ù£¬È¥ÍêÉÆ
Row-level delete µÄ¹¦ÄÜ¡£
SQL Extension ÄÜÁ¦ÍêÉÆ¡£ÎÒÃÇ»á¸ü¼ÓÍêÉÆ SQL Extension µÄÄÜÁ¦¡£
½¨Á¢Í³Ò»Ë÷Òý¼ÓËÙÊý¾Ý¼ìË÷¡£Iceberg ÏÖÔÚ²¢Ã»ÓÐͳһµÄË÷ÒýÀ´¼ÓËÙÊý¾Ý¼ìË÷£¬ÏÖÔÚÎÒÃÇÒ²ÔÚ¸úÉçÇøºÏ×÷£¬ÉçÇøÒ²Ìá³öÁËÒ»¸ö
Bloom Filter µÄË÷ÒýÄÜÁ¦£¬Í¨¹ý¹¹½¨Í³Ò»µÄË÷Òý£¬¿ÉÒÔ¼ÓËÙ iceberg ¼ìË÷ÎļþµÄÄÜÁ¦¡£
ÔÚ Iceberg µÄÄÚºËÌáÉý·½Ã棬ÎÒÃÇÖ÷ÒªÊÇÏ£ÍûÏÈÄܹ»°ÑÕâЩ¹¦ÄܸøÍêÉÆ¡£
2.ƽ̨½¨Éè
ÔÚÆ½Ì¨½¨Éè·½Ãæ£¬ÎÒÃǽ«³¢ÊÔ£º
Ê×ÏÈ£¬×Ô¶¯ Schema ʶ±ð³éÈ¡½¨±í¡£Ï£ÍûÄܹ»×Ô¶¯µÄ¸ù¾Ýǰ¶ËµÄÊý¾Ý Schema ÐÅÏ¢£¬Äܹ»×Ô¶¯µÄ½«Õâ¸ö±í¸ø´´½¨³öÀ´£¬¸ü·½±ãÓû§È¥Ê¹ÓÃÕû¸öÊý¾ÝÈëºþµÄÒ»¸öÁ÷³Ì¡£
Æä´Î£¬¸ü±ã½ÝµÄÊý¾ÝÔªÐÅÏ¢¹ÜÀí¡£Iceberg ÏÖÔÚµÄÔªÐÅÏ¢Æäʵ¶¼ÊÇÂãµÄ£¬¶¼ÊÇÖ±½Ó·ÅÔÚ hive metastore
Éϵģ¬Èç¹ûÓû§ÐèÒª²é¿´Êý¾ÝÔªÐÅÏ¢£¬Æäʵ»¹ÐèҪȥÅÜ SQL£¬ÎÒÃÇÏ£ÍûÔÚÆ½Ì¨»¯µÄ½¨ÉèÖаÑËü¸ø¼ÌÐøµÄÍêÉÆ¡£
µÚÈý£¬»ùÓÚ Alluxio ´òÔìÊý¾Ý¼ÓËٲ㡣ϣÍûÓà Alluxio ´òÔìÒ»¸öÊý¾Ýºþ¼ÓËٲ㹦ÄÜ£¬ÒÔ·½±ãÉϲã¸ü¼ÓºÃµÄȥʵÏÖÒ»¸öÃë¼¶·ÖÎöµÄÄÜÁ¦¡£
µÚËÄ£¬ÓëÄÚ²¿¸÷ϵͳ´òͨ¡£ÆäʵÎÒÃÇÄÚ²¿»¹ÓкܶàÏñʵʱÀëÏß·ÖÎöµÄ¸÷¸öϵͳ£¬ÎÒÃÇÒ²ÊÇÐèÒª½«ÎÒÃÇÕû¸öƽ̨¸úÄÚ²¿µÄ¸÷¸öϵͳ֮¼ä½øÐÐÒ»¸ö´òͨ´®ÁªµÄ¹¤×÷¡£
|