Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
KuduÉè¼ÆÔ­Àí³õ̽
 
×÷Õߣº NoSQLÂþ̸
 
  2378  次浏览      27
2020-4-23
 
±à¼­ÍƼö:
ÈçºÎÔÚÒ»¸öϵͳÖÐÈÚºÏOLTPÐÍËæ»ú¶ÁдÄÜÁ¦ÓëOLAPÐÍ·ÖÎöÄÜÁ¦£¬KuduÌṩÁËÓÅÐãµÄÉè¼ÆË¼Â·¡£

±¾ÎÄÀ´×Ôcsdn£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼­¡¢ÍƼö¡£

±¾ÎÄÖ÷Òª´ÓKuduµÄÉè¼ÆÂÛÎÄ×ÅÊÖ£¬½áºÏÓëHBaseµÄ¶Ô±È·ÖÎö£¬À´³õ²½½ÒʾKuduµÄÉè¼ÆÔ­Àí£¬²¿·ÖÉè¼ÆÔÚ×îеÄKudu°æ±¾ÖпÉÄÜÒѾ­¹ýʱ£¬µ«×î³õµÄÉè¼ÆË¼ÏëÒÀȻֵµÃ½è¼ø¡£

1KuduµÄÉè¼Æ³õÖÔ

ÔÚ½éÉÜKuduÊÇʲô֮ǰ£¬»¹ÊÇÏȼòµ¥µÄ˵һÏÂÏÖ´æÏµÍ³Õë¶Ô½á¹¹»¯Êý¾Ý´æ´¢Óë²éѯµÄһЩʹµãÎÊÌ⣬½á¹¹»¯Êý¾ÝµÄ´æ´¢£¬Í¨³£°üº¬ÈçÏÂÁ½ÖÖ·½Ê½£º

¾²Ì¬Êý¾Ýͨ³£ÒÔParquet/Carbon/AvroÐÎʽֱ½Ó´æ·ÅÔÚHDFSÖУ¬¶ÔÓÚ·ÖÎö³¡¾°£¬ÕâÖִ洢ͨ³£ÊǸü¼ÓÊʺϵġ£µ«ÎÞÂÛÒÔÄÄÖÖ·½Ê½´æÔÚÓÚHDFSÖУ¬¶¼ÄÑÒÔÖ§³Öµ¥Ìõ¼Ç¼¼¶±ðµÄ¸üУ¬Ëæ»ú¶ÁȡҲ²¢²»¸ßЧ¡£

¿É±äÊý¾ÝµÄ´æ´¢Í¨³£Ñ¡ÔñHBase»òÕßCassandra£¬ÒòΪËüÃÇÄܹ»Ö§³Ö¼Ç¼¼¶±ðµÄ¸ßÐ§Ëæ»ú¶Áд¡£µ«ÕâÖִ洢ȴ²¢²»ÊʺÏÀëÏß·ÖÎö³¡¾°£¬ÒòΪËüÃÇÔÚ´óÅúÁ¿Êý¾Ý»ñȡʱµÄÐÔÄܽϲÕë¶ÔHBase¶øÑÔ£¬ÓÐÁ½·½ÃæµÄÖ÷ÒªÔ­Òò£ºÒ»ÊÇHFile±¾ÉíµÄ½á¹¹¶¨Ò壬ËüÊǰ´ÐÐ×éÖ¯Êý¾ÝµÄ£¬ÕâÖÖ¸ñʽÕë¶Ô´ó¶àÊýµÄ·ÖÎö³¡¾°£¬¶¼»á´øÀ´½Ï´óµÄIOÏûºÄ£¬ÒòΪ¿ÉÄÜ»á¶ÁÈ¡ºÜ¶à²»±ØÒªµÄÊý¾Ý£¬Ïà¶Ô¶øÑÔParquet¸ñʽÕë¶Ô·ÖÎö³¡¾°¾Í×öÁ˺ܶàÓÅ»¯¡£ ¶þÊÇÓÉÓÚHBase±¾ÉíµÄLSM-Tree¼Ü¹¹¾ö¶¨µÄ£¬HBaseµÄ¶Áȡ·¾¶ÖУ¬²»½öÒª¿¼ÂÇÄÚ´æÖеÄÊý¾Ý£¬Í¬Ê±Òª¿¼ÂÇHDFSÖеÄÒ»¸ö»ò¶à¸öHFile£¬½ÏÖ®ÓÚÖ±½Ó´ÓHDFSÖжÁÈ¡Îļþ¶øÑÔ£¬ÕâÖÖ¶Áȡ·¾¶Êǹý³¤µÄ£©¡£

¿ÉÒÔ¿´³ö£¬ÈçÉÏÁ½ÖÖ´æ´¢·½Ê½£¬¶¼´æÔÚÃ÷ÏÔµÄÓÅȱµã£º

Ö±½Ó´æ·ÅÓÚHDFSÖУ¬ÊʺÏÀëÏß·ÖÎö£¬È´²»ÀûÓڼǼ¼¶±ðµÄËæ»ú¶Áд¡£

Ö±½Ó½«Êý¾Ý´æ·ÅÓÚHBase/CassandraÖУ¬ÊʺϼǼ¼¶±ðµÄËæ»ú¶Áд£¬¶ÔÀëÏß·ÖÎöÈ´²»ÓѺá£

µ«Ôںܶàʵ¼ÊÒµÎñ³¡¾°ÖУ¬Á½ÖÖ³¡¾°Ê±³£ÊDz¢´æµÄ¡£ÎÒÃǵÄͨ³£×ö·¨ÓÐÈçϼ¸ÖÖ£º

Êý¾Ý´æ·ÅÓÚHBaseÖУ¬¶ÔÓÚ·ÖÎöÈÎÎñ£¬»ùÓÚSpark/Hive On HBase½øÐУ¬ÐÔÄܽϲ

¶ÔÓÚ·ÖÎöÐÔÄÜÒªÇó½Ï¸ßµÄ£¬¿ÉÒÔ½«Êý¾ÝÔÚHDFS/HiveÖжàÈßÓà´æ·ÅÒ»·Ý£¬»òÕߣ¬½«HBaseÖеÄÊý¾Ý¶¨ÆÚµÄµ¼³ö³ÉParquet/Carbon¸ñʽµÄÊý¾Ý¡£ Ã÷ÏÔÕâÖÖ·½°¸¶ÔÒµÎñÓ¦ÓÃÌá³öÁ˽ϸߵÄÒªÇ󣬶øÇÒÈÝÒ×µ¼ÖÂÔÚÏßÊý¾ÝÓëÀëÏßÊý¾ÝÖ®¼äµÄÒ»ÖÂÐÔÎÊÌâ¡£

KuduµÄÉè¼Æ£¬¾ÍÊÇÊÔͼÔÚOLAPÓëOLTPÖ®¼ä£¬Ñ°ÇóÒ»¸ö×î¼ÑµÄ½áºÏµã£¬´Ó¶øÔÚÒ»¸öϵͳµÄÒ»·ÝÊý¾ÝÖУ¬¼ÈÄÜÖ§³ÖOLTPÐÍʵʱ¶ÁдÄÜÁ¦ÓÖÄÜÖ§³ÖOLAPÐÍ·ÖÎö¡£ÁíÍâÒ»¸ö³õÖÔ£¬ÔÚCloudera·¢²¼µÄ¡¶Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data¡·Ò»ÎÄÖÐÓÐÌá¼°£¬Kudu×÷Ϊһ¸öеķֲ¼Ê½´æ´¢ÏµÍ³ÆÚÍûÓÐЧÌáÉýCPUµÄʹÓÃÂÊ£¬¶øµÍCPUʹÓÃÂÊÇ¡ÊÇHBase/CassandraµÈϵͳµÄ×î´óÎÊÌâ¡£ÏÂÃæµÄÕ½ÚÖУ¬Ö÷Òª´ÓÂÛÎÄËù½ÒʾµÄÄÚÈÝÀ´½â¶ÁKuduµÄÉè¼ÆÔ­Àí¡£

2 KuduµÄÔ­Àí½éÉÜ

Kudu×ÔÉíµÄ¼Ü¹¹£¬²¿·Ö½è¼øÁËBigtable/HBase/SpannerµÄÉè¼ÆË¼Ïë¡£ÂÛÎĵÄ×÷ÕßÁбíÖУ¬Óм¸Î»ÊÇHBaseÉçÇøµÄCommitter/PBC³ÉÔ±£¬Òò´Ë£¬ÔÚÂÛÎÄÖÐÒ²ÄܺÜÉî¿ÌµÄ¸ÐÊܵ½HBase¶ÔKuduÉè¼ÆµÄһЩӰÏ죬Òò´Ë£¬ÔÚ±¾ÎĵĶà¸öµØ·½¶¼ÓÐ̸¼°KuduÓëHBaseÔÚÉè¼ÆÉϵÄÒìͬ¡£

2.1 ±íÓëSchema

KuduÉè¼ÆÊÇÃæÏò½á¹¹»¯´æ´¢µÄ£¬Òò´Ë£¬KuduµÄ±í£¬ÐèÒªÓû§ÔÚ½¨±íʱ¶¨ÒåËüµÄSchemaÐÅÏ¢£¬ÕâЩSchemaÐÅÏ¢°üº¬£ºÁж¨Ò壨º¬ÀàÐÍ£©£¬Primary Key¶¨Ò壨Óû§Ö¸¶¨µÄÈô¸É¸öÁеÄÓÐÐò×éºÏ£©¡£Êý¾ÝµÄΨһÐÔ£¬ÒÀÀµÓÚÓû§ËùÌṩµÄPrimary KeyÖеÄColumn×éºÏµÄÖµµÄΨһÐÔ¡£ KuduÌṩÁËAlterÃüÁîÀ´ÔöɾÁУ¬µ«Î»ÓÚPrimary KeyÖеÄÁÐÊDz»ÔÊÐíɾ³ýµÄ¡£

Kuduµ±Ç°²¢²»Ö§³Ö¶þ¼¶Ë÷Òý¡£

2.2 API

KuduÌṩÁËJava/C++Á½ÖÖÓïÑÔµÄAPI£¨¾¡¹ÜÒ²ÌṩÁËPython API£¬µ«Éд¦ÓÚExperimental½×¶Î£©¡£Í¨¹ýÕâЩAPI£¬¿ÉÒÔ½øÐÐÈçÏÂһЩ²Ù×÷£º

Insert/Update/Delete

ÅúÁ¿Êý¾Ýµ¼Èë/¸üвÙ×÷

Scan(¿ÉÖ§³Ö¼òµ¥µÄFilter)

2.3 ÊÂÎñÓëÒ»ÖÂÐÔÄ£ÐÍ

Kudu½ö½öÌṩµ¥ÐÐÊÂÎñ£¬Ò²²»Ö§³Ö¶àÐÐÊÂÎñ¡£ÕâÒ»µãÓëHBaseÊÇÏàËÆµÄ¡£µ«ÔÚÊý¾ÝÒ»ÖÂÐÔÄ£ÐÍÉÏ£¬ÓëHBaseÓнϴóµÄÇø±ð¡£ KuduÌṩÁËÈçÏÂÁ½ÖÖÒ»ÖÂÐÔÄ£ÐÍ£º

Snapshot Consistency

ÕâÊÇKuduÖеÄĬÈÏÒ»ÖÂÐÔÄ£ÐÍ¡£ÔÚÕâÖÖÄ£ÐÍÖУ¬Ö»±£Ö¤Ò»¸ö¿Í»§¶ËÄܹ»¿´µ½×Ô¼ºËùÌá½»µÄд²Ù×÷£¬¶ø²¢²»±£ÕÏÈ«¾ÖµÄ£¨¿ç¶à¸ö¿Í»§¶ËµÄ£©ÊÂÎñ¿É¼ûÐÔ¡£

External Consistency

×îÔçÌá³öExternal Consistency»úÖÆµÄ£¬Ó¦¸ÃÊÇÔÚGoogleµÄSpannerÂÛÎÄÖС£´«Í³¹ØÏµÐÍÊý¾Ý¿âÖеÄÁ½½×¶ÎÌá½»»úÖÆ£¬ÐèÒªÁ½»ØºÏͨÐÅ£¬Õâ¹ý³ÌÖдøÀ´µÄ´ú¼ÛÊǽϸߵ쬵«Í¬Ê±Õâ¹ý³ÌÖеĸ´ÔÓµÄËø»úÖÆÒ²¿ÉÄÜ»á´øÀ´Ò»Ð©¿ÉÓÃÐÔÎÊÌâ¡£Ò»¸ö¸üºÃµÄʵÏÖ·Ö²¼Ê½ÊÂÎñ/Ò»ÖÂÐÔµÄ˼·£¬ÊÇ»ùÓÚÒ»¸öÈ«¾Ö·¢²¼µÄTimestamp»úÖÆ¡£SpannerÌá³öÁËCommit-waitµÄ»úÖÆ£¬À´±£ÕÏÈ«¾ÖÊÂÎñµÄÓÐÐòÐÔ£ºÈç¹ûÒ»¸öÊÂÎñT1µÄÌá½»ÏÈÓÚÁíÍâÒ»¸öÊÂÎñT2µÄ¿ªÊ¼£¬ÔòT1µÄTimestampҪСÓÚT2µÄTimeStamp¡£ÎÒÃÇÖªµÀ£¬ÔÚ·Ö²¼Ê½ÏµÍ³ÖУ¬ÊǺÜÄÑÓÚ×öÕâÑùµÄ³ÐŵµÄ¡£ÔÚHBaseÖУ¬ÎÒÃÇ¿ÉÒÔÏëÏó£¬Èç¹ûËùÓÐRegionServerÖеÄSequenceID·¢²¼×Ôͬһ¸öÊý¾ÝÔ´£¬ÄÇô£¬HBaseµÄºÜ¶àÊÂÎñÐÔÎÊÌâ¾ÍÓ­Èжø½âÁË£¬È»ºó×î´óµÄÎÊÌâÔÚÓÚÕâ¸öÈ«¾ÖµÄSequenceIDÊý¾ÝÔ´½«»áÊÇÕû¸öϵͳµÄÐÔÄÜÆ¿¾±µã¡£»Øµ½External Consistency»úÖÆ£¬SpannerÊÇÒÀÀµÓڸ߾«¶ÈÓë¿ÉÔ¤¼ûÎó²îµÄ±¾µØÊ±ÖÓ(TrueTime API)ʵÏÖµÄ(¼´ÐèÒªÒ»¸ö¸ß¿É¿¿ºÍ¸ß¾«¶ÈµÄʱÖÓÔ´£¬Í¬Ê±£¬Õâ¸öʱÖÓµÄÎó²îÊÇ¿ÉÔ¤¼ûµÄ¡£¸ÐÐËȤµÄͬѧ¿ÉÒÔÔĶÁSpannerÂÛÎÄ£¬ÕâÀﲻ׸Êö)¡£KuduÖÐÌṩÁËÁíÍâÒ»ÖÖ˼·À´ÊµÏÖExternal Consistency,»ùÓÚTimestampÀ©É¢»úÖÆ£¬¼´£¬¶à¸ö¿Í»§¶Ë¿ÉÏ໥ͨÐÅÀ´¸æÖª±Ë´ËËùÌá½»µÄTimestampÖµ£¬´Ó¶ø±£ÕÏÒ»¸öÈ«¾ÖµÄ˳Ðò¡£ÕâÖÖ»úÖÆÒ²ÊÇÏà¶Ô½ÏΪ¸´Ôӵġ£

ÓëSpannerÀàËÆ£¬Kudu²»ÔÊÐíÓû§×Ô¶¨ÒåÓû§Êý¾ÝµÄTimestamp£¬µ«ÔÚHBaseÖÐÈ´ÊDz»Í¬£¬Óû§¿ÉÒÔ·¢ÆðÒ»´Î»ùÓÚÄ³ÌØ¶¨TimestampµÄ²éѯ¡£

2.4 KuduµÄ¼Ü¹¹

KuduÒ²²ÉÓÃÁËMaster-SlaveÐÎʽµÄÖÐÐĽڵã¼Ü¹¹£¬¹ÜÀí½Úµã±»³Æ×÷Kudu Master£¬Êý¾Ý½Úµã±»³Æ×÷Tablet Server£¨¿É¶Ô±ÈÀí½âHBaseÖеÄRegionServer½ÇÉ«£©¡£Ò»¸ö±íµÄÊý¾Ý£¬±»·Ö¸î³É1¸ö»ò¶à¸öTablet£¬Tablet±»²¿ÊðÔÚTablet ServerÀ´ÌṩÊý¾Ý¶Áд·þÎñ¡£?

Kudu MasterÔÚKudu¼¯ÈºÖУ¬·¢»ÓÈçϵÄһЩ×÷Óãº

1. ÓÃÀ´´æ·ÅһЩ±íµÄSchemaÐÅÏ¢£¬ÇÒ¸ºÔð´¦Àí½¨±íµÈÇëÇó¡£

2. ¸ú×Ù¹ÜÀí¼¯ÈºÖеÄËùÓеÄTablet Server£¬²¢ÇÒÔÚTablet ServerÒì³£Ö®ºóЭµ÷Êý¾ÝµÄÖØ²¿Êð¡£

3. ´æ·ÅTabletµ½Tablet ServerµÄ²¿ÊðÐÅÏ¢¡£

TabletÓëHBaseÖеÄRegion´óÖÂÏàËÆ£¬µ«´æÔÚÈçÏÂһЩÃ÷ÏÔµÄÇø±ðµã£º

Tablet°üº¬Á½ÖÖ·ÖÇø²ßÂÔ£¬Ò»ÖÖÊÇ»ùÓÚHash Partition·½Ê½£¬ÔÚÕâÖÖ·ÖÇø·½Ê½ÏÂÓû§Êý¾Ý¿É½Ï¾ùÔȵķֲ¼ÔÚ¸÷¸öTabletÖУ¬µ«Ô­À´µÄÊý¾ÝÅÅÐòÌØµãÒѱ»´òÂÒ¡£ÁíÍâÒ»ÖÖÊÇ»ùÓÚRange Partition·½Ê½£¬Êý¾Ý½«°´ÕÕÓû§Êý¾ÝÖ¸¶¨µÄÓÐÐòµÄPrimary Key ColumnsµÄ×éºÏStringµÄ˳Ðò½øÐзÖÇø¡£¶øHBaseÖнö½öÌṩÁËÒ»ÖÖ°´Óû§Êý¾ÝRowKeyµÄRange Partition·½Ê½¡£

Ò»¸öTablet¿ÉÒÔ±»²¿Êðµ½Á˶à¸öTablet ServerÖС£ÔÚHBase×î³õµÄ¼Ü¹¹ÖУ¬Ò»¸öRegionÖ»Äܱ»²¿ÊðÔÚÒ»¸öRegionServerÖУ¬ËüµÄÊý¾Ý¶à¸±±¾½»ÓÉHDFSÀ´±£ÕÏ¡£´Ó1.0°æ±¾¿ªÊ¼£¬HBaseÓÐÁËRegion Replica£¨HBASE-10070£©ÌØÐÔ£¬¸ÃÌØÐÔÔÊÐí½«Ò»¸öRegion²¿ÊðÔÚ¶à¸öRegionServerÖÐÀ´ÌáÉý¶ÁÈ¡µÄ¿ÉÓÃÐÔ£¬µ«¶àRegion¸±±¾Ö®¼äµÄÊý¾ÝÈ´²»ÊÇʵʱͬ²½µÄ¡£

ͼ1 KuduµÄÊý¾Ý¶à¸±±¾»úÖÆ

ͼ2 HBaseµÄÊý¾Ý¶à¸±±¾»úÖÆ

2.5 KuduµÄµ×²ãÊý¾ÝÄ£ÐÍ

KuduµÄµ×²ãÊý¾ÝÎļþµÄ´æ´¢£¬Î´²ÉÓÃHDFSÕâÑùµÄ½Ï¸ß³éÏó²ã´ÎµÄ·Ö²¼Ê½Îļþϵͳ£¬¶øÊÇ×ÔÐпª·¢ÁËÒ»Ì׿ɻùÓÚTable/Tablet/ReplicaÊÓͼ¼¶±ðµÄµ×²ã´æ´¢ÏµÍ³¡£ÕâÌ×ʵÏÖ»ùÓÚÈçÏµļ¸¸öÉè¼ÆÄ¿±ê£º

¿ÉÌṩ¿ìËÙµÄÁÐʽ²éѯ¡£

¿ÉÖ§³Ö¿ìËÙµÄËæ»ú¸üÐÂ

¿ÉÌṩ¸üΪÎȶ¨µÄ²éѯÐÔÄܱ£ÕÏ¡£

ΪÁËʵÏÖÈçÉÏÄ¿±ê£¬Kudu²Î¿¼ÁËÒ»ÖÖÀàËÆÓÚFractured MirrorsµÄ»ìºÏÁд洢¼Ü¹¹¡£TabletÔڵײ㱻½øÒ»²½Ï¸·Ö³ÉÁËÒ»¸ö³ÆÖ®ÎªRowSetsµÄµ¥Ôª£º

ͼ3 RowSets

MemRowSets¿ÉÒÔ¶Ô±ÈÀí½â³ÉHBaseÖеÄMemStore, ¶øDiskRowSets¿ÉÀí½â³ÉHBaseÖеÄHFile¡£MemRowSetsÖеÄÊý¾Ý°´ÕÕÐÐÊÔͼ½øÐд洢£¬Êý¾Ý½á¹¹ÎªB-Tree¡£MemRowSetsÖеÄÊý¾Ý±»Flushµ½´ÅÅÌÖ®ºó£¬ÐγÉDiskRowSets¡£ DisRowSetsÖеÄÊý¾Ý£¬°´ÕÕ32MB´óСΪµ¥Î»£¬°´Ðò»®·ÖΪһ¸ö¸öµÄDiskRowSet¡£

DiskRowSetÖеÄÊý¾Ý°´ÕÕColumn½øÐÐ×éÖ¯£¬ÓëParquetÀàËÆ¡£ÕâÊÇKudu¿ÉÖ§³ÖһЩ·ÖÎöÐÔ²éѯµÄ»ù´¡¡£Ã¿Ò»¸öColumnµÄÊý¾Ý±»´æ´¢ÔÚÒ»¸öÏàÁÚµÄÊý¾ÝÇøÓò£¬¶øÕâ¸öÊý¾ÝÇøÓò½øÒ»²½±»Ï¸·Ö³ÉÒ»¸ö¸öµÄСµÄPageµ¥Ôª£¬ÓëHBase FileÖеÄBlockÀàËÆ£¬¶Ôÿһ¸öColumn Page¿É²ÉÓÃһЩEncodingËã·¨£¬ÒÔ¼°Ò»Ð©Í¨ÓõÄCompressionËã·¨¡£

¼ÈÈ»¿É¶ÔColumn Page¿É²ÉÓÃEncodingÒÔ¼°CompressionËã·¨£¬ÄÇô£¬¶Ôµ¥Ìõ¼Ç¼µÄ¸ü¸Ä¾Í»á±È½ÏÀ§ÄÑÁË¡£Ç°ÃæÌáµ½ÁËKudu¿ÉÖ§³Öµ¥Ìõ¼Ç¼¼¶±ðµÄ¸üÐÂ/ɾ³ý£¬ÊÇÈçºÎ×öµ½µÄ£¿ÓëHBaseÀàËÆ£¬Ò²ÊÇͨ¹ýÔö¼ÓÒ»ÌõеļǼÀ´ÃèÊöÕâ´Î¸üÐÂ/ɾ³ý²Ù×÷µÄ¡£Ò»¸öDiskRowSet°üº¬Á½²¿·ÖÊý¾Ý£º»ù´¡Êý¾Ý(Base Data)£¬ÒÔ¼°±ä¸üÊý¾Ý(Delta Stores)¡£¸üÐÂ/ɾ³ý²Ù×÷ËùÉú³ÉµÄÊý¾Ý¼Ç¼£¬±»±£´æÔÚ±ä¸üÊý¾Ý²¿·Ö¡£

ͼ4 Delta Store Design

´ÓÉÏͼ£¨Ô´×ÔKuduµÄÔ´¹¤³ÌÎļþ£©À´¿´£¬DeltaÊý¾Ý²¿·ÖÓ¦¸Ã°üº¬REDOÓëUNDOÁ½²¿·Ö£¬ÕâÀïµÄREDOÓëUNDOÓë¹ØÏµÐÍÊý¾Ý¿âÖеÄREDOÓëUNDOÈÕÖ¾ÀàËÆ£¨ÔÚ¹ØÏµÐÍÊý¾Ý¿âÖУ¬REDOÈÕÖ¾¼Ç¼Á˸üкóµÄÊý¾Ý£¬¿ÉÒÔÓÃÀ´»Ö¸´ÉÐδдÈëData FileµÄÒѳɹ¦ÊÂÎñ¸üеÄÊý¾Ý¡£ ¶øUNDOÈÕÖ¾ÓÃÀ´¼Ç¼ÊÂÎñ¸üÐÂ֮ǰµÄÊý¾Ý£¬¿ÉÒÔÓÃÀ´ÔÚÊÂÎñʧ°Üʱ½øÐлعö£©£¬µ«Ò²´æÔÚһЩϸ½ÚÉϵIJîÒ죺

REDO Delta Files°üº¬ÁËBase Data×ÔÉÏÒ»´Î±»Flush/CompactionÖ®ºóµÄ±ä¸üÖµ¡£REDO Delta Files°´ÕÕTimestamp˳ÐòÅÅÁС£

UNDO Delta Files°üº¬ÁËBase Data×ÔÉÏÒ»´ÎFlush/Compaction֮ǰµÄ±ä¸üÖµ¡£ÕâÑù²Å¿ÉÒÔ±£ÕÏ»ùÓÚÒ»¸ö¾ÉTimestampµÄ²éѯÄܹ»¿´µ½Ò»¸öÒ»ÖÂÐÔÊÓͼ¡£UNDO°´ÕÕTimestampµ¹ÐòÅÅÁС£

2.6 Êý¾Ý¶ÁдÁ÷³Ì

дÊý¾ÝµÄÁ÷³Ì£¬ÈçÏÂͼËùʾ£º

ͼ5 Write Path

Kudu²»ÔÊÐíÓû§Êý¾ÝµÄPrimary KeyÖØ¸´£¬Òò´Ë£¬ÔÚTabletÄÚ²¿Ð´ÈëÊý¾Ý֮ǰ£¬ÐèÒªÏÈ´ÓÒÑÓеÄÊý¾ÝÖмì²éµ±Ç°ÐÂдÈëµÄÊý¾ÝµÄPrimary KeyÊÇ·ñÒѾ­´æÔÚ£¬¾¡¹ÜÔÚDiskRowSetsÖÐÔö¼ÓÁËBloomFilterÀ´ÌáÉýÕâÖÖÅжϵÄЧÂÊ£¬µ«¿ÉÒÔÔ¤¼û£¬KuduµÄÕâÖÖÉè¼Æ½«»áÃ÷ÏÔÔö´óдÈëµÄʱÑÓ¡£

Êý¾ÝÒ»¿ªÊ¼ÏÈ´æ·ÅÓÚMemRowSetsÖУ¬´ý´óС³¬³öÒ»¶¨µÄãÐÖµÖ®ºó£¬ÔÙFlush³ÉDiskRowSets¡£Õⲿ·ÖÒѾ­ÔÚͼ4ÖÐÓÐÏêϸµÄ½éÉÜ¡£Ëæ×ÅFlush´ÎÊýµÄ²»¶ÏÔö¼Ó£¬Éú³ÉµÄDiskRowSetsÒ²»á²»¶ÏµÄÔö¶à£¬ÔÚKuduÄÚ²¿Ò²´æÔÚÒ»¸öCompactionÁ÷³Ì£¬ÕâÑù¿ÉÒÔ½«ÒѾ­´æÔڵĶà¸ö´æÔÚPrimary Key½»¼¯µÄDiskRowSetsÖØÐÂÅÅÐò¶øÉú³ÉÒ»¸öеÄDiskRowSets¡£ÈçÏÂͼËùʾ£º

ͼ6 RowSet Compaction

¶ÁÊý¾ÝµÄÁ÷³Ì£¬¼ÈÒª¿¼ÂÇ´æÔÚÓÚÄÚ´æÖеÄMemRowSets,ÓÖÒª¶ÁȡλÓÚ´ÅÅÌÖеÄÒ»¸ö»ò¶à¸öDiskRowSets£¬ÔÚScannerµÄ¸ß²ã³éÏóÖУ¬Ó¦¸ÃÓëHBaseÀàËÆ¡£ÈçÏÂÖØµãÌáһЩϸ½ÚµÄÓÅ»¯µã£º

ͨ¹ýScanµÄ·¶Î§£¬Óëÿһ¸öDiskRowSetsÖеÄPrimary Key Range½øÐжԱȣ¬¿ÉÒÔÊ×ÏȹýÂ˵ôһЩ²»±ØÒª²ÎÓë´Ë´ÎScanµÄDiskRowSets¡£

Delta Store²¿·Ö£¬Õë¶Ô¼Ç¼¼¶±ðµÄ¸ü¸Ä£¬¼Ç¼ÁËBase DataÖжÔӦԭʼÊý¾ÝµÄOffset¡£ÕâÑù£¬ÔÚÅжÏÒ»Ìõ¼Ç¼ÊÇ·ñ´æÔÚ¸ü¸ÄµÄ¼Ç¼ʱ£¬½«»á¸ü¼ÓµÄ¿ìËÙ¡£

ÓÉÓÚDiskRowSetsµÄµ×²ãÎļþÊǰ´ÕÕÁÐ×éÖ¯µÄ£¬»ùÓÚһЩÁеÄÌõ¼þ½øÐйýÂ˲éѯʱ£¬¿ÉÒÔÓÅÏȹýÂ˵ôһЩ²»±ØÒªµÄPrimary Keys¡£Kudu²¢²»»áÔÚÒ»¿ªÊ¼¶ÁÈ¡µÄʱºò¾Í½«Ò»ÐÐÊý¾ÝµÄËùÓÐÁжÁÈ¡³öÀ´£¬¶øÊÇÏȶÁÈ¡Óë¹ýÂËÌõ¼þÏà¹ØµÄÁУ¬Í¨¹ý½«ÕâЩÁÐÓë²éѯÌõ¼þÆ¥ÅäÖ®ºó£¬ÔÙÀ´¾ö¶¨ÊÇ·ñÈ¥¶ÁÈ¡·ûºÏÌõ¼þµÄÐÐÖÐµÄÆäËüµÄÁÐÐÅÏ¢¡£ÕâÑù¿ÉÒÔ½ÚʡһЩ´ÅÅÌIO¡£Õâ¾ÍÊÇKuduËùÌṩµÄLazy MaterializationÌØÐÔ¡£

2.7 RaftÄ£ÐÍ

KuduµÄ¶à¸±±¾Ö®¼äµÄÊý¾Ý¹²Ê¶Ð­Òé²ÉÓÃÁËRaftЭÒ飬RaftÊDZÈPaxos¸üÈÝÒ×Àí½âÇÒ¸ü¼òµ¥µÄÒ»ÖÖÒ»ÖÂÐÔЭÒé¡£

¹ØÓÚRaftµÄ¸ü¶àÐÅÏ¢£¬Çë²Î¿¼£ºhttps://raft.github.io/

3 KuduÓëHBaseµÄÇø±ð

ÕâÀïÔÙ×ܽáÒ»ÏÂKuduÓëHBaseµÄһЩ´óµÄÇø±ðµã£º

KuduµÄÊý¾Ý·ÖÇø·½Ê½Ïà¶Ô¶àÑù»¯£¬¶øHBase½Ïµ¥Ò»¡£

KuduµÄTablet×ÔÉí¾ß±¸¶à¸±±¾»úÖÆ£¬¶øHBaseµÄRegionÒÀÀµÓڵײãHDFSµÄ¶à¸±±¾»úÖÆ¡£

Kuduµ×²ãÖ±½Ó²ÉÓñ¾µØÎļþϵͳ£¬ ¶øHBaseÒÀÀµÓÚHDFS¡£

KuduµÄµ×²ãÎļþ¸ñʽ²ÉÓÃÁËÀàËÆÓÚParquetµÄÁÐʽ´æ´¢¸ñʽ£¬¶øHBaseµÄµ×²ãHFileÎļþÈ´Êǰ´ÐÐÀ´×éÖ¯µÄ¡£

Kudu¹ØÓڵײãµÄFlushÈÎÎñÒÔ¼°CompactionÈÎÎñ£¬Äܹ»½áºÏæʱ»òÕßÏÐʱ½øÐÐ×Ô¶¯µÄµ÷Õû¡£HBase»¹Éв»¾ß±¸ÕâÖÖµ÷¶ÈÄÜÁ¦¡£

KuduµÄCompactionÎÞMinor/MajorµÄÇø·Ö£¬ÏÞÖÆÃ¿Ò»´ÎCompactionµÄIO×ÜÁ¿ÔÚ128MB´óС£¬Òò´Ë£¬²¢²»´æÔÚ³¤¾ÃÖ´ÐеÄCompactionÈÎÎñ¡£ CompactionÊǰ´Ðè½øÐеģ¬ÀýÈ磬Èç¹ûËùÓеÄдÈë¶¼ÊÇ˳ÐòдÈ룬Ôò½«²»»á´¥·¢Compaction¡£

KuduµÄÉè¼Æ£¬¼È¼æ¹ËÁË·ÖÎöÐ͵IJéѯÄÜÁ¦£¬ÓÖ¼æ¹ËÁËËæ»ú¶ÁдÄÜÁ¦£¬ÕâÑù£¬ÊƱØÒ²»á¸¶³öһЩ´ú¼Û¡£ ÀýÈ磬дÈëÊý¾Ýʱ¹ØÓÚPrimary KeyΨһÐÔµÄÏÞÖÆ£¬¾ÍÒªÇóдÈëǰҪ¼ì²é¶ÔÓ¦µÄPrimary KeyÊÇ·ñÒѾ­´æÔÚ£¬ÕâÑùÊÆ±Ø»áÔö´óдÈëµÄʱÑÓ¡£¶øµ×²ã¾¡¹Ü²ÉÓÃÁËÀàËÆÓÚParquetµÄÁÐʽÎļþÉè¼Æ£¬µ«ÓëHBaseÀàËÆµÄÈß³¤µÄ¶Áȡ·¾¶£¬Ò²»á¶Ô·ÖÎöÐԵIJéѯ´øÀ´Ò»Ð©Ó°Ïì¡£ÁíÍ⣬ÕâÖÖÉè¼ÆÔÚÕûÐжÁȡʱ£¬Ò²»á¸¶³ö½Ï¸ßµÄ´ú¼Û¡£

4 KuduÓëÏÖÓÐϵͳµÄ¶Ô½Ó

KuduÌṩÁËÓëÈçÏÂһЩϵͳµÄ¶Ô½Ó£º

MapReduce: ÌṩÕë¶ÔKuduÓû§±íµÄInputÒÔ¼°OutputÈÎÎñ¶Ô½Ó¡£

Spark: ÌṩÓëSpark SQLÒÔ¼°DataFramesµÄ¶Ô½Ó¡£

Impala: Kudu×ÔÉíδÌṩShellÒÔ¼°SQL Parser£¬ËùÒÔ£¬ËüµÄSQLÄÜÁ¦Ô´×ÔÓëImpalaµÄ¼¯³É¡£ÔÚÕâЩ¼¯³ÉÖУ¬Äܹ»ºÜºÃµÄ¸ÐÖªKudu±íÊý¾ÝµÄ±¾µØÐÔÐÅÏ¢£¬Äܹ»³ä·ÖÀûÓÃKuduËùÌṩµÄ¹ýÂËÆ÷¶Ô²éѯ½øÐÐÓÅ»¯£¬Í¬Ê±£¬Impala±¾ÉíµÄDDL/DMLÓï·¨Õë¶ÔKuduÒ²×öÁËһЩÀ©Õ¹¡£¿ÉÒÔÏëÏó£¬ClouderaÔÚImpalaÓëKuduµÄ¼¯³ÉÉÏ£¬Ò»¶¨»áÓиü¶àµÄ·¢Á¦µã¡£

5KuduµÄÊÊÓó¡¾°

Todd LipconÔÚStrata+Hadoop World 2015´ó»áÉÏËùÌṩµÄÖ÷ÌâΪ¡¶Kudu: Resolving transactional and analytic trade-offs in Hadoop¡·µÄÑݽ²ÖУ¬ÕâÑù×ÓÃèÊöKuduµÄÊÊÓó¡¾°£º

6 Kudu BenchmarkÊý¾Ý½âÎö

ÈçÏÂÊǶÔKudu WhitePageÖÐËùÌṩµÄһЩBenchmarkÐÔÄܲâÊÔÊý¾ÝµÄ¼òµ¥½âÎö(ÏêϸµÄ½á¹ûÇë²Î¿¼ÂÛÎĵĵÚ6Õ½Ú)£º

1.»ùÓÚTPC-H²âÊÔ±ê×¼£ºÕë¶ÔImpala On ParquetÒÔ¼°Impala On Kudu×öÁ˶ԱȲâÊÔ£¬Impala On KuduµÄƽ¾ùÐÔÄܱÈImpala On ParquetÌáÉýÁË31%¡£ÕâÊÇÓÉÓÚKuduËùÌṩµÄLazy MeterializationÌØÐÔÒÔ¼°¶Ô¶ÔCPUЧÂʵÄÌáÉý¶ø´øÀ´µÄ³É¹û¡£

2.Impala-KuduÓëPhoenix-HBaseµÄ¶Ô±È£º²âÊÔʹÓõ½ÁËTPC-HÖеÄlineitemÒ»±í£¬¹²µ¼ÈëÁË62GBµÄCSV¸ñʽµÄÊý¾Ý¡£ÔÚµ¼ÈëPhoenixʱʹÓÃÁËPhoenixËùÌṩµÄCsvBulkLoadTool¹¤¾ß¡£²âÊÔʱµÄһЩÅäÖÃÐÅÏ¢ÈçÏÂËùʾ£º

ΪPhoenix±í»®·ÖÁË100¸öHash Partitions¡£ÎªKudu´´½¨ÁË100¸öTablets¡£

HBase²ÉÓÃĬÈϵÄBlock Cache²ßÂÔ£¬ÎªÃ¿Ò»¸öRegionServerÅäÖÃÁË9.6GBµÄCacheÄÚ´æ¡£¶øKuduÅäÖÃÁË1GBµÄBlock CacheµÄ½ø³ÌÄڴ棬µ«Í¬Ê±»¹ÒÀÀµÓÚ²Ù×÷ϵͳµÄBuffer¡£

HBase±íÖвÉÓÃÁËFAST_DIFFµÄBlock EncodingËã·¨£¬Î´ÆôÓÃÈκÎѹËõ¡£

Êý¾Ýµ¼Èëµ½HBaseÖÐÖ®ºó£¬Ö÷¶¯´¥·¢ÁËÒ»´ÎMajor Compaction£¬À´È·±£Êý¾ÝµÄ±¾µØ»¯ÂÊ¡£62GBԭʼÊý¾Ýµ¼Èëµ½HBaseÖÐÖ®ºóµÄ×Ü´óСԼΪ570GB£¨ÕâÊÇÓÉÓÚδÆôÓÃCompressionѹËõ£¬Í¬Ê±£¬ÓÉÓÚ¶à¸öÁж¼ÊǶÀÁ¢´æÔڵĴøÀ´µÄÅòÕ͵¼Ö£©£¬¶øµ¼Èëµ½KuduÖÐÖ®ºóµÄ´óСԼΪ227GB¡£ÈçÏÂÊÇÏàÓ¦µÄ¶Ô±È²âÊÔ³¡¾°ÒÔ¼°¶Ô±È½á¹û£º

³ýÁË»ùÓÚKeyÖµµÄÕûÐÐÊý¾ÝµÄ²éѯÐÔÄÜ£¬PhoenixÓÐÃ÷ÏÔµÄÓÅÊÆÒÔÍ⣬ÆäËüµÄ»ùÓÚÕû±íɨÃ裬»òÕßÊÇ»ùÓÚһЩÁеIJéѯ£¬Impala-KuduÊÇÓÐÃ÷ÏÔµÄÓÅÊÆµÄ¡£

»ùÓÚScan + FilterµÄ²éѯ£¬HBase±¾Éí¾Í²»É󤡣

3.Ëæ»ú¶ÁдÄÜÁ¦µÄ¶Ô±È

ÈçÏÂÊǶԱȲâÊÔµÄһЩ³¡¾°£º

ÈçÏÂÊǶԱȲâÊԵĽá¹û£º

¹ØÓÚ¼ÓÔØÒÔ¼°Zipfian·Ö²¼Ä£Ê½Ï£¬HBaseµÄÓÅÊÆ¸ü¼ÓÃ÷ÏÔ£¬µ±Ç°KuduÒ²ÕýÔÚ×ö¹ØÓÚZipfian·Ö²¼Ä£Ê½ÏµÄÓÅ»¯£¨KUDU-749£©£¬¶øÔÚUniformģʽÏ£¬HBaseµÄÓÅÊÆÉÔÈõ¡£ÕûÌåÀ´¿´£¬ÔÚËæ»ú¶ÁдÉÏ£¬KuduµÄÉè¼Æ½ÏÖ®HBase¶øÑÔ£¬´æÔÚһЩÁÓÊÆ£¬ÕâÊÇΪÁ˼æ¹Ë·ÖÎöÐͲéѯËù¸¶³öµÄһЩ´ú¼Û¡£

 

 

 
   
2378 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]
 
×îÐÂÎÄÕÂ
´óÊý¾Ýƽ̨ϵÄÊý¾ÝÖÎÀí
ÈçºÎÉè¼ÆÊµÊ±Êý¾Ýƽ̨£¨¼¼Êõƪ£©
´óÊý¾Ý×ʲú¹ÜÀí×ÜÌå¿ò¼Ü¸ÅÊö
Kafka¼Ü¹¹ºÍÔ­Àí
ELK¶àÖּܹ¹¼°ÓÅÁÓ
×îпγÌ
´óÊý¾Ýƽ̨´î½¨Óë¸ßÐÔÄܼÆËã
´óÊý¾Ýƽ̨¼Ü¹¹ÓëÓ¦ÓÃʵս
´óÊý¾ÝϵͳÔËά
´óÊý¾Ý·ÖÎöÓë¹ÜÀí
Python¼°Êý¾Ý·ÖÎö
³É¹¦°¸Àý
ijͨÐÅÉ豸ÆóÒµ PythonÊý¾Ý·ÖÎöÓëÍÚ¾ò
Ä³ÒøÐÐ È˹¤ÖÇÄÜ+Python+´óÊý¾Ý
±±¾© Python¼°Êý¾Ý·ÖÎö
ÉñÁúÆû³µ ´óÊý¾Ý¼¼Êõƽ̨-Hadoop
ÖйúµçÐÅ ´óÊý¾Ýʱ´úÓëÏÖ´úÆóÒµµÄÊý¾Ý»¯ÔËӪʵ¼ù