±à¼ÍƼö: |
±¾ÎÄÊôÓÚ»ù´¡ÎÄÕ£¬ÊʺÏÈëÃŵÄС»ï°éÃÇ£¬Ö÷Òª½éÉÜKuduÊÇʲô£¬ÓÐʲô£¬×îºóÒ»¸öС°¸Àý£¬Ï£Íû¶Ô´ó¼ÒÓаïÖú¡£
±¾ÎÄÀ´×Ôcnblogs£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
Cloudera KuduÊÇʲô£¿
kuduÊÇclouderaÔÚ2012¿ªÊ¼ÃØÃÜÑз¢µÄÒ»¿î½éÓÚhdfsºÍhbaseÖ®¼äµÄ¸ßËÙ·Ö²¼Ê½ÁÐʽ´æ´¢Êý¾Ý¿â¡£¼æ¾ßÁËhbaseµÄʵʱÐÔ¡¢hdfsµÄ¸ßÍÌÍ£¬ÒÔ¼°´«Í³Êý¾Ý¿âµÄsqlÖ§³Ö¡£×÷Ϊһ¿îʵʱ¡¢ÀëÏßÖ®¼äµÄ´æ´¢ÏµÍ³¡£¶¨Î»ºÍsparkÔÚ¼ÆËãϵͳÖеĵØÎ»·Ç³£ÏàËÆ¡£Èç¹û°Ñmr+hdfs×÷ΪÀëÏß¼ÆËã±êÅ䣬storm+hbase×÷Ϊʵʱ¼ÆËã±êÅä¡£spark+kuduÓпÉÄܳÉΪδÀ´×îÓоºÕùÁ¦µÄÒ»Öּܹ¹¡£
Ò²¾ÍÊÇkafka -> spark -> kuduÕâÖּܹ¹£¬Î´À´´Ë¼Ü¹¹ÊÇ·ñ»á·çÃÒ£¬ÔÝÇÒ²»ÑÔÂÛ¡£ÈÃÎÒÃÇÊÃÄ¿ÒÔ´ý°É£¡
KuduÊÇCloudera¿ªÔ´µÄÐÂÐÍÁÐʽ´æ´¢ÏµÍ³£¬ÊÇApache HadoopÉú̬ȦµÄгÉÔ±Ö®Ò»£¨incubating£©£¬×¨ÃÅΪÁ˶ԿìËٱ仯µÄÊý¾Ý½øÐпìËٵķÖÎö£¬Ìî²¹ÁËÒÔÍùHadoop´æ´¢²ãµÄ¿Õȱ¡£
KuduÊÇTodd Lipcon@Cloudera´øÍ·¿ª·¢µÄ´æ´¢ÏµÍ³£¬ÆäÕûÌåÓ¦ÓÃģʽºÍHBase±È½Ï½Ó½ü£¬¼´Ö§³ÖÐм¶±ðµÄËæ»ú¶Áд£¬²¢Ö§³ÖÅúÁ¿Ë³Ðò¼ìË÷¹¦ÄÜ¡£
Kudu ÊÇÒ»¸öÕë¶Ô Apache Hadoop ƽ̨¶ø¿ª·¢µÄÁÐʽ´æ´¢¹ÜÀíÆ÷¡£Kudu ¹²Ïí Hadoop
Éú̬ϵͳӦÓõij£¼û¼¼ÊõÌØÐÔ:ËüÔÚcommodity hardware£¨ÉÌÆ·Ó²¼þ£©ÉÏÔËÐУ¬horizontally
scalable£¨Ë®Æ½¿ÉÀ©Õ¹£©£¬²¢Ö§³Ö highly available£¨¸ß¿ÉÓã©ÐÔ²Ù×÷¡£
KuduµÄÄ¿±êÊÇ£ºÌṩ¿ìËÙµÄÈ«Á¿Êý¾Ý·ÖÎöÓëʵʱ´¦Àí¹¦ÄÜ£»³ä·ÖÀûÓÃÏȽøCPUÓëIO×ÊÔ´£»Ö§³ÖÊý¾Ý¸üУ»¼òµ¥¡¢¿ÉÀ©Õ¹µÄÊý¾ÝÄ£ÐÍ¡£
KuduµÄ¹ÙÍø

A new addition to the open source Apache Hadoop
ecosystem, Apache Kudu completes Hadoop's storage
layer to enablefast analytics on fast data.

±³¾°¡ª¡ª¹¦ÄÜÉϵĿհ×
Hadoop Éú̬ϵͳÓкܶà×é¼þ£¬Ã¿Ò»¸ö×é¼þÓв»Í¬µÄ¹¦ÄÜ¡£ÔÚÏÖʵ³¡¾°ÖУ¬Óû§ÍùÍùÐèҪͬʱ²¿ÊðºÜ¶à
Hadoop ¹¤¾ßÀ´½â¾öͬһ¸öÎÊÌ⣬ÕâÖּܹ¹³ÆÎª »ìºÏ¼Ü¹¹ (hybrid architecture)
¡£ ±ÈÈ磬Óû§ÐèÒªÀûÓà Hbase µÄ¿ìËÙ²åÈë¡¢¿ì¶Á random access µÄÌØÐÔÀ´µ¼ÈëÊý¾Ý£¬
HBase Ò²ÔÊÐíÓû§¶ÔÊý¾Ý½øÐÐÐ޸ģ¬ HBase ¶ÔÓÚ´óÁ¿Ð¡¹æÄ£²éѯҲ·Ç³£Ñ¸ËÙ¡£Í¬Ê±£¬Óû§Ê¹Óà HDFS/Parquet
+ Impala/Hive À´¶Ô³¬´óµÄÊý¾Ý¼¯½øÐвéѯ·ÖÎö£¬¶ÔÓÚÕâÀೡ¾°£¬ Parquet ÕâÖÖÁÐʽ´æ´¢Îļþ¸ñʽ¾ßÓм«´óµÄÓÅÊÆ¡£
ºÜ¶à¹«Ë¾¶¼³É¹¦µØ²¿ÊðÁË HDFS/Parquet + HBase »ìºÏ¼Ü¹¹£¬È»¶øÕâÖּܹ¹½ÏΪ¸´ÔÓ£¬¶øÇÒÔÚά»¤ÉÏҲʮ·ÖÀ§ÄÑ¡£Ê×ÏÈ£¬Óû§ÓÃ
Flume »ò Kafka µÈÊý¾Ý Ingest ¹¤¾ß½«Êý¾Ýµ¼Èë HBase £¬Óû§¿ÉÄÜÔÚ HBase
É϶ÔÊý¾Ý×öһЩÐ޸ġ£È»ºóÿ¸ôÒ»¶Îʱ¼ä ( ÿÌì»òÿÖÜ ) ½«Êý¾Ý´Ó Hbase Öе¼Èëµ½ Parquet
Îļþ£¬×÷Ϊһ¸öÐ嵀 partition ·ÅÔÚ HDFS ÉÏ£¬×îºóʹÓà Impala µÈ¼ÆËãÒýÇæ½øÐвéѯ£¬Éú³É×îÖÕ±¨±í¡£

ÕâÑùÒ»Ìõ¹¤¾ßÁ´·±Ëö¶ø¸´ÔÓ£¬¶øÇÒ»¹´æÔںܶàÎÊÌ⣬±ÈÈ磺
£¨1£©ÈçºÎ´¦Àíijһ¹ý³Ì³öÏÖʧ°Ü£¿¡¡¡¡
£¨2£©´Ó HBase ½«Êý¾Ýµ¼³öµ½Îļþ£¬¶à¾ÃµÄƵÂʱȽϺÏÊÊ£¿
£¨3£©µ±Éú³É×îÖÕ±¨±íʱ£¬×î½üµÄÊý¾Ý²¢ÎÞ·¨ÌåÏÖÔÚ×îÖÕ²éѯ½á¹ûÉÏ¡£
£¨4£©Î¬»¤¼¯ÈºÊ±£¬ÈçºÎ±£Ö¤¹Ø¼üÈÎÎñ²»Ê§°Ü£¿
£¨5£©Parquet ÊÇ immutable £¬Òò´Ëµ± HBase ÖÐɾ¸ÄijЩÀúÊ·Êý¾Ýʱ£¬ÍùÍùÐèÒªÈ˹¤¸ÉÔ¤½øÐÐͬ²½¡£
Õâʱºò£¬Óû§¾ÍÏ£ÍûÄܹ»ÓÐÒ»ÖÖÓÅÑŵĴ洢½â¾ö·½°¸£¬À´Ó¦¸¶²»Í¬ÀàÐ͵Ť×÷Á÷£¬²¢±£³Ö¸ßÐÔÄܵļÆËãÄÜÁ¦¡£
Cloudera ºÜÔç¾ÍÒâʶµ½Õâ¸öÎÊÌ⣬ÔÚ 2012 Äê¾Í¿ªÊ¼¼Æ»®¿ª·¢ Kudu Õâ¸ö´æ´¢ÏµÍ³£¬ÖÕÓÚÔÚ
2015 Äê·¢²¼²¢¿ªÔ´³öÀ´¡£ Kudu ÊÇ¶Ô HDFS ºÍ HBase ¹¦ÄÜÉϵIJ¹³ä£¬ÄÜÌṩ¿ìËٵķÖÎöºÍʵʱ¼ÆËãÄÜÁ¦£¬²¢ÇÒ³ä·ÖÀûÓÃ
CPU ºÍ I/O ×ÊÔ´£¬Ö§³ÖÊý¾ÝÔµØÐ޸ģ¬Ö§³Ö¼òµ¥µÄ¡¢¿ÉÀ©Õ¹µÄÊý¾ÝÄ£ÐÍ¡£
±³¾°¡ª¡ªÐµÄÓ²¼þÉ豸 RAM µÄ¼¼Êõ·¢Õ¹·Ç³£¿ì£¬Ëü±äµÃÔ½À´Ô½±ãÒË£¬ÈÝÁ¿Ò²Ô½À´Ô½´ó¡£ Cloudera µÄ¿Í»§Êý¾ÝÏÔʾ£¬ËûÃǵĿͻ§Ëù²¿ÊðµÄ·þÎñÆ÷£¬
2012 Äêÿ¸ö½Úµã½öÓÐ 32GB RAM £¬ÏÖÈç½ñÔö³¤µ½Ã¿¸ö½ÚµãÓÐ 128GB »ò 256GB RAM
¡£´æ´¢É豸ÉϸüÐÂÒ²·Ç³£¿ì£¬ ÔںܶàÆÕͨ·þÎñÆ÷Öв¿Êð SSD Ò²ÊÇÂżû²»ÏÊ¡£ HBase ¡¢ HDFS
¡¢ÒÔ¼°ÆäËûµÄ Hadoop ¹¤¾ß¶¼ÔÚ²»¶Ï×ÔÎÒÍêÉÆ£¬´Ó¶øÊÊÓ¦Ó²¼þÉϵÄÉý¼¶»»´ú¡£È»¶ø£¬´Ó¸ù±¾ÉÏ£¬ HDFS
»ùÓÚ 03 Äê GFS £¬ HBase »ùÓÚ 05 Äê BigTable £¬ÔÚµ±Ê±ÏµÍ³Æ¿¾±Ö÷Ҫȡ¾öÓڵײã´ÅÅÌËÙ¶È¡£µ±´ÅÅÌËٶȽÏÂýʱ£¬
CPU ÀûÓÃÂʲ»×ãµÄ¸ù±¾ÔÒòÊÇ´ÅÅÌËٶȵ¼ÖÂµÄÆ¿¾±£¬µ±´ÅÅÌËÙ¶ÈÌá¸ßÁËÖ®ºó£¬ CPU ÀûÓÃÂÊÌá¸ß£¬Õâʱºò
CPU ÍùÍù³ÉΪϵͳµÄÆ¿¾±¡£ HBase ¡¢ HDFS ÓÉÓÚÄê´ú¾ÃÔ¶£¬ÒѾºÜÄÑ´Ó»ù±¾¼Ü¹¹ÉϽøÐÐÐ޸쬶ø
Kudu ÊÇ»ùÓÚȫеÄÉè¼Æ£¬Òò´Ë¿ÉÒÔ¸ü³ä·ÖµØÀûÓà RAM ¡¢ I/O ×ÊÔ´£¬²¢ÓÅ»¯ CPU ÀûÓÃÂÊ¡£ÎÒÃÇ¿ÉÒÔÀí½âΪ£¬
Kudu Ïà±ÈÓëÒÔÍùµÄϵͳ£¬ CPU ʹÓýµµÍÁË£¬ I/O µÄʹÓÃÌá¸ßÁË£¬ RAM µÄÀûÓøü³ä·ÖÁË¡£
1. KuduµÄ¼ò½é Kudu Éè¼ÆÖ®³õ£¬ÊÇΪÁ˽â¾öÒ»ÏÂÎÊÌ⣺
¶ÔÊý¾ÝɨÃè (scan) ºÍËæ»ú·ÃÎÊ (random access) ͬʱ¾ßÓиßÐÔÄÜ£¬¼ò»¯Óû§¸´ÔӵĻìºÏ¼Ü¹¹£» ¸ß CPU ЧÂÊ£¬Ê¹Óû§¹ºÂòµÄÏȽø´¦ÀíÆ÷µÄµÄ»¨·ÑµÃµ½×î´ó»Ø±¨£»¸ß IO ÐÔÄÜ£¬³ä·ÖÀûÓÃÏȽø´æ´¢½éÖÊ£»Ö§³ÖÊý¾ÝµÄԵظüУ¬±ÜÃâ¶îÍâµÄÊý¾Ý´¦Àí¡¢Êý¾ÝÒÆ¶¯¡£
2. KuduÖ§³Ö¿çÊý¾ÝÖÐÐÄ replication
Kudu µÄºÜ¶àÌØÐÔ¸ú HBase ºÜÏñ£¬ËüÖ§³ÖË÷Òý¼üµÄ²éѯºÍÐ޸ġ£
Cloudera Ôø¾Ïë¹ý»ùÓÚ Hbase ½øÐÐÐ޸ģ¬È»¶ø½áÂÛÊÇ¶Ô HBase µÄ¸Ä¶¯·Ç³£´ó£¬ Kudu
µÄÊý¾ÝÄ£ÐͺʹÅÅÌ´æ´¢¶¼Óë Hbase ²»Í¬¡£ HBase ±¾Éí³É¹¦µÄÊÊÓÃÓÚ´óÁ¿µÄÆäËü³¡¾°£¬Òò´ËÐÞ¸Ä
HBase ºÜ¿ÉÄܳÔÁ¦²»Ìֺá£×îºó Cloudera ¾ö¶¨¿ª·¢Ò»¸öȫеĴ洢ϵͳ¡£

3. KuduµÄ¶ÔÍâ½Ó¿Ú Kudu Ìṩ C++ ºÍ JAVA API £¬¿ÉÒÔ½øÐе¥Ìõ»òÅúÁ¿µÄÊý¾Ý¶Áд£¬ schema µÄ´´½¨Ð޸ġ£³ý´ËÖ®Í⣬
Kudu »¹½«Óë hadoop Éú̬ȦµÄÆäËü¹¤¾ß½øÐÐÕûºÏ¡£Ä¿Ç°£¬ kudu beta °æ±¾¶Ô Impala
Ö§³Ö½ÏΪÍêÉÆ£¬Ö§³ÖÓà Impala ½øÐд´½¨±í¡¢É¾¸ÄÊý¾ÝµÈ´ó²¿·Ö²Ù×÷¡£ Kudu »¹ÊµÏÖÁË KuduTableInputFormat
ºÍ KuduTableOutputFormat £¬´Ó¶øÖ§³Ö Mapreduce µÄ¶Áд²Ù×÷¡£Í¬Ê±Ö§³ÖÊý¾ÝµÄ
locality£¨±¾µØÐÔ£© ¡£Ä¿Ç°¶Ô spark µÄÖ§³Ö»¹²»¹»ÍêÉÆ£¬ spark Ö»ÄܽøÐÐÊý¾ÝµÄ¶Á²Ù×÷¡£
4. ½Úµã Kudu-master£ºÖ÷½Úµã£¬Î¬»¤´æ´¢±íÔªÊý¾Ý£¬¸ú×Ùе÷ËùÓеÄtserverµÄ״̬ºÍÊý¾Ý£¬°²×°ÆæÊý½Úµã(×îÉÙÈý¸ö)¡£
Kudu-tserver£º´Ó½Úµã£¬´æ´¢¾ßÌå±íÊý¾ÝµÄ½Úµã£¬Ò»¸ö±íÊý¾Ý¿ÉÒÔÓжà¸ö¸±±¾£¬µ«Ö»ÓÐÒ»¸öleader²ÅÄܸºÔðдÇëÇó£¬leaderºÍfollower¶¼¿ÉÒÔ¸ºÔð¶ÁÇëÇó¡£°²×°×îÉÙÈý¸ö½Úµã¡£
ʹÓð¸Àý¡ª¡ªÐ¡Ã×
ΪʲôÕâÀïÓÃСÃ×À´×÷Ϊ°¸Àý£¬ÊÇÒòΪСÃ×ÔÚKudu×ßÔÚǰÁС£
СÃ×ÊÇHbaseµÄÖØ¶ÈÓû§£¬ËûÃÇÿÌìÓÐÔ¼50ÒÚÌõÓû§¼Ç¼¡£Ð¡Ã×ĿǰʹÓõÄÒ²ÊÇHDFS
+ HBaseÕâÑùµÄ»ìºÏ¼Ü¹¹¡£¿É¼û¸ÃÁ÷Ë®ÏßÏà¶Ô±È½Ï¸´ÔÓ£¬ÆäÊý¾Ý´æ´¢·ÖΪSequenceFile£¬HbaseºÍParquet¡£

ÔÚʹÓÃKuduÒÔºó£¬Kudu×÷ΪͳһµÄÊý¾Ý²Ö¿â£¬¿ÉÒÔͬʱ֧³ÖÀëÏß·ÖÎöºÍʵʱ½»»¥·ÖÎö¡£ÈçÏ£º

|