±à¼ÍƼö: |
±¾ÆªÎÄÕ¾͸ÅÄî¡¢¹¤×÷»úÖÆ¡¢Êý¾Ý±¸·Ý¡¢ÓÅÊÆÓë²»×ã4¸ö·½ÃæÏêϸ½éÉÜÁËApache
Kylin¡£
±¾ÎÄÀ´×ÔСÃ×ÔËά£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
Apache Kylin ¼ò½é
1. Apache kylin ÊÇÒ»¸ö¿ªÔ´µÄº£Á¿Êý¾Ý·Ö²¼Ê½Ô¤´¦ÀíÒýÇæ¡£Ëüͨ¹ý ANSI-SQL ½Ó¿Ú£¬Ìṩ»ùÓÚ
hadoop µÄ³¬´óÊý¾Ý¼¯£¨TB-PB ¼¶£©µÄ¶àά·ÖÎö£¨OLAP£©¹¦ÄÜ¡£

2. kylin ¿ÉʵÏÖ³¬´óÊý¾Ý¼¯ÉϵÄÑÇÃë¼¶£¨sub-second latency£©²éѯ¡£
1£©È·¶¨ hadoop ÉÏÒ»¸öÐÇÐÍģʽµÄÊý¾Ý¼¯¡£
2£©¹¹½¨Êý¾ÝÁ¢·½Ìå cube¡£
3£©¿Éͨ¹ý ODBC, JDBC,RESTful API µÈ½Ó¿ÚÔÚÑÇÃë¼¶µÄÑÓ³ÙÄÚ²éѯÏà
Apache Kylin ºËÐĸÅÄî
1. ±í£¨Table £©£º±í¶¨ÒåÔÚ hive ÖУ¬ÊÇÊý¾ÝÁ¢·½Ì壨Data cube£©µÄÊý¾ÝÔ´£¬ÔÚ
build cube ֮ǰ£¬±ØÐëͬ²½ÔÚ kylin ÖС£
2. Ä£ÐÍ£¨model£©: Ä£ÐÍÃèÊöÁËÒ»¸öÐÇÐÍģʽµÄÊý¾Ý½á¹¹£¬Ëü¶¨ÒåÁËÒ»¸öÊÂʵ±í£¨Fact Table£©ºÍ¶à¸ö²éÕÒ±í£¨Lookup
Table£©µÄÁ¬½ÓºÍ¹ýÂ˹ØÏµ¡£
3. Á¢·½Ì壨Cube£©£ºËü¶¨ÒåÁËʹÓõÄÄ£ÐÍ¡¢Ä£ÐÍÖеıíµÄά¶È£¨dimension£©¡¢¶ÈÁ¿£¨measure
, Ò»°ãÖ¸¾ÛºÏº¯Êý£¬È磺sum¡¢count¡¢average µÈ£©¡¢ÈçºÎ¶Ô¶Î·ÖÇø£¨ segments
partition£©¡¢ºÏ²¢¶Î£¨segments auto-merge£©µÈµÄ¹æÔò¡£
4. Á¢·½Ìå¶Î£¨Cube Segment£©£ºËüÊÇÁ¢·½Ìå¹¹½¨£¨build£©ºóµÄÊý¾ÝÔØÌ壬һ¸ö segment
Ó³Éä hbase ÖеÄÒ»ÕÅ±í£¬Á¢·½ÌåʵÀý¹¹½¨£¨build£©ºó£¬»á²úÉúÒ»¸öÐ嵀 segment£¬Ò»µ©Ä³¸öÒѾ¹¹½¨µÄÁ¢·½ÌåµÄÔʼÊý¾Ý·¢Éú±ä»¯£¬Ö»ÐèˢУ¨fresh£©±ä»¯µÄʱ¼ä¶ÎËù¹ØÁªµÄ
segment ¼´¿É¡£
5. ×÷Òµ£¨Job£©£º¶ÔÁ¢·½ÌåʵÀý·¢³ö¹¹½¨£¨build£©ÇëÇóºó£¬»á²úÉúÒ»¸ö×÷Òµ¡£¸Ã×÷Òµ¼Ç¼ÁËÁ¢·½ÌåʵÀý
build ʱµÄÿһ²½ÈÎÎñÐÅÏ¢¡£×÷ÒµµÄ״̬ÐÅÏ¢·´Ó³¹¹½¨Á¢·½ÌåʵÀýµÄ½á¹ûÐÅÏ¢¡£Èç×÷ÒµÖ´ÐеÄ״̬ÐÅϢΪ
RUNNING ʱ£¬±íÃ÷Á¢·½ÌåʵÀýÕýÔÚ±»¹¹½¨£»Èô×÷ҵ״̬ÐÅϢΪ FINISHED £¬±íÃ÷Á¢·½ÌåʵÀý¹¹½¨³É¹¦£»Èô×÷ҵ״̬ÐÅϢΪ
ERROR £¬±íÃ÷Á¢·½ÌåʵÀý¹¹½¨Ê§°Ü£¡×÷ÒµµÄËùÓÐ״̬ÈçÏ£º
1£©NEW - This denotes one job has been just created.
2£©PENDING - This denotes one job is paused by job
scheduler and waiting for resources.
3£©RUNNING - This denotes one job is running in progress.
4£©FINISHED - This denotes one job is successfully
finished.
5£©ERROR - This denotes one job is aborted with errors.
6£©DISCARDED - This denotes one job is cancelled by
end users.
Apache Kylin ¹¤×÷»úÖÆ
1. Apache kylin ÄÜÌṩµÍÑÓ³Ù£¨sub-second latency£©µÄÃØ¾÷¾ÍÊÇÔ¤¼ÆË㣬¼´Õë¶ÔÒ»¸öÐÇÐÍÍØÆË½á¹¹µÄÊý¾ÝÁ¢·½Ì壬Ԥ¼ÆËã¶à¸öά¶È×éºÏµÄ¶ÈÁ¿£¬È»ºó½«½á¹û±£´æÔÚ
hbase ÖУ¬¶ÔÍⱩ¶ JDBC¡¢ODBC¡¢Rest API µÄ²éѯ½Ó¿Ú£¬¼´¿ÉʵÏÖʵʱ²éѯ¡£Êý¾ÝÁ¢·½ÌåÒ»°ãÓÉ
Hive ÖеÄÒ»¸öÊÂʵ±í, ¶à¸ö²éÕÒ±í×é³É¡£Ô¤¼ÆËãµÄ¹ý³ÌÔÚ kylin ÖоÍÊÇ Cube µÄ build
¹ý³Ì£¬ÈçÏÂͼ£º

2. µ±Ç° Apache kylin ¹¹½¨£¨build£©Êý¾ÝÁ¢·½Ì壬²ÉÓÃÖð²ãËã·¨£¨By Layer
Cubing£©¡£Î´À´µÄ·¢²¼Öн«²ÉÓÿìËÙÁ¢·½ÌåËã·¨£¨Fast Cubing£©¡£
ÏÂÃæ¼òµ¥½éÉÜÒ»ÏÂÖð²ãËã·¨£º
Ò»¸öÍêÕûµÄÊý¾ÝÁ¢·½Ì壬ÓÉ N-dimension Á¢·½Ì壬N-1 dimension Á¢·½Ì壬N-2
άÁ¢·½Ì壬0 dimension Á¢·½ÌåÕâÑùµÄ²ã¹ØÏµ×é³É£¬³ýÁË N-dimension Á¢·½Ì壬»ùÓÚÔÊý¾Ý¼ÆË㣬ÆäËû²ãµÄÁ¢·½Ìå¿É»ùÓÚÆä¸¸²ãµÄÁ¢·½Ì弯Ëã¡£ËùÒÔ¸ÃËã·¨µÄºËÐÄÊÇ
N ´Î˳ÐòµÄ MapReduce ¼ÆËã¡£
ÔÚ MapReduce Ä£ÐÍÖУ¬key ÓÉά¶ÈµÄ×éºÏµÄ¹¹³É£¬value ÓɶÈÁ¿µÄ×éºÏ¹¹³É£¬µ±Ò»¸ö
Map ¶Áµ½Ò»¸ö key-value ¶Ôʱ£¬Ëü»á¼ÆËãËùÓеÄ×ÓÁ¢·½Ì壨child cuboid£©£¬ÔÚÿ¸ö×ÓÁ¢·½ÌåÖУ¬Map
´Ó key ÖÐÒÆ³ýÒ»¸öά¶È£¬½«Ð key ºÍ value Êä³öµ½ reducer ÖС£Ö±µ½µ±ËùÓÐ²ã¼ÆËãÍê±Ï£¬²ÅÍê³ÉÊý¾ÝÁ¢·½ÌåµÄ¼ÆËã¡£¹ý³ÌÈçÏÂͼ£º

3. ÔÚÊý¾ÝÁ¢·½Ì弯ËãÍê±Ïºó£¬ÓÐÒ»¸öÈÎÎñ£¨Convert Cuboid Data to HFile£©£¬ÆäÖ°ÔðÊǽ«
reduce Êä³öµÄÔËËã½á¹û£¨Cuboid Data£©×ª»¯³É Hbase ÖÐµÄ´æ´¢ÔØÌ壨HFile£©£¬×îÖÕ½«
HFile ¼ÓÔØµ½ Hbase ±íÖбãÓÚ²éѯ¡£ÆäÖбíµÄ rowkey ÓÉά¶È×éºÏ¶ø³É£¬Î¬¶È×éºÏ¶ÔÓ¦µÄ¶ÈÁ¿Öµ¹¹³ÉÁË
column family£¬ÎªÁ˲éѯ¼õÉÙ´æ´¢¿Õ¼ä£¬»á¶Ô RowKey ºÍ column family
µÄÖµ½øÐбàÂ룬ĬÈϱàÂëÊÇ Snappy¡£

4. Õû¸öÊý¾ÝÁ¢·½ÌåµÄ¹¹½¨Á÷³ÌÈçÏ£º

5. Apache kylin ¼Ü¹¹ÈçÏ£º

6. ºËÐÄ×é¼þ£º
1£©Êý¾ÝÁ¢·½Ìå¹¹½¨ÒýÇæ£¨Cube Build Engine£©£ºµ±Ç°µ×²ãÊý¾Ý¼ÆËãÒýÇæÖ§³Ö MapReduce1¡¢MapReduce2¡¢Spark
µÈ¡£
2£©Rest Server£ºµ±Ç° kylin ²ÉÓÃµÄ rest API¡¢JDBC¡¢ODBC ½Ó¿ÚÌṩ
web ·þÎñ¡£
3£©²éѯÒýÇæ£¨Query Engine£©£ºRest Server ½ÓÊÕ²éѯÇëÇóºó£¬½âÎö sql Óï¾ä£¬Éú³ÉÖ´Ðмƻ®£¬È»ºóת·¢²éѯÇëÇóµ½
Hbase ÖУ¬×îºó½«½á¹¹·µ»Ø¸ø Rest Server¡£
Apache Kylin ÔªÊý¾Ý±¸·Ý
1. ±¸·ÝÔªÊý¾Ý
Kylin ½«ËüÈ«²¿µÄÔªÊý¾Ý£¨°üÀ¨ cube ÃèÊöºÍʵÀý¡¢ÏîÄ¿¡¢µ¹ÅÅË÷ÒýÃèÊöºÍʵÀý¡¢ÈÎÎñ¡¢±íºÍ×ֵ䣩×éÖ¯³É²ã¼¶ÎļþϵͳµÄÐÎʽ¡£È»¶ø£¬Kylin
ʹÓà hbase À´´æ´¢ÔªÊý¾Ý£¬¶ø²»ÊÇÒ»¸öÆÕͨµÄÎļþϵͳ¡£Èç¹ûÄã²é¿´¹ý Kylin µÄÅäÖÃÎļþ£¨kylin.properties£©£¬Äã»á·¢ÏÖÕâÑùÒ»ÐУº
## The metadata store in hbase
kylin.metadata.url=kylin_metadata@hbase
Õâ±íÃ÷ÔªÊý¾Ý»á±»±£´æÔÚÒ»¸ö½Ð×÷ ¡°kylin_metadata¡± µÄ htable Àï¡£Äã¿ÉÒÔÔÚ
hbase shell Àï scan ¸Ã htbale À´»ñÈ¡Ëü¡£
2. ʹÓöþ½øÖưüÀ´±¸·Ý Metadata Store ÓÐʱÄãÐèÒª½« Kylin µÄ Metadata
Store ´Ó hbase ±¸·Ýµ½´ÅÅÌÎļþϵͳ¡£ÔÚÕâÖÖÇé¿öÏ£¬¼ÙÉèÄãÔÚ²¿Êð Kylin µÄ hadoop
ÃüÁîÐУ¨»òɳºÐ£©ÀÄã¿ÉÒÔµ½ KYLIN_HOME ²¢ÔËÐУº
./bin/metastore.sh backup
À´½«ÄãµÄÔªÊý¾Ýµ¼³öµ½±¾µØÄ¿Â¼£¬Õâ¸öĿ¼ÔÚ KYLIN_HOME/metadata_backps Ï£¬ËüµÄÃüÃû¹æÔòʹÓÃÁ˵±Ç°Ê±¼ä×÷Ϊ²ÎÊý£ºKYLIN_HOME/meta_backups/meta _year_month_day_hour_minute_second
¡£
3. ʹÓöþ½øÖưüÀ´»Ö¸´ Metatdara Store
ÍòÒ»Äã·¢ÏÖÄãµÄÔªÊý¾Ý±»¸ãµÃÒ»ÍÅÔ㣬ÏëÒª»Ö¸´ÏÈǰµÄ±¸·Ý£º
Ê×ÏÈ£¬ÖØÖà Metatdara Store£¨Õâ¸ö»áÇåÀí Kylin ÔÚ hbase µÄ Metadata
Store µÄËùÓÐÐÅÏ¢£¬ÇëÈ·±£Ïȱ¸·Ý£©£º
./bin/metastore.sh reset
È»ºóÉÏ´«±¸·ÝµÄÔªÊý¾Ýµ½ Kylin µÄ Metadata Store£º
./bin/metastore.sh restore $KYLIN_HOME/meta_backups/ meta_xxxx_xx_xx_xx_xx_xx
4. ÔÚ¿ª·¢»·¾³±¸·Ý / »Ö¸´ÔªÊý¾ÝÔÚ¿ª·¢µ÷ÊÔ Kylin ʱ£¬µäÐ͵Ļ·¾³ÊÇһ̨װÓÐ IDE µÄ¿ª·¢»úÉϺÍÒ»¸öºǫ́µÄɳºÐ£¬Í¨³£Äã»áд´úÂë²¢ÔÚ¿ª·¢»úÉÏÔËÐвâÊÔ°¸Àý£¬µ«Ã¿´Î¶¼ÐèÒª½«¶þ½øÖưü·Åµ½É³ºÐÀïÒÔ¼ì²éÔªÊý¾ÝÊǺÜÂé·³µÄ¡£ÕâʱÓÐÒ»¸öÃûΪ
SandboxMetastoreCLI ¹¤¾ßÀà¿ÉÒÔ°ïÖúÄãÔÚ¿ª·¢»ú±¾µØÏÂÔØ / ÉÏ´«ÔªÊý¾Ý¡£
5. ´Ó Metadata Store ÇåÀíÎÞÓõÄ×ÊÔ´
Ëæ×ÅÔËÐÐʱ¼äÔö³¤£¬ÀàËÆ×ֵ䡢±í¿ìÕÕµÄ×ÊÔ´±äµÃûÓÐÓã¨cube segment ±»¶ªÆú»òÕߺϲ¢ÁË£©£¬µ«ÊÇËüÃÇÒÀ¾ÉÕ¼Óÿռ䣬Äã¿ÉÒÔÔËÐÐÃüÁîÀ´ÕÒµ½²¢Çå³ýËüÃÇ£º
Ê×ÏÈ£¬ÔËÐÐÒ»¸ö¼ì²é£¬ÕâÊǰ²È«µÄÒòΪËü²»»á¸Ä±äÈκζ«Î÷£º
./bin/metastore.sh clean
½«Òª±»É¾³ýµÄ×ÊÔ´»á±»ÁгöÀ´£º
½ÓÏÂÀ´£¬Ôö¼Ó ¡°¨Cdelete true¡± ²ÎÊýÀ´ÇåÀíÕâЩ×ÊÔ´£»ÔÚÕâ֮ǰ£¬ÄãÓ¦¸ÃÈ·±£ÒѾ±¸·Ý metadata
store£º
./bin/metastore.sh clean --delete true
Apache Kylin µÄÓÅÊÆÓë²»×ã
1. ÐÔÄܷdz£Îȶ¨¡£ÒòΪ Kylin ÒÀÀµµÄËùÓзþÎñ£¬±ÈÈç Hive¡¢HBase ¶¼ÊǷdz£³ÉÊìµÄ£¬Kylin
±¾ÉíµÄÂß¼²¢²»¸´ÔÓ£¬ËùÒÔÎȶ¨ÐÔÓÐÒ»¸öºÜºÃµÄ±£Ö¤¡£Ä¿Ç°ÔÚÉú²ú»·¾³ÖУ¬Îȶ¨ÐÔ¿ÉÒÔ±£Ö¤ÔÚ 99.99% ÒÔÉÏ¡£Í¬Ê±²éѯʱÑÓÒ²±È½ÏÀíÏë¡£

2. ÌØ±ðÖØÒªµÄÒ»µã£¬¾ÍÊÇÊý¾ÝµÄ¾«È·ÐÔÒªÇ󡣯äʵÏÖÔÚÄÜ×öµ½µÄÖ»ÓÐ Kylin£¬ÔÚÕâÒ»µãÉÏҲûÓÐʲô̫¶àÆäËûµÄÑ¡Ôñ¡£
3. ´ÓÒ×ÓÃÐÔÉÏÀ´½²£¬Kylin Ò²Óзdz£¶àµÄÌØµã¡£Ê×ÏÈÊÇÍâΧµÄ·þÎñ£¬²»¹ÜÊÇ Hive »¹ÊÇ HBase£¬Ö»Òª´ó¼ÒÓÃ
Hadoop ϵͳµÄ»°»ù±¾¶¼ÓÐÁË£¬²»ÐèÒª¶îÍ⹤×÷¡£ÔÚ²¿ÊðÔËάºÍʹÓóɱ¾ÉÏÀ´½²£¬¶¼ÊDZȽϵ͵ġ£Kylin
ÓÐÒ»¸öͨÓÃµÄ Web Server ¿ª·Å³öÀ´£¬ËùÓÐÓû§¶¼¿ÉÒÔÈ¥²âÊԺͶ¨Ò壬ֻÓÐÉÏÏßµÄʱºòÐèÒª¹ÜÀíÔ±ÔÙ
review һϣ¬ÕâÑùÌåÑé¾Í»áºÃºÜ¶à¡£
4. ²éѯÁé»îÐÔ¡£¾³£ÓÐÒµÎñ·½Îʵ½£¬Èç¹û Cube û¶¨ÒåµÄ»°Ôõô°ì£¿ÏÖÔÚµ±È»²éѯֻÄÜʧ°Ü¡£Õâ¸ö˵Ã÷ÓеIJéѯģʽ²»ÊÇÄÇô¹Ì¶¨µÄ£¬¿ÉÄÜͻȻҪ²éÒ»¸öÊý£¬µ«ÒÔºó¶¼²»»áÔÙ²éÁË¡£Êµ¼ÊÉÏÔÚÐèÒªÔ¤¶¨ÒåµÄ
OLAP ÒýÇæÉÏ£¬ÕâÖÖÐèÇóÆÕ±éÀ´½²Ö§³Ö¶¼²»ÊÇÌ«ºÃ¡£ ´Óά¶ÈµÄ½Ç¶ÈÀ´¿´£¬Ò»°ãά¶ÈµÄ¸öÊýÔÚ 5-20 ¸öÖ®¼ä£¬Ïà¶ÔÀ´Ëµ»¹ÊDZȽÏÊʺÏÓÃ
Kylin µÄ¡£ÁíÒ»¸öÌØµãÊÇÒ»°ã¶¼»áÓÐÒ»¸öÈÕÆÚά¶È£¬ÓпÉÄÜÊǵ±Ì죬ҲÓпÉÄÜÊÇÒ»¸öÐÇÆÚ£¬Ò»¸öÔ£¬»òÕßÈÎÒâÒ»¸öʱ¼ä¶Î¡£ÁíÍâÒ²»áÓн϶àµÄ²ã´Îά¶È£¬±ÈÈç×éÖ¯¼Ü¹¹´Ó×îÉÏÃæµÄ´óÇøÒ»Ö±µ½ÏÂÃæµÄ·äÎÑ£¬¾ÍÊÇÒ»¸öµäÐ͵IJã´Îά¶È¡£
´ÓÖ¸±êµÄ½Ç¶ÈÀ´½²£¬Ò»°ãÇé¿öÏÂÖ¸±ê¸öÊýÔÚ 50 ¸öÒÔÄÚ£¬Ïà¶ÔÀ´Ëµ Kylin ÔÚÖ¸±êÉϵÄÏÞÖÆ²¢Ã»ÓÐÄÇôÑϸñ£¬¶¼ÄÜÂú×ãÐèÇ󡣯äÖÐÓбȽ϶àµÄ±í´ïʽָ±ê£¬ÔÚ
Kylin ÀïÃæ¾ÛºÏº¯ÊýµÄ²ÎÊýÖ»ÄÜÊǵ¥¶ÀµÄÒ»ÁУ¬Ïñ sum(if¡) ÕâÖ־Ͳ»ÄÜÖ§³Ö£¬Òò´ËÐèÒªÒ»Ð©ÌØ±ðµÄ½â¾ö·½·¨¡£ÁíÍâÒ»¸ö·Ç³£ÖØÒªµÄÎÊÌâÊÇÊý¾ÝµÄ¾«È·ÐÔ£¬Ä¿Ç°ÔÚ
OLAP ÁìÓò£¬¸÷¸öϵͳ¶¼ÊÇÓà hyperloglog µÈ½üËÆËã·¨×öÈ¥ÖØ¼ÆÊý£¬ÕâÖ÷ÒªÊdzöÓÚ¿ªÏúÉϵĿ¼ÂÇ£¬µ«ÒµÎñ³¡¾°ÓÐʱҪÇóÊý¾Ý±ØÐëÊǾ«È·µÄ¡£Òò´ËÕâÒ²ÊÇÒªÖØµã½â¾öµÄÎÊÌâ¡£ |