±à¼ÍƼö: |
±¾ÎÄÖ÷ÒªÊáÀí´ÓÊý¾Ý²Ö¿âµ½Êý¾ÝºþµÄÊý¾Ý¼Ü¹¹µÄÑݽø¹ý³Ì£¬¸ü¶àÏêÇéÇëÔĶÁÏÂÎÄ¡£
±¾ÎÄÀ´×ÔBingo Cloud£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£ |
|
´«Í³µÄÊý¾Ý²Ö¿â¼¼Êõ´ÓÏÖÔڵĴóÊý¾ÝµÄ½Ç¶ÈÀ´¿´£¬ÒµÄÚÈËÄܹ»Ã÷°×ÉîÉîÆäÖеÄÌôÕ½¡£Ò»¸öÔËÐÐÁË
20 ¶àÄêµÄÊý¾Ý¼Ü¹¹£¬±ØÈ»ÓÐÆäºÏÀíÐÔ¡£Ò²ÕýÊÇÒòΪÄê´ú¾ÃÔ¶£¬´æÁ¿¹ý¶à£¬²Åµ¼Ö¾ٲ½Î¬¼è¡£ÔÚ Cloud
ºÍ 5G ʱ´ú£¬³¬ÃܶÈÍøÂ缯³ÉºÍ´óÊý¾Ý¶´²ìÐèÇó¸øÆóÒµ¿Í»§´øÀ´ÐµÄÌôÕ½£¬´ÓÊý¾Ý²Ö¿âµ½Êý¾Ýºþ£¬²»½ö½ö¼Ü¹¹µÄ±ä¸ï£¬¸üÊÇ˼ά·½Ê½µÄÉý¼¶¡£±¾Îij¢ÊÔÊáÀíÊý¾Ý¼Ü¹¹µÄÑݽø¹ý³Ì¡£
Ŀ¼
Êý¾Ý²Ö¿âÀúÊ·ÑØ¸ï
Êý¾Ý²Ö¿â¸ÅÄî
Êý¾Ý²Ö¿â¼Ü¹¹
Êý¾ÝÁ¢·½Ìå
Êý¾Ý¿â½¨Ä£
´óÊý¾Ý¼Ü¹¹
Êý¾Ýºþ¼Ü¹¹
ÑݽøÂ·¾¶Êµ¼ù
1. Êý¾Ý²Ö¿âÀúÊ·ÑØ¸ï
1970 Ä꣬¹ØÏµÊý¾Ý¿âµÄÑо¿ÔÐÍ System R ºÍ INGRES ¿ªÊ¼³öÏÖ£¬ÕâÁ½¸öϵͳµÄÉè¼ÆÄ¿±ê¶¼ÊÇÃæÏò
on-line transaction processing (OLTP)£¨ÇñÑó£ºÁª»úÊÂÎñ´¦Àí OLTP£©µÄÓ¦Ó᣹ØÏµÊý¾Ý¿âµÄÕæÕý¿ÉÓòúÆ·Ö±µ½
1980 Äê²Å³öÏÖ£¬·Ö±ðÊÇ DB2 ºÍ INGRES¡£
ÆäËûµÄÊý¾Ý¿â£¬°üÀ¨ Sybase, Oracle, ºÍ Informix ¶¼×ñ´ÓÁËÏàͬµÄÊý¾Ý¿â»ù±¾Ä£ÐÍ¡£¹ØÏµÊý¾Ý¿âµÄÌØµãÊǰ´ÕÕÐд洢¹ØÏµ±í£¬Ê¹ÓÃ
B Ê÷»òÑÜÉúµÄÊ÷½á¹¹×÷ΪË÷ÒýºÍ»ùÓÚ´ú¼ÛµÄÓÅ»¯Æ÷£¬Ìṩ ACID µÄÊôÐÔ±£Ö¤¡£
µ½ 1990 Ä꣬һ¸öеÄÇ÷ÊÆ¿ªÊ¼³öÏÖ£ºÆóҵΪÁËÉÌÒµÖÇÄܵÄÄ¿µÄ£¬ÐèÒª°Ñ¶à¸ö²Ù×÷Êý¾Ý¿âÖÐÊý¾ÝÊÕ¼¯µ½Ò»¸öÊý¾Ý²Ö¿âÖС£¾¡¹ÜͶ×ʾ޴óÇÒ¹¦ÄÜÓÐÏÞ£¬Í¶×ÊÊý¾Ý²Ö¿âµÄÆóÒµ»¹ÊÇ»ñµÃÁ˲»´íµÄͶ×ʻر¨ÂÊ¡£
´Ó´Ë£¬Êý¾Ý²Ö¿â¿ªÊ¼Ö§³Å¸÷´óÆóÒµµÄÉÌÒµ¾ö²ß¹ý³Ì¡£Êý¾Ý²Ö¿âµÄ¹Ø¼ü¼¼Êõ°üÀ¨Êý¾Ý½¨Ä££¬ETL ¼¼Êõ£¬OLAP
¼¼ÊõºÍ±¨±í¼¼ÊõµÈ¡£
ĿǰÖ÷ÒªµÄÊý¾Ý²Ö¿â²úÆ·¹©Ó¦Ḛ́üÀ¨ Oracle¡¢IBM¡¢Microsoft¡¢SAS¡¢Teradata¡¢Sybase¡¢Business
Objects(Òѱ» SAP ÊÕ¹º) µÈ¡£
µçÐÅÐÐÒµÊÇ×îÔç²ÉÓÃÊý¾Ý²Ö¿â¼¼ÊõµÄÐÐÒµÖ®Ò»¡£ÓÉÓÚµçÐŹ«Ë¾ÔËÐÐÔÚÒ»¸ö¿ìËٱ仯ºÍ¸ßËÙ¾ºÕùµÄ»·¾³£¬ÓµÓдóÁ¿µÄ¿Í»§»ù´¡£¬´Ó¶ø²úÉúºÍ´æ´¢º£Á¿µÄ¸ßÖÊÁ¿Êý¾Ý¡£
µçÐŹ«Ë¾ÀûÓÃÊý¾ÝÍÚ¾ò¼¼Êõ½µµÍÓªÏú³É±¾£¬Ê¶±ðÆÛÕ©£¬²¢¸üºÃµØ¹ÜÀíÆäµçÐÅÍøÂç¡£
2. Êý¾Ý²Ö¿â¸ÅÄî
Êý¾Ý²Ö¿âÖ®¸¸ Bill Inmon ÔÚ 1991 Äê³ö°æµÄ ¡°Building the Data Warehouse¡±
Ò»ÊéÖÐËùÌá³öµÄ¶¨Òå±»¹ã·º½ÓÊÜ¡ª¡ªÊý¾Ý²Ö¿â£¨Data Warehouse£©ÊÇÒ»¸öÃæÏòÖ÷ÌâµÄ£¨Subject
Oriented£©¡¢¼¯³ÉµÄ£¨Integrated£©¡¢Ïà¶ÔÎȶ¨µÄ£¨Non-Volatile£©¡¢·´Ó³ÀúÊ·±ä»¯£¨Time
Variant£©µÄÊý¾Ý¼¯ºÏ£¬ÓÃÓÚÖ§³Ö¹ÜÀí¾ö²ß (Decision Making Support)¡£
ÕâÊÇÒ»¸öÆ«ÏòѧÊõµÄ¶¨Ò壬ȴ·Ç³£×¼È·µÄ½ç¶¨ÁËÊý¾Ý²Ö¿âÓëÆäËûÊý¾Ý¿âϵͳµÄ±¾ÖÊÇø±ð¡£
¡°A data warehouseis a subject-oriented, integrated,
time-variant, and nonvolatile collection ofdata in
support of management¡¯s decision-making process.¡±
¡ªW. H. Inmon
ÒªÀí½âÊý¾Ý²Ö¿âµÄ¸ÅÄÐèÒª´ÓÓëÊý¾Ý¿âµÄϵͳµÄ¶Ô±ÈÀ´¿´¡£
Êý¾Ý¿âÊÇ×÷Ϊ ¡°ËùÓд¦ÀíµÄµ¥Ò»Êý¾ÝÔ´¡± ³öÏֺͶ¨ÒåµÄ¡£
Êý¾Ý¿âµÄ³öÏÖÓÐÁ½¸öÇý¶¯ÒòËØ£¬µÚÒ»ÊÇ 70 Äê´úÒÔǰ´óÁ¿Ó¦ÓóÌÐòºÍÖ÷ÎļþµÄ·ÖÉ¢´æ·Åµ¼ÖÂһƬ»ìÂҺʹóÁ¿ÈßÓàÊý¾Ý¡£µÚ¶þÊÇÖ±½Ó´æÈ¡´æ´¢É豸µÄ³öÏÖʹµÃ°´¼Ç¼Ѱַ³ÉΪ¿ÉÄÜ¡£»ùÓÚ
DBMS µÄÔÚÏßÊÂÎñ´¦ÀíΪÉÌÒµ·¢Õ¹¿ª±ÙȫеÄÊÓÒ°¡£
Êý¾Ý¿âϵͳµÄÉè¼ÆÄ¿±êÊÇÊÂÎñ´¦Àí¡£Êý¾Ý¿âϵͳÊÇΪ¼Ç¼¸üкÍÊÂÎñ´¦Àí¶øÉè¼Æ£¬Êý¾ÝµÄ·ÃÎʵÄÌØµãÊÇ»ùÓÚÖ÷¼ü£¬´óÁ¿Ô×Ó£¬¸ôÀëµÄСÊÂÎñ£¬²¢·¢ºÍ¿É»Ö¸´ÊǹؼüÊôÐÔ£¬×î´óÊÂÎñÍÌÍÂÁ¿ÊǹؼüÖ¸±ê£¬Òò´ËÊý¾Ý¿âµÄÉè¼Æ¶¼·´Ó³ÁËÕâЩÐèÇó¡£
Êý¾Ý²Ö¿âµÄÉè¼ÆÄ¿±êÊǾö²ßÖ§³Ö¡£ÀúÊ·µÄ£¬ÕªÒªµÄ£¬¾ÛºÏµÄÊý¾Ý±ÈÔʼµÄ¼ÇÂ¼ÖØÒªµÄ¶à¡£²éѯ¸ºÔØÖ÷Òª¼¯ÖÐÔÚ¼´Ï¯²éѯºÍ°üº¬Á¬½Ó£¬¾ÛºÏµÈ²Ù×÷µÄ¸´ÔÓ²éѯ¡£
Ïà¶ÔÓÚÊý¾Ý¿âϵͳÀ´Ëµ£¬²éѯÍÌÍÂÁ¿ºÍÏìӦʱ¼ä±ÈÊÂÎñ´¦ÀíÍÌÍÂÁ¿ÖØÒªµÄ¶à¡£
Êý¾Ý²Ö¿âºÍÊý¾Ý¿âϵͳµÄÇø±ð£¬Ò»ÑÔ±ÎÖ®£ºOLAP ºÍ OLTP µÄÇø±ð¡£Êý¾Ý¿âÖ§³ÖÊÇ OLTP£¨ÇñÑó£ºÁª»ú·ÖÎö´¦Àí
On-Line Analytical Processing£©£¬Êý¾Ý²Ö¿âÖ§³ÖµÄÊÇ OLAP¡£
¶Ô OLTP ºÍ OLAP µÄÇø±ð»¹¿ÉÒÔÓÐÒ»¸öά¶È£¬¾ÍÊǼ°Ê±ÐÔÐèÇó¡£OLTP ¶ÔÊÂÎñµÄ¼°Ê±ÐÔÐèÇó½Ï¸ß£¬¶ø
OLAP Ôò²»È»¡£
¡ª¡ª²Üºéΰ
Êý¾Ý²Ö¿âÒ»°ã»ùÓÚÊý¾Ý¿âʵÏÖ£¬µ«ÊÇΪ²¿ÊðºÍά»¤ÉÏÊÇ·ÖÀëµÄ¡£Êý¾Ý²Ö¿â¿ÉÒÔÊÇ»ùÓÚ¹ØÏµÊý¾Ý¿âʵÏֵģ¬ÕâÑùµÄÊý¾Ý²Ö¿â±»³ÆÎª
ROLAP¡£Êý¾Ý²Ö¿âÒ²¿ÉÒÔÊÇ»ùÓÚ¶àάÊý¾Ý½á¹¹ÊµÏֵģ¬ÕâÑùµÄÊý¾Ý²Ö¿â±»³ÆÎª MOLAP¡£
3. Êý¾Ý²Ö¿â¼Ü¹¹
Êý¾Ý²Ö¿âÊÇÒ»ÖÖÌåϵ½á¹¹£¬¶ø²»ÊÇÒ»ÖÖ¼¼Êõ¡£Êý¾Ý²Ö¿â×îΪºËÐĵÄÄÚÈÝ·ÖÀàÁ½²¿·Ö£º
»ùÓÚ¹ØÏµÊý¾Ý¿âµÄ¶àά½¨Ä££¨RDBMS-based dimensional modeling£©
»ùÓÚÊý¾ÝÁ¢·½ÌåµÄ OLAP ²éѯ£¨cube-based OLAP£©

Êý¾Ý²Ö¿âÌåϵ½á¹¹°üº¬ÁË´ÓÍⲿÊý¾ÝÔ´»òÕßÊý¾Ý¿â³éÈ¡Êý¾ÝµÄ ETL ¹¤¾ß¡£ETL
»¹¸ºÔðÊý¾ÝµÄת»»£¬ÇåÏ´£¬È»ºó¼ÓÔØµ½Êý¾Ý²Ö¿âµÄ´æ´¢ÖС£Ò»°ãÀ´Ëµ£¬Êý¾Ý¶¼»á¼ÓÔØµ½´æÈ¡ËٶȽÏÂýµÄ´æ´¢ÖУ¬ÒÔÔʼÊý¾ÝµÄ·½Ê½±£´æÏÂÀ´¡£
ΪÁËÌá¸ß²éѯЧÂÊ£¬ÔʼÊý¾Ý»á°´Ö÷Ìâ·ÖÀ࣬ÒԾۺϵķ½Ê½´æ´¢µ½Êý¾Ý¼¯ÊÐÖУ¬³ÆÖ®Îª¾ÛºÏÊý¾Ý¡£
²Î¼ûÏÂͼ£¬ÔʼÊý¾ÝÍùÍùÓжàÌõ¾ÛºÏ·¾¶£¬Ê±¼äά¶ÈÊÇÒ»¸ö×î»ù±¾µÄÄÚÖþۺÏ·¾¶£¬ÐÐÕþ¼¶±ð»®·ÖÒ²ÊÇÒ»ÖÖ³£¼ûµÄ¾ÛºÏ·¾¶£¬²úÆ·ÊôÐÔÒ²Êdz£¼ûµÄ¾ÛºÏ·¾¶¡£

Êý¾Ý²Ö¿âÌåϵ½á¹¹Öл¹°üÀ¨Ç°¶ËµÄ²éѯ¹¤¾ß£¬±¨±í¹¤¾ßºÍÊý¾ÝÍÚ¾ò¹¤¾ß£¬±»³ÆÎª front-end¡£
×îºóÒ²ÊÇ×îÖØÒªµÄÊÇ£¬Êý¾Ý²Ö¿âÌåϵ½á¹¹Öж¼»á°üº¬Ò»¸ö¹¹½¨Êý¾Ý²Ö¿âµÄÔªÊý¾Ý²Ö¿â¡£
ÔªÊý¾Ý²Ö¿â°üÀ¨Êý¾Ý¿â schema£¬view£¬ÓÃÓÚ ETL µÄ metadata£¬ÓÃÓÚÊý¾Ý¾ÛºÏµÄ
metadata£¬ÓÃÓÚ±¨±í³ÊÏÖµÄ metadata ºÍ SQL Ä£°åµÈ¡£Êý¾Ý²Ö¿âÍùÍù²ÉÓà meta
data driven µÄ¼Ü¹¹Éè¼Æ£¬Õâ¸öÔªÊý¾Ý²Ö¿â¾ÍÖÁ¹ØÖØÒª¡£
ÉÏÎÄÖÐÌáµ½µÄά¶ÈµÄ¸ÅÄά¶È (dimension) Êǹ۲ìÊÂÎïµÄ½Ç¶È£¬Ò²ÊÇÊý¾Ý¿âÊÂʵ±íÖÐÓÃÀ´ÃèÊöÊý¾Ý·ÖÀàµÄ²ã´Î½á¹¹¡£Î¬¶ÈÔÚÊý¾ÝÖоÍÊDZíʾΪÁУ¬ÔÚ
SQL ÖÐÓÃ×÷¹ýÂ˺ͷÖ×é¡£
ÏñÉÏͼÕâÑù¶ÔÊý¾Ý½øÐжà¸öά¶ÈµÄ³éÏó²¢½èÖúÓÚÊý¾Ý¿âµÄ select£¬group by µÈ»ù±¾²Ù×÷ÐγɵÄ
OLAP ¶àάÊý¾Ý²Ù×÷£¨roll up£¬drill down£¬slice and dice£¬pivot£©±»³ÆÎª¶àάÊý¾ÝÄ£ÐÍ¡£
ΪÁË·½±ã¸´ÔÓ·ÖÎöºÍ¿ÉÊÓ»¯³ÊÏÖ£¬Êý¾Ý²Ö¿âÖÐÊý¾ÝÍùÍùÒÔ¶àάģÐͽ¨Ä£¡£Ã¿Ò»¸öά¶È±»³ÆÎªÒ»¸ö²ã¼¶£¬Èý¸öά¶È¹¹³ÉÒ»¸öÊý¾ÝÁ¢·½Ì塣ά¶ÈҲͨ³£ÓÃÀ´¹ýÂ˺ͷÖ×飬ËùÒÔÊý¾ÝÁ¢·½Ìå³ÆÖ®Îª
group by µÄ²¢¡£
OLAP Ò²±»³ÆÎªÔÚ»ùÓÚÊý¾Ý²Ö¿â¶àάģÐ͵Ļù´¡ÉÏʵÏÖµÄÃæÏò·ÖÎöµÄ¸÷Àà²Ù×÷µÄ¼¯ºÏ¡£
4. Êý¾ÝÁ¢·½Ìå
Êý¾ÝÁ¢·½ÌåÖ»ÊǶàάģÐ͵ÄÒ»¸öÐÎÏóµÄ˵·¨¡£Á¢·½ÌåÆä±¾ÉíÖ»ÓÐÈýά£¬µ«¶àάģÐͲ»½öÏÞÓÚÈýάģÐÍ£¬¿ÉÒÔ×éºÏ¸ü¶àµÄά¶È¡£
µ«Ò»·½ÃæÊdzöÓÚ¸ü·½±ãµØ½âÊͺÍÃèÊö£¬Í¬Ê±Ò²ÊǸøË¼Î¬³ÉÏñºÍÏëÏóµÄ¿Õ¼ä£»ÁíÒ»·½ÃæÊÇΪÁËÓ봫ͳ¹ØÏµÐÍÊý¾Ý¿âµÄ¶þά±íÇø±ð¿ªÀ´£¬ÓÚÊǾÍÓÐÁËÊý¾ÝÁ¢·½ÌåµÄ³Æºô
(¼ûÏÂͼ)¡£

OLAP µÄ²Ù×÷ÊÇÒÔ²éѯ¡ª¡ªÒ²¾ÍÊÇÊý¾Ý¿âµÄ SELECT ²Ù×÷ΪÖ÷£¬µ«ÊDzéѯ¿ÉÒԺܸ´ÔÓ£¬±ÈÈç»ùÓÚ¹ØÏµÊý¾Ý¿âµÄ²éѯ¿ÉÒÔ¶à±í¹ØÁª£¬¿ÉÒÔʹÓÃ
COUNT¡¢SUM¡¢AVG µÈ¾ÛºÏº¯Êý¡£
OLAP µÄ¶àά·ÖÎö²Ù×÷°üÀ¨£º×êÈ¡£¨Drill-down£©¡¢ÉÏ¾í£¨Roll-up£©¡¢ÇÐÆ¬£¨Slice£©¡¢Çп飨Dice£©ÒÔ¼°Ðýת£¨Pivot£©£¬ÖðÒ»½âÊÍÈçÏ£º
¿´ÁËÉÏͼÖÐÊý¾ÝÁ¢·½ÌåµÄ¸÷ÖÖ²Ù×÷£¬ÓÐÈ˾õµÃ»¹ÊǺܳéÏó¡£ÏÂÃæ¸øÒ»¸ö SQL µÄÀý×Ó£¬ËµÃ÷Êý¾ÝÁ¢·½ÌåµÄ¾ßÌå²Ù×÷¡£
select//¹«Ê½±ØÐëÅäºÏ group by ʹÓÃ
OLAP µÄÓÅÊÆÊÇ»ùÓÚÊý¾Ý²Ö¿âÃæÏòÖ÷Ìâ¡¢¼¯³ÉµÄ¡¢±£ÁôÀúÊ·¼°²»¿É±ä¸üµÄÊý¾Ý´æ´¢£¬ÒÔ¼°¶àάģÐͶàÊӽǶà²ã´ÎµÄÊý¾Ý×éÖ¯ÐÎʽ£¬Èç¹ûÍÑÀëµÄÕâÁ½µã£¬OLAP
½«²»¸´´æÔÚ£¬Ò²¾ÍûÓÐÓÅÊÆ¿ÉÑÔ¡£
»ùÓÚ¶àάģÐ͵ÄÊý¾Ý×éÖ¯ÈÃÊý¾ÝµÄչʾ¸ü¼ÓÖ±¹Û£¬Ëü¾ÍÏñÊÇÎÒÃÇÆ½³£¿´´ý¸÷ÖÖÊÂÎïµÄ·½Ê½£¬¿ÉÒÔ´Ó¶à¸ö½Ç¶È¶à¸ö²ãÃæÈ¥·¢ÏÖÊÂÎïµÄ²»Í¬ÌØÐÔ£¬¶ø
OLAP ÕýÊǽ«ÕâÖÖѰ³£µÄ˼άģÐÍÓ¦Óõ½ÁËÊý¾Ý·ÖÎöÉÏ¡£
5. Êý¾Ý¿â½¨Ä£
Èç¹û°Ñ¶àάÊý¾ÝÄ£ÐÍÓ³Éäµ½¹ØÏµÊý¾Ý¿âºÍ SQL ²éѯÉÏ£¨ROLAP£©£¬Êý¾Ý¿â¸ÃÈçºÎÉè¼ÆÄØ£¿
´ó¶àÊýÊý¾Ý²Ö¿â¶¼²ÉÓà ¡°ÐÇÐÍÄ£ÐÍ¡± À´±íʾ¶àάÊý¾ÝÄ£ÐÍ¡£ÔÚÐÇÐÍÄ£ÐÍÖУ¬Ö»ÓÐÒ»¸öÊÂʵ±í£¬²¢ÇÒÿһ¸öά¶ÈÓÐÒ»¸öµ¥¶ÀµÄ±í¡£
ÊÂʵ±íÖеÄÿһ¸öÔª×é¶¼ÊÇÒ»¸öÍâ¼üÖ¸Ïòά¶È±íµÄÖ÷¼ü¡£Ã¿Ò»¸öά¶È±íµÄÁÐÊÇ×é³ÉÕâ¸öά¶ÈµÄËùÓÐÊôÐÔ¡£ÈçÏÂͼËùʾ¡£

ÁíÍâÒ»¸ö³£¼ûµÄÊý¾Ý¿âÉè¼Æ·½·¨ÊÇ ¡°Ñ©»¨Ä£ÐÍ¡±¡£Ñ©»¨Ä£ÐÍͨ¹ý¶¨Òåµ¥¶ÀµÄά¶È±í£¬¸Ä½øÁËÐÇÐÍÄ£ÐÍÖÐûÓÐÃ÷È·Ìṩά¶È²ã¼¶µÄÎÊÌâ¡£ÊÇνά¶È±íµÄÕýÔò»¯£¬ÈçÏÂͼ¡£µ«ÐÇÐÍÄ£Ð͸üÊʺÏä¯ÀÀά¶È²ã¼¶¡£

³ýÁËÊÂʵ±íºÍά¶È±í£¬Êý¾Ý²Ö¿â»¹ÐèÒª´´½¨ pre-aggregation ±íÓÃÓÚ´æ´¢ÌôÑ¡µÄÕªÒªÊý¾Ý¡£
6. ´óÊý¾Ý¼Ü¹¹
1010data ¹«Ë¾¸ß¼¶Èí¼þ¹¤³Ìʦ ADAM JACOBS ²©Ê¿ÔÚ ACM ͨѶ·¢±íµÄ¡¶´óÊý¾Ý²¡Àíѧ¡·Ö¸³ö´óÊý¾ÝµÄ²¡ÀíÔÚÓÚ·ÖÎö¶ø²»ÔÚÓÚ´æ´¢¡ª¡ªÎÒÃÇÆÚÍû´Ó³ÉÄêÀÛÔ»ýÀÛµÄÊý¾ÝÖÐÔÚ¼¸·ÖÖÓ»òÕß¼¸ÃëÄÚ»ñµÃ·ÖÎö½á¹û£¡
Æäʵ×÷ÕßÖ¸³öÁ˹ØÏµÊý¾Ý¿âµÄÔÚ´óÊý¾Ýʱ´úµÄ²¡Àí£¬ÈçÏÂͼËùʾһ¸öÊý¾Ý²Ö¿â·ÖÎö²Ù×÷µÄ SQL ÔÚÊý¾ÝÁ¿³¬¹ý
100 ÍòÌõ¼Ç¼ʱµÄÐÔÄܱíÏÖ¡£

Òò´Ë£¬Êý¾Ý²Ö¿â±»ÈÏΪÊǶÔÊý¾Ý¿â²éѯÐÔÄÜÎÊÌâµÄÒ»¸ö½â¾ö·½°¸¡£ÔÚ 90 Äê´ú£¬ÈËÃÇÒѾ¶¼ÃæÁÙÒ»¸öÊý¾Ý±¬Õ¨µÄÌôÕ½£¬ÎªÁ˽â¾öÄǸöʱ´úµÄ
¡°´óÊý¾Ý¡± ÎÊÌ⣬Êý¾Ý²Ö¿âÓ¦Ô˶øÉú¡£
ÔÚ 1980s ÔçÆÚ£¬´óÊý¾ÝÊÇÖ¸Êý¾Ý¼¯³¬³öÁË´Å´ø»úµÄ´¦ÀíÄÜÁ¦¡£
ÔÚ 1990s£¬´óÊý¾ÝÊÇÖ¸Êý¾Ý¼¯³¬³öÁË Microsoft Excel »òÕß×ÀÃæ PC µÄ´¦ÀíÄÜÁ¦¡£
½ñÌ죬´óÊý¾ÝÊÇÖ¸Êý¾Ý¼¯³¬³öÁ˹ØÏµÊý¾Ý¿âµÄ´¦ÀíÄÜÁ¦¡£
Õ¾ÔÚ´óÊý¾Ýʱ´ú»ØÍûÊý¾Ý¼Ü¹¹µÄ·¢Õ¹ÀúÊ·£¬È»ºó´Ó¼¼ÊõµÄ½Ç¶È˼¿¼´óÊý¾ÝµÄ¶¨Ò壺µ±Ç°Á÷Ðеļ¼Êõ´¦Àí²»Á˵ÄÊý¾Ý£¬¶¼ÊÇ´óÊý¾Ý¡£
Êý¾Ý²Ö¿âµÄ±¾ÖÊÊǰÑÊý¾Ý±äС£¬Ò»°ãÓÐÁ½¸ö·½·¨£º
µÚÒ»ÊÇͨ¹ý³éÈ¡£¬×ª»»£¬¼ÓÔØ£¬ÇåÏ´¡£
µÚ¶þÊÇͨ¹ý pre-aggregation »ñµÃÊý¾ÝµÄÒ»·Ýµ¥¶À¿½±´¡£Òò´ËÊý¾Ý²Ö¿â±»¶¨ÒåΪ£º
ΪÁË·½±ã²éѯ·ÖÎö£¬°ÑÊý¾Ý´Ó¹ØÏµÊý¾Ý¿âÖе¥¶À¿½±´Ò»·Ý³öÀ´£¬È»ºóͨ¹ý ETL »òÕß ELT ת»»¡£
¶ÔÓÚ´óÊý¾Ý£¬½ö½ö¼òµ¥¹¹½¨Ò»¸öÊý¾Ý²Ö¿âÊDz»¹»µÄ¡£Êý¾ÝÓ¦¸ÃÈçºÎ½á¹¹»¯²ÅÄܸü±ãÓÚ·ÖÎö£¿Êý¾Ý¿âºÍ·ÖÎö¹¤¾ßÓ¦¸ÃÈçºÎÉè¼Æ²ÅÄܸü¸ßЧµÄ´¦Àí´óÊý¾Ý£¿
Òâʶµ½´óÊý¾Ý¹ÌÓеÄʱ¼äÊôÐԺͿռäÊôÐÔ£¬ÊÇÎÒÃÇÀí½â¹ØÏµÊý¾Ý¿â´¦Àí´óÊý¾Ýʱ´æÔÚÐÔÄÜÎÊÌâµÄÖØÒªÇ°Ìá¡£
Èç¹û˵Êý¾ÝÊÇÎÒ¶ÔÊÀ½çµÄ¹Û²ì¼Ç¼µÄ»°£¬´óÊý¾ÝÊÇÎÒÃǶÔÊÀ½çÔÚʱ¼äºÍ/»ò¿Õ¼äά¶ÈµÄÖØ¸´¹Û²ì¡£Õâ¾ÍÊÇ´óÊý¾ÝµÄʱ¿ÕÌØµã£¬Ò²ÊÇÊý¾Ý²Ö¿â¶àάģÐ͵Ĺ¹½¨ÔÀí¡£
µ±½ñµÄÖ÷Á÷Êý¾Ý¿âÄ£ÐÍÊǹØÏµÊý¾Ý¿â£¬²¢ÇÒ¸ÃÄ£ÐÍÏÔʽµØºöÂÔ±íÖеÄÐеÄ˳Ðò¡£Õ⽫²»¿É±ÜÃâµ¼ÖÂÓ¦ÓÃÒÔ·Ç˳ÐòµÄ·½Ê½²éѯÊý¾Ý¡£
ÔÚÕâÖÖÇé¿öÏ£¬´«Í³µÄÊý¾Ý¼Ü¹¹¿ÉÒÔͨ¹ýÒýÈ뻺´æµÄ·½Ê½»º½âÐÔÄÜÎÊÌ⣬¶ø´óÊý¾ÝÔò»á´ó´ó·Å´óÁË´ÎÓÅ·ÃÎÊģʽ¶ÔÐÔÄܵÄÓ°Ïì¡£
ÈçÏÂͼËùÊ¾Ëæ»ú·ÃÎʺÍ˳Ðò·ÃÎʵIJî±ð¡£

Òò´ËÎÒÃÇÒªÒýÈ룬ҲÊÇÎÒÃÇÒªÍÆµ¼µÄ½áÂÛ£ºÄæÕýÔò»¯£¨Äæ¹æ·¶»¯£©ºÍ˳Ðò´æ´¢£¬²»¿É¸ü¸ÄÊý¾Ý¼¯£¨append
only£¬immutable data set£©¡£Ë³×Å´æ´¢Õ»ÍùÏÂ×ߣ¬Ö±µ½Êý¾Ý´æ´¢¸ñʽ¡£ÊÇʱºò·ÅÆú¹ØÏµÊý¾Ý¿âÁË¡£
¼òµ¥½âÊÍÒ»ÏÂÄæÕýÔò»¯£¨Äæ¹æ·¶»¯£©¡£¾µä¹ØÏµÊý¾Ý¿â½éÉܵÄËùÓз¶Ê½Ö¸µ¼Ë¼Ïë¶¼ÊÇÕýÔò»¯£¬¼õÉÙÖØ¸´Êý¾Ý£¬Èç¹ûÖØ¸´£¬Ôòµ¥¶À´´½¨Ò»¸ö±í£¬Ê¹ÓÃÍâ¼ü¹ØÁª£¬Ä¿µÄÊǽÚÊ¡´æ´¢¿Õ¼ä£¨ÄǸöʱºò´æ´¢ºÜ°º¹ó£©¡£
ÄæÕýÔò»¯ÔòÊÇÔÊÐíÁÐÖ®¼äµÄÖØ¸´¡£ÈçÏÂͼËùʾ¡£

ÎÒÓÐÒ»¸ö¿´·¨£¬NoSQL µÄ¼üÖµ´æ´¢¼´ÊÇÓü«¼òµÄ·Ç½á¹¹»¯À´ÊµÏֽṹ»¯´æ´¢µÄÄæ¹æ·¶»¯¡£¼üÖµ´æ´¢ÊǼ«¼òµÄ½á¹¹»¯£¬Ò²ÊǼ«¼òµÄ·Ç½á¹¹»¯¡£
¹ØÓÚ˳Ðò´æ´¢£¬²»¿É¸ü¸ÄÊý¾Ý¼¯£¬¿ÉÒԲο¼ Pat Helland¡¶Immutability Changes
Everything¡·£¬ºÍÎÒÉÏÃæµÄ½éÉÜÊÇÒ»Öµġ£
¹ØÓÚ´«Í³¹ØÏµÊý¾Ý¿âµÄÌÖÂÛ»¹ÓÐÊý¾Ý¿âÖªÃûר¼Ò£¬2015 ÄêͼÁé½±µÃÖ÷ Michael Stonebraker
׫дµÄ¡¶One Size Fits All¡·£¬·Ö±ð´ÓÊý¾Ý²Ö¿âºÍÁ÷´¦ÀíÁ½¸ö·½ÃæÌ½ÌÖÁËÊý¾Ý¿â 25 ÄêÀ´Ò»Õв»±äµÄÁ鵤ÃîÒ©ÒѾ²»ÔÙÊʺÏÏÖÔÚµÄÒµÎñ·¢Õ¹¡£
ÎÄÕµÄÖÐÐÄ˼ÏëºÍ Pat Helland Ìá³ö lambda ¼Ü¹¹Ò²ÓÐÒìÇúͬ¹¤Ö®Ãî¡£

speed layer
(i) compensates for the high latency of updates to
the serving layer
(ii) deals with recent data only
serving layer
(i) indexes the batch views
(ii) Can be queried in low-latency, ad-hoc way
batch layer
(i) managing the master dataset (an immutable, append-only
set of raw data),
(ii) pre-compute the batch views
Lambda ¼Ü¹¹Í³Ò»ÁË´«Í³Êý¾Ý²Ö¿âʱ´úµÄ°ëʵʱÔÚÏß²éѯ£¬¸Õ¸ÕÐËÆðµÄʵʱÁ÷´¦Àí£¨Online£©£¬ºÍÅú´¦ÀíÊý¾Ý·ÖÎö£¨Offline£©£¬¸øÊý¾Ý¼Ü¹¹µÄÉè¼ÆÈËÔ±ÌṩÁËÒ»¸öÈ«ÃæµÄ²Î¿¼¡£
ÔÙ½áºÏ°ë½á¹¹»¯£¬½á¹¹»¯Êý¾Ý´æ´¢£¬SQL and No-SQL »ìºÏ£¬ÎÒÃÇ¿ÉÒԵõ½ÏÂÃæÒ»¸öµäÐ͵ÄÊý¾Ý¼Ü¹¹£º

ÉÏÃæµÄÌÖÂÛÊǼܹ¹µÄ΢¹Û¿¼ÂÇ£¬ÈÃÎÒÃǻص½´óÊý¾Ý¼Ü¹¹µÄºê¹ÛÖ¸µ¼ÉÏÀ´¡£
Ŀǰҵ½ç¶Ô´óÊý¾ÝµÄÒ»¸ö¹²Ê¶µÄ¶¨ÒåÊÇ 5 ¸ö V¡£ÈçÏÂͼËùʾ¡£

´Ó¼¼ÊõµÄ½Ç¶ÈÐèҪרעÓÚÆäÖеÄÈý¸ö V£¬Í¨¹ýÔĶÁ´óÁ¿ÎÄÏ×£¬Îҵõ½ÏÂÃæÒ»¸ö·¶ÐÍ£º
1. ½èÁ¦¿ªÔ´Èí¼þ´¦ÀíÊý¾Ý¶àÑùÐÔÌôÕ½
2. ʹÓ÷ֲ¼Ê½¼¼Êõ½â¾öÊý¾ÝÈÝÁ¿ÎÊÌâ
3. ʹÓÃʵʱÁ÷´¦Àí¼¼Êõ½â¾öÊý¾ÝËÙ¶ÈÎÊÌâ

´«Í³µÄ OLAP ¶øÑÔ£¬ÊµÊ±ÐÔÐèÇó²»Ã÷ÏÔ£¬ÊµÊ±·ÖÎöµÄÇ¿ÐèÇóÊǵ¼Ö´óÊý¾Ý¼¼ÊõµÄÒ»¸öÔÒò¡£
¡ª¡ª²Üºéΰ
»ùÓÚ´Ë£¬ÎÒ¸öÈËÍÆ¼öµÄ´óÊý¾Ý¼Ü¹¹ÊÇ BDAS, the Berkeley Data Analytics
Stack¡£Õâ¸ö¼Ü¹¹Öв»½ö°üº¬ÉÏÃæÌáµ½µÄÈý¸ö˼¿¼Î¬¶È£¬»¹ÌṩÁËÕû¸ö´óÊý¾Ý¼Ü¹¹ blueprint¡£ÄÚÈݺܶ࣬ʹÓÃʱ¸÷¸ö»÷ÆÆ£¬Ôڴ˲»×¸Êö¡£

̸ÁËÄÇô¶à£¬×ܽáһϴóÊý¾Ý¼Ü¹¹µÄ¼¸¸öÒªµã£º
·Ö²¼Ê½¼ÆËã
ʵʱÁ÷´¦Àí
Online ºÍ Offline
SQL ºÍ No-SQL£º»ìºÏ¼Ü¹¹Ò²ÊÇÑݽøÂ·¾¶Ö®Ò»
ÄæÕýÔò»¯£¨Äæ¹æ·¶»¯£©ºÍ˳Ðò´æ´¢£¬²»¿É¸ü¸ÄÊý¾Ý¼¯
7. Êý¾Ýºþ¼Ü¹¹
Pentaho µÄ CTO James Dixon ÔÚ 2011 ÄêÌá³öÁË ¡°Data Lake¡±
µÄ¸ÅÄî¡£ÔÚÃæ¶Ô´óÊý¾ÝÌôսʱ£¬ËûÉù³Æ£º²»ÒªÏë×ÅÊý¾ÝµÄ ¡°²Ö¿â¡± ¸ÅÄÏëÏëÊý¾Ý µÄ ¡°ºþ¡± ¸ÅÄî¡£Êý¾Ý
¡°²Ö¿â¡± ¸ÅÄîºÍÊý¾Ýºþ¸ÅÄîµÄÖØ´óÇø±ðÊÇ£ºÊý¾Ý²Ö¿âÖÐÊý¾ÝÔÚ½øÈë²Ö¿â֮ǰÐèÒªÊÇÊÂÏȹéÀ࣬ÒÔ±ãÓÚδÀ´µÄ·ÖÎö¡£ÕâÔÚ
OLAP ʱ´úºÜ³£¼û£¬µ«ÊǶÔÓÚÀëÏß·ÖÎöȴûÓÐÈκÎÒâÒ壬²»Èç°Ñ´óÁ¿µÄÔʼÊý¾ÝÏß±£´æÏÂÀ´£¬¶øÏÖÔÚÁ®¼ÛµÄ´æ´¢ÌṩÁËÕâ¸ö¿ÉÄÜ¡£
Nearly unlimited potential for operational insight
and data discovery. As data volumes, data variety,
and metadata richness grow, so does the benefit.
ÐÎÏóµÄÀ´¿´£¬ÈçÏÂͼËùʾ£¬Êý¾Ýºþ¼Ü¹¹±£Ö¤Á˶à¸öÊý¾ÝÔ´µÄ¼¯³É£¬²¢ÇÒ²»ÏÞÖÆ schema£¬±£Ö¤ÁËÊý¾ÝµÄ¾«È·¶È¡£Êý¾Ýºþ¿ÉÒÔÂú×ãʵʱ·ÖÎöµÄÐèÒª£¬Í¬Ê±Ò²¿ÉÒÔ×÷ΪÊý¾Ý²Ö¿âÂú×ãÅú´¦ÀíÊý¾ÝÍÚ¾òµÄÐèÒª¡£Êý¾Ýºþ»¹ÎªÊý¾Ý¿ÆÑ§¼Ò´ÓÊý¾ÝÖз¢ÏÖ¸ü¶àµÄÁé¸ÐÌṩÁË¿ÉÄÜ¡£

ºÍÊý¾Ý²Ö¿â¶Ô±ÈÀ´¿´£¬Êý¾Ý²Ö¿âÊǸ߶Ƚṹ»¯µÄ¼Ü¹¹£¬Êý¾ÝÔÚת»»Ö®Ç°ÊÇÎÞ·¨¼ÓÔØµ½Êý¾Ý²Ö¿âµÄ£¬Óû§¿ÉÒÔÖ±½Ó»ñµÃ·ÖÎöÊý¾Ý¡£¶øÔÚÊý¾ÝºþÖУ¬Êý¾ÝÖ±½Ó¼ÓÔØµ½Êý¾ÝºþÖУ¬È»ºó¸ù¾Ý·ÖÎöµÄÐèÒªÔÙת»»Êý¾Ý¡£

ÏÂÃæÎÒÕûÀíÁËÊý¾Ý²Ö¿âºÍÊý¾ÝºþÔÚ¶à¸öά¶ÈµÄÏêϸ¶Ô±È¡£

×ܽáÆðÀ´£¬Êý¾Ýºþ¼Ü¹¹ÓÐһϼ¸¸öÏÔÖøµÄÌØµã£º
Êý¾Ý´æ´¢£º´óÈÝÁ¿µÍ³É±¾
Êý¾Ý±£Õæ¶È£ºÊý¾ÝºþÒÔÔʼµÄ¸ñʽ±£´æÊý¾Ý
Êý¾ÝʹÓãºÊý¾ÝºþÖеÄÊý¾Ý¿ÉÒÔ·½±ãµÄ±»Ê¹ÓÃ
Ñӳٰ󶨣ºÊý¾ÝºþÌṩÁé»îµÄ£¬ÃæÏòÈÎÎñµÄÊý¾Ý°ó¶¨£¬²»ÐèÒªÌáǰ¶¨ÒåÊý¾ÝÄ£ÐÍ

µ±È»£¬¶ÔÓÚÊý¾Ýºþ¼Ü¹¹µÄÅúÆÀÒ²ÊDz»¾øÓÚ¶ú¡£ÓÐÈËÅúÆÀ˵£¬»ã¼¯¸÷ÖÖÔÓÂÒµÄÊý¾Ý£¬Ó¦¸Ã¾ÍÊÇÊý¾ÝÕÓÔó¡£Martin
Fowler Ò²¶ÔÊý¾ÝºþÖÐÊý¾ÝµÄ°²È«ÐÔºÍ˽ÃÜÐÔÌá³öÁËÖÊÒÉ¡£
8. ÑݽøÂ·¾¶Êµ¼ù
ÏÖÔڵļܹ¹ÊÇÒ»¸öµäÐ͵ÄÊý¾Ý²Ö¿â¼Ü¹¹¡£ÈçÏÂͼËùʾ¡£ÏÖÔڵļܹ¹Éè¼ÆÓÐÒÔϼ¸¸öÒªµã£º
ROLAP£º»ùÓÚ Oracle Êý¾Ý¿â£¬µ«²¢Ã»ÓÐÓà Oracle µÄÊý¾Ý²Ö¿â£¬µ¥¶À¹¹½¨Êý¾Ý²Ö¿â¡£
Meta Data Driven µÄ¼Ü¹¹Éè¼Æ£ºMeta Data ¸²¸ÇÕû¸öÊý¾Ý pipe¡£µ±ÐµÄÊý¾ÝÐèÒª¼¯³É£¬Ö»ÐèÒª±à¼ÐµÄ
Meta Data£¬ÏµÍ³²»ÐèÒª×öÈκθı䡣
Schema Éè¼Æ£ºÖ÷ÒªÓÐÁ½Àà±í£ºÔʼÊý¾Ý±íºÍ¾ÛºÏ±í£» ÿÀà±í¶¼ÓÐÈý²ã½á¹¹£º±í£¬ÓÃ×÷¾ÛºÏµÄÊÓͼ£¬ÓÃ×÷±¨±íµÄÊÓͼ¡£²»Í¬µÄÓ¦ÓÃʹÓò»Í¬µÄÊÓͼÀ´²Ù×÷Êý¾Ý¡£µ±ÔʼµÄÊý¾Ý±í½á¹¹±ä»¯Ê±£¬¿ÉÒÔ¸ù¾ÝÐèÒª¸ü¸Ä²»Í¬²ã´ÎµÄÊÓͼ¡£
Schema µÄÑÝ»¯¡£ÕâÊÇÒ»¸ö±È½Ï´óµÄÖ÷Ì⣬¹ØÏµÊý¾ÝÊÇ schema on write µÄ£¬ÈκÎÁеÄÔö¼Ó¶¼ÐèÒª
alter ±í½á¹¹£¬Õâ»á´øÀ´¿Í»§ÏµÍ³ºÜ³¤Ê±¼äµÄ downtime¡£Òò´ËÔʼ±í²ÉÓà 1000 ÁеÄÉè¼Æ£¨Oracle
Ö§³ÖµÄ×î´óÁÐÊý£©£¬²¢ÇÒÁÐÖ»Ôö¼Ó£¬²»¼õÉÙ£¬±ÜÃâÁËÊý¾Ý¿â schema µÄ±ä»¯£¬½µµÍ²»Í¬ release
Ö®¼ä migration µÄ³É±¾¡£
Êý¾Ý´æ´¢£º¶¨ÆÚÇå³ýÔʼÊý¾Ý£¬Ö»±£Áô¾ÛºÏÊý¾Ý¡£

ΪʲôÏÖÔڵļܹ¹ÐèÒªÑݽøÄØ£¿
Ê×Ïȵ±Ç°¼Ü¹¹ÃæÁÙÀ©Õ¹ÐÔµÄÌôÕ½¡£Êý¾Ý¿âÀ©Õ¹ÐÔÖ÷ÒªÒÀÀµÓÚ Oracle RAC ½â¾ö·½°¸£¬Oracle
RAC ²»ÊÇÒ»¸öÏßÐÔµÄÀ©Õ¹·½°¸£¬Í¬Ê±Ò²Ôö¼ÓÁ˺ܶà¹ÜÀíºÍά»¤³É±¾¡£²¢ÇÒÓÉÓÚÓ²¼þµÄÏÞÖÆ£¬´¹Ö±ÐÔÀ©Õ¹²»ÊÇÒ»¸ö³¤ÆÚµÄ½â¾ö·½°¸¡£
Æä´Î£¬µ±Ç°µÄ´æ´¢³É±¾Ì«°º¹ó£¬Òò´ËÈ¥ IOE ³ÉΪĿ±ê¡£
µÚÈý£¬ÊµÊ±´¦ÀíÐèÇóÒ²ÊÇÇý¶¯¼Ü¹¹ÑݽøµÄÖØÒªÒòËØ¡£
È»ºó£¬¼Ü¹¹±ä³ÉÁËÕâÑù×Ó£º

´«Í³ SQL »ùÓÚÔÆÆ½Ì¨ÖØÐ¶¨ÒåΪ NewSQL£¬ÄÇô Data Warehouse Ò²¿ÉÒÔÖØÐ¶¨Òå
New Data Warehouse¡£
¡ª¡ª²Üºéΰ
ÕâÑùµÄ¼Ü¹¹ÊDz»ÊÇ New Data Warehouse£¬ÎÒ²»ÖªµÀ£¬¿ÉÄÜÊÇ¡£ÔÚÕâÑùµÄ¼Ü¹¹Ï£¬×î´óµÄ±ä»¯¾ÍÊǸü»»
Oracle Êý¾ÝΪ HDFS£¬²¢Ê¹Óà SQL on Hadoop£¨±ÈÈç Hive SQL£¬Spark
SQL£©µÈ±£³Ö SQL ½Ó¿Ú£¬Î¬³ÖÁËǰ¶Ë·ÖÎöÒýÇæµÄ²»±ä¡£Meta Data ²¿·ÖÒÀÈ»±£³ÖÁËÔÀ´µÄÊý¾Ý½¨Ä££¬²¢Ã»ÓиıäÊý¾Ý¼¯³É·½Ê½¡£ÕâÑùµÄ¼Ü¹¹¼Ì³ÐÁ˾µäµÄ²Ö¿â¼Ü¹¹£¬Ìá¸ßϵͳÀ©Õ¹ÐÔ£¬ÔÚÂú×ãÒµÎñÐèÇóµÄͬʱ£¬×î´ó»¯µÄ±£»¤ÒÑÓÐͶ×Ê¡£
Ôڼܹ¹ÑݽøÕâ¸ö¹ý³ÌÖУ¬ÓÐһЩ lesson learned£º
SQL on Hadoop ÊDZØÐëµÄ¡£¿Í»§Ï£Íû±£³Ö SQL ½Ó¿ÚµÄÁ¬ÐøÐÔ¡£
»ìºÏÊý¾Ý²Ö¿â¼Ü¹¹£ºÕë¶Ô²»Í¬µÄÒµÎñ²ÉÓò»Í¬´æ´¢·½°¸£¨Oracle ºÍ HDFS£©£¬Êý¾ÝÁ¿´óµÄ²ÉÓà HDFS
´æ´¢£¬Êý¾ÝÁ¿²»¹»´óµÄ£¨²»´æÔÚÀ©Õ¹ÐÔÌôÕ½µÄ£©¿ÉÒÔÒÀȻʹÓùØÏµÐÍÊý¾Ý¿â¡£
Äæ¹æ·¶»¯¶ÔÐÔÄܵÄÓ°ÏìÖØ´ó¡£Í¨¹ý¶ÔÄæ¹æ·¶Éè¼Æ£¬¿ÉÒÔ´ïµ½¹ØÏµÊý¾Ý¿âµÄ²éѯÐÔÄÜ¡£µ«ÊǶÔÓÚÄæ¹æ·¶»¯ÊÇ·ñ´æÔÚÆäËûÓ°Ï죬»¹ÐèÒªÑо¿¡£
Ïà¶ÔÓÚ sequence files ºÍ RC files£¬ORC Îļþ¸ñʽµÄÐÔÄÜÊÇ×îºÃµÄ¡£
ʵʱ pipe ʹÓà storm ºÍ Kafka ʵÏÖ¡£
¾ÍÏñ NewSQL ÄÇÑù£¬¿ÉÒÔÓÐ New Data Warehouse µÄ¡£¾ÍÊÇ Data Warehouse
ÓëÔÆ¼ÆËãµÄÈںϣ¬¼´Êý¾Ý²Ö¿âµÄ´æ´¢²ãÔÚÔÆÆ½Ì¨£¬²ÉÓ÷ֲ¼Ê½ÏµÍ³¡£¶ÔÓ¦ÓÃ²à¶øÑÔ£¬ ÔÓеķ½Ê½ÒÀ¾ÉÓÐЧ£¬ÕâÑù¾Í²»»á×ʲúÀË·Ñ£¬¶øÊÇÓÐЧµÄ¼Ì³Ð£¬
Ò²ÊÇͨÍùÊý¾ÝºþµÄÒ»¸ö½ÏÎÈÍ׵IJ½Öè¡£
¡ª¡ª²Üºéΰ
ÀϲÜÕâôһ˵£¬»íÈ»¿ªÀÊ¡£ÎÒÃÇÔÚ̸Êý¾Ý²Ö¿â¼Ü¹¹Ïò´óÊý¾Ý¼Ü¹¹ÑݽøµÄʱºò£¬ÆäʵÎÒÃÇÔÚ̸ New Data
Warehouse ¼Ü¹¹¡£
¾ÍÏñµ±³õÊý¾Ý²Ö¿âµÄ³öÏÖÊǶÔÊý¾Ý¿âϵͳ´æÔÚµÄÏÞÖÆ½øÐв¹³äÒ»Ñù£¬Ä¿Ç°µÄ´óÊý¾Ýƽ̨ÊǶÔÊý¾Ý²Ö¿âϵͳ´æÔÚµÄÎÊÌâ½øÐв¹³ä¡£
ËûÃǵļ¼Êõ˼·£¬¼¼Êõ¼Ü¹¹£¬Óû§ÐèÇóijÖ̶ֳÈÉÏÊÇÒ»Öµģ¬»òÕß˵ºËÐĵÄ˼ÏëÊÇÒ»Öµġ£²»Ò»Öµĵط½½ö½öÊÇΪÁËÂú×ãÐÔÄܶø×öµÄ¼¼Êõ·½°¸µÄµ÷Õû¡£
Ê×ÏÈ¿´Êý¾Ý¼¯³É¼Ü¹¹¡£ÈçÏÂͼ£¬»ùÓÚ Hadoop µÄÊý¾Ý¼¯³É¼Ü¹¹ºÍ»ùÓÚ¹ØÏµÊý¾Ý¿âµÄ´«Í³Êý¾Ý¼¯³É¼Ü¹¹ÊÇÒ»Öµġ£
²»Í¬µØ·½ÔÚÓÚÓÉÓÚÊý¾ÝÁ¿µÄÔö´ó£¬×ó±ßµÄ¼Ü¹¹²ÉÓþßÓÐÄæÕýÔò»¯£¨Äæ¹æ·¶»¯£©ºÍ˳Ðò´æ´¢£¬²»¿É¸ü¸ÄÊý¾Ý¼¯µÈÌØµãµÄ
Hadoop ƽ̨´æ´¢Êý¾Ý¡£

Æä´Î¿´Êý¾Ý·ÖÎö·½·¨¡£ËäȻ˵»ùÓÚ Hadoop µÄÊý¾Ý¼¯³É¼Ü¹¹²ÉÓÃÁË Hadoop Êý¾Ý´æ´¢Æ½Ì¨£¨ÄÚÖÃ
MapRdecue Êý¾Ý´¦ÀíÒýÇæ£©¡£
ÆäÊý¾Ý²Ù×÷£¬Êý¾Ý·ÖÎö·½·¨ÔÚ˼ÏëÉÏÊÇÒ»Öµġª¡ª´Ó´óÁ¿µÄÊý¾Ý¼¯ÖлñµÃÓɼÛÖµµÄÐÅÏ¢¡ª¡ªÈçÏÂͼËùʾ£¬Êý¾Ý²Ö¿âµÄ²Ù×÷Óï¾ä£¨group-by-aggregation£©Óë
MapRdecue µÄ²Ù×÷º¯Êý¶ÔÓ¦¹ØÏµ¡£
ËùÒÔ MapRdecue µÄºËÐÄ˼Ïë¾ÍÊÇÔÚÊý¾Ý·ÖƬµÄ»ù´¡ÉϰÑÊý¾Ý²Ö¿âÖÐµÄ group-by-aggregation
²Ù×÷ת»»³É·Ö²¼Ê½Ö´ÐУ¬MapRdecue ºÍ´«Í³Êý¾Ý²Ö¿âµÄ˼ÏëÊÇÒ»Öµġ£
The Map-Reduce programming model provides a good
abstraction of group-by-aggregation operations over
a cluster of machines.
The programmer provides a map function that performs
grouping and a reduce function that performs aggregation.
The underlying run-time system achieves parallelism
by partitioning the data and processing different
partitions concurrently using multiple machines.
Ëùν´´Ð£¬¼Ì³ÐºÍ·¢Õ¹£¬´ó¸ÅÈç´Ë°É¡£¹Ö²»µÃ Michael Stonebraker ׫ÎÄ¡¶MapReduce:
A major step backwards¡·Ö¸³ö MapReduce ÊÇÒ»¸ö¾Þ´óµ¹ÍË£¬²¢Òý·¢ÁËËûºÍ
DeWitt Ö®¼äµÄ´óÂÛÕ½¡£
Google ÔÚ 2010 Ä껹Ϊ MapRdecue ÉêÇëÁËרÀû£¬µ«ÎÒÈÏΪ MapReduce
²»ËãÊÇÖØ´ó»ù´¡ÐÔ´´Ð£¬±¾ÖÊÉÏ»¹ÊÇÔÆÊ±´úµÄÊý¾Ý²Ö¿â¼¼Êõ£¨New Data Warehouse£©¡£µ«Æä×÷Ϊ
Google Èý¼ÜÂí³µµÄ·çÍ·ÈÃÈËÃÇ´ó´óºöÂÔÁË´«Í³Êý¾Ý²Ö¿âµÄ¼¼Êõ˼Ï룬Îóµ¼Á˺ܶàÄêÇáѧ×ӵļ¼Êõ³ç°Ý¡£
ËùÒÔ±¾Îij¢ÊÔÌṩһ¸ö¼¼ÊõÂöÂ磺Data Warehouse->New Data Warehouse->Data
Lake£¬²ûÊö´óÊý¾Ý¼¼Êõ±³ºóµÄ¼¼Êõ¼Ü¹¹Ñݽø£¬Å×שÒýÓñ£¬»¶ÓÅúÆÀÖ¸Õý¡£
A giant step backward in the programming paradigm
for large-scale data intensive applications.
Not novel at all ¡ª it represents a specific implementation
of well known techniques developed nearly 25 years
ago.
To draw an analogy to SQL, map is like the group-by
clause of an aggregate query. Reduce is analogous
to the aggregate function (e.g., average) that is
computed over all the rows with the same group-by
attribute.
ÔÚ New Data Warehouse ¼Ü¹¹µÄ»ù´¡ÉÏ£¬ÈçºÎÏò Data Lake Ñݽø£¿
ÒÔµçÐÅÐÐÒµ¾ÙÀýÀ´Ëµ£¬NFV ºÍ SDN ÕýÔÚÍÆ¶¯µçÐÅÍøÂçÉ豸¿ØÖÆÆ½ÃæºÍÊý¾ÝÆ½ÃæµÄ·ÖÀ룬µçÐÅÉ豸Êý¾Ý»á×ßÏòÊý¾Ýºþ¼Ü¹¹¡£
µçÐÅÉ豸Êý¾ÝÈںϣ¬ÔËÓªÊý¾ÝÈںϣ¬×îÖÕ»á×ßÏòÒ»¸ö´óÈںϡ£×ܽáÆðÀ´£¬µçÐÅ´óÊý¾Ý¶ÔÓÚÊý¾Ýºþ¼Ü¹¹µÄÓµ±§£¬À´×ÔÓÚÒÔÏÂËĸö·½ÃæµÄÇý¶¯¡£ÎÒÓÃËĸöÍÆµ¼¹«Ê½£¬ÈçÏ£º
5G->BigData (Semi-Structured and Unstructured)
-> Modern Data Architecture for Enterprise ->
Data Lake Storage Architecture -> Data Lake
Cloud -> Network Function Cloudification ->
Network Function Virtualization -> stateless VNF
-> Distributed Sharing Storage -> Data Lake
Distributed analytics -> Data Lake
Hierarchy architecture -> Flat operations architecture
-> Data Lake
ÎÒÃdz¢ÊÔ¹ýÔÚÊý¾Ý¼ÓÔØ¹ý³ÌÖÐ×ÔѧϰµÄ²úÉúÊý¾Ý¿â schema£¬Ö¤Ã÷Õâ¸ö˼·ÊÇ¿ÉÐеġ£»ùÓڽṹ»¯µÄÊý¾Ý£¬Õâ¸ö¹ý³Ì·Ç³£ÈÝÒס£µ«¶ÔÓڷǽṹ»¯µÄÊý¾Ý£¬»¹ÊÇ´æÔںܴóµÄÌôÕ½¡£
ʹÓûúÆ÷ѧϰµÄ·½Ê½£¬Ä£ÐÍѵÁ·³É±¾¿ÖźÍÈ˹¤³éÈ¡ schema µÄ¹¤×÷Á¿ÊÇÏ൱´óµÄ¡£µ«ÊÇÎÒÒ²¿´µ½ÔÚһЩ
CMDB µÄÊý¾Ý¿âÐû³ÆÒѾ֧³ÖÊý¾Ý¿â schema µÄ×Ô¶¯Éý¼¶£¬µÈÎÒµ÷ÑÐÒ»ÏÂÔÙ˵¡£
|