Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Apache BeamʵսָÄÏÖ®»ù´¡ÈëÃÅ
 
À´Ô´£ºinfoq ·¢²¼ÓÚ£º2017-11-3
  3898  次浏览      31
 

ǰÑÔ£º´óÊý¾Ý 2.0 ʱ´ú²»ÆÚ¶øÖÁ

Ëæ×Å´óÊý¾Ý 2.0 ʱ´úÇÄÈ»µ½À´£¬´óÊý¾Ý´Ó¼òµ¥µÄÅú´¦ÀíÀ©Õ¹µ½ÁËʵʱ´¦Àí¡¢Á÷´¦Àí¡¢½»»¥Ê½²éѯºÍ»úÆ÷ѧϰӦÓá£ÔçÆÚµÄ´¦ÀíÄ£ÐÍ (Map/Reduce) ÔçÒѾ­Á¦²»´ÓÐÄ£¬¶øÇÒÒ²ºÜÄÑÓ¦Óõ½´¦ÀíÁ÷³Ì³¤ÇÒ¸´ÔÓµÄÊý¾ÝÁ÷Ë®ÏßÉÏ¡£ÁíÍ⣬½üÄêÀ´Ó¿ÏÖ³öÖî¶à´óÊý¾ÝÓ¦ÓÃ×é¼þ£¬Èç HBase¡¢Hive¡¢Kafka¡¢Spark¡¢Flink µÈ¡£¿ª·¢Õß¾­³£ÒªÓõ½²»Í¬µÄ¼¼Êõ¡¢¿ò¼Ü¡¢API¡¢¿ª·¢ÓïÑÔºÍ SDK À´Ó¦¶Ô¸´ÔÓÓ¦ÓõĿª·¢¡£Õâ´ó´óÔö¼ÓÁËÑ¡ÔñºÏÊʹ¤¾ßºÍ¿ò¼ÜµÄÄѶȣ¬¿ª·¢ÕßÏëÒª½«ËùÓеĴóÊý¾Ý×é¼þÊìÁ·ÔËÓü¸ºõÊÇÒ»Ïî²»¿ÉÄÜÍê³ÉµÄÈÎÎñ¡£

Ãæ¶ÔÕâÖÖÇé¿ö£¬Google ÔÚ 2016 Äê 2 ÔÂÐû²¼½«´óÊý¾ÝÁ÷Ë®Ïß²úÆ·£¨Google DataFlow£©¹±Ï׸ø Apache »ù½ð»á·õ»¯£¬2017 Äê 1 Ô Apache ¶ÔÍâÐû²¼¿ªÔ´ Apache Beam£¬2017 Äê 5 ÔÂÓ­À´ÁËËüµÄµÚÒ»¸öÎȶ¨°æ±¾ 2.0.0¡£ÔÚ¹úÄÚ£¬´ó²¿·Ö¿ª·¢Õß¶ÔÓÚ Beam »¹È±·¦Á˽⣬ÉçÇøÖÐÎÄ×ÊÁÏÒ²±È½ÏÉÙ¡£InfoQ ÆÚÍûͨ¹ý Apache Beam ʵսָÄÏϵÁÐÎÄÕÂ ÍÆ¶¯ Apache Beam ÔÚ¹úÄÚµÄÆÕ¼°¡£

±¾ÎĽ«¼òÒª½éÉÜ Apache Beam µÄ·¢Õ¹ÀúÊ·¡¢Ó¦Óó¡¾°¡¢Ä£ÐͺÍÔËÐÐÁ÷³Ì¡¢SDKs ºÍ Beam µÄÓ¦ÓÃʾÀý¡£»¶Ó­¼ÓÈë Beam ÖÐÎÄÉçÇøÉîÈëÌÖÂۺͽ»Á÷¡£

¸ÅÊö

´óÊý¾Ý´¦ÀíÁìÓòµÄÒ»´óÎÊÌâÊÇ£º¿ª·¢Õß¾­³£ÒªÓõ½ºÜ¶à²»Í¬µÄ¼¼Êõ¡¢¿ò¼Ü¡¢API¡¢¿ª·¢ÓïÑÔºÍ SDK¡£È¡¾öÓÚÐèÒªÍê³ÉµÄÊÇʲôÈÎÎñ£¬ÒÔ¼°ÔÚʲôÇé¿öϽøÐУ¬¿ª·¢ÕߺܿÉÄÜ»áÓà MapReduce ½øÐÐÅú´¦Àí£¬Óà Apache Spark SQL ½øÐн»»¥ÇëÇó£¨interactive queries£©£¬Óà Apache Flink ½øÐÐʵʱÁ÷´¦Àí£¬»¹ÓпÉÄÜÓõ½»ùÓÚÔÆ¶ËµÄ»úÆ÷ѧϰ¿ò¼Ü¡£

½üÁ½ÄêÓ¿ÏֵĿªÔ´´ó³±£¬Îª´óÊý¾Ý¿ª·¢ÕßÌṩÁËÊ®·Ö¸»ÓàµÄ¹¤¾ß¡£µ«ÕâͬʱҲÔö¼ÓÁË¿ª·¢ÕßÑ¡ÔñºÏÊʹ¤¾ßµÄÄѶȣ¬ÓÈÆä¶ÔÓÚÐÂÈëÐеĿª·¢ÕßÀ´Ëµ¡£ÕâºÜ¿ÉÄÜÍÏÂý¡¢ÉõÖÁ×è°­¿ªÔ´¹¤¾ßµÄ·¢Õ¹£º°Ñ¸÷ÖÖ¿ªÔ´¿ò¼Ü¡¢¹¤¾ß¡¢¿â¡¢Æ½Ì¨È˹¤ÕûºÏµ½Ò»ÆðËùÐ蹤×÷Ö®¸´ÔÓ£¬ÊÇ´óÊý¾Ý¿ª·¢Õß³£Óеı§Ô¹Ö®Ò»£¬Ò²ÊÇËûÃÇÖ§³ÖרÓдóÊý¾Ýƽ̨µÄÊ×ÒªÔ­Òò¡£

Apache Beam ·¢Õ¹ÀúÊ·

Beam ÔÚ 2016 Äê 2 Ô³ÉΪ Apache ·õ»¯Æ÷ÏîÄ¿£¬²¢ÔÚ 2016 Äê 12 ÔÂÉý¼¶³ÉΪ Apache »ù½ð»áµÄ¶¥¼¶ÏîÄ¿¡£Í¨¹ýÊ®Îå¸öÔµÄŬÁ¦£¬Ò»¸öÉÔÏÔ»ìÂҵĴúÂë¿â£¬´Ó¶à¸ö×éÖ¯ºÏ²¢£¬ÒÑ·¢Õ¹³ÉΪÊý¾Ý´¦ÀíµÄͨÓÃÒýÇæ£¬¼¯³É¶à¸ö´¦ÀíÊý¾Ý¿ò¼Ü£¬¿ÉÒÔ×öµ½¿ç»·¾³¡£

Beam ¾­¹ýÈý¸ö·õ»¯Æ÷°æ±¾ºÍÈý¸öºó·õ»¯Æ÷°æ±¾µÄÑÝ»¯ºÍ¸Ä½ø£¬×îÖÕÔÚ 2017 Äê 5 Ô 17 ÈÕÓ­À´ÁËËüµÄµÚÒ»¸öÎȶ¨°æ 2.0.0¡£·¢²¼Îȶ¨°æ±¾ 3 ¸öÔÂÒÔÀ´£¬Apache Beam ÒѾ­³öÏÖÃ÷ÏÔµÄÔö³¤£¬ÎÞÂÛÊÇͨ¹ý¹Ù·½»¹ÊÇÉçÇøµÄ¹±Ï×ÊýÁ¿¡£Apache Beam ÔڹȸèÔÆ·½ÃæÒ²ÒѾ­Õ¹Ê¾³öÁË¡°²Å¸É¡±¡£

Beam 2.0.0 ¸Ä½øÁËÓû§ÌåÑ飬֨µãÔÚÓÚ¿ò¼Ü¿ç»·¾³µÄÎÞ·ìÒÆÖ²ÄÜÁ¦£¬ÕâЩִÐл·¾³°üÀ¨Ö´ÐÐÒýÇæ¡¢²Ù×÷ϵͳ¡¢±¾µØ¼¯Èº¡¢ÔƶËÒÔ¼°Êý¾Ý´æ´¢ÏµÍ³¡£Beam µÄÆäËûÌØÐÔ»¹°üÀ¨Èçϼ¸µã£º

API Îȶ¨ÐԺͶÔδÀ´°æ±¾µÄ¼æÈÝÐÔ¡£

ÓÐ״̬µÄÊý¾Ý´¦Àíģʽ£¬¸ßЧµÄÖ§³ÖÒÀÀµÓÚÊý¾ÝµÄ¼ÆËã¡£

Ö§³ÖÓû§À©Õ¹µÄÎļþϵͳ£¬Ö§³Ö Hadoop ·Ö²¼Ê½·¢Îļþϵͳ¼°ÆäËû¡£

ÌṩÁËÒ»¸ö¶ÈÁ¿Ö¸±êϵͳ£¬¿ÉÓÃÓÚ¸ú×ٹܵÀµÄÖ´ÐÐ×´¿ö¡£

ÍøÉÏÒѾ­ÓкܶàÈËд¹ý Beam 2.0.0 °æ±¾Ö®Ç°µÄ×ÊÁÏ£¬µ«ÊÇ 2.0.0 °æ±¾ºó API ºÜ¶àд·¨±ä¶¯½Ï´ó£¬±¾ÎĽ«´ø×Å´ó¼Ò´ÓÁã»ù´¡µ½ Apache Beam ÈëÃÅ¡£

Apache Beam Ó¦Óó¡¾°

Google Cloud¡¢PayPal¡¢Talend µÈ¹«Ë¾¶¼ÔÚʹÓà Beam£¬¹úÄÚ°üÀ¨°¢Àï°Í°Í¡¢°Ù¶È¡¢½ðɽ¡¢ËÕÄþ¡¢¾Å´Î·½´óÊý¾Ý¡¢360¡¢»Û¾ÛÊýͨÐÅÏ¢¼¼ÊõÓÐÏÞ¹«Ë¾µÈÒ²ÔÚʹÓà Beam£¬Í¬Ê±»¹ÓÐһЩ´óÊý¾Ý¹«Ë¾µÄ¼Ü¹¹Ê¦»òÑз¢ÈËÔ±ÕýÔÚÒ»Æð½øÐÐÑо¿¡£Apache Beam ÖÐÎÄÉçÇøÕýÔÚ¼¯³ÉһЩ¹¤×÷ÖÐµÄ runners ºÍ sdk IO£¬°üÀ¨È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰºÍʱÐòÊý¾Ý¿âµÈһЩ¹¦ÄÜ¡£

ÒÔÏÂΪӦÓó¡¾°µÄ¼¸¸öÀý×Ó£º

Beam ¿ÉÒÔÓÃÓÚ ETL Job ÈÎÎñ

Beam µÄÊý¾Ý¿ÉÒÔͨ¹ý SDKs µÄ IO ½ÓÈ룬ͨ¹ý¹ÜµÀ¿ÉÒÔÓúóÃæµÄ Runners ×öÇåÏ´¡£

Beam Êý¾Ý²Ö¿â¿ìËÙÇл»¡¢¿ç²Ö¿â

ÓÉÓÚ Beam µÄÊý¾ÝÔ´ÊǶàÑù IO£¬ËùÒÔÓà Beam ¿ÉÒÔ¿ìËÙÇл»ÈκÎÊý¾Ý²Ö¿â¡£

Beam ¼ÆËã´¦ÀíÆ½Ì¨Çл»¡¢¿çƽ̨

Runners ĿǰÌṩÁË 3-4 ÖÖ¿ÉÒÔÇл»µÄƽ̨£¬Ëæ×Å Beam µÄÇ¿´óÓ¦¸Ã»áÓиü¶àµÄƽ̨Ìṩ¸ø´ó¼ÒʹÓá£

Apache Beam ÔËÐÐÁ÷³Ì

4-1 Êý¾Ý´¦ÀíÁ÷³Ì

Èçͼ 4-1 Ëùʾ£¬Apache Beam ´óÌåÔËÐÐÁ÷³Ì·Ö³ÉÈý´ó²¿·Ö£º

Modes

Modes ÊÇ Beam µÄÄ£ÐÍ»ò½ÐÊý¾ÝÀ´Ô´µÄ IO£¬ËüÊÇÓɶàÖÖÊý¾ÝÔ´»ò²Ö¿âµÄ IO ×é³É£¬Êý¾ÝÔ´Ö§³ÖÅú´¦ÀíºÍÁ÷´¦Àí¡£

Pipeline

Pipeline ÊÇ Beam µÄ¹ÜµÀ£¬ËùÓеÄÅú´¦Àí»òÁ÷´¦Àí¶¼ÒªÍ¨¹ýÕâ¸ö¹ÜµÀ°ÑÊý¾Ý´«Êäµ½ºó¶ËµÄ¼ÆËãÆ½Ì¨¡£Õâ¸ö¹ÜµÀÏÖÔÚÊÇΨһµÄ¡£Êý¾ÝÔ´¿ÉÒÔÇл»¶àÖÖ£¬¼ÆËãÆ½Ì¨»ò´¦ÀíÆ½Ì¨Ò²Ö§³Ö¶àÖÖ¡£ÐèҪעÒâµÄÊÇ£¬¹ÜµÀÖ»ÓÐÒ»Ìõ£¬ËüµÄ×÷ÓÃÊÇÁ¬½ÓÊý¾ÝºÍ Runtimes ƽ̨¡£

Runtimes

Runtimes ÊÇ´óÊý¾Ý¼ÆËã»ò´¦ÀíÆ½Ì¨£¬Ä¿Ç°Ö§³Ö Apache Flink¡¢Apache Spark¡¢Direct Pipeline ºÍ Google Clound Dataflow ËÄÖÖ¡£ÆäÖÐ Apache Flink ºÍ Apache Spark ͬʱ֧³Ö±¾µØºÍÔÆ¶Ë¡£Direct Pipeline ½öÖ§³Ö±¾µØ£¬Google Clound Dataflow ½öÖ§³ÖÔÆ¶Ë¡£³ý´ËÖ®Í⣬ºóÆÚ Beam ¹úÍâÑз¢ÍŶӻ¹»á¼¯³ÉÆäËû´óÊý¾Ý¼ÆËãÆ½Ì¨¡£ÓÉÓڹȸèδ½øÈëÖйú£¬Ä¿Ç°¹úÄÚ¿ª·¢ÈËÔ±ÔÚ¹¤×÷ÖжԹȸèÔÆµÄʹÓÃÓ¦¸Ã²»ÊǺܶ࣬Ö÷ÒªÒÔǰÁ½ÖÖΪÖ÷¡£ÎªÁËʹ¶ÁÕß¶ÁÍêÎÄÕºóÄÜ¿ìËÙѧϰÇÒ¸üÌù½üʵ¼Ê¹¤×÷»·¾³£¬ºóÐøÎÄÕÂÖÐÎÒ»áÒÔǰÁ½ÖÖ×÷Ϊ´óÊý¾Ý¼ÆËã»ò´¦ÀíÆ½Ì¨½øÐÐÑÝʾ¡£

Beam Model ¼°Æä¹¤×÷Á÷³Ì

Beam Model Ö¸µÄÊÇ Beam µÄ±à³Ì·¶Ê½£¬¼´ Beam SDK ±³ºóµÄÉè¼ÆË¼Ïë¡£ÔÚ½éÉÜ Beam Model ֮ǰ£¬ÏȼòÒª½éÉÜһϠBeam Model Òª´¦ÀíµÄÎÊÌâÓòÓëһЩ»ù±¾¸ÅÄî¡£

Êý¾ÝÔ´ÀàÐÍ¡£·Ö²¼Ê½Êý¾ÝÀ´Ô´ÀàÐÍÒ»°ã¿ÉÒÔ·ÖΪÁ½À࣬ÓнçµÄÊý¾Ý¼¯ºÍÎÞ½çµÄÊý¾ÝÁ÷¡£ÓнçµÄÊý¾Ý¼¯£¬±ÈÈçÒ»¸ö Ceph ÖеÄÎļþ£¬Ò»¸ö Mongodb ±íµÈ£¬ÌصãÊÇÊý¾ÝÒѾ­´æÔÚ£¬Êý¾Ý¼¯ÓÐÒÑÖªµÄ¡¢¹Ì¶¨µÄ´óС£¬Ò»°ã´æÔÚ´ÅÅÌÉÏ£¬²»»áͻȻÏûʧ¡£¶øÎÞ½çµÄÊý¾ÝÁ÷£¬±ÈÈç Kafka ÖÐÁ÷¹ýÀ´µÄÊý¾ÝÁ÷£¬ÕâÖÖÊý¾ÝµÄÌØµãÊÇÊý¾Ý¶¯Ì¬Á÷È롢ûÓб߽硢ÎÞ·¨È«²¿³Ö¾Ã»¯µ½´ÅÅÌÉÏ¡£Beam ¿ò¼ÜÉè¼ÆÊ±ÐèÒªÕë¶ÔÕâÁ½ÖÖÊý¾ÝµÄ´¦Àí½øÐп¼ÂÇ£¬¼´Åú´¦ÀíºÍÁ÷´¦Àí¡£

ʱ¼ä¡£·Ö²¼Ê½¿ò¼ÜµÄʱ¼ä´¦ÀíÓÐÁ½ÖÖ£¬Ò»ÖÖÊÇÈ«Á¿¼ÆË㣬ÁíÒ»ÖÖÊDz¿·ÖÔöÁ¿¼ÆËã¡£ÎÒ¸ø´ó¼Ò¾Ù¸öÀý×Ó£ºÀýÈçÎÒÃÇÍæ¡°ÍõÕßũҩ¡±ÓÎÏ·£¬ÓÎÏ·µÄÊý¾ÝÐèҪʵʱµØÁ÷Ïò·þÎñÆ÷£¬µôѪÇé¿ö»áËæ×Åʱ¼äʵʱ±ä»¯£¬µ«ÊÇÅÅÐаñµÄÊý¾ÝÔòÊÇÈ«²¿Íæ¼ÒÔÚÒ»¶¨Ê±¼äÄÚµÄÅÅÃû£¬ÀýÈçÒ»ÖÜ»òÒ»¸öÔ¡£Beam Õë¶ÔÕâÁ½ÖÖÇé¿ö¶¼Éè¼ÆÁ˶ÔÓ¦µÄ´¦Àí·½Ê½¡£

ÂÒÐò¡£¶ÔÓÚÁ÷´¦Àí¿ò¼Ü´¦ÀíµÄÊý¾ÝÁ÷À´Ëµ£¬Êý¾Ýµ½´ï´óÌå·ÖÁ½ÖÖ£¬Ò»ÖÖÊǰ´ÕÕ Process Time ¶¨Òåʱ¼ä´°¿Ú£¬ÕâÖÖ²»Óÿ¼ÂÇÂÒÐòÎÊÌ⣬ÒòΪ¶¼Êǹرյ±Ç°´°¿Úºó²Å½øÐÐÏÂÒ»¸ö´°¿Ú²Ù×÷£¬ÐèÒªµÈ´ý£¬ËùÒÔÖ´Ðж¼ÊÇÓÐÐòµÄ¡£¶øÁíÒ»ÖÖ£¬Event Time ¶¨ÒåµÄʱ¼ä´°¿ÚÔò²»ÐèÒªµÈ´ý£¬¿ÉÄܵ±Ç°²Ù×÷»¹Ã»Óд¦ÀíÍ꣬¾ÍÖ±½ÓÖ´ÐÐÏÂÒ»¸ö²Ù×÷£¬Ôì³ÉÏûϢ˳Ðò´¦Àíµ«½á¹û²»Êǰ´Ë³ÐòÅÅÐòÁË¡£ÀýÈçÎÒÃǵĶ©µ¥ÏûÏ¢£¬²ÉÓÃÁË·Ö²¼Ê½´¦Àí£¬Èç¹ûϵ¥²Ù×÷ËùÊô·þÎñÆ÷´¦ÀíËٶȱȽÏÂý£¬¶øÓû§Ö§¸¶µÄ·þÎñÆ÷Ëٶȷdz£¿ì£¬Õâʱ×îºóµÄ¶©µ¥²Ù×÷ʱ¼äÖá¾Í»á³öÏÖÒ»ÖÖÇé¿ö£¬Ïµ¥ÔÚÖ§¸¶µÄºóÃæ¡£¶ÔÓÚÕâÖÖÇé¿ö£¬ÈçºÎÈ·¶¨³Ùµ½Êý¾Ý£¬ÒÔ¼°¶ÔÓÚ³Ùµ½Êý¾ÝÈçºÎ´¦Àíͨ³£ÊǺÜÂé·³µÄÊÂÇé¡£

Beam Model ´¦ÀíµÄÄ¿±êÊý¾ÝÊÇÎÞ½çµÄʱ¼äÂÒÐòÊý¾ÝÁ÷£¬²»¿¼ÂÇʱ¼ä˳Ðò»òÓнçµÄÊý¾Ý¼¯¿É¿´×öÊÇÎÞ½çÂÒÐòÊý¾ÝÁ÷µÄÒ»¸öÌØÀý¡£Beam Model ´ÓÏÂÃæËĸöά¶È¹éÄÉÁËÓû§ÔÚ½øÐÐÊý¾Ý´¦ÀíµÄʱºòÐèÒª¿¼ÂǵÄÎÊÌ⣺

What¡£ÈçºÎ¶ÔÊý¾Ý½øÐмÆË㣿ÀýÈ磬»úÆ÷ѧϰÖÐѵÁ·Ñ§Ï°Ä£ÐÍ¿ÉÒÔÓà Sum »òÕß Join µÈ¡£ÔÚ Beam SDK ÖÐÓÉ Pipeline ÖеIJÙ×÷·ûÖ¸¶¨¡£

Where¡£Êý¾ÝÔÚʲô·¶Î§ÖмÆË㣿ÀýÈ磬»ùÓÚ Process-Time µÄʱ¼ä´°¿Ú¡¢»ùÓÚ Event-Time µÄʱ¼ä´°¿Ú¡¢»¬¶¯´°¿ÚµÈµÈ¡£ÔÚ Beam SDK ÖÐÓÉ Pipeline µÄ´°¿ÚÖ¸¶¨¡£

When¡£ºÎʱÊä³ö¼ÆËã½á¹û£¿ÀýÈ磬ÔÚ 1 СʱµÄ Event-Time ʱ¼ä´°¿ÚÖУ¬Ã¿¸ô 1 ·ÖÖÓ½«µ±Ç°´°¿Ú¼ÆËã½á¹ûÊä³ö¡£ÔÚ Beam SDK ÖÐÓÉ Pipeline µÄ Watermark ºÍ´¥·¢Æ÷Ö¸¶¨¡£

How¡£³Ùµ½Êý¾ÝÈçºÎ´¦Àí£¿ÀýÈ磬½«³Ùµ½Êý¾Ý¼ÆËãÔöÁ¿½á¹ûÊä³ö£¬»òÊǽ«³Ùµ½Êý¾Ý¼ÆËã½á¹ûºÍ´°¿ÚÄÚÊý¾Ý¼ÆËã½á¹ûºÏ²¢³ÉÈ«Á¿½á¹ûÊä³ö¡£ÔÚ Beam SDK ÖÐÓÉ Accumulation Ö¸¶¨¡£

Beam Model ½«¡°WWWH¡±Ëĸöά¶È³éÏó³öÀ´×é³ÉÁË Beam SDK£¬Óû§ÔÚ»ùÓÚ Beam SDK ¹¹½¨Êý¾Ý´¦ÀíÒµÎñÂß¼­Ê±£¬Ã¿Ò»²½Ö»ÐèÒª¸ù¾ÝÒµÎñÐèÇó°´ÕÕÕâËĸöά¶Èµ÷ÓþßÌåµÄ API£¬¼´¿ÉÉú³É·Ö²¼Ê½Êý¾Ý´¦Àí Pipeline£¬²¢Ìá½»µ½¾ßÌåÖ´ÐÐÒýÇæÉÏÖ´ÐС£¡°WWWH¡±Ëĸöά¶ÈÖ»ÊÇ´ÓÒµÎñµÄ½Ç¶È¿´´ýÎÊÌ⣬²¢²»ÊÇÈ«²¿ÊÊÓÃÓÚ×Ô¼ºµÄÒµÎñ¡£×ö¼¼Êõ¼Ü¹¹Ò»¶¨Òª½áºÏ×Ô¼ºµÄÒµÎñʹÓÃÏàÓ¦µÄ¼¼ÊõÌØÐÔ»ò¿ò¼Ü¡£Beam ×öΪ¡°Ò»Í³¡±µÄ¿ò¼Ü£¬Îª¿ª·¢Õß´øÀ´ÁË·½±ã¡£

Beam SDKs

Beam SDK ¸øÉϲãÓ¦ÓõĿª·¢ÕßÌṩÁËÒ»¸öͳһµÄ±à³Ì½Ó¿Ú£¬¿ª·¢Õß²»ÐèÒªÁ˽âµ×²ãµÄ¾ßÌåµÄ´óÊý¾Ýƽ̨µÄ¿ª·¢½Ó¿ÚÊÇʲô£¬Ö±½Óͨ¹ý Beam SDK µÄ½Ó¿Ú¾Í¿ÉÒÔ¿ª·¢Êý¾Ý´¦ÀíµÄ¼Ó¹¤Á÷³Ì£¬²»¹ÜÊäÈëÊÇÓÃÓÚÅú´¦ÀíµÄÓнçÊý¾Ý¼¯£¬»¹ÊÇÁ÷ʽµÄÎÞ½çÊý¾Ý¼¯¡£¶ÔÓÚÕâÁ½ÀàÊäÈëÊý¾Ý£¬Beam SDK ¶¼Ê¹ÓÃÏàͬµÄÀàÀ´±íÏÖ£¬²¢ÇÒʹÓÃÏàͬµÄת»»²Ù×÷½øÐд¦Àí¡£Beam SDK ÓµÓв»Í¬±à³ÌÓïÑÔµÄʵÏÖ£¬Ä¿Ç°ÒѾ­ÍêÕûµØÌṩÁË Java µÄ SDK£¬Python µÄ SDK »¹ÔÚ¿ª·¢ÖУ¬ÏàÐÅδÀ´»á·¢²¼¸ü¶à²»Í¬±à³ÌÓïÑ﵀ SDK¡£

Beam 2.0 µÄ SDKs ĿǰÓУº

Amqp£º¸ß¼¶ÏûÏ¢¶ÓÁÐЭÒé¡£

Cassandra£ºCassandra ÊÇÒ»¸ö NoSQL ÁÐ×壨column family£©ÊµÏÖ£¬Ê¹ÓÃÓÉ Amazon Dynamo ÒýÈëµÄ¼Ü¹¹·½ÃæµÄÌØÐÔÀ´Ö§³Ö Big Table Êý¾ÝÄ£ÐÍ¡£Cassandra µÄһЩÓÅÊÆÈçÏÂËùʾ£º

¸ß¶È¿ÉÀ©Õ¹ÐԺ͸߶ȿÉÓÃÐÔ£¬Ã»Óе¥µã¹ÊÕÏ

NoSQL ÁÐ×åʵÏÖ

·Ç³£¸ßµÄдÈëÍÌÍÂÁ¿ºÍÁ¼ºÃµÄ¶ÁÈ¡ÍÌÍÂÁ¿

ÀàËÆ SQL µÄ²éѯÓïÑÔ£¨´Ó 0.8 °æ±¾Æð£©£¬²¢Í¨¹ý¶þ¼¶Ë÷ÒýÖ§³ÖËÑË÷

¿Éµ÷½ÚµÄÒ»ÖÂÐԺͶԸ´ÖƵÄÖ§³ÖÁé»îµÄģʽ

Elasticesarch£ºÒ»¸öʵʱµÄ·Ö²¼Ê½ËÑË÷ÒýÇæ¡£

Google-cloud-platform£º¹È¸èÔÆ IO¡£

Hadoop-file-system£º²Ù×÷ Hadoop ÎļþϵͳµÄ IO¡£

Hadoop-hbase£º²Ù×÷ Hadoop É쵀 Hbase µÄ½Ó¿Ú IO¡£

Hcatalog£ºHcatalog ÊÇ Apache ¿ªÔ´µÄ¶ÔÓÚ±íºÍµ×²ãÊý¾Ý¹ÜÀíͳһ·þÎñƽ̨¡£

Jdbc£ºÁ¬½Ó¸÷ÖÖÊý¾Ý¿âµÄÊý¾Ý¿âÁ¬½ÓÆ÷¡£

Jms£ºJava ÏûÏ¢·þÎñ£¨Java Message Service£¬¼ò³Æ JMS£©ÊÇÓÃÓÚ·ÃÎÊÆóÒµÏûϢϵͳµÄ¿ª·¢ÉÌÖÐÁ¢µÄ API¡£ÆóÒµÏûϢϵͳ¿ÉÒÔЭÖúÓ¦ÓÃÈí¼þͨ¹ýÍøÂç½øÐÐÏûÏ¢½»»¥¡£JMS ÔÚÆäÖаçÑݵĽÇÉ«Óë JDBC ºÜÏàËÆ£¬ÕýÈç JDBC ÌṩÁËÒ»Ì×ÓÃÓÚ·ÃÎʸ÷ÖÖ²»Í¬¹ØÏµÊý¾Ý¿âµÄ¹«¹² API£¬JMS Ò²ÌṩÁ˶ÀÁ¢ÓÚÌØ¶¨³§ÉÌµÄÆóÒµÏûϢϵͳ·ÃÎÊ·½Ê½¡£

Kafka£º´¦ÀíÁ÷Êý¾ÝµÄÇáÁ¿¼¶´óÊý¾ÝÏûϢϵͳ£¬»ò½ÐÏûÏ¢×ÜÏß¡£

Kinesis£º¶Ô½ÓÑÇÂíÑ·µÄ·þÎñ£¬¿ÉÒÔ¹¹½¨ÓÃÓÚ´¦Àí»ò·ÖÎöÁ÷Êý¾ÝµÄ×Ô¶¨ÒåÓ¦ÓóÌÐò£¬ÒÔÂú×ãÌØ¶¨ÐèÇó¡£

Mongodb£ºMongoDB ÊÇÒ»¸ö»ùÓÚ·Ö²¼Ê½Îļþ´æ´¢µÄÊý¾Ý¿â¡£

Mqtt£ºIBM ¿ª·¢µÄÒ»¸ö¼´Ê±Í¨Ñ¶Ð­Òé¡£

Solr£ºÑÇʵʱµÄ·Ö²¼Ê½ËÑË÷ÒýÇæ¼¼Êõ¡£

xml£ºÒ»ÖÖÊý¾Ý¸ñʽ¡£

Beam Pipeline Runners

Beam Pipeline Runner ½«Óû§Óà Beam Ä£ÐͶ¨Ò忪·¢µÄ´¦ÀíÁ÷³Ì·­Òë³Éµ×²ãµÄ·Ö²¼Ê½Êý¾Ý´¦ÀíÆ½Ì¨Ö§³ÖµÄÔËÐÐʱ»·¾³¡£ÔÚÔËÐÐ Beam ³ÌÐòʱ£¬ÐèÒªÖ¸Ã÷µ×²ãµÄÕýÈ· Runner ÀàÐÍ£¬Õë¶Ô²»Í¬µÄ´óÊý¾Ýƽ̨£¬»áÓв»Í¬µÄ Runner¡£Ä¿Ç° Flink¡¢Spark¡¢Apex ÒÔ¼°¹È¸èµÄ Cloud DataFlow ¶¼ÓÐÖ§³Ö Beam µÄ Runner¡£

ÐèҪעÒâµÄÊÇ£¬ËäÈ» Apache Beam ÉçÇø·Ç³£Ï£ÍûËùÓÐµÄ Beam Ö´ÐÐÒýÇæ¶¼Äܹ»Ö§³Ö Beam SDK ¶¨ÒåµÄ¹¦ÄÜÈ«¼¯£¬µ«ÊÇÔÚʵ¼ÊʵÏÖÖпÉÄÜÎÞ·¨´ïµ½ÕâÒ»ÆÚÍû¡£ÀýÈ磬»ùÓÚ MapReduce µÄ Runner ÏÔÈ»ºÜÄÑʵÏÖºÍÁ÷´¦ÀíÏà¹ØµÄ¹¦ÄÜÌØÐÔ¡£¾ÍĿǰ״̬¶øÑÔ£¬¶Ô Beam Ä£ÐÍÖ§³Ö×îºÃµÄ¾ÍÊÇÔËÐÐÓڹȸèÔÆÆ½Ì¨Ö®É쵀 Cloud Dataflow£¬ÒÔ¼°¿ÉÒÔÓÃÓÚ×Ô½¨»ò²¿ÊðÔڷǹȸèÔÆÖ®É쵀 Apache Flink¡£µ±È»£¬ÆäËüµÄ Runner Ò²ÕýÔÚÓ­Í·¸ÏÉÏ£¬Õû¸öÐÐÒµÒ²ÔÚ³¯×ÅÖ§³Ö Beam Ä£Ð͵ķ½Ïò·¢Õ¹¡£

Beam 2.0 µÄ Runners ¿ò¼ÜÈçÏ£º

Apex

µ®ÉúÓÚ 2015 Äê 6 Ô嵀 Apache Apex£¬ÆäͬÑùÔ´×Ô DataTorrent ¼°ÆäÁîÈËÓ¡ÏóÉî¿ÌµÄ RTS ƽ̨£¬ÆäÖаüº¬Ò»Ì׺ËÐÄ´¦ÀíÒýÇæ¡¢ÒDZí°å¡¢Õï¶ÏÓë¼à¿Ø¹¤¾ßÌ×¼þÍâ¼ÓרÃÅÃæÏòÊý¾Ý¿ÆÑ§¼ÒÓû§µÄͼÐÎÁ÷±à³Ìϵͳ dtAssemble¡£Ö÷ÒªÓÃÓÚÁ÷´¦Àí£¬³£ÓÃÓÚÎïÁªÍøµÈ³¡¾°¡£

Direct-java

±¾µØ´¦ÀíºÍÔËÐÐ runner¡£

Flink_2.10

Flink ÊÇÒ»¸öÕë¶ÔÁ÷Êý¾ÝºÍÅúÊý¾ÝµÄ·Ö²¼Ê½´¦ÀíÒýÇæ¡£

Gearpump

Gearpump ÊÇÒ»¸ö»ùÓÚ Akka Actor µÄÇáÁ¿¼¶µÄʵʱÁ÷¼ÆËãÒýÇæ¡£Èç½ñÁ÷ƽ̨ÐèÒª´¦ÀíÀ´×Ô¸÷ÖÖÒÆ¶¯¶ËºÍÎïÁªÍøÉ豸µÄº£Á¿Êý¾Ý£¬ÏµÍ³ÒªÄܲ»¼ä¶ÏµØÌṩ·þÎñ£¬¶ÔÊý¾ÝµÄ´¦ÀíÒªÄÜ×öµ½²»¶ªÊ§²»Öظ´£¬¶Ô¸÷ÖÖÈíÓ²¼þ´íÎóÄÜÆ½»¬´¦Àí£¬¶ÔÓû§µÄÊäÈëÒªÄÜʵʱÏìÓ¦¡£³ýÁËÕâЩϵͳ²ãÃæµÄÐèÇóÍ⣬Óû§²ãÃæµÄ½Ó¿Ú»¹ÒªÄÜ×öµ½·á¸»¶øÁé»î£¬Ò»·½Ã棬ƽ̨ҪÌṩ×ã¹»·á¸»µÄ»ù´¡ÉèÊ©£¬ÄÜ×î¼ò»¯Ó¦ÓóÌÐòµÄ±àд£»ÁíÒ»·½Ã棬Õâ¸öƽ̨ӦÌṩ¾ßÓбíÏÖÁ¦µÄ±à³Ì API£¬ÈÃÓû§ÄÜÁé»î±í´ï¸÷ÖÖ¼ÆË㣬²¢ÇÒÕû¸öϵͳ¿ÉÒÔ¶¨ÖÆ£¬ÔÊÐíÓû§Ñ¡Ôñµ÷¶È²ßÂԺͲ¿Êð»·¾³£¬ÔÊÐíÓû§ÔÚ²»Í¬µÄÖ¸±ê¼ä×öÕÛÖÐÈ¡ÉᣬÒÔÂú×ãÌØ¶¨µÄÐèÇó¡£Akka Actor ÌṩÁËͨÐÅ¡¢²¢·¢¡¢¸ôÀë¡¢ÈÝ´íµÄ»ù´¡ÉèÊ©£¬Gearpump ͨ¹ý°Ñ³éÏó²ã´ÎÌáÉýµ½ Actor ÕâÒ»²ã£¬ÆÁ±ÎÁ˵ײãµÄϸ½Ú£¬×¨×¢ÓÚÁ÷´¦ÀíÐèÇó±¾Éí£¬Äܸü¼òµ¥¶øÓÖ¸ßЧµØ½â¾öÉÏÊöÎÊÌâ¡£

Dataflow

2016 Äê 2 Ô·ݣ¬¹È¸è¼°ÆäºÏ×÷»ï°éÏò Apache ¾èÔùÁËÒ»´óÅú´úÂ룬´´Á¢ÁË·õ»¯ÖÐµÄ Beam ÏîÄ¿£¨×î³õ½Ð Apache Dataflow£©¡£ÕâЩ´úÂëÖеĴ󲿷ÖÀ´×ÔÓڹȸè Cloud Dataflow SDK¡ª¡ª¿ª·¢ÕßÓÃÀ´Ð´Á÷´¦ÀíºÍÅú´¦Àí¹ÜµÀ£¨pipelines£©µÄ¿â£¬¿ÉÔÚÈκÎÖ§³ÖµÄÖ´ÐÐÒýÇæÉÏÔËÐС£µ±Ê±£¬Ö§³ÖµÄÖ÷ÒªÒýÇæÊǹȸè Cloud Dataflow¡£

Spark

Apache Spark ÊÇÒ»¸öÕýÔÚ¿ìËٳɳ¤µÄ¿ªÔ´¼¯Èº¼ÆËãϵͳ¡£Apache Spark Éú̬ϵͳÖеİüºÍ¿ò¼ÜÈÕÒæ·á¸»£¬Ê¹µÃ Spark Äܹ»Ö´Ðи߼¶Êý¾Ý·ÖÎö¡£Apache Spark µÄ¿ìËٳɹ¦µÃÒæÓÚËüµÄÇ¿´ó¹¦ÄܺÍÒ×ÓÃÐÔ¡£Ïà±ÈÓÚ´«Í³µÄ MapReduce ´óÊý¾Ý·ÖÎö£¬Spark ЧÂʸü¸ß¡¢ÔËÐÐʱËٶȸü¿ì¡£Apache Spark ÌṩÁËÄÚ´æÖеķֲ¼Ê½¼ÆËãÄÜÁ¦£¬¾ßÓÐ Java¡¢Scala¡¢Python¡¢R ËÄÖÖ±à³ÌÓïÑ﵀ API ±à³Ì½Ó¿Ú¡£

ʵս£º¿ª·¢µÚÒ»¸ö Beam ³ÌÐò

8.1 ¿ª·¢»·¾³

ÏÂÔØ°²×° JDK 7 »ò¸üеİ汾£¬¼ì²â JAVA_HOME »·¾³±äÁ¿¡£±¾ÎÄʾÀýʹÓõÄÊÇ JDK 1.8¡£

ÏÂÔØ maven ²¢ÅäÖ㬱¾ÎÄʾÀýʹÓõÄÊÇ maven-3.3.3¡£

¿ª·¢»·¾³ myeclipse¡¢Spring Tool Suite ¡¢IntelliJ IDEA£¬Õâ¸ö¿ÉÒÔ°´ÕÕ¸öÈËϲºÃ£¬±¾ÎÄʾÀýÓõÄÊÇ STS¡£

8.2 ¿ª·¢µÚÒ»¸ö wordCount ³ÌÐò²¢ÇÒÔËÐÐ

1 н¨Ò»¸ö maven ÏîÄ¿

2 ÔÚ pom.xml ÎļþÖÐÌí¼ÓÁ½¸ö jar °ü

3 н¨Ò»¸ö txtIOTest.java

дÈëÒÔÏ´úÂ룺

4 ÒòΪ Windows É쵀 Beam2.0.0 ²»Ö§³Ö±¾µØÂ·¾¶£¬ÐèÒª²¿Êðµ½ Linux ÉÏ£¬ÐèÒª´ò°üÈçͼ£¬´Ë´¦×¢ÒâÒª°ÑÒÀÀµ jar ¶¼´ò°ü½øÈ¥¡£

5 ²¿Êð beam.jar µ½ Linux »·¾³ÖÐ

ʹÓà Xshell 5 µÇ¼ÐéÄâ»ú»òÕß Linux ϵͳ¡£Óà rz ÃüÁî°Ñ¸Õ²Å´ò°üµÄÎļþÉÏ´«ÉÏÈ¥¡£ÆäÖÐÐéÄâ»úÒª°²×°ÉÏ jdk ²¢ÅäÖúû·¾³±äÁ¿¡£

ÎÒÃÇ¿ÉÒÔÓÃÊäÈë javac ÃüÁî²âÊÔһϡ£

ÎÒÃÇ°Ñ beam.jar ÉÏ´«µ½ /usr/local/ Ŀ¼ÏÂÃæ£¬È»ºóн¨Ò»¸öÎļþ£¬Ò²¾ÍÊÇÔ´Îļþ¡£ÃüÁtouch text.txt¡¡ÃüÁchmod o+rwx¡¡text.txt

ÐÞ¸Ä text.txt ²¢Ìí¼ÓÊý¾Ý¡£¡¡ÃüÁvi text.txt

ÔËÐÐÃüÁjava -jar beam.jar£¬Éú³ÉÎļþ¡£

Óà cat ÃüÁî²é¿´ÎļþÄÚÈÝ£¬ÀïÃæ¾ÍÊÇͳ¼ÆµÄ½á¹û¡£

8.3 ʵսÆÊÎö

ÎÒÃÇ¿ÉÒÔͨ¹ýÒÔÉÏʵս´úÂë½øÒ»²½Á˽â Beam µÄÔËÓÃÔ­Àí¡£

µÚÒ»¼þÊÂÇéÊǴһ¸ö¹ÜµÀ£¨Pipeline£©£¬ÀýÈçÎÒÃÇСʱºò¼ÒÀï½½µØÓõġ°Ë®¹Ü¡±¡£Ëü¾ÍÊÇÁ¬½ÓˮԴºÍ´¦ÀíµÄÇÅÁº¡£

PipelineOptions pipelineOptions = PipelineOptionsFactory.create();// ´´½¨¹ÜµÀ

µÚ¶þ¼þÊÂÇéÊÇÈÃÎÒÃǵĹܵÀÓÐÒ»¸ö´¦Àí¿ò¼Ü£¬Ò²¾ÍÊÇÎÒÃÇµÄ Runtimes ¡£ÀýÈçÎÒÃǽӵ½Ë®ÒªÔõô´¦Àí£¬ÊÇÊäË͸øÎÒÃdzÇÊеÄÎÛË®´¦Àí³§£¬»¹ÊÇÆäËû¡£Õâ¸öÎÛË®´¦Àí³§¾ÍÏ൱ÓÚÎÒÃǵĴ¦Àí¿ò¼Ü£¬ÀýÈçÏÖÔÚÁ÷ÐÐµÄ Apache Spark »ò Apache Flink¡£Õâ¸öÒª¸ù¾Ý×Ô¼ºµÄÒµÎñÖ¸¶¨£¬ÈçÏ´úÂëÖÐÎÒÖ¸¶¨Á˱¾µØµÄ´¦Àí¿ò¼Ü¡£

pipelineOptions.setRunner(DirectRunner.class);

µÚÈý¼þÊÂÇéÒ²ÊÇ Beam ×îºóÒ»¸öÖØÒªµÄµØ·½£¬¾ÍÊÇÄ£ÐÍ (Model)£¬Í¨Ë׵㽲¾ÍÊÇÎÒÃǵÄÊý¾ÝÀ´Ô´¡£Èç¹û½áºÏÒÔÉϵÚÒ»¼þºÍµÚ¶þ¼þµÄÊÂÇé˵¾ÍÊÇË®´ÓÄÄÀïÀ´£¬Ë®µÄÀ´Ô´¿ÉÄÜÊǺÓÀï¡¢¿ÉÄÜÊÇÎÛˮͨµÀµÈµÈ¡£±¾ÊµÀýÓõÄÊÇÓнç¹Ì¶¨´óСµÄÎı¾Îļþ¡£µ±È» Model »¹°üº¬ÎÞ½çÊý¾Ý£¬ÀýÈç kafka µÈµÈ£¬¿ÉÒÔ¸ù¾ÝµÄÐèÇóÁé»îÔËÓá£

pipeline.apply (TextIO.read(). from ("/usr/local/text.txt")). apply
(" ExtractWords", ParDo.of (new DoFn<String, String>() //ºóÊ¡ÂÔ

×îºóÒ»²½ÊÇ´¦Àí½á¹û£¬Õâ¸ö±È½Ï¼òµ¥£¬¿ÉÒÔ¸ù¾Ý×Ô¼ºµÄÐèÇó´¦Àí¡£Ï£Íûͨ¹ý´úÂëµÄʵս½áºÏÔ­ÀíÆÊÎö¿ÉÒÔ°ïÖú´ó¼Ò¸ü¿ìµØÊìϤ Beam ²¢Äܹ»¼òµ¥µØÔËÓà Beam¡£

×ܽá

Apache Beam ÊǼ¯³ÉÁ˺ܶàÊý¾ÝÄ£Ð͵ÄÒ»¸öͳһ»¯Æ½Ì¨£¬ËüΪ´óÊý¾Ý¿ª·¢¹¤³ÌʦƵ·±»»Êý¾ÝÔ´»ò¶àÊý¾ÝÔ´¡¢¶à¼ÆËã¿ò¼ÜÌṩÁ˼¯³Éͳһ¿ò¼Üƽ̨¡£Apache Beam ÉçÇøÏÖÔÚÒѾ­¼¯³ÉÁËÊý¾Ý¿âµÄÇл» IO£¬Î´À´ Beam ÖÐÎÄÉçÇø»¹½«Îª Beam ¼¯³É¸ü¶àµÄ Model ºÍ¼ÆËã¿ò¼Ü£¬Îª´ó¼ÒÌṩ·½±ã¡£

   
3898 ´Îä¯ÀÀ       31
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ