ǰÑÔ£º´óÊý¾Ý
2.0 ʱ´ú²»ÆÚ¶øÖÁ
Ëæ×Å´óÊý¾Ý 2.0 ʱ´úÇÄÈ»µ½À´£¬´óÊý¾Ý´Ó¼òµ¥µÄÅú´¦ÀíÀ©Õ¹µ½ÁËʵʱ´¦Àí¡¢Á÷´¦Àí¡¢½»»¥Ê½²éѯºÍ»úÆ÷ѧϰӦÓá£ÔçÆÚµÄ´¦ÀíÄ£ÐÍ
(Map/Reduce) ÔçÒѾÁ¦²»´ÓÐÄ£¬¶øÇÒÒ²ºÜÄÑÓ¦Óõ½´¦ÀíÁ÷³Ì³¤ÇÒ¸´ÔÓµÄÊý¾ÝÁ÷Ë®ÏßÉÏ¡£ÁíÍ⣬½üÄêÀ´Ó¿ÏÖ³öÖî¶à´óÊý¾ÝÓ¦ÓÃ×é¼þ£¬Èç
HBase¡¢Hive¡¢Kafka¡¢Spark¡¢Flink µÈ¡£¿ª·¢Õß¾³£ÒªÓõ½²»Í¬µÄ¼¼Êõ¡¢¿ò¼Ü¡¢API¡¢¿ª·¢ÓïÑÔºÍ
SDK À´Ó¦¶Ô¸´ÔÓÓ¦ÓõĿª·¢¡£Õâ´ó´óÔö¼ÓÁËÑ¡ÔñºÏÊʹ¤¾ßºÍ¿ò¼ÜµÄÄѶȣ¬¿ª·¢ÕßÏëÒª½«ËùÓеĴóÊý¾Ý×é¼þÊìÁ·ÔËÓü¸ºõÊÇÒ»Ïî²»¿ÉÄÜÍê³ÉµÄÈÎÎñ¡£
Ãæ¶ÔÕâÖÖÇé¿ö£¬Google ÔÚ 2016 Äê 2 ÔÂÐû²¼½«´óÊý¾ÝÁ÷Ë®Ïß²úÆ·£¨Google DataFlow£©¹±Ï׸ø
Apache »ù½ð»á·õ»¯£¬2017 Äê 1 Ô Apache ¶ÔÍâÐû²¼¿ªÔ´ Apache Beam£¬2017
Äê 5 ÔÂÓÀ´ÁËËüµÄµÚÒ»¸öÎȶ¨°æ±¾ 2.0.0¡£ÔÚ¹úÄÚ£¬´ó²¿·Ö¿ª·¢Õß¶ÔÓÚ Beam »¹È±·¦Á˽⣬ÉçÇøÖÐÎÄ×ÊÁÏÒ²±È½ÏÉÙ¡£InfoQ
ÆÚÍûͨ¹ý Apache Beam ʵսָÄÏϵÁÐÎÄÕÂ ÍÆ¶¯ Apache Beam ÔÚ¹úÄÚµÄÆÕ¼°¡£
±¾ÎĽ«¼òÒª½éÉÜ Apache Beam µÄ·¢Õ¹ÀúÊ·¡¢Ó¦Óó¡¾°¡¢Ä£ÐͺÍÔËÐÐÁ÷³Ì¡¢SDKs ºÍ Beam
µÄÓ¦ÓÃʾÀý¡£»¶Ó¼ÓÈë Beam ÖÐÎÄÉçÇøÉîÈëÌÖÂۺͽ»Á÷¡£
¸ÅÊö
´óÊý¾Ý´¦ÀíÁìÓòµÄÒ»´óÎÊÌâÊÇ£º¿ª·¢Õß¾³£ÒªÓõ½ºÜ¶à²»Í¬µÄ¼¼Êõ¡¢¿ò¼Ü¡¢API¡¢¿ª·¢ÓïÑÔºÍ SDK¡£È¡¾öÓÚÐèÒªÍê³ÉµÄÊÇʲôÈÎÎñ£¬ÒÔ¼°ÔÚʲôÇé¿öϽøÐУ¬¿ª·¢ÕߺܿÉÄÜ»áÓÃ
MapReduce ½øÐÐÅú´¦Àí£¬Óà Apache Spark SQL ½øÐн»»¥ÇëÇó£¨interactive
queries£©£¬Óà Apache Flink ½øÐÐʵʱÁ÷´¦Àí£¬»¹ÓпÉÄÜÓõ½»ùÓÚÔÆ¶ËµÄ»úÆ÷ѧϰ¿ò¼Ü¡£
½üÁ½ÄêÓ¿ÏֵĿªÔ´´ó³±£¬Îª´óÊý¾Ý¿ª·¢ÕßÌṩÁËÊ®·Ö¸»ÓàµÄ¹¤¾ß¡£µ«ÕâͬʱҲÔö¼ÓÁË¿ª·¢ÕßÑ¡ÔñºÏÊʹ¤¾ßµÄÄѶȣ¬ÓÈÆä¶ÔÓÚÐÂÈëÐеĿª·¢ÕßÀ´Ëµ¡£ÕâºÜ¿ÉÄÜÍÏÂý¡¢ÉõÖÁ×è°¿ªÔ´¹¤¾ßµÄ·¢Õ¹£º°Ñ¸÷ÖÖ¿ªÔ´¿ò¼Ü¡¢¹¤¾ß¡¢¿â¡¢Æ½Ì¨È˹¤ÕûºÏµ½Ò»ÆðËùÐ蹤×÷Ö®¸´ÔÓ£¬ÊÇ´óÊý¾Ý¿ª·¢Õß³£Óеı§Ô¹Ö®Ò»£¬Ò²ÊÇËûÃÇÖ§³ÖרÓдóÊý¾Ýƽ̨µÄÊ×ÒªÔÒò¡£
Apache Beam ·¢Õ¹ÀúÊ·
Beam ÔÚ 2016 Äê 2 Ô³ÉΪ Apache ·õ»¯Æ÷ÏîÄ¿£¬²¢ÔÚ 2016 Äê 12 ÔÂÉý¼¶³ÉΪ
Apache »ù½ð»áµÄ¶¥¼¶ÏîÄ¿¡£Í¨¹ýÊ®Îå¸öÔµÄŬÁ¦£¬Ò»¸öÉÔÏÔ»ìÂҵĴúÂë¿â£¬´Ó¶à¸ö×éÖ¯ºÏ²¢£¬ÒÑ·¢Õ¹³ÉΪÊý¾Ý´¦ÀíµÄͨÓÃÒýÇæ£¬¼¯³É¶à¸ö´¦ÀíÊý¾Ý¿ò¼Ü£¬¿ÉÒÔ×öµ½¿ç»·¾³¡£
Beam ¾¹ýÈý¸ö·õ»¯Æ÷°æ±¾ºÍÈý¸öºó·õ»¯Æ÷°æ±¾µÄÑÝ»¯ºÍ¸Ä½ø£¬×îÖÕÔÚ 2017 Äê 5 Ô 17 ÈÕÓÀ´ÁËËüµÄµÚÒ»¸öÎȶ¨°æ
2.0.0¡£·¢²¼Îȶ¨°æ±¾ 3 ¸öÔÂÒÔÀ´£¬Apache Beam ÒѾ³öÏÖÃ÷ÏÔµÄÔö³¤£¬ÎÞÂÛÊÇͨ¹ý¹Ù·½»¹ÊÇÉçÇøµÄ¹±Ï×ÊýÁ¿¡£Apache
Beam ÔڹȸèÔÆ·½ÃæÒ²ÒѾչʾ³öÁË¡°²Å¸É¡±¡£
Beam 2.0.0 ¸Ä½øÁËÓû§ÌåÑ飬֨µãÔÚÓÚ¿ò¼Ü¿ç»·¾³µÄÎÞ·ìÒÆÖ²ÄÜÁ¦£¬ÕâЩִÐл·¾³°üÀ¨Ö´ÐÐÒýÇæ¡¢²Ù×÷ϵͳ¡¢±¾µØ¼¯Èº¡¢ÔƶËÒÔ¼°Êý¾Ý´æ´¢ÏµÍ³¡£Beam
µÄÆäËûÌØÐÔ»¹°üÀ¨Èçϼ¸µã£º
API Îȶ¨ÐԺͶÔδÀ´°æ±¾µÄ¼æÈÝÐÔ¡£
ÓÐ״̬µÄÊý¾Ý´¦Àíģʽ£¬¸ßЧµÄÖ§³ÖÒÀÀµÓÚÊý¾ÝµÄ¼ÆËã¡£
Ö§³ÖÓû§À©Õ¹µÄÎļþϵͳ£¬Ö§³Ö Hadoop ·Ö²¼Ê½·¢Îļþϵͳ¼°ÆäËû¡£
ÌṩÁËÒ»¸ö¶ÈÁ¿Ö¸±êϵͳ£¬¿ÉÓÃÓÚ¸ú×ٹܵÀµÄÖ´ÐÐ×´¿ö¡£
ÍøÉÏÒѾÓкܶàÈËд¹ý Beam 2.0.0 °æ±¾Ö®Ç°µÄ×ÊÁÏ£¬µ«ÊÇ 2.0.0 °æ±¾ºó API ºÜ¶àд·¨±ä¶¯½Ï´ó£¬±¾ÎĽ«´ø×Å´ó¼Ò´ÓÁã»ù´¡µ½
Apache Beam ÈëÃÅ¡£
Apache Beam Ó¦Óó¡¾°
Google Cloud¡¢PayPal¡¢Talend µÈ¹«Ë¾¶¼ÔÚʹÓà Beam£¬¹úÄÚ°üÀ¨°¢Àï°Í°Í¡¢°Ù¶È¡¢½ðɽ¡¢ËÕÄþ¡¢¾Å´Î·½´óÊý¾Ý¡¢360¡¢»Û¾ÛÊýͨÐÅÏ¢¼¼ÊõÓÐÏÞ¹«Ë¾µÈÒ²ÔÚʹÓÃ
Beam£¬Í¬Ê±»¹ÓÐһЩ´óÊý¾Ý¹«Ë¾µÄ¼Ü¹¹Ê¦»òÑз¢ÈËÔ±ÕýÔÚÒ»Æð½øÐÐÑо¿¡£Apache Beam ÖÐÎÄÉçÇøÕýÔÚ¼¯³ÉһЩ¹¤×÷ÖеÄ
runners ºÍ sdk IO£¬°üÀ¨È˹¤ÖÇÄÜ¡¢»úÆ÷ѧϰºÍʱÐòÊý¾Ý¿âµÈһЩ¹¦ÄÜ¡£
ÒÔÏÂΪӦÓó¡¾°µÄ¼¸¸öÀý×Ó£º
Beam ¿ÉÒÔÓÃÓÚ ETL Job ÈÎÎñ
Beam µÄÊý¾Ý¿ÉÒÔͨ¹ý SDKs µÄ IO ½ÓÈ룬ͨ¹ý¹ÜµÀ¿ÉÒÔÓúóÃæµÄ Runners ×öÇåÏ´¡£
Beam Êý¾Ý²Ö¿â¿ìËÙÇл»¡¢¿ç²Ö¿â
ÓÉÓÚ Beam µÄÊý¾ÝÔ´ÊǶàÑù IO£¬ËùÒÔÓà Beam ¿ÉÒÔ¿ìËÙÇл»ÈκÎÊý¾Ý²Ö¿â¡£
Beam ¼ÆËã´¦ÀíÆ½Ì¨Çл»¡¢¿çƽ̨
Runners ĿǰÌṩÁË 3-4 ÖÖ¿ÉÒÔÇл»µÄƽ̨£¬Ëæ×Å Beam µÄÇ¿´óÓ¦¸Ã»áÓиü¶àµÄƽ̨Ìṩ¸ø´ó¼ÒʹÓá£
Apache Beam ÔËÐÐÁ÷³Ì

4-1 Êý¾Ý´¦ÀíÁ÷³Ì
Èçͼ 4-1 Ëùʾ£¬Apache Beam ´óÌåÔËÐÐÁ÷³Ì·Ö³ÉÈý´ó²¿·Ö£º
Modes
Modes ÊÇ Beam µÄÄ£ÐÍ»ò½ÐÊý¾ÝÀ´Ô´µÄ IO£¬ËüÊÇÓɶàÖÖÊý¾ÝÔ´»ò²Ö¿âµÄ IO ×é³É£¬Êý¾ÝÔ´Ö§³ÖÅú´¦ÀíºÍÁ÷´¦Àí¡£
Pipeline
Pipeline ÊÇ Beam µÄ¹ÜµÀ£¬ËùÓеÄÅú´¦Àí»òÁ÷´¦Àí¶¼ÒªÍ¨¹ýÕâ¸ö¹ÜµÀ°ÑÊý¾Ý´«Êäµ½ºó¶ËµÄ¼ÆËãÆ½Ì¨¡£Õâ¸ö¹ÜµÀÏÖÔÚÊÇΨһµÄ¡£Êý¾ÝÔ´¿ÉÒÔÇл»¶àÖÖ£¬¼ÆËãÆ½Ì¨»ò´¦ÀíÆ½Ì¨Ò²Ö§³Ö¶àÖÖ¡£ÐèҪעÒâµÄÊÇ£¬¹ÜµÀÖ»ÓÐÒ»Ìõ£¬ËüµÄ×÷ÓÃÊÇÁ¬½ÓÊý¾ÝºÍ
Runtimes ƽ̨¡£
Runtimes
Runtimes ÊÇ´óÊý¾Ý¼ÆËã»ò´¦ÀíÆ½Ì¨£¬Ä¿Ç°Ö§³Ö Apache Flink¡¢Apache Spark¡¢Direct
Pipeline ºÍ Google Clound Dataflow ËÄÖÖ¡£ÆäÖÐ Apache Flink
ºÍ Apache Spark ͬʱ֧³Ö±¾µØºÍÔÆ¶Ë¡£Direct Pipeline ½öÖ§³Ö±¾µØ£¬Google
Clound Dataflow ½öÖ§³ÖÔÆ¶Ë¡£³ý´ËÖ®Í⣬ºóÆÚ Beam ¹úÍâÑз¢ÍŶӻ¹»á¼¯³ÉÆäËû´óÊý¾Ý¼ÆËãÆ½Ì¨¡£ÓÉÓڹȸèδ½øÈëÖйú£¬Ä¿Ç°¹úÄÚ¿ª·¢ÈËÔ±ÔÚ¹¤×÷ÖжԹȸèÔÆµÄʹÓÃÓ¦¸Ã²»ÊǺܶ࣬Ö÷ÒªÒÔǰÁ½ÖÖΪÖ÷¡£ÎªÁËʹ¶ÁÕß¶ÁÍêÎÄÕºóÄÜ¿ìËÙѧϰÇÒ¸üÌù½üʵ¼Ê¹¤×÷»·¾³£¬ºóÐøÎÄÕÂÖÐÎÒ»áÒÔǰÁ½ÖÖ×÷Ϊ´óÊý¾Ý¼ÆËã»ò´¦ÀíÆ½Ì¨½øÐÐÑÝʾ¡£
Beam Model ¼°Æä¹¤×÷Á÷³Ì
Beam Model Ö¸µÄÊÇ Beam µÄ±à³Ì·¶Ê½£¬¼´ Beam SDK ±³ºóµÄÉè¼ÆË¼Ïë¡£ÔÚ½éÉÜ Beam
Model ֮ǰ£¬ÏȼòÒª½éÉÜһϠBeam Model Òª´¦ÀíµÄÎÊÌâÓòÓëһЩ»ù±¾¸ÅÄî¡£
Êý¾ÝÔ´ÀàÐÍ¡£·Ö²¼Ê½Êý¾ÝÀ´Ô´ÀàÐÍÒ»°ã¿ÉÒÔ·ÖΪÁ½À࣬ÓнçµÄÊý¾Ý¼¯ºÍÎÞ½çµÄÊý¾ÝÁ÷¡£ÓнçµÄÊý¾Ý¼¯£¬±ÈÈçÒ»¸ö
Ceph ÖеÄÎļþ£¬Ò»¸ö Mongodb ±íµÈ£¬ÌصãÊÇÊý¾ÝÒѾ´æÔÚ£¬Êý¾Ý¼¯ÓÐÒÑÖªµÄ¡¢¹Ì¶¨µÄ´óС£¬Ò»°ã´æÔÚ´ÅÅÌÉÏ£¬²»»áͻȻÏûʧ¡£¶øÎÞ½çµÄÊý¾ÝÁ÷£¬±ÈÈç
Kafka ÖÐÁ÷¹ýÀ´µÄÊý¾ÝÁ÷£¬ÕâÖÖÊý¾ÝµÄÌØµãÊÇÊý¾Ý¶¯Ì¬Á÷È롢ûÓб߽硢ÎÞ·¨È«²¿³Ö¾Ã»¯µ½´ÅÅÌÉÏ¡£Beam
¿ò¼ÜÉè¼ÆÊ±ÐèÒªÕë¶ÔÕâÁ½ÖÖÊý¾ÝµÄ´¦Àí½øÐп¼ÂÇ£¬¼´Åú´¦ÀíºÍÁ÷´¦Àí¡£
ʱ¼ä¡£·Ö²¼Ê½¿ò¼ÜµÄʱ¼ä´¦ÀíÓÐÁ½ÖÖ£¬Ò»ÖÖÊÇÈ«Á¿¼ÆË㣬ÁíÒ»ÖÖÊDz¿·ÖÔöÁ¿¼ÆËã¡£ÎÒ¸ø´ó¼Ò¾Ù¸öÀý×Ó£ºÀýÈçÎÒÃÇÍæ¡°ÍõÕßũҩ¡±ÓÎÏ·£¬ÓÎÏ·µÄÊý¾ÝÐèҪʵʱµØÁ÷Ïò·þÎñÆ÷£¬µôѪÇé¿ö»áËæ×Åʱ¼äʵʱ±ä»¯£¬µ«ÊÇÅÅÐаñµÄÊý¾ÝÔòÊÇÈ«²¿Íæ¼ÒÔÚÒ»¶¨Ê±¼äÄÚµÄÅÅÃû£¬ÀýÈçÒ»ÖÜ»òÒ»¸öÔ¡£Beam
Õë¶ÔÕâÁ½ÖÖÇé¿ö¶¼Éè¼ÆÁ˶ÔÓ¦µÄ´¦Àí·½Ê½¡£
ÂÒÐò¡£¶ÔÓÚÁ÷´¦Àí¿ò¼Ü´¦ÀíµÄÊý¾ÝÁ÷À´Ëµ£¬Êý¾Ýµ½´ï´óÌå·ÖÁ½ÖÖ£¬Ò»ÖÖÊǰ´ÕÕ Process Time ¶¨Òåʱ¼ä´°¿Ú£¬ÕâÖÖ²»Óÿ¼ÂÇÂÒÐòÎÊÌ⣬ÒòΪ¶¼Êǹرյ±Ç°´°¿Úºó²Å½øÐÐÏÂÒ»¸ö´°¿Ú²Ù×÷£¬ÐèÒªµÈ´ý£¬ËùÒÔÖ´Ðж¼ÊÇÓÐÐòµÄ¡£¶øÁíÒ»ÖÖ£¬Event
Time ¶¨ÒåµÄʱ¼ä´°¿ÚÔò²»ÐèÒªµÈ´ý£¬¿ÉÄܵ±Ç°²Ù×÷»¹Ã»Óд¦ÀíÍ꣬¾ÍÖ±½ÓÖ´ÐÐÏÂÒ»¸ö²Ù×÷£¬Ôì³ÉÏûϢ˳Ðò´¦Àíµ«½á¹û²»Êǰ´Ë³ÐòÅÅÐòÁË¡£ÀýÈçÎÒÃǵĶ©µ¥ÏûÏ¢£¬²ÉÓÃÁË·Ö²¼Ê½´¦Àí£¬Èç¹ûϵ¥²Ù×÷ËùÊô·þÎñÆ÷´¦ÀíËٶȱȽÏÂý£¬¶øÓû§Ö§¸¶µÄ·þÎñÆ÷Ëٶȷdz£¿ì£¬Õâʱ×îºóµÄ¶©µ¥²Ù×÷ʱ¼äÖá¾Í»á³öÏÖÒ»ÖÖÇé¿ö£¬Ïµ¥ÔÚÖ§¸¶µÄºóÃæ¡£¶ÔÓÚÕâÖÖÇé¿ö£¬ÈçºÎÈ·¶¨³Ùµ½Êý¾Ý£¬ÒÔ¼°¶ÔÓÚ³Ùµ½Êý¾ÝÈçºÎ´¦Àíͨ³£ÊǺÜÂé·³µÄÊÂÇé¡£
Beam Model ´¦ÀíµÄÄ¿±êÊý¾ÝÊÇÎÞ½çµÄʱ¼äÂÒÐòÊý¾ÝÁ÷£¬²»¿¼ÂÇʱ¼ä˳Ðò»òÓнçµÄÊý¾Ý¼¯¿É¿´×öÊÇÎÞ½çÂÒÐòÊý¾ÝÁ÷µÄÒ»¸öÌØÀý¡£Beam
Model ´ÓÏÂÃæËĸöά¶È¹éÄÉÁËÓû§ÔÚ½øÐÐÊý¾Ý´¦ÀíµÄʱºòÐèÒª¿¼ÂǵÄÎÊÌ⣺
What¡£ÈçºÎ¶ÔÊý¾Ý½øÐмÆË㣿ÀýÈ磬»úÆ÷ѧϰÖÐѵÁ·Ñ§Ï°Ä£ÐÍ¿ÉÒÔÓà Sum »òÕß Join µÈ¡£ÔÚ
Beam SDK ÖÐÓÉ Pipeline ÖеIJÙ×÷·ûÖ¸¶¨¡£
Where¡£Êý¾ÝÔÚʲô·¶Î§ÖмÆË㣿ÀýÈ磬»ùÓÚ Process-Time µÄʱ¼ä´°¿Ú¡¢»ùÓÚ Event-Time
µÄʱ¼ä´°¿Ú¡¢»¬¶¯´°¿ÚµÈµÈ¡£ÔÚ Beam SDK ÖÐÓÉ Pipeline µÄ´°¿ÚÖ¸¶¨¡£
When¡£ºÎʱÊä³ö¼ÆËã½á¹û£¿ÀýÈ磬ÔÚ 1 СʱµÄ Event-Time ʱ¼ä´°¿ÚÖУ¬Ã¿¸ô 1 ·ÖÖÓ½«µ±Ç°´°¿Ú¼ÆËã½á¹ûÊä³ö¡£ÔÚ
Beam SDK ÖÐÓÉ Pipeline µÄ Watermark ºÍ´¥·¢Æ÷Ö¸¶¨¡£
How¡£³Ùµ½Êý¾ÝÈçºÎ´¦Àí£¿ÀýÈ磬½«³Ùµ½Êý¾Ý¼ÆËãÔöÁ¿½á¹ûÊä³ö£¬»òÊǽ«³Ùµ½Êý¾Ý¼ÆËã½á¹ûºÍ´°¿ÚÄÚÊý¾Ý¼ÆËã½á¹ûºÏ²¢³ÉÈ«Á¿½á¹ûÊä³ö¡£ÔÚ
Beam SDK ÖÐÓÉ Accumulation Ö¸¶¨¡£
Beam Model ½«¡°WWWH¡±Ëĸöά¶È³éÏó³öÀ´×é³ÉÁË Beam SDK£¬Óû§ÔÚ»ùÓÚ Beam
SDK ¹¹½¨Êý¾Ý´¦ÀíÒµÎñÂ߼ʱ£¬Ã¿Ò»²½Ö»ÐèÒª¸ù¾ÝÒµÎñÐèÇó°´ÕÕÕâËĸöά¶Èµ÷ÓþßÌåµÄ API£¬¼´¿ÉÉú³É·Ö²¼Ê½Êý¾Ý´¦Àí
Pipeline£¬²¢Ìá½»µ½¾ßÌåÖ´ÐÐÒýÇæÉÏÖ´ÐС£¡°WWWH¡±Ëĸöά¶ÈÖ»ÊÇ´ÓÒµÎñµÄ½Ç¶È¿´´ýÎÊÌ⣬²¢²»ÊÇÈ«²¿ÊÊÓÃÓÚ×Ô¼ºµÄÒµÎñ¡£×ö¼¼Êõ¼Ü¹¹Ò»¶¨Òª½áºÏ×Ô¼ºµÄÒµÎñʹÓÃÏàÓ¦µÄ¼¼ÊõÌØÐÔ»ò¿ò¼Ü¡£Beam
×öΪ¡°Ò»Í³¡±µÄ¿ò¼Ü£¬Îª¿ª·¢Õß´øÀ´ÁË·½±ã¡£
Beam SDKs
Beam SDK ¸øÉϲãÓ¦ÓõĿª·¢ÕßÌṩÁËÒ»¸öͳһµÄ±à³Ì½Ó¿Ú£¬¿ª·¢Õß²»ÐèÒªÁ˽âµ×²ãµÄ¾ßÌåµÄ´óÊý¾Ýƽ̨µÄ¿ª·¢½Ó¿ÚÊÇʲô£¬Ö±½Óͨ¹ý
Beam SDK µÄ½Ó¿Ú¾Í¿ÉÒÔ¿ª·¢Êý¾Ý´¦ÀíµÄ¼Ó¹¤Á÷³Ì£¬²»¹ÜÊäÈëÊÇÓÃÓÚÅú´¦ÀíµÄÓнçÊý¾Ý¼¯£¬»¹ÊÇÁ÷ʽµÄÎÞ½çÊý¾Ý¼¯¡£¶ÔÓÚÕâÁ½ÀàÊäÈëÊý¾Ý£¬Beam
SDK ¶¼Ê¹ÓÃÏàͬµÄÀàÀ´±íÏÖ£¬²¢ÇÒʹÓÃÏàͬµÄת»»²Ù×÷½øÐд¦Àí¡£Beam SDK ÓµÓв»Í¬±à³ÌÓïÑÔµÄʵÏÖ£¬Ä¿Ç°ÒѾÍêÕûµØÌṩÁË
Java µÄ SDK£¬Python µÄ SDK »¹ÔÚ¿ª·¢ÖУ¬ÏàÐÅδÀ´»á·¢²¼¸ü¶à²»Í¬±à³ÌÓïÑ﵀ SDK¡£
Beam 2.0 µÄ SDKs ĿǰÓУº
Amqp£º¸ß¼¶ÏûÏ¢¶ÓÁÐÐÒé¡£
Cassandra£ºCassandra ÊÇÒ»¸ö NoSQL ÁÐ×壨column family£©ÊµÏÖ£¬Ê¹ÓÃÓÉ
Amazon Dynamo ÒýÈëµÄ¼Ü¹¹·½ÃæµÄÌØÐÔÀ´Ö§³Ö Big Table Êý¾ÝÄ£ÐÍ¡£Cassandra
µÄһЩÓÅÊÆÈçÏÂËùʾ£º
¸ß¶È¿ÉÀ©Õ¹ÐԺ͸߶ȿÉÓÃÐÔ£¬Ã»Óе¥µã¹ÊÕÏ
NoSQL ÁÐ×åʵÏÖ
·Ç³£¸ßµÄдÈëÍÌÍÂÁ¿ºÍÁ¼ºÃµÄ¶ÁÈ¡ÍÌÍÂÁ¿
ÀàËÆ SQL µÄ²éѯÓïÑÔ£¨´Ó 0.8 °æ±¾Æð£©£¬²¢Í¨¹ý¶þ¼¶Ë÷ÒýÖ§³ÖËÑË÷
¿Éµ÷½ÚµÄÒ»ÖÂÐԺͶԸ´ÖƵÄÖ§³ÖÁé»îµÄģʽ
Elasticesarch£ºÒ»¸öʵʱµÄ·Ö²¼Ê½ËÑË÷ÒýÇæ¡£
Google-cloud-platform£º¹È¸èÔÆ IO¡£
Hadoop-file-system£º²Ù×÷ Hadoop ÎļþϵͳµÄ IO¡£
Hadoop-hbase£º²Ù×÷ Hadoop É쵀 Hbase µÄ½Ó¿Ú IO¡£
Hcatalog£ºHcatalog ÊÇ Apache ¿ªÔ´µÄ¶ÔÓÚ±íºÍµ×²ãÊý¾Ý¹ÜÀíͳһ·þÎñƽ̨¡£
Jdbc£ºÁ¬½Ó¸÷ÖÖÊý¾Ý¿âµÄÊý¾Ý¿âÁ¬½ÓÆ÷¡£
Jms£ºJava ÏûÏ¢·þÎñ£¨Java Message Service£¬¼ò³Æ JMS£©ÊÇÓÃÓÚ·ÃÎÊÆóÒµÏûϢϵͳµÄ¿ª·¢ÉÌÖÐÁ¢µÄ
API¡£ÆóÒµÏûϢϵͳ¿ÉÒÔÐÖúÓ¦ÓÃÈí¼þͨ¹ýÍøÂç½øÐÐÏûÏ¢½»»¥¡£JMS ÔÚÆäÖаçÑݵĽÇÉ«Óë JDBC ºÜÏàËÆ£¬ÕýÈç
JDBC ÌṩÁËÒ»Ì×ÓÃÓÚ·ÃÎʸ÷ÖÖ²»Í¬¹ØÏµÊý¾Ý¿âµÄ¹«¹² API£¬JMS Ò²ÌṩÁ˶ÀÁ¢ÓÚÌØ¶¨³§ÉÌµÄÆóÒµÏûϢϵͳ·ÃÎÊ·½Ê½¡£
Kafka£º´¦ÀíÁ÷Êý¾ÝµÄÇáÁ¿¼¶´óÊý¾ÝÏûϢϵͳ£¬»ò½ÐÏûÏ¢×ÜÏß¡£
Kinesis£º¶Ô½ÓÑÇÂíÑ·µÄ·þÎñ£¬¿ÉÒÔ¹¹½¨ÓÃÓÚ´¦Àí»ò·ÖÎöÁ÷Êý¾ÝµÄ×Ô¶¨ÒåÓ¦ÓóÌÐò£¬ÒÔÂú×ãÌØ¶¨ÐèÇó¡£
Mongodb£ºMongoDB ÊÇÒ»¸ö»ùÓÚ·Ö²¼Ê½Îļþ´æ´¢µÄÊý¾Ý¿â¡£
Mqtt£ºIBM ¿ª·¢µÄÒ»¸ö¼´Ê±Í¨Ñ¶ÐÒé¡£
Solr£ºÑÇʵʱµÄ·Ö²¼Ê½ËÑË÷ÒýÇæ¼¼Êõ¡£
xml£ºÒ»ÖÖÊý¾Ý¸ñʽ¡£
Beam Pipeline Runners
Beam Pipeline Runner ½«Óû§Óà Beam Ä£ÐͶ¨Ò忪·¢µÄ´¦ÀíÁ÷³Ì·Òë³Éµ×²ãµÄ·Ö²¼Ê½Êý¾Ý´¦ÀíÆ½Ì¨Ö§³ÖµÄÔËÐÐʱ»·¾³¡£ÔÚÔËÐÐ
Beam ³ÌÐòʱ£¬ÐèÒªÖ¸Ã÷µ×²ãµÄÕýÈ· Runner ÀàÐÍ£¬Õë¶Ô²»Í¬µÄ´óÊý¾Ýƽ̨£¬»áÓв»Í¬µÄ Runner¡£Ä¿Ç°
Flink¡¢Spark¡¢Apex ÒÔ¼°¹È¸èµÄ Cloud DataFlow ¶¼ÓÐÖ§³Ö Beam µÄ
Runner¡£
ÐèҪעÒâµÄÊÇ£¬ËäÈ» Apache Beam ÉçÇø·Ç³£Ï£ÍûËùÓÐµÄ Beam Ö´ÐÐÒýÇæ¶¼Äܹ»Ö§³Ö Beam
SDK ¶¨ÒåµÄ¹¦ÄÜÈ«¼¯£¬µ«ÊÇÔÚʵ¼ÊʵÏÖÖпÉÄÜÎÞ·¨´ïµ½ÕâÒ»ÆÚÍû¡£ÀýÈ磬»ùÓÚ MapReduce µÄ Runner
ÏÔÈ»ºÜÄÑʵÏÖºÍÁ÷´¦ÀíÏà¹ØµÄ¹¦ÄÜÌØÐÔ¡£¾ÍĿǰ״̬¶øÑÔ£¬¶Ô Beam Ä£ÐÍÖ§³Ö×îºÃµÄ¾ÍÊÇÔËÐÐÓڹȸèÔÆÆ½Ì¨Ö®ÉϵÄ
Cloud Dataflow£¬ÒÔ¼°¿ÉÒÔÓÃÓÚ×Ô½¨»ò²¿ÊðÔڷǹȸèÔÆÖ®É쵀 Apache Flink¡£µ±È»£¬ÆäËüµÄ
Runner Ò²ÕýÔÚÓÍ·¸ÏÉÏ£¬Õû¸öÐÐÒµÒ²ÔÚ³¯×ÅÖ§³Ö Beam Ä£Ð͵ķ½Ïò·¢Õ¹¡£
Beam 2.0 µÄ Runners ¿ò¼ÜÈçÏ£º
Apex
µ®ÉúÓÚ 2015 Äê 6 Ô嵀 Apache Apex£¬ÆäͬÑùÔ´×Ô DataTorrent ¼°ÆäÁîÈËÓ¡ÏóÉî¿ÌµÄ
RTS ƽ̨£¬ÆäÖаüº¬Ò»Ì׺ËÐÄ´¦ÀíÒýÇæ¡¢ÒDZí°å¡¢Õï¶ÏÓë¼à¿Ø¹¤¾ßÌ×¼þÍâ¼ÓרÃÅÃæÏòÊý¾Ý¿ÆÑ§¼ÒÓû§µÄͼÐÎÁ÷±à³Ìϵͳ
dtAssemble¡£Ö÷ÒªÓÃÓÚÁ÷´¦Àí£¬³£ÓÃÓÚÎïÁªÍøµÈ³¡¾°¡£
Direct-java
±¾µØ´¦ÀíºÍÔËÐÐ runner¡£
Flink_2.10
Flink ÊÇÒ»¸öÕë¶ÔÁ÷Êý¾ÝºÍÅúÊý¾ÝµÄ·Ö²¼Ê½´¦ÀíÒýÇæ¡£
Gearpump
Gearpump ÊÇÒ»¸ö»ùÓÚ Akka Actor µÄÇáÁ¿¼¶µÄʵʱÁ÷¼ÆËãÒýÇæ¡£Èç½ñÁ÷ƽ̨ÐèÒª´¦ÀíÀ´×Ô¸÷ÖÖÒÆ¶¯¶ËºÍÎïÁªÍøÉ豸µÄº£Á¿Êý¾Ý£¬ÏµÍ³ÒªÄܲ»¼ä¶ÏµØÌṩ·þÎñ£¬¶ÔÊý¾ÝµÄ´¦ÀíÒªÄÜ×öµ½²»¶ªÊ§²»Öظ´£¬¶Ô¸÷ÖÖÈíÓ²¼þ´íÎóÄÜÆ½»¬´¦Àí£¬¶ÔÓû§µÄÊäÈëÒªÄÜʵʱÏìÓ¦¡£³ýÁËÕâЩϵͳ²ãÃæµÄÐèÇóÍ⣬Óû§²ãÃæµÄ½Ó¿Ú»¹ÒªÄÜ×öµ½·á¸»¶øÁé»î£¬Ò»·½Ã棬ƽ̨ҪÌṩ×ã¹»·á¸»µÄ»ù´¡ÉèÊ©£¬ÄÜ×î¼ò»¯Ó¦ÓóÌÐòµÄ±àд£»ÁíÒ»·½Ã棬Õâ¸öƽ̨ӦÌṩ¾ßÓбíÏÖÁ¦µÄ±à³Ì
API£¬ÈÃÓû§ÄÜÁé»î±í´ï¸÷ÖÖ¼ÆË㣬²¢ÇÒÕû¸öϵͳ¿ÉÒÔ¶¨ÖÆ£¬ÔÊÐíÓû§Ñ¡Ôñµ÷¶È²ßÂԺͲ¿Êð»·¾³£¬ÔÊÐíÓû§ÔÚ²»Í¬µÄÖ¸±ê¼ä×öÕÛÖÐÈ¡ÉᣬÒÔÂú×ãÌØ¶¨µÄÐèÇó¡£Akka
Actor ÌṩÁËͨÐÅ¡¢²¢·¢¡¢¸ôÀë¡¢ÈÝ´íµÄ»ù´¡ÉèÊ©£¬Gearpump ͨ¹ý°Ñ³éÏó²ã´ÎÌáÉýµ½ Actor
ÕâÒ»²ã£¬ÆÁ±ÎÁ˵ײãµÄϸ½Ú£¬×¨×¢ÓÚÁ÷´¦ÀíÐèÇó±¾Éí£¬Äܸü¼òµ¥¶øÓÖ¸ßЧµØ½â¾öÉÏÊöÎÊÌâ¡£
Dataflow
2016 Äê 2 Ô·ݣ¬¹È¸è¼°ÆäºÏ×÷»ï°éÏò Apache ¾èÔùÁËÒ»´óÅú´úÂ룬´´Á¢ÁË·õ»¯ÖÐµÄ Beam
ÏîÄ¿£¨×î³õ½Ð Apache Dataflow£©¡£ÕâЩ´úÂëÖеĴ󲿷ÖÀ´×ÔÓڹȸè Cloud Dataflow
SDK¡ª¡ª¿ª·¢ÕßÓÃÀ´Ð´Á÷´¦ÀíºÍÅú´¦Àí¹ÜµÀ£¨pipelines£©µÄ¿â£¬¿ÉÔÚÈκÎÖ§³ÖµÄÖ´ÐÐÒýÇæÉÏÔËÐС£µ±Ê±£¬Ö§³ÖµÄÖ÷ÒªÒýÇæÊǹȸè
Cloud Dataflow¡£
Spark
Apache Spark ÊÇÒ»¸öÕýÔÚ¿ìËٳɳ¤µÄ¿ªÔ´¼¯Èº¼ÆËãϵͳ¡£Apache Spark Éú̬ϵͳÖеİüºÍ¿ò¼ÜÈÕÒæ·á¸»£¬Ê¹µÃ
Spark Äܹ»Ö´Ðи߼¶Êý¾Ý·ÖÎö¡£Apache Spark µÄ¿ìËٳɹ¦µÃÒæÓÚËüµÄÇ¿´ó¹¦ÄܺÍÒ×ÓÃÐÔ¡£Ïà±ÈÓÚ´«Í³µÄ
MapReduce ´óÊý¾Ý·ÖÎö£¬Spark ЧÂʸü¸ß¡¢ÔËÐÐʱËٶȸü¿ì¡£Apache Spark ÌṩÁËÄÚ´æÖеķֲ¼Ê½¼ÆËãÄÜÁ¦£¬¾ßÓÐ
Java¡¢Scala¡¢Python¡¢R ËÄÖÖ±à³ÌÓïÑ﵀ API ±à³Ì½Ó¿Ú¡£
ʵս£º¿ª·¢µÚÒ»¸ö Beam ³ÌÐò
8.1 ¿ª·¢»·¾³
ÏÂÔØ°²×° JDK 7 »ò¸üеİ汾£¬¼ì²â JAVA_HOME »·¾³±äÁ¿¡£±¾ÎÄʾÀýʹÓõÄÊÇ JDK
1.8¡£
ÏÂÔØ maven ²¢ÅäÖ㬱¾ÎÄʾÀýʹÓõÄÊÇ maven-3.3.3¡£
¿ª·¢»·¾³ myeclipse¡¢Spring Tool Suite ¡¢IntelliJ IDEA£¬Õâ¸ö¿ÉÒÔ°´ÕÕ¸öÈËϲºÃ£¬±¾ÎÄʾÀýÓõÄÊÇ
STS¡£
8.2 ¿ª·¢µÚÒ»¸ö wordCount ³ÌÐò²¢ÇÒÔËÐÐ
1 н¨Ò»¸ö maven ÏîÄ¿

2 ÔÚ pom.xml ÎļþÖÐÌí¼ÓÁ½¸ö jar °ü

3 н¨Ò»¸ö txtIOTest.java

дÈëÒÔÏ´úÂ룺

4 ÒòΪ Windows É쵀 Beam2.0.0 ²»Ö§³Ö±¾µØÂ·¾¶£¬ÐèÒª²¿Êðµ½ Linux ÉÏ£¬ÐèÒª´ò°üÈçͼ£¬´Ë´¦×¢ÒâÒª°ÑÒÀÀµ
jar ¶¼´ò°ü½øÈ¥¡£


5 ²¿Êð beam.jar µ½ Linux »·¾³ÖÐ
ʹÓà Xshell 5 µÇ¼ÐéÄâ»ú»òÕß Linux ϵͳ¡£Óà rz ÃüÁî°Ñ¸Õ²Å´ò°üµÄÎļþÉÏ´«ÉÏÈ¥¡£ÆäÖÐÐéÄâ»úÒª°²×°ÉÏ
jdk ²¢ÅäÖúû·¾³±äÁ¿¡£
ÎÒÃÇ¿ÉÒÔÓÃÊäÈë javac ÃüÁî²âÊÔһϡ£

ÎÒÃÇ°Ñ beam.jar ÉÏ´«µ½ /usr/local/ Ŀ¼ÏÂÃæ£¬È»ºóн¨Ò»¸öÎļþ£¬Ò²¾ÍÊÇÔ´Îļþ¡£ÃüÁtouch
text.txt¡¡ÃüÁchmod o+rwx¡¡text.txt
ÐÞ¸Ä text.txt ²¢Ìí¼ÓÊý¾Ý¡£¡¡ÃüÁvi text.txt

ÔËÐÐÃüÁjava -jar beam.jar£¬Éú³ÉÎļþ¡£

Óà cat ÃüÁî²é¿´ÎļþÄÚÈÝ£¬ÀïÃæ¾ÍÊÇͳ¼ÆµÄ½á¹û¡£

8.3 ʵսÆÊÎö
ÎÒÃÇ¿ÉÒÔͨ¹ýÒÔÉÏʵս´úÂë½øÒ»²½Á˽â Beam µÄÔËÓÃÔÀí¡£
µÚÒ»¼þÊÂÇéÊǴһ¸ö¹ÜµÀ£¨Pipeline£©£¬ÀýÈçÎÒÃÇСʱºò¼ÒÀï½½µØÓõġ°Ë®¹Ü¡±¡£Ëü¾ÍÊÇÁ¬½ÓˮԴºÍ´¦ÀíµÄÇÅÁº¡£
PipelineOptions
pipelineOptions = PipelineOptionsFactory.create();//
´´½¨¹ÜµÀ |
µÚ¶þ¼þÊÂÇéÊÇÈÃÎÒÃǵĹܵÀÓÐÒ»¸ö´¦Àí¿ò¼Ü£¬Ò²¾ÍÊÇÎÒÃÇµÄ Runtimes ¡£ÀýÈçÎÒÃǽӵ½Ë®ÒªÔõô´¦Àí£¬ÊÇÊäË͸øÎÒÃdzÇÊеÄÎÛË®´¦Àí³§£¬»¹ÊÇÆäËû¡£Õâ¸öÎÛË®´¦Àí³§¾ÍÏ൱ÓÚÎÒÃǵĴ¦Àí¿ò¼Ü£¬ÀýÈçÏÖÔÚÁ÷ÐеÄ
Apache Spark »ò Apache Flink¡£Õâ¸öÒª¸ù¾Ý×Ô¼ºµÄÒµÎñÖ¸¶¨£¬ÈçÏ´úÂëÖÐÎÒÖ¸¶¨Á˱¾µØµÄ´¦Àí¿ò¼Ü¡£
pipelineOptions.setRunner(DirectRunner.class); |
µÚÈý¼þÊÂÇéÒ²ÊÇ Beam ×îºóÒ»¸öÖØÒªµÄµØ·½£¬¾ÍÊÇÄ£ÐÍ (Model)£¬Í¨Ë׵㽲¾ÍÊÇÎÒÃǵÄÊý¾ÝÀ´Ô´¡£Èç¹û½áºÏÒÔÉϵÚÒ»¼þºÍµÚ¶þ¼þµÄÊÂÇé˵¾ÍÊÇË®´ÓÄÄÀïÀ´£¬Ë®µÄÀ´Ô´¿ÉÄÜÊǺÓÀï¡¢¿ÉÄÜÊÇÎÛˮͨµÀµÈµÈ¡£±¾ÊµÀýÓõÄÊÇÓнç¹Ì¶¨´óСµÄÎı¾Îļþ¡£µ±È»
Model »¹°üº¬ÎÞ½çÊý¾Ý£¬ÀýÈç kafka µÈµÈ£¬¿ÉÒÔ¸ù¾ÝµÄÐèÇóÁé»îÔËÓá£
pipeline.apply
(TextIO.read(). from ("/usr/local/text.txt")).
apply
(" ExtractWords", ParDo.of (new DoFn<String,
String>() //ºóÊ¡ÂÔ |
×îºóÒ»²½ÊÇ´¦Àí½á¹û£¬Õâ¸ö±È½Ï¼òµ¥£¬¿ÉÒÔ¸ù¾Ý×Ô¼ºµÄÐèÇó´¦Àí¡£Ï£Íûͨ¹ý´úÂëµÄʵս½áºÏÔÀíÆÊÎö¿ÉÒÔ°ïÖú´ó¼Ò¸ü¿ìµØÊìϤ
Beam ²¢Äܹ»¼òµ¥µØÔËÓà Beam¡£
×ܽá
Apache Beam ÊǼ¯³ÉÁ˺ܶàÊý¾ÝÄ£Ð͵ÄÒ»¸öͳһ»¯Æ½Ì¨£¬ËüΪ´óÊý¾Ý¿ª·¢¹¤³ÌʦƵ·±»»Êý¾ÝÔ´»ò¶àÊý¾ÝÔ´¡¢¶à¼ÆËã¿ò¼ÜÌṩÁ˼¯³Éͳһ¿ò¼Üƽ̨¡£Apache
Beam ÉçÇøÏÖÔÚÒѾ¼¯³ÉÁËÊý¾Ý¿âµÄÇл» IO£¬Î´À´ Beam ÖÐÎÄÉçÇø»¹½«Îª Beam ¼¯³É¸ü¶àµÄ
Model ºÍ¼ÆËã¿ò¼Ü£¬Îª´ó¼ÒÌṩ·½±ã¡£ |