Apache
Storm ¼ò½é
Apache Storm µÄǰÉíÊÇ Twitter Storm ƽ̨£¬Ä¿Ç°ÒѾ¹éÓÚ Apache »ù½ð»á¹ÜϽ¡£Apache
Storm ÊÇÒ»¸öÃâ·Ñ¿ªÔ´µÄ·Ö²¼Ê½ÊµÊ±¼ÆËãϵͳ¡£¼ò»¯ÁËÁ÷Êý¾ÝµÄ¿É¿¿´¦Àí£¬Ïñ Hadoop Ò»ÑùʵÏÖʵʱÅú´¦Àí¡£Storm
ºÜ¼òµ¥£¬¿ÉÓÃÓÚÈÎÒâ±à³ÌÓïÑÔ¡£Apache Storm ²ÉÓà Clojure ¿ª·¢¡£Storm ÓкܶàÓ¦Óó¡¾°£¬°üÀ¨ÊµÊ±Êý¾Ý·ÖÎö¡¢Áª»úѧϰ¡¢³ÖÐø¼ÆËã¡¢·Ö²¼Ê½
RPC¡¢ETL µÈ¡£Storm Ëٶȷdz£¿ì£¬Ò»¸ö²âÊÔÔÚµ¥½ÚµãÉÏʵÏÖÿÃëÒ»°ÙÍòµÄ×é´¦Àí¡£
1¡¢Storm¼¯Èº¼Ü¹¹
Storm¼¯Èº²ÉÓÃÖ÷´Ó¼Ü¹¹·½Ê½£¬Ö÷½ÚµãÊÇNimbus£¬´Ó½ÚµãÊÇSupervisor£¬Óйص÷¶ÈÏà¹ØµÄÐÅÏ¢´æ´¢µ½ZooKeeper¼¯ÈºÖУ¬¼Ü¹¹ÈçÏÂͼËùʾ£º

Nimbus
Storm¼¯ÈºµÄMaster½Úµã£¬¸ºÔð·Ö·¢Óû§´úÂ룬ָÅɸø¾ßÌåµÄSupervisor½ÚµãÉϵÄWorker½Úµã£¬È¥ÔËÐÐTopology¶ÔÓ¦µÄ×é¼þ£¨Spout/Bolt£©µÄTask¡£
Supervisor
Storm¼¯ÈºµÄ´Ó½Úµã£¬¸ºÔð¹ÜÀíÔËÐÐÔÚSupervisor½ÚµãÉϵÄÿһ¸öWorker½ø³ÌµÄÆô¶¯ºÍÖÕÖ¹¡£Í¨¹ýStormµÄÅäÖÃÎļþÖеÄsupervisor.slots.portsÅäÖÃÏ¿ÉÒÔÖ¸¶¨ÔÚÒ»¸öSupervisorÉÏ×î´óÔÊÐí¶àÉÙ¸öSlot£¬Ã¿¸öSlotͨ¹ý¶Ë¿ÚºÅÀ´Î¨Ò»±êʶ£¬Ò»¸ö¶Ë¿ÚºÅ¶ÔÓ¦Ò»¸öWorker½ø³Ì£¨Èç¹û¸ÃWorker½ø³Ì±»Æô¶¯£©¡£
ZooKeeper
ÓÃÀ´Ðµ÷NimbusºÍSupervisor£¬Èç¹ûSupervisorÒò¹ÊÕϳöÏÖÎÊÌâ¶øÎÞ·¨ÔËÐÐTopology£¬Nimbus»áµÚһʱ¼ä¸ÐÖªµ½£¬²¢ÖØÐ·ÖÅäTopologyµ½ÆäËü¿ÉÓõÄSupervisorÉÏÔËÐС£
2¡¢ÔËÐÐ×é¼þ
StromÔÚÔËÐÐÖпɷÖΪspoutÓëboltÁ½¸ö×é¼þ£¬ÆäÖУ¬Êý¾ÝÔ´´Óspout¿ªÊ¼£¬Êý¾ÝÒÔtupleµÄ·½Ê½·¢Ë͵½bolt£¬¶à¸öbolt¿ÉÒÔ´®Á¬ÆðÀ´£¬Ò»¸öboltÒ²¿ÉÒÔ½ÓÈë¶à¸öspot/bolt.ÔËÐÐʱÔÀíÈçÏÂͼ

ÆäÖУ¬¸÷×é¼þ¶¨ÒåÈçÏÂ
Spout£º Êý¾ÝÔ´£¬Ô´Ô´²»¶ÏµÄ·¢ËÍÔª×éÊý¾Ý Tuple
Tuple£º Ôª×éÊý¾ÝµÄ³éÏó½Ó¿Ú£¬¿ÉÒÔÊÇÈκÎÀàÐ͵ÄÊý¾Ý¡£µ«ÊDZØÐëÒª¿ÉÐòÁл¯¡£
Stream£º TupleµÄ¼¯ºÏ¡£Ò»¸ö StreamÄÚµÄ TupleÓµÓÐÏàͬµÄÔ´¡£
Bolt£º Ïû·ÑTupleµÄ½Úµã¡£Ïû·Ñºó¿ÉÄÜ»áÅųöÐ嵀 Tupleµ½¸Ã StreamÉÏ£¬Ò²¿ÉÄÜ»áÅŵ½µ½ÆäËû
Stream£¬Ò²»òÕ߸ù±¾²»ÅÅ¡£¿É²¢·¢¡£
Topology£º ½« Spout¡¢ BoltÕûºÏÆðÀ´µÄÍØÆËͼ¡£¶¨ÒåÁË SpoutºÍ BoltµÄ½áºÏ¹ØÏµ¡¢²¢·¢ÊýÁ¿¡¢ÅäÖõȵȡ£
3¡¢Topology¾ßÌåÔËÐÐ
ÔÚÉÏÃæSpoutºÍBolt×é³ÉÒ»¸öTopology£¬È»ºóͨ¹ýÃüÁÕâ¸öTopology´ò°ü³Éjar°ü£¬Æô¶¯Ïà¹ØÃüÁîÆô¶¯Ó¦ÓþͿÉÒÔÁË£¬Ò»¸öStormÔÚ¼¯ÈºÉÏÔËÐÐÒ»¸öTopologyʱ£¬Ö÷Ҫͨ¹ýÒÔÏÂ3¸öʵÌåÀ´Íê³ÉTopologyµÄÖ´Ðй¤×÷£º
(1). Worker£¨½ø³Ì£©
(2). Executor£¨Ị̈߳©
(3). Task
ÏÂͼ¼òÒªÃèÊöÁËÕâ3ÕßÖ®¼äµÄ¹ØÏµ£º

1¸öworker½ø³ÌÖ´ÐеÄÊÇ1¸ötopologyµÄ×Ó¼¯£¨×¢£º²»»á³öÏÖ1¸öworkerΪ¶à¸ötopology·þÎñ£©¡£1¸öworker½ø³Ì»áÆô¶¯1¸ö»ò¶à¸öexecutorÏß³ÌÀ´Ö´ÐÐ1¸ötopologyµÄcomponent(spout»òbolt)¡£Òò´Ë£¬1¸öÔËÐÐÖеÄtopology¾ÍÊÇÓɼ¯ÈºÖжą̀ÎïÀí»úÉϵĶà¸öworker½ø³Ì×é³ÉµÄ¡£
executorÊÇ1¸ö±»worker½ø³ÌÆô¶¯µÄµ¥¶ÀÏ̡߳£Ã¿¸öexecutorÖ»»áÔËÐÐ1¸ötopologyµÄ1¸öcomponent(spout»òbolt)µÄtask£¨×¢£ºtask¿ÉÒÔÊÇ1¸ö»ò¶à¸ö£¬stormĬÈÏÊÇ1¸öcomponentÖ»Éú³É1¸ötask£¬executorÏß³ÌÀï»áÔÚÿ´ÎÑ»·Àï˳Ðòµ÷ÓÃËùÓÐtaskʵÀý£©¡£
taskÊÇ×îÖÕÔËÐÐspout»òboltÖдúÂëµÄµ¥Ôª£¨×¢£º1¸ötask¼´Îªspout»òboltµÄ1¸öʵÀý£¬executorÏß³ÌÔÚÖ´ÐÐÆÚ¼ä»áµ÷ÓøÃtaskµÄnextTuple»òexecute·½·¨£©¡£topologyÆô¶¯ºó£¬1¸öcomponent(spout»òbolt)µÄtaskÊýÄ¿Êǹ̶¨²»±äµÄ£¬µ«¸ÃcomponentʹÓõÄexecutorÏß³ÌÊý¿ÉÒÔ¶¯Ì¬µ÷Õû£¨ÀýÈ磺1¸öexecutorÏ߳̿ÉÒÔÖ´ÐиÃcomponentµÄ1¸ö»ò¶à¸ötaskʵÀý£©¡£ÕâÒâζ×Å£¬¶ÔÓÚ1¸öcomponent´æÔÚÕâÑùµÄÌõ¼þ£º#threads<=#tasks£¨¼´£ºÏß³ÌÊýСÓÚµÈÓÚtaskÊýÄ¿£©¡£Ä¬ÈÏÇé¿öÏÂtaskµÄÊýÄ¿µÈÓÚexecutorÏß³ÌÊýÄ¿£¬¼´1¸öexecutorÏß³ÌÖ»ÔËÐÐ1¸ötask¡£
×ÜÌåµÄTopology´¦ÀíÁ÷³ÌͼΪ£º

4¡¢Stream Groupings
StormÖÐ×îÖØÒªµÄ³éÏó£¬Ó¦¸Ã¾ÍÊÇStream groupingÁË£¬ËüÄܹ»¿ØÖÆSpot/Bolt¶ÔÓ¦µÄTaskÒÔʲôÑùµÄ·½Ê½À´·Ö·¢Tuple£¬½«Tuple·¢É䵽ĿµÄSpot/Bolt¶ÔÓ¦µÄTask

Ŀǰ£¬Storm Streaming GroupingÖ§³ÖÈçϼ¸ÖÖÀàÐÍ£º
Shuffle Grouping £ºËæ»ú·Ö×飬¾¡Á¿¾ùÔÈ·Ö²¼µ½ÏÂÓÎBoltÖÐ
½«Á÷·Ö×鶨ÒåΪ»ìÅÅ¡£ÕâÖÖ»ìÅÅ·Ö×éÒâζ×ÅÀ´×ÔSpoutµÄÊäÈ뽫»ìÅÅ£¬»òËæ»ú·Ö·¢¸ø´ËBoltÖеÄÈÎÎñ¡£shuffle
grouping¶Ô¸÷¸ötaskµÄtuple·ÖÅäµÄ±È½Ï¾ùÔÈ¡£
Fields Grouping £º°´×ֶηÖ×飬°´Êý¾ÝÖÐfieldÖµ½øÐзÖ×飻ÏàͬfieldÖµµÄTuple±»·¢Ë͵½ÏàͬµÄTask
ÕâÖÖgrouping»úÖÆ±£Ö¤ÏàͬfieldÖµµÄtuple»áȥͬһ¸ötask£¬Õâ¶ÔÓÚWordCountÀ´Ëµ·Ç³£¹Ø¼ü£¬Èç¹ûͬһ¸öµ¥´Ê²»È¥Í¬Ò»¸ötask£¬ÄÇôͳ¼Æ³öÀ´µÄµ¥´Ê´ÎÊý¾Í²»¶ÔÁË¡£¡°if
the stream is grouped by the ¡°user-id¡± field, tuples
with the same ¡°user-id¡± will always Go to the same task¡±.
¡ª¡ª СʾÀý
All grouping £º¹ã²¥
¹ã²¥·¢ËÍ£¬ ¶ÔÓÚÿһ¸ötuple½«»á¸´ÖƵ½Ã¿Ò»¸öboltÖд¦Àí¡£
Global grouping £ºÈ«¾Ö·Ö×飬Tuple±»·ÖÅäµ½Ò»¸öBoltÖеÄÒ»¸öTask£¬ÊµÏÖÊÂÎñÐÔµÄTopology¡£
StreamÖеÄËùÓеÄtuple¶¼»á·¢Ë͸øÍ¬Ò»¸öboltÈÎÎñ´¦Àí£¬ËùÓеÄtuple½«»á·¢Ë͸øÓµÓÐ×îСtask_idµÄboltÈÎÎñ´¦Àí¡£
None grouping £º²»·Ö×é
²»¹Ø×¢²¢Ðд¦Àí¸ºÔØ¾ùºâ²ßÂÔʱʹÓø÷½Ê½£¬Ä¿Ç°µÈͬÓÚshuffle grouping,ÁíÍâstorm½«»á°ÑboltÈÎÎñºÍËûµÄÉÏÓÎÌṩÊý¾ÝµÄÈÎÎñ°²ÅÅÔÚͬһ¸öÏß³ÌÏ¡£
Direct grouping £ºÖ±½Ó·Ö×é Ö¸¶¨·Ö×é
ÓÉtupleµÄ·¢Éäµ¥ÔªÖ±½Ó¾ö¶¨tuple½«·¢É䏸ÄǸöbolt£¬Ò»°ãÇé¿öÏÂÊÇÓɽÓÊÕtupleµÄbolt¾ö¶¨½ÓÊÕÄĸöbolt·¢ÉäµÄTuple¡£ÕâÊÇÒ»ÖֱȽÏÌØ±ðµÄ·Ö×é·½·¨£¬ÓÃÕâÖÖ·Ö×éÒâζ×ÅÏûÏ¢µÄ·¢ËÍÕßÖ¸¶¨ÓÉÏûÏ¢½ÓÊÕÕßµÄÄĸötask´¦ÀíÕâ¸öÏûÏ¢¡£
Ö»Óб»ÉùÃ÷ΪDirect StreamµÄÏûÏ¢Á÷¿ÉÒÔÉùÃ÷ÕâÖÖ·Ö×é·½·¨¡£¶øÇÒÕâÖÖÏûÏ¢tuple±ØÐëʹÓÃemitDirect·½·¨À´·¢Éä¡£ÏûÏ¢´¦ÀíÕß¿ÉÒÔͨ¹ýTopologyContextÀ´»ñÈ¡´¦ÀíËüµÄÏûÏ¢µÄtaskid
(OutputCollector.emit·½·¨Ò²»á·µ»Øtaskid)¡£
ÁíÍ⣬Storm»¹ÌṩÁËÓû§×Ô¶¨ÒåStreaming Grouping½Ó¿Ú£¬Èç¹ûÉÏÊöStreaming
Grouping¶¼ÎÞ·¨Âú×ãʵ¼ÊÒµÎñÐèÇó£¬Ò²¿ÉÒÔ×Ô¼ºÊµÏÖ£¬Ö»ÐèҪʵÏÖbacktype.storm.grouping.CustomStreamGrouping½Ó¿Ú£¬¸Ã½Ó¿ÚÖØ¶¨ÒåÁËÈçÏ·½·¨£º
List chooseTasks(int taskId, List values)
ÉÏÃæ¼¸ÖÖStreaming GroupµÄÄÚÖÃʵÏÖÖУ¬×î³£ÓõÄÓ¦¸ÃÊÇShuffle Grouping¡¢Fields
Grouping¡¢Direct GroupingÕâÈýÖÖ£¬Ê¹ÓÃÆäËüµÄÒ²ÄÜÂú×ãÌØ¶¨µÄÓ¦ÓÃÐèÇó¡£
5¡¢¿É¿¿ÐÔ
(1)¡¢spoutµÄ¿É¿¿ÐÔ
spout»á¼Ç¼ËüËù·¢Éä³öÈ¥µÄtuple£¬µ±ÏÂÓÎÈÎÒâÒ»¸öbolt´¦Àíʧ°ÜʱspoutÄܹ»ÖØÐ·¢Éä¸Ãtuple¡£ÔÚspoutµÄnextTuple()·¢ËÍÒ»¸ötupleʱ£¬ÎªÊµÏÖ¿É¿¿ÏûÏ¢´¦ÀíÐèÒª¸øÃ¿¸öspout·¢³öµÄtuple´øÉÏΨһID£¬²¢½«¸ÃID×÷Ϊ²ÎÊý´«µÝ¸øSpoutOutputCollectorµÄemit()·½·¨£ºcollector.emit(new
Values("value1","value2"), tupleID);
ʵ¼ÊÉÏValues extends ArrayList<Object>
±£ÕϹý³ÌÖУ¬Ã¿¸öboltÿÊÕµ½Ò»¸ötuple£¬¶¼ÒªÏòÉÏÓÎÓ¦´ð»ò±¨´í£¬ÔÚtupleÊ÷ÉϵÄËùÓÐbolt¶¼È·ÈÏÓ¦´ð£¬spout²Å»áÒþʽµ÷ÓÃack()·½·¨±íÃ÷ÕâÌõÏûÏ¢£¨Ò»ÌõÍêÕûµÄÁ÷£©ÒѾ´¦ÀíÍê±Ï£¬½«»á¶Ô±àºÅIDµÄÏûÏ¢Ó¦´ðÈ·ÈÏ£»´¦Àí±¨´í¡¢³¬Ê±Ôò»áµ÷ÓÃfail()·½·¨¡£
(2)¡¢boltµÄ¿É¿¿ÐÔ
boltµÄ¿É¿¿ÏûÏ¢´¦Àí»úÖÆ°üº¬Á½¸ö²½Ö裺
a¡¢µ±·¢ÉäÑÜÉúµÄtuple£¬ÐèҪ궨¶ÁÈëµÄtuple
b¡¢µ±´¦ÀíÏûϢʱ£¬ÐèÒªÓ¦´ð»ò±¨´í
¿ÉÒÔͨ¹ýOutputCollectorÖÐemit()µÄÒ»¸öÖØÔØº¯Êý궨»òtuple£ºcollector.emit(tuple,
new Values(word)); ²¢ÇÒÐèÒªµ÷ÓÃÒ»´Îthis.collector.ack(tuple)Ó¦´ð¡£
6¡¢¸ßÐÔÄܲ¢ÐмÆËãÒýÇæStormºÍSpark±È½Ï
Spark»ùÓÚÕâÑùµÄÀíÄ°Ñ¼ÆËã¹ý³Ì´«µÝ¸øÊý¾ÝÒª±È°ÑÊý¾Ý´«µÝ¸ø¼ÆËã¹ý³ÌÒª¸ü¸»Ð§ÂÊ¡£Ã¿¸ö½Úµã´æ´¢£¨»ò»º´æ£©ËüµÄÊý¾Ý¼¯£¬È»ºóÈÎÎñ±»Ìá½»¸ø½Úµã¡£Ã¿´ÎÊäÈëÊÇÒ»´ÎÐÔ½«ËùÓÐÊý¾Ý·Ö²¿µ½¸÷»úÆ÷½Úµã¶ÁÈ룬ͨ¹ýÄÚ´æ¼ÆË㽫½á¹ûRDDÁÙʱ±£´æÄÚ´æÖС£Ò»´ÎÅÜÅú½«ËùÓеÄÈÎÎñ¸ù¾Ý¶èÐÔRDDµÄÇø±ðÀ´²ð½â²»ÏÖµÄstage£¬ÏÂÒ»¸öµÄstageµÄÊäÈëΪÉÏÒ»¸östageµÄÊä³ö¡£ÕâÒ»¹ý³ÌÈ«²¿¶¼ÊÇÔÚÄÚ´æÖÐÍê³É¡££¨ÄÚ´æ²»×ãÒ²¿ÉÒÔÓ²ÅÌ£©ËùÒÔÕâÊǰѹý³Ì´«µÝ¸øÊý¾Ý¡£ÕâºÍHadoop
map/reduce·Ç³£ÏàËÆ£¬³ýÁË»ý¼«Ê¹ÓÃÄÚ´æÀ´±ÜÃâI/O²Ù×÷£¬ÒÔʹµÃµü´úËã·¨£¨Ç°Ò»²½¼ÆËãÊä³öÊÇÏÂÒ»²½¼ÆËãµÄÊäÈ룩ÐÔÄܸü¸ß¡£
¶øStormµÄ¼Ü¹¹ºÍSpark½ØÈ»Ïà·´¡£StormÊÇÒ»¸ö·Ö²¼Ê½Á÷¼ÆËãÒýÇæ¡£Ã¿¸ö½ÚµãʵÏÖÒ»¸ö»ù±¾µÄ¼ÆËã¹ý³Ì£¬¶øÊý¾ÝÏîÔÚ»¥ÏàÁ¬½ÓµÄÍøÂç½ÚµãÖÐÁ÷½øÁ÷³ö¡£ºÍSparkÏà·´£¬Õâ¸öÊǰÑÊý¾Ý´«µÝ¸ø¹ý³Ì¡£StromÈÎÎñÌá½»ºó×é³ÉÒ»¸öTopology,»áÒ»Ö±²»¶ÏµÄÈ¡Êý¾Ý½øÐд¦Àí£¬Èç¹ûûÓÐÖ´ÐÐÍ£Ö¹ÃüÁÈÎÎñ²»»áÍ£Ö¹¡£¶øSpak¿ÉÒÔµ±³ÉÊÇÒ»´ÎÐԵģ¨spark
streaming²»ÊÇÒ»´ÎÐԵģ©ÈÎÎñ¡£Êý¾Ý´¦ÀíÍêºóÈÎÎñ¾Í½áÊø¡£
Á½¸ö¿ò¼Ü¶¼ÓÃÓÚ´¦Àí´óÁ¿Êý¾ÝµÄ²¢ÐмÆËã¡£
StormÔÚ¶¯Ì¬´¦Àí´óÁ¿Éú³ÉµÄ¡°Ð¡Êý¾Ý¿é¡±ÉÏÒª¸üºÃ£¨±ÈÈçÔÚTwitterÊý¾ÝÁ÷ÉÏʵʱ¼ÆËãһЩ»ã¾Û¹¦ÄÜ»ò·ÖÎö£©¡£
Spark¹¤×÷ÓÚÏÖÓеÄÊý¾ÝÈ«¼¯£¨ÈçHadoopÊý¾Ý£©ÒѾ±»µ¼ÈëSpark¼¯Èº£¬Spark»ùÓÚin-memory¹ÜÀí¿ÉÒÔ½øÐпìѶɨÃ裬²¢×îС»¯µü´úËã·¨µÄÈ«¾ÖI/O²Ù×÷¡£
²»¹ýSparkÁ÷Ä£¿é£¨Streaming Module£©µ¹ÊǺÍStormÏàÀàËÆ£¨¶¼ÊÇÁ÷¼ÆËãÒýÇæ£©£¬¾¡¹Ü²¢·ÇÍêȫһÑù¡£
SparkÁ÷Ä£¿éÏÈ»ã¾ÛÅúÁ¿Êý¾ÝÈ»ºó½øÐÐÊý¾Ý¿é·Ö·¢£¨ÊÓ×÷²»¿É±äÊý¾Ý½øÐд¦Àí£©£¬¶øStormÊÇÖ»Òª½ÓÊÕµ½Êý¾Ý¾Íʵʱ´¦Àí²¢·Ö·¢¡£
²»È·¶¨ÄÄÖÖ·½Ê½ÔÚÊý¾ÝÍÌÍÂÁ¿ÉÏÒª¾ßÓÅÊÆ£¬²»¹ýStorm¼ÆËãʱ¼äÑÓ³ÙҪС¡£
×ܽáÏ£¬SparkºÍStormÉè¼ÆÏà·´£¬¶øSpark Steaming²ÅºÍStormÀàËÆ£¬Ç°ÕßÓÐÊý¾Ýƽ»¬´°¿Ú£¨sliding
window£©£¬¶øºóÕßÐèÒª×Ô¼ºÈ¥Î¬»¤Õâ¸ö´°¿Ú¡£
|