ÔÚµÚ¶þ´ÎÉϺ£´óÊý¾ÝÁ÷´¦Àí¾Û»áÉÏ£¬À´×Ô Intel£¬´óÖÚµãÆÀÓë Cloudera µÄ´óÊý¾Ý¹¤³ÌʦÓë´ó¼Ò·ÖÏíʱÏ´óÊý¾ÝÁ÷´¦Àí×î»ðÈȵϰÌâ¡£
ÕÅÌìÂ×£ºStorm over Gearpump
Intel´óÊý¾Ý¹¤³ÌʦÕÅÌìÂ×
Ñݽ²¿ªÊ¼Ç°£¬ÕÅÌìÂ×ͨ¹ý½éÉÜ×Ô¼ºµÄ¹¤×÷±í´ïÁ˶ÔstreamingµÄÐËȤ£¬²¢Ïò´ó¼ÒÍÆ¼ö×Ô¼ºËѼ¯ÔÚGitHubÉϵÄprojects£¬Ï£ÍûÓиü¶àÓÐÐËȤµÄÈËÒ»Æð½»Á÷¡£
Gearpump - Distributed Real-time Streaming Engine
Storm over Gearpump,¼´ÔÚGearpumpÉÏÌṩһ¸öStormµÄ͸Ã÷µÄ¼æÈݲ㣬Óû§¿ÉÒÔ²»¸ÄÒ»ÐдúÂ룬²»ÓÃÖØÐ¶¨ÒåËüµÄjar°ü,¾Í¿ÉÒÔ°ÑStorm ÔËÐе½GearpumpÉÏ¡£Storm ÊÇÒµ½çʹÓÃ×î¹ã·ºµÄÁ÷´¦ÀíÒýÇæ£¬µ«Ò²±©Â¶³öÁ˲»ÉÙ¾ÖÏÞÐÔ¡£ÕâЩ¾ÖÏÞÐÔÔÚ Intel ×îпªÔ´µÄÁ÷´¦Àíϵͳ Gearpump Öж¼µÃµ½ÁËÁ¼ºÃµÄ½â¾ö¡£ÎªÁËÈùã´ó Storm Óû§Áã³É±¾µØÌåÑéµ½ Gearpump µÄÓÅÁ¼ÌØÐÔ£¬Gearpump ʵÏÖÁË¶Ô Storm µÄ͸Ã÷¼æÈÝ£¬¼´Óû§ÎÞÐèÐ޸ĴúÂë£¬ÖØÐ±àÒ룬¾Í¿ÉÒÔÖ±½Ó½«¶þ½øÖưüÔËÐÐÔÚ Gearpump ÉÏ¡£
ͼ1»ùÓÚAkka/Actor Ä£Ð͵ÄGearpump²ã¼¶Í¼
Gearpump »ùÓÚ Akka ºÍ Actor Ä£Ð͹¹½¨ÁËÒ»¸ö¸ß¿É¿¿¸ßÐÔÄܵÄʵʱÁ÷´¦Àíϵͳ£¬Í¼1ΪGearpumpµÄÒ»¸öcluster£¬ËüÓÐÒ»¸ömasterºÍ¶à¸öworker£¬Ã¿¸öworker»á¹ÜÀí¸÷¸ö¼¯ÈºÉϸ÷¸ö½ÚµãµÄ×ÊÔ´¡£²ã¼¶»¯µÄºÃ´¦ÊÇϵͳһ²¿·Ö³öÎÊÌâʱ²»»áÓ°Ïìµ½ÆäËü²¿·Ö£¬ÔÚGearpumpÀһ¸ömaster¿ÉÒÔ¹ÜÀí¶à¸öworker,¶à¸öÓ¦ÓÃÖ®¼äÊÇÏ໥¸ôÀëµÄ£¬Ã¿¸öÓ¦ÓûáÓÐÒ»¸ö¶ÔÓ¦µÄappmaster£¬Ã¿¸öappmasterÏòmasterÉêÇë×ÊÔ´ºó¿ÉÒÔÔÚworkerÉϲ¿Êðexecuter,Ï൱ÓÚÒ»¸öJVM£¬¾ßÌåµÄÖ´Ðе¥ÔªÊÇTask£¬Ò»¸öTask¾ÍÊÇÒ»¸öActor¡£GearpumpµÄDynamic DAGÊÇ¿ÉÒÔÔÚÏßÐ޸ĵ쬶øÇÒ¼ÆËã¿ì£¬ÑÓʱµÍ¡£
Gearpump Updates
ͼ2¹ØÓÚ0.7.0°æ±¾µÄ²âÊÔ
ΪʲôҪ×öGearpumpÔÚStormÉϵļæÈÝÐÔÄØ£¿
StormÊÇ×îΪ¹ã·ºµÄÁ÷´¦Àíϵͳ£¬ÔÚʹÓùý³ÌÖÐÒ²·¢ÏÖÁËËüµÄһЩ¾ÖÏÞÐÔ£¬GearpumpµÄÉè¼ÆÖ®³õ¾ÍÊÇΪÁ˿˷þStormµÄ¾ÖÏÞÐÔ£¬Í¬Ê±Ï£Íû¹ã´óµÄStormÓû§¿ÉÒÔÎÞ´ú¼ÛµÄÏíÊܵ½GearpumpÕâЩÓÅÁ¼µÄÌØÐÔ£¬ËùÒÔҪʵÏÖÔÚGearpumpÉÏÄܹ»Í¸Ã÷µÄÖ§³ÖStormµÄÓ¦Óá£
Storm over Gearpump ¨C Features
GearpumpÏÖÔÚÖ§³Ö0.9°æ±¾µÄStorm£¬Ö§³Ömulti-lang£¬¾ÍÊÇStormÖ§³ÖһЩPython/Ruby/NodeµÄ½Å±¾£¬Ö§³ÖStormÀïDRPCµÄ¹¦ÄÜ£¬Ò²Ö§³ÖKafkaSpout / KafkaBolt£¬TridentÕⲿ·ÖµÄ¹¤×÷Ŀǰ»¹ÔÚ½øÐе±ÖС£
Similarities of Gearpump and Storm
Ê×ÏÈGearpumpºÍStorm¶¼ÊǶԵ¥ÌõÊý¾Ýµ¥¸öÏûÏ¢½øÐд¦Àí£¬Æä´ÎËüÃǵÄÓû§½Ó¿Ú¶¼ÊÇÏàËÆµÄ£¬StormÊÇͨ¹ýTopolgy£¬GearpumpÊÇͨ¹ýDAG¡£ÁíÍâTask½Ó¿ÚÒ²ÊÇÏàËÆµÄ¡£
ͼ3 StromºÍGearpumpµÄTask½Ó¿Úͼ
ͼ3×ó±ßΪStormµÄTask½Ó¿Ú£¬StormÀïÃæÓÐÁ½ÖÖ½ÇÉ«£¬SpoutºÍBolt£¬¶ÔÓÚSpoutµÄÉúÃüÖÜÆÚ»áÓÐÒ»¸ö³õʼµÄOpen·½Ê½£¬È»ºó²»Í£µÄÑ»·µ÷ÓÃnextToupleÏòÏÂÓη¢ÏûÏ¢£¬¶ÔÓÚBoltÀ´Ëµ¿ªÊ¼»áÓÐÒ»¸öprepare½×¶Î£¬Ã¿ÊÕµ½Ò»¸öÏûÏ¢Ëü»áµ÷ÓÃexecute¡£ÓÒ±ßÊÇGearpumpµÄTask½Ó¿Ú£¬Ò»¿ªÊ¼»áÓÐÒ»¸öonStart½×¶Î£¬Ã¿ÊÕµ½Ò»ÌõÏûÏ¢»áµ÷ÓÃonNext¡£ÕâÁ½ÕßÊǷdz£ÏàËÆµÄ£¬µ«ÊÇStromµÄTask»áÕ¼Óõ½Ò»¸öỊ̈߳¬GearpumpµÄTaskÊÇÒ»¸öActor£¬ÊDZÈÏ̸߳üСµÄÒ»¸öÖ´Ðе¥Ôª¡£
Storm over Gearpump ¨C Overview
ͼ4ʵÏָſöͼ
ͼ4Éϰ벿·Ö×óÓÒÁ½±ßµÄʵÏÖ¹ý³ÌºÜÏàËÆ¡£Storm NimbusÓëClientÊÇͨ¹ýÐÒé½øÐн»»¥µÄ£¬ÎÒÃÇ¿ÉÒÔÁé»îµÄÔÚGearpumpÀïʵÏÖNimbusÀ´×ñÊØStormµÄÐÒ飬¾ßÌå×ö·¨ÊÇÔÚStormClientÌá½»ÇëÇóʱÔÚ±¾µØÆðÒ»¸öGearpumpµÄNimbus£¬ÔÚGearpumpµÄNimbusÀï°ÑStorm Topology·Òë³ÉGearpump DAG£¬ÔÙ°ÑGearpump DAGÌá½»¸øGearpump master£¬ÕâÑù£¬Ò»¸öStormÓ¦ÓþͿÉÒÔÔËÐÐÔÚGearpump clusterÉÏÃæÁË¡£
ͼ5 DAG·Òë¹ý³Ì
ͼ5ÖУ¬Spout·Òë³ÉGearpumpµÄÂß¼½ÚµãProcessor£¬ËùÓеÄGrouperÇ£Òýµ½GearpumpÀïµÄpartitioner£¬Õû¸öͼʵÏÖÊÇÒ»Ò»¶ÔÓ¦µÄ¡£
ͼ6 ¾ßÌåTaskÖ´ÐÐͼ
ͼ6ÖÐÔÚGearpumpÀïÔËÐÐStormµÄSpoutºÍBolt£¬²¢ÇҰѶþÕßµÄÉúÃüÖÜÆÚ½áºÏÔÚÒ»Æð¡£´«ÊäSpoutÐèÒªGearpumpµÄ´«Êä²ã£¬ËùÒÔÿ´Îµ÷ÓÃonStartʱ£¬»á°ÑGearpumpµÄcollector×¢²áµ½SpoutÀͨ¹ýÕâÌõͨµÀ»á°ÑStormµÄÏûÏ¢´«µ½GearpumpÀïÃæ¡£
Storm over Gearpump - Flow Control
ͼ7 StromºÍGearpumpµÄÁ÷¿ØÖÆ
StormÊÇͨ¹ýAckerʵÏÖÁ÷¿ØÖÆ£¬µ±ÓÐÒ»¶¨ÊýÁ¿µÄÏûϢûÓб»Ackºó£¬StormÔÚSpout¶Ë¾Í²»»áÏòÏÂÓη¢ÏûÏ¢ÁË£¬ÔÚGearpumpÀïÃæ£¬Ã¿¸ôÒ»¶¨µÄÏûÏ¢ÊýÁ¿ÉÏÓÎTask»áÏòÏÂÓÎTask·¢Ò»¸öAckRequest£¬ÏÂÓÎÊÕµ½AckRequestºó»á¸øÉÏÓλØÒ»¸öAck£¬ÕâʱÔÚÒ»¸öTaskÀïÎÒÃÇά»¤ÁËÒ»¸ö»¬¶¯´°µÄ¸ÅÄͨ¹ýÕâÑùÒ»ÖÖ»úÖÆ£¬ÏÂÓεÄѹÁ¦»áÒ»²ã²ãµÄÏòÉÏÓδ«²¥£¬Ö±µ½´«µ½SpoutÕâÒ»²ã£¬Spout»áÍ£Ö¹»òÕß¼õÂýÏòÏÂÓη¢ÏûÏ¢¡£
Storm over Gearpump - At Least Once
ͼ8 At least onceÔÀíͼ
Stormm»á°ÑKafkaµÄOffset´æµ½ZookeeperÀÈç¹û³ö´íºó£¬Ëü¿ÉÒÔÖØÐ´ÓZookeeperÀïÁ˽⵽ÏûÏ¢·¢µ½ÄÄÀïÁË£¬ÔÚGearpumpÀïÒ²Ö§³ÖKafkaµÄAt least onceµÄÓïÒ⣬µ«ÊǺÍStrom×ö·¨²»Ò»Ñù£¬Í¼8ÖУ¬2ÖпªÊ¼·¢ÏûϢʱÊÇ´¦ÓÚpendingµÄλÖã¬3¡¢4·¢ÏûϢʱֱ½ÓAck£¬ÓÉÓÚ2ÖÐûÓÐAck£¬´ËʱStormµÄKafkaÀïOffsetûÓиüУ¬¼´Ê¹³öÎÊÌâʱÒÀÈ»»ØÀ´´Ó2¿ªÊ¼£¬ÕâÊÇΪÁ˱ÜÃâStormµÄSpoutÖÐÀÛ»ý´óÁ¿ÏûÏ¢¡£ËùÓÐÏûÏ¢ÔÚGearpumpÉ϶¼»á´òÉÏϵͳµÄʱ¼äÓ¡¼Ç£¬µ±ÏÂÓεÄTaskÊÕµ½ÏûÏ¢ºó»á»ã±¨¸øClock Service£¬Clock Service»áά»¤È«¾ÖµÄ×îСµÄAckʱ¼ä¡£
ÐÔÄÜ
ͼ9ÍÌÍÂÁ¿ÐÔÄÜͼ
ʵÏÖÕâÑùµÄ¼æÈݲãµÄÐÔÄÜÔõôÑùÄØ£¿ÎÒÃÇÔÚËĸö½ÚµãÉÏ×öÁ˲âÊÔ£¬ÓõÄÊÇ storm-benchmark ÖÐµÄ SOL£¬Õâ¸öÀý×ÓûÓÐÈκÎÓû§Âß¼£¬´¿´â²âÊÔµÄÊÇϵͳ¿ò¼Ü±¾ÉíµÄÐÔÄÜ£¬ÓÐ48¸öSpoutsºÍBolts£¬16¸öworkers¡£Í¼9ÖпÉÒÔ¿´³öGearpumpµÄÐÔÄÜÁ¼ºÃ¡£
ÏÂÒ»²½¹¤×÷
Ôö¼ÓÔÚ½çÃæÉÏÖ±½ÓÌá½»Storm Job£¬Ôö¼ÓStorm 0.10µÄÖ§³Ö£¬Ôö¼ÓAt least once¶Ô¸ü¶àSpoutsµÄÖ§³Ö£¬Ôö¼ÓTridentµÄÖ§³Ö¡£
Íõдº£º Storm ¼ÆËãÆ½Ì¨ÔÚ´óÖÚµãÆÀµÄʵ¼ù
´óÖÚµãÆÀÊý¾ÝÖÐÐÄ»ù´¡¼Ü¹¹×é¼Ü¹¹Ê¦Íõдº
µãÆÀµÄʵʱӦÓó¡¾°
Á÷Á¿¡¢½»Ò×Ïà¹ØµÄ Dashboard¡£°üÀ¨UV£¨Ã¿Ìì¶ÀÁ¢Óû§·ÃÎÊÊý£©Ïà¹ØµÄ¸÷¸öƽ̨£ºÖ÷APP£¨Android/iPhone/iPad£©¡¢ÍÅAPP¡¢Öܱ߿ì²é¡¢PC¡¢MÕ¾£»Ð¼¤»îÓû§Êý£»·Ö²ã´Î·ÖÆ·ÀàµÄʵʱ½»Ò×¶î¡£
¸öÐÔ»¯ËÑË÷ÓëÍÆ¼ö¡£Óû§ÔÚµãÆÀµÄÿһ²½ÓмÛÖµµÄ²Ù×÷£¨°üÀ¨£ºËÑË÷¡¢µã»÷¡¢ä¯ÀÀ¡¢¹ºÂò¡¢Êղصȣ©£¬¶¼½«ÊµÊ±¡¢ÖÇÄܵÄÓ°ÏìËÑË÷ÅÅÐò£¬´Ó¶øÏÔÖøÌáÉýÓû§ËÑË÷ÌåÑé¡¢ËÑË÷ת»¯ÂÊ¡£
¹ã¸æµÄµã»÷ÂÊ£¬¹ã¸æµÄ·´×÷±×¡¢ÊµÊ±¼Æ·Ñ£¬¼°·´ÅÀ³æ·´×÷±×µÄÒµÎñ°²È«¡£
µãÆÀµÄʵʱƽ̨
»ù±¾¼Ü¹¹
ͼ1»ù±¾¼Ü¹¹
µãÆÀµÄʵʱƽ̨¼Ü¹¹Ö÷Òª·Ö¼¸¿é£¬Ê×ÏÈÊÇÊý¾ÝÔ´£¬Êý¾ÝµÄÊäÈë°üÀ¨PCºÍAPPÉÏ´òµãµÄÊý¾Ý£¬´òµãÊý¾ÝÊÇÖ¸Óû§µÄä¯ÀÀÊý¾Ý£¬BlackholeÖ÷ÒªÊÇÖ§³ÖÈÕÖ¾ÀàµÄ£¬PUMAÖ÷ÒªÊÇ»ñÈ¡MySQLÏßÉÏÊý¾Ý¿âµÄ£¬SwallowÖ÷ÒªÊÇMQϵͳ¡£ÄÇôÔÚStormÉÏÄÜÄõ½ÄÄЩÊý¾ÝÄØ£¿¼¸ºõËùÓеãÆÀÏßÉϲúÉúµÄÊý¾Ý¶¼¿ÉÒÔÃë¼¶ÄÚÄõ½£¬·â×°¶ÔÓ¦µÄÊý¾ÝÊäÈëÔ´Spout£»Í¨¹ýBlackholeÖ§³ÖÈÕÖ¾Ààʵʱ»ñÈ¡£¬´òµãÈÕÖ¾/ÒµÎñLog/NginxÈÕÖ¾µÈ£»ÕûºÏPuma Client£¨MySQL Binlog£©£ºµÚһʱ¼ä»ñÈ¡Êý¾Ý¿âÊý¾Ý±ä¸ü£»ÕûºÏSwallow£¨MQ£©£º»ñȡӦÓÃÏûÏ¢ÕûºÏPigeon£¨RPC ¿ò¼Ü£©£ºÖ§³Öµ÷ÓõÚÈý·½ÒµÎñ¡£ÆäËûÒµÎñÈçºÎ»ñÈ¡Storm¼ÆËãµÄÊý¾ÝÊä³öÄØ£¿ÊµÊ±¼ÆËãºÍµÚÈý·½ÒµÎñ½âñͨ¹ýdata-service·þÎñ£¬Êý¾Ý¿ÉÒÔÊä³öRedis/HBase/MySQLµÈ´æ´¢ÖУ»Í¬Ñù£¬µÚÈý·½ÒµÎñͨ¹ýdata-service·þÎñ£¬»ñÈ¡Storm¼ÆËãµÄ³Ö¾Ã»¯Êý¾Ý¡£
ͼ2 Blackhole¼Ü¹¹
ÿ̨ÏßÉϵÄÓ¦Ó÷þÎñÆ÷¶¼ÊÇÓÐAgent²¿ÊðµÄ£¬ÓëKafka²»Ì«Ò»ÑùµÄÊÇSupervisor×öËùÓÐÐÅÏ¢µÄ¿ØÖÆ£¬´«ÊäÊý¾Ýʱͨ¹ýBroker£¬ÔÚBlackholeÉÏÒ»ÊÇ×öʵʱÊý¾ÝÏû·Ñ£¬¶þÊÇÀÈ¡µ½ÏßÉϺܶàÓ¦ÓÃÈÕÖ¾Ö±½Ó·Åµ½HDFSÀï¡£
¼¯Èº¼à¿Ø
ͼ3ij¸ö¼¯ÈºµÄ»ù±¾×´Ì¬
ÓÃgangliaËѼ¯Ò»Ð©Ö÷»ú²ã´ÎµÄÊý¾Ý£¬Ò²¾ÍÊǼ¯ÈºµÄ»ù±¾µÄ״̬£¬°üÀ¨Load¡¢ÄÚ´æ¡¢cpu¡¢ÍøÂçµÄ×ÜÌåÇé¿ö£¬ÕâÑùÔÚÅŲéÎÊÌâʱ¾Í¿ÉÒÔ¿´µ½Ö÷»ú²ã´ÎµÄ»ù±¾ÐÅÏ¢ÓÐûÓÐÒì³£¡£
ͼ4ij¸öworker»ù±¾ÐÅÏ¢
¾ßÌåµ½Ò»¸öworkerʱ£¬ÊǰÑͳһµÄÐÅÏ¢´òµ½µãÆÀÄÚ²¿µÄ¼à¿ØÏµÍ³CATÀïÃæ£¬ºÜ¶àÓ¦ÓÃÀàµÄ¼à¿ØÒ²»áÒÀÀµCAT¡£
ÒµÎñ¼à¿Ø
ͼ5ÕûÌåµÄTopologyά¶ÈµÄÏàÓ¦Êý¾Ý
°ÑËùÓеÄTopologyͨ¹ýNimbus APIºÍMetric API°ÑÊý¾Ý×¥³öÀ´£¬Êä³öµ½µãÆÀµÄCATÀïÃæ¡£¿ÉÒÔ¶ÔÿÕÅͼÅ䱨¾¯Öµ£¬ËüµÄµ±Ç°ÖµºÍ¼«ÏÞÖµ£¬±ä»¯ÂÊ£¬Á¬Ðø¼¸·ÖÖӵĵøÂä¶àÉÙ£¬×îСֵ×î´óÖµ£¬ÕâЩ¿ÉÒÔÅäÏàÓ¦µÄ±¨¾¯¹æÔò£¬³öÎÊÌâʱ¾Í»áÊÕµ½±¨¾¯£¬Ïñ΢ÐÅ¡¢Óʼþ¡¢¶ÌÐŵȡ£
ͼ6ÒµÎñ×Ô¼º´òµãÏàÓ¦Êý¾Ý
ÎÒÃÇ¿ÉÒÔ°Ñ×Ô¼º¹ØÐĵÄÒµÎñÊý¾Ý´òµ½CATÀïÃæÈ¥¿´£¬SpoutÀïBlackholeÏû·ÑµÄTopicÊý¾Ý£¬Ê§°ÜÊýÁ¿£¬TPSµÈ£¬ÕâÑùÒ²¿ÉÒÔÉèÖÃһЩ±¨¾¯¹æÔò¡£
µãÆÀʹÓà Storm ÒÔÀ´µÄ¾Ñé½ÌѵÓë½â¾ö·½°¸
1¡¢Ä³¸öWorker³ÔµôÁ˼¸ºõËùÓеÄCPU£¬ÆäËûTopologyÒ²ÔâÑ꣺TopologyÒ»¹²¾Í2¸öWorker²¢ÇÒBoltÀïÃæ×Ô¼ºÆô¶¯ÁË200¸öỊ̈߳¬½âÎöjson¡£
½â¾ö·½°¸£ºCGroupÏÞÖÆµ¥¸öWorkerµÄ×ÊÔ´¡£
2¡¢StormÎÞȨÏÞ¹ÜÀí£¬ÊÕµ½Ò»¶Ñ±¨¾¯£¬Owner²»ÖªµÀÊÇË¡£
3¡¢TopologyÌá½»µ½Storm£¬È´Æô¶¯²»ÆðÀ´£ºCause£ºfree slot < worker num¡£
½â¾ö·½°¸£º
import backtype.storm.nimbus.ITopologyValidator; public class DPTopologyValidator implements ITopologyValidator { //Topology NameºÏ·¨ÐÔ //Worker¡¢ExecutorÊýÁ¿ºÏÀíÐÔ //free slotÊýÁ¿±ØÐë´óÓÚµ¥Supervisor½ÚµãÊýÁ¿£¬±£Ö¤¼¯ÈºÕûÌå¿É¿¿ÐÔ } |
4¡¢Zookeeper´ÅÅ̱»Ë¢±¬ÁË£ºCasue£ºStorm¼¯ÈºÉϵÄTopology Task´óÁ¿ÐÄÌøÐÅÏ¢£¬zk²úÉúµÄÈÕÖ¾> 20G/H¡£
½â¾ö·½°¸£º
¼õÉÙZookeeperÈÕÖ¾±£´æµÄÊýÁ¿£»
¿ØÖƵ¥¼¯Èº¹æÄ££»
task.heartbeat.frequency.secs ĬÈÏ3s£¬Êʵ±Ôö´ó¡£
5¡¢Namenode±»Topology DDoSÁË
Cause£º
Hadoop¼¯Èº¿ªÆôÁËSecurity£»
Á÷Á¿ÈÕÖ¾ÒÀÀµstorm-hdfsдHDFS£¬ÒµÎñÖØ¹¹Âß¼£¬writeʧ°Üºó·´¸´ÖØÊÔ£¬Namenode £»RPC ³¬¹ý8000QPS¡¢¸ºÔعý¸ß£¬ËùÓÐÀëÏßJob¶¼Êܵ½Ó°Ïì¡£
½â¾ö·½°¸£º
ÌṩͳһµÄдHDFSµÄ·þÎñ£¬Ö»ÐèÒª°ÑÐèҪдÈëµÄÊý¾Ý·¢Ë͸øblackhole¡£
6¡¢Á÷Á¿ÉÏÀ´£¬Worker OOMÁË
Cause£ºStormĿǰÎÞbackpress»úÖÆ£¨JStorm 2.1.0ÐÂÔö£©
½â¾ö·½°¸£º
¿ªÆôACK£»
ÉèÖÃtopology.max.spout.pending¡£
ÎÞ²»ÒýÆðÁËС»ï°éÃǵļ«´ó¹²Ãù£¬ÕâÕæÊÇ ¡°ÄÇЩÄ꣬ÎÒÃÇÒ»Æð²È¹ýµÄ Storm ¿Ó°¡¡°¡£
´ËÍ⣬Ëû»¹·ÖÏíÁËһЩ Storm Topology ÓÅ»¯µÄС¼¼ÇÉ£¬¿ÉνÊǸɻõÊ®×ã¡£
ºóÐø¹æ»®
WorkerÈÕ־ͳһÊÕ¼¯ºÍÕ¹ÏÖ£¨doing£©£ºÏÖÓÐlog²é¿´±È½Ï²»·½±ã£¬Topology¡¢Worker¡¢Package¡¢Class¡¢LevelµÈ¶àά¶ÈͳһչÏÖ¡£
¹ÜÀíÆ½Ì¨¼¯³É¸ü¶à¼à¿ØÊý¾Ý£ºÖ§³ÖϸÁ£¶ÈµÄtracking£¬Storm/JStormͬʱ֧³Ö¡£
Storm on Docker £ºÔöÇ¿¸ôÀëÐÔ£¬Topology»·¾³¿ÉÒÔ¶ÀÁ¢¿ª¡£
³ÌºÆ£ºStreamingSQL on Spark
Intel ´óÊý¾Ý¹¤³Ìʦ³ÌºÆ
Ñݽ²Ç°£¬³ÌºÆ¼òÒª½éÉÜÁËËûÃÇÍŶÓÊÇ Spark¿ªÔ´ÉçÇøµÄ»îÔ¾¿ª·¢Õߣ¬Ëû´øÀ´µÄÊÇ Intel ¿ªÔ´ÏîÄ¿ StreamingSQL µÄµÚÒ»ÊÖ×ÊÁÏ¡£
ΪʲôÐèÒªStreamingSQL£¿
´ÓÓû§Ê¹ÓõIJãÃæÉϽ²£¬ÈçºÎΪÓû§Ìṩһ¸ö·Ç³£¼òÒײÙ×÷µÄÒ»Ì×·â×°£¬Èôó¼Ò¸üºÃµÄÈ¥²Ù×÷Streaming£¬¼ò»¯ÎÒÃǵIJÙ×÷¡£StreamingSQL µÄ³öÏÖʹµÃ¿ª·¢Õß¿ÉÒÔͨ¹ý SQL Î޷켯³ÉÁ÷´¦ÀíºÍÅú´¦ÀíÁ½ÖÖÔËË㣬´ó´ó½µµÍÁË¿ª·¢ºÍά»¤³É±¾¡£³ÌºÆ´Ó Spark Streaming ºÍ Spark SQL µÄ»ù±¾ÔÀí½²Æð£¬SparkStreaming¾ÍÊǰÑÖ¸¶¨µÄʱ¼äƬ¶ÎÄÚµÄÅúÁ¿µÄÊý¾Ý×éÖ¯³ÉÒ»¸öÃÔÄãµÄBatches£¬È»ºó·â×°³ÉÒ»¸öRDD£¬¿ÉÒÔ¶ÔRDD½øÐи÷ÖÖת»»²Ù×÷£¬×îºó°ÑСµÄÊý¾ÝÌá½»¸øSparkÖ´ÐÐÒýÇæÈ¥Ö´ÐУ¬ËüÊÇÒ»¸ö²»¶ÏµÄÏòǰµÄµü´úµÄ¹ý³Ì¡£
ͼ1StreamµÄwindow¸ÅÄî
ÿËѼ¯Ò»¸öµ¥Î»Ê±¼äÄÚµÄÊý¾Ý´ú±íÒ»¸öС¸ñ×Ó£¬ÎÒÃÇ¿ÉÒÔÓÐÈô¸É¸öµ¥Î»Ê±¼ä×é³ÉÒ»¸öwindow£¬ÎÒÃÇÏ£Íû¶ÔwindowÄÚµÄÊý¾Ý½øÐвÙ×÷£¬ÓÉÓÚÊÇÁ÷ʽ´¦Àí£¬Õû¸öwindow¿ÉÒÔ²»Í£µÄÏòÇ°ÒÆ£¬Í¼1Öж¨ÒåÒ»¸öwindow³¤¶È30S£¬Ã¿¸ô20SÒªÖ´ÐÐÒ»´ÎÏà¹ØµÄ²Ù×÷¡£
StreamingSQL µÄÉè¼ÆË¼Ïë
ͼ2 SparkSQLµÄÓÅ»¯£¨execution pipeline£©
ͼ2±íÏÖ³öÁ½¸öÒâ˼¡£µÚÒ»£¬ASTºÍDataFrame×÷Ϊǰ¶ËµÄÊäÈ룬ֻÐè¸æËßËüÎÒÃÇÒª¸Éʲô£¬ÖÁÓÚÔõôȥ×öÊÇÒýÇæ×Ô¼ºµÄÊÂÇ飬±ÈÈç˵½øÐÐÓïÒâ·ÖÎö£¬¶ÔÆä½øÐÐÓÅ»¯£¬ÓÃÒ»¸ö´ú¼Û×îµÍµÄ·½Ê½°ïÎÒÃÇÈ¥Íê³É¡£µÚ¶þ£¬Spark SQL×îºóµÄÖ´ÐÐÊǽ»¸øSpark¼¯ÈºÀ´×öµÄ£¬Õû¸öµÄÖ´ÐÐÒýÇæ×îºóµÄÊä³öÊÇRDD¡£
ͼ3 Spark SQL Spark Plan & RDD
ͼ3ΪSparkµÄÒ»¸öÎïÀí¼Æ»®£¬×ó²àÊ÷״ͼ°ÑËüÀí½âΪÁ½¸ö±í£¬ÓÒ±ßΪORC£¬×ó±ßΪParquet£¬¶ÔORC½øÐйýÂ˺óÁ½¸ö±íµÄÊý¾Ý½øÐÐJoin£¬È»ºó½øÐÐͶӰ£¬ÔÙ°ÑÊý¾Ýд»ØÎļþ¡£¶ÔӦͼ×óµÄSparkplanµÄRDDÈçͼÓҲ࣬ËüÓÐÒ»¸ö·Ç³£¼òµ¥µÄ¶ÔÓ¦¹ØÏµ¡£
ÔõÑùʵÏÖSpark Streaming SQL
ÈçºÎ³ÐÓÃSpark SQLºÍSteraming SQLµÄ×齨£¬¼´ÈçºÎʹÓÃÏÖÓеĴúÂëÒÔ¼°ÈçºÎ¼Ì³ÐÏÖÓеŦÄÜ
ͼ4 The Key Classes
ͼ4ÖеÚÒ»¸öΪWindowedPhysicalPlan£¬ËüÊÇSparkPlanµÄÒ»¸ö×ÓÀ࣬µ«windowDurationºÍslideDurationÊÇÓÃÀ´ÃèÊöWindowµÄ£¬Õâ¾Í°ÑSparkºÍStreamingµÄ¸ÅÄî½áºÏÆðÀ´ÁË£¬executeµÄ·½·¨×îºóÒª·µ»ØÒ»¸öRDD£¬ÔÚÕû¸öµÄPlanning Stage,Èç¹ûÏë°ÑSQLµ±ÖеĸÅÄîºÍÁ÷µÄ¸ÅÄî½áºÏÆðÀ´£¬¾ÍÊÇͨ¹ýWindowPhysicalPlanÕâ¸ö²Ù×÷Ëã×ÓÁ¬½ÓÆðÀ´µÄ¡£Spark StreamingÒª²Ù×÷µÄÊÇDStream£¬µÚ¶þ¸öSchemaDStream·â×°ÁËÒ»¸östreamSQLContext£¬computeµÄ·½·¨×îºóÒ²Òª·µ»ØÒ»¸öRDD¡£
ͼ5 Ìæ»»ºóµÄSpark SQL Spark Plan & RDD
¼ÙÉèͼ5ÖÐ×ó±ßµÄ±íµÄParquetÌæ»»³ÉStreamingµÄDatasource£¬¼òµ¥°ÑÎïÀí¼Æ»®µÄ½Úµã»»³ÉWindowPhysicalPlanºó£¬»áÉú³ÉÒ»¸öRDD£¬°ïÖúÎÒÃǰÑStreamingºÍSQL½áºÏÆðÀ´£¬È»ºó°ÑÓÒ±ßÉú³ÉµÄRDDµÄDAG°ü×°³ÉSchemaDStream£¬·â×°ÆðÀ´£¬ÓÚÊÇÎÒÃǼȿÉÒÔͨ¹ý×ó±ßµÄ·½Ê½´ïµ½SQLµÄת»»±ä³ÉÒ»¸öRDD£¬Í¨¹ýÓұߵķ½Ê½°ÑÄõ½µÄRDD±ä³ÉDStreamÌá½»¸øSpark StreamingÈ¥¹¤×÷£¬Õâ¾ÍÍê³ÉÁËSpark Streaming SQLµÄ¹ý³Ì¡£
ÈçºÎÈ¥´´ÔìÒ»¸ö»ùÓÚÁ÷ʽµÄDatasource£¿
Spark Streaming SQLÍêÈ«×ßµÄÊÇSpark SQL±ê×¼µÄDatasourceµÄ½Ó¿ÚȥʵÏÖ£¬Ö»²»¹ýÒªÌØ±ð½¨Á¢Ò»¸ökafkaµÄdatasource¡£
ÈçºÎÔÚSQLÖж¨ÒåWindow
ÎÒÃǶÔSQL×öÁËÒ»µãµãÀ©Õ¹£¬¾ÍÊÇover£¬¹ØÓÚʱ¼äÐòÁе͍Ò塣ȥµôoverºóÓëSQLһģһÑù¡£
ͼ6 Spark Streaming V.S. Streaming SQL
Á½Õ߶Աȷ¢ÏÖ£¬Spark StreamingµÄ¿É¶ÁÐԽϲ¶øStreaming SQLһĿÁËÈ»¡£
¸ü¶àºÃ´¦
ͼ7 Spark Streaming SQL½á¹¹Í¼
ͼ7ÖеײãSpark Core£¬ÉÏÃæµÄDataFramesÅԱ߲åÈëStreaming SQL,ËüÊÇ»ùÓÚSpark SQLºÍSpark StreamingÀ´Íê³ÉÕâÑùµÄ×齨£¬ÍêÈ«¼æÈÝSpark SQL,Èç¹ûSQL»òStreamingÓÐʲô¸Ä½øÊ±£¬»á×Ô¶¯»ñµÃÕâЩºÃ´¦£¬²»ÐèÒª×ö¶îÍâµÄÐ޸쬶øÇÒÁ÷ʽÊý¾Ý¿ÉÒԺ;²Ì¬±í½øÐн»»¥²Ù×÷£¬´úÂë¼òµ¥¡£
ͼ8 Spark Streaming SQLʹÓð¸Àý
×îºó·ÖÏíÁËʹÓà StreamingSQL ´ÓÉãÏñÍ·Êý¾ÝÖÐʵʱ·ÖÎöºÍÔ¤´¦Àí£¬ÓëÀúÊ·Êý¾ÝºÍ·¸×ïÏÓÒÉÈËÊý¾Ý½øÐÐÆ¥Å䣬ץ²¶ÏÓÒÉ·¸µÄʹÓð¸Àý¡£StreamingSQL µ±Ç°»ùÓÚ Spark 1.4.1 °æ±¾£¬¶ø Spark ÉçÇøÀïÕýÔÚ´òÔìÁíÒ»Ì×ÈÚºÏÅúÁ¿Êý¾ÝºÍÁ÷ʽÊý¾ÝµÄ·½°¸£¬ Streaming DataFrames¡£³ÌºÆÈÏΪδÀ´ Spark ÖеÄÅú´¦ÀíºÍÁ÷´¦ÀíÁ½ÖÖÔËË㽫»áºÏ¶þΪһ¡£
Todd Lipcon£º Fast Analytics on fast data
ClouderaÃ÷Ðǹ¤³ÌʦTodd Lipcon
Todd ÊÇ Hadoop ÉçÇøµÄÉñÒ»ÑùµÄ¹¤³Ìʦ£¬Hadoop ºÍ HBase µÄ PMC ºÍ Committer£¬°üÀ¨highly-available metadata journaling (QJM) and automatic failover for HDFS¡£ ×Ô2012ÄêÆð£¬¿ªÊ¼ÔÚClouderaÁìµ¼KuduÏîÄ¿¡£
µ±Ç°µÄHadoopÉú̬ϵͳ
Ñݽ²ÖУ¬Todd Ê×ÏÈÖ¸³öÁ˵±½ñ Hadoop ´æ´¢ÏµÍ³ÖдæÔÚµÄÎÊÌâ¡£HDFS (Parquet) ÊʺÏ×ö´óÁ¿Êý¾ÝµÄÀëÏß·ÖÎö£¬HBase ÊʺÏ×÷ÔÚÏßÊý¾ÝµÄËæ»ú·ÃÎÊ£¬Ã»ÓÐÒ»¸öϵͳ¼æ¾ßÁ½ÕßµÄÓÅÊÆ¡£Ëæ×ų¬¿ìËÙ³¬´óÈÝÁ¿ÄÚ´æÉ豸µÄ³öÏÖ£¬´æ´¢ÏµÍ³µÄÆ¿¾±½«×ªÏò CPU£¬¶øµ±Ç°µÄ´æ´¢ÏµÍ³ÔÚÉè¼ÆÖ®³õ²¢Î´¿¼Âǵ½ CPU µÄЧÂÊ¡£
KuduÉè¼ÆÄ¿±ê
´óɨÃè¸ßͨÁ¿£¬¶ÌÆÚ·ÃÎʵĵÍÑÓ³Ù£¬ÀàÊý¾Ý¿âµÄÓïÒ壬¹ØÏµÊý¾ÝÄ£Ð͵ȡ£Kudu ¾ÍÊÇΪÁ˽â¾öÕâЩÎÊÌâ¶øµ®ÉúµÄ£¬ËüÄÜͬʱÂú×ã OLAP ºÍ OLTP Á½ÖÖÐèÇó¡£ÔÚʹÓÃÉÏ£¬Kudu µÄ±íÏñ´«Í³Êý¾Ý¿â£¬ÓÐÖ÷¼ü£¬²»ÄÜÎÞÏÞÖÆÌí¼ÓÁУ»Í¬Ê± Kudu ÌṩÁË NoSQL ·ç¸ñµÄÓû§½Ó¿Ú£»´ËÍ⣬Kudu ʵÏÖÁËÓë MapReduce£¬Spark Óë Impala µÄ¼¯³É¡£ÔÚÐÔÄÜÉÏ£¬Kudu ¿ÉÒÔÀ©Õ¹µ½ÉÏǧ¸ö½Úµã£¬´æ´¢ PB ¼¶Êý¾Ý£¬Ã¿Ãë´¦Àí°ÙÍò´Î¶Áд¡£
Kudu ×îÊʺÏÓÚ˳Ðò¶ÁдºÍËæ»ú¶Áд»ìºÏµÄÓ¦Óã¬ÕâÒ»µã Todd ÒÔСÃ×µÄʹÓ󡾰ΪÀý¡£½èÖú Kudu£¬ СÃ×¼ò»¯ÁË´óÊý¾Ý·ÖÎöƽ̨£¬ÔÊý¾ÝÎÞÐèͨ¹ýÆäËü×é¼þ¼´¿ÉÖ±½Óµ¼Èë Kudu ½øÐзÖÎö¡£Ã¿ÌìÓг¬¹ý 50 ÒÚÌõ¼Ç¼дÈë Kudu£¬ÑÓʱ´ÓСʱ£¨Ì죩¼¶Ï½µµ½Ãë¼¶¡£
KuduµÄʹÓð¸Àý
KuduÊÇËæ»ú¶ÁÈ¡ºÍдÈëͬʱ½áºÏ¡£
ʱ¼äÐòÁÐʾÀý£ºÁ÷Êг¡Êý¾Ý;ÆÛÕ©¼ì²âºÍÔ¤·À;·çÏÕ¼à¿Ø¡£
¹¤×÷¸ºÔØ£º²åÈ룬¸üУ¬É¨Ã裬²éÕÒ¡£
»úÆ÷Êý¾Ý·ÖÎöÀýÈç£ºÍøÂçÍþв¼ì²â¡£
¹¤×÷¸ºÔØ£º²åÈ룬ɨÃ裬²éÕÒ¡£
ÔÚÏß±¨¸æÊ¾Àý£ºODS¡£
¹¤×÷¸ºÔØ£º²åÈ룬¸üУ¬É¨Ã裬²éÕÒ¡£
ʲôÊÇKudu
Todd Ïêϸ½âÊÍÁË Kudu µÄ¼Ü¹¹Éè¼Æ¡£Kudu ÖÐ±í±»Ë®Æ½·Ö¸îΪ¶à¸ö Tablets£¬Êý¾Ý´æÔÚ·þÎñÆ÷µÄ±¾µØÓ²ÅÌÉÏ¡£Ã¿¸ö Tablet Óжà¸ö±¸·Ý£¬±¸·ÝÖ®¼äͨ¹ý Raft ÐÒé±£³ÖÒ»ÖÂÐÔ¡£master ¸ºÔð¹ÜÀí±íµÄÔªÊý¾Ý£¬ÎªÁËÌá¸ßÐÔÄÜ£¬ÔªÊý¾Ýͬʱ»º´æÔÚ¿Í»§¶ËµÄÄÚ´æÖС£Kudu ²ÉÓÃÁËÁд洢£¬Õâ¼ÈÄÜѹËõ½ÚÊ¡¿Õ¼äÌá¸ßÍÌÍÂÁ¿£¬Í¬Ê±¶ÔÓÚ¸ßÑ¡ÔñÐÔµÄÇëÇó·Ç³£¸ßЧ¡£
ÔÚ TPC-H µÄ´ó²¿·Ö²âÊÔÉÏ£¬Kudu ¶¼±È Parquet ÓиüºÃµÄÐÔÄÜ£¬ÕâÒ²ÔÚСÃ×µÄÕæÊµÒµÎñ²âÆÀÖеõ½ÑéÖ¤¡£´ËÍ⣬Kudu ÕýÔÚ³ÉΪ Apache Incubator ÏîÄ¿¡£
|