±¾ÎÄͨ¹ý¶Ô
ELK Stack¡¢Kafka¡¢Spark Streaming ÕûºÏ·½°¸µÄ½éÉÜÃèÊöÁËϵͳƽ̨ÈÕÖ¾´¦ÀíÁ÷³Ì£¬Ï£Íû¶ÔϵͳÔËά¹¤³Ìʦ¡¢Êý¾Ý¿â¹¤³ÌʦÔÚʵÏÖÊý¾Ýƽ̨¼¯ÖÐʽÔËά¹¤×÷ÖÐÓÐËù°ïÖú£¬¶ÁÕß»¹¿ÉÒԲο¼ÎÄÕÂĩβ¸ø³öµÄ»¥ÁªÍø¹«Ë¾ÔÚϵͳÔËά¡¢ÈÕÖ¾´¦Àí¹¤×÷ÖеÄʵ¼ù¾Ñ飬½áºÏ×ÔÉíÇé¿öÉè¼Æ×Ô¼ºµÄ·½°¸¡£
¸ÅÊö
´óÊý¾Ýʱ´ú£¬Ëæ×ÅÊý¾ÝÁ¿²»¶ÏÔö³¤£¬´æ´¢Óë¼ÆË㼯ȺµÄ¹æÄ£Ò²Öð½¥À©´ó£¬¼¸°ÙÉÏǧ̨µÄÔÆ¼ÆËã»·¾³ÒѲ»Ïʼû¡£ÏÖÔڵļ¯ÈºËùÐèÒª½â¾öµÄÎÊÌâ²»½ö½öÊǸßÐÔÄÜ¡¢¸ß¿É¿¿ÐÔ¡¢¸ß¿ÉÀ©Õ¹ÐÔ£¬»¹ÐèÒªÃæ¶ÔÒ×ά»¤ÐÔÒÔ¼°Êý¾Ýƽ̨ÄÚ²¿µÄÊý¾Ý¹²ÏíÐÔµÈÖî¶àÌôÕ½¡£ÓÅÐãµÄϵͳÔËάƽ̨¼ÈÄÜʵÏÖÊý¾Ýƽ̨¸÷×é¼þµÄ¼¯ÖÐʽ¹ÜÀí¡¢·½±ãϵͳÔËάÈËÔ±ÈÕ³£¼à²â¡¢ÌáÉýÔËάЧÂÊ£¬ÓÖÄÜ·´À¡ÏµÍ³ÔËÐÐ״̬¸øÏµÍ³¿ª·¢ÈËÔ±¡£ÀýÈç²É¼¯Êý¾Ý²Ö¿âµÄÈÕÖ¾¿ÉÒÔ°´ÕÕʱ¼äÐòÁв鿴¸÷Êý¾Ý¿âʵÀý¸÷ÖÖ¼¶±ðµÄÈÕÖ¾ÊýÁ¿ÓëÕ¼±È£¬²É¼¯
DB2 ±í¿Õ¼äÊý¾Ý·ÖÎö¿ÉµÃµ½Êý¾Ý¿â¼¯Èº½¡¿µ×´Ì¬£¬·ÖÎöÓ¦Ó÷þÎñÆ÷µÄÈÕÖ¾¿ÉÒԲ鿴³ö´í×î¶àµÄÄ£¿é¡¢ÏÂÔØ×î¶àµÄÎļþ¡¢Ê¹ÓÃ×î¶àµÄ¹¦Äܵȡ£´óÊý¾Ýʱ´úµÄÒµÎñÓëÔËά½«½ôÃܵĽáºÏÔÚÒ»Æð¡£
ÈÕÖ¾
1. ʲôÊÇÈÕÖ¾
ÈÕÖ¾ÊÇ´øÊ±¼ä´ÁµÄ»ùÓÚʱ¼äÐòÁеĻúÆ÷Êý¾Ý£¬°üÀ¨ IT ϵͳÐÅÏ¢£¨·þÎñÆ÷¡¢ÍøÂçÉ豸¡¢²Ù×÷ϵͳ¡¢Ó¦ÓÃÈí¼þ£©¡¢ÎïÁªÍø¸÷ÖÖ´«¸ÐÆ÷ÐÅÏ¢¡£ÈÕÖ¾¿ÉÒÔ·´Ó³Óû§Êµ¼ÊÐÐΪ£¬ÊÇÕæÊµµÄÊý¾Ý¡£
2. ÈÕÖ¾´¦Àí·½°¸Ñݽø

ͼ 1. ÈÕÖ¾´¦Àí·½°¸¾ÀúµÄ°æ±¾µü´ú
ÈÕÖ¾´¦Àí v1.0£ºÈÕ־ûÓм¯ÖÐʽ´¦Àí£»Ö»×öʺó×·²é£¬ºÚ¿ÍÈëÇÖºóɾ³ýÈÕÖ¾ÎÞ·¨²ì¾õ£»Ê¹ÓÃÊý¾Ý¿â´æ´¢ÈÕÖ¾£¬ÎÞ·¨Ê¤Èθ´ÔÓÊÂÎñ´¦Àí¡£
ÈÕÖ¾´¦Àí v2.0£ºÊ¹Óà Hadoop ƽ̨ʵÏÖÈÕÖ¾ÀëÏßÅú´¦Àí£¬È±µãÊÇʵʱÐԲʹÓÃ
Storm Á÷´¦Àí¿ò¼Ü¡¢Spark ÄÚ´æ¼ÆËã¿ò¼Ü´¦ÀíÈÕÖ¾£¬µ« Hadoop/Storm/Spark ¶¼ÊDZà³Ì¿ò¼Ü£¬²¢²»ÊÇÄÃÀ´¼´ÓÃµÄÆ½Ì¨¡£
ÈÕÖ¾´¦Àí v3.0£ºÊ¹ÓÃÈÕ־ʵʱËÑË÷ÒýÇæ·ÖÎöÈÕÖ¾£¬Ìص㣺µÚÒ»Êǿ죬ÈÕÖ¾´Ó²úÉúµ½ËÑË÷·ÖÎö³ö½á¹ûÖ»ÓÐÊýÃëÑÓʱ£»µÚ¶þÊÇ´ó£¬Ã¿Ìì´¦Àí
TB ÈÕÖ¾Á¿£»µÚÈýÊÇÁé»î£¬¿ÉËÑË÷·ÖÎöÈκÎÈÕÖ¾¡£×÷Ϊ´ú±íµÄ½â¾ö·½°¸ÓÐ Splunk¡¢ELK¡¢SILK¡£

ͼ 2. Éî¶ÈÕûºÏ ELK¡¢Spark¡¢Hadoop
¹¹½¨ÈÕÖ¾·ÖÎöϵͳ
ELK Stack
ELK Stack ÊÇ¿ªÔ´ÈÕÖ¾´¦ÀíÆ½Ì¨½â¾ö·½°¸£¬±³ºóµÄÉÌÒµ¹«Ë¾ÊÇ Elastic(https://www.elastic.co/)¡£ËüÓÉÈÕÖ¾²É¼¯½âÎö¹¤¾ß
Logstash¡¢»ùÓÚ Lucene µÄÈ«ÎÄËÑË÷ÒýÇæ Elasticsearch¡¢·ÖÎö¿ÉÊÓ»¯Æ½Ì¨ Kibana
×é³É¡£Ä¿Ç° ELK µÄÓû§ÓÐ Adobe¡¢Microsoft¡¢Mozilla¡¢Facebook¡¢Stackoverflow¡¢Cisco¡¢ebay¡¢Uber
µÈÖî¶àÖªÃû³§ÉÌ¡£
1. Logstash
Logstash ÊÇÒ»ÖÖ¹¦ÄÜÇ¿´óµÄÐÅÏ¢²É¼¯¹¤¾ß£¬ÀàËÆÓÚ Hadoop Éú̬ȦÀïµÄ
Flume¡£Í¨³£ÔÚÆäÅäÖÃÎļþ¹æ¶¨ Logstash ÈçºÎ´¦Àí¸÷ÖÖÀàÐ͵ÄʼþÁ÷£¬Ò»°ã°üº¬ input¡¢filter¡¢output
Èý¸ö²¿·Ö¡£Logstash Ϊ¸÷¸ö²¿·ÖÌṩÏàÓ¦µÄ²å¼þ£¬Òò¶øÓÐ input¡¢filter¡¢output ÈýÀà²å¼þÍê³É¸÷ÖÖ´¦ÀíºÍת»»£»ÁíÍâ
codec ÀàµÄ²å¼þ¿ÉÒÔ·ÅÔÚ input ºÍ output ²¿·Öͨ¹ý¼òµ¥±àÂëÀ´¼ò»¯´¦Àí¹ý³Ì¡£ÏÂÃæÒÔ DB2
µÄÒ»ÌõÈÕ־ΪÀý¡£

ͼ 3.DB2 Êý¾Ý¿â²úÉúµÄ°ë½á¹¹»¯ÈÕÖ¾ÑùÀý
ÕâÊÇÒ»ÖÖ¶àÐеÄÈÕÖ¾£¬Ã¿Ò»ÌõÈÕÖ¾ÒÔ£º¡°2014-10-19-12.19.46.033037-300¡±¸ñʽµÄʱ¼ä´ÁΪÆðʼ±êÖ¾¡£¿ÉÒÔÔÚ
input ²¿·ÖÒýÈë codec ²å¼þ multiline£¬À´½«Ò»ÌõÈÕÖ¾µÄ¶àÐÐÎı¾·â×°µ½Ò»ÌõÏûÏ¢£¨message£©ÖС£
input { file { path => "path/to/filename" codec => multiline { pattern => "^\d{4}-\d{2}-\d{2}-\d{2}\.\d{2}\.\d{2}\.\d{6}[\+-]\d{3}" negate => true what => previous } } } |
ʹÓà file ²å¼þµ¼ÈëÎļþÐÎʽµÄÈÕÖ¾£¬¶øÇ¶ÈëµÄ codec ²å¼þ multiline
µÄ²ÎÊý pattern ¾Í¹æ¶¨ÁË·Ö¸îÈÕÖ¾ÌõÄ¿µÄʱ¼ä´Á¸ñʽ¡£ÔÚ DataStage ¶àÐÐÈÕÖ¾µÄʵ¼ÊÓ¦ÓÃÖУ¬ÓÐʱһÌõÈÕÖ¾»á³¬¹ý
500 ÐУ¬Õⳬ³öÁË multiline ×é¼þĬÈϵÄʼþ·â×°µÄ×î´óÐÐÊý£¬ÕâÐèÒªÎÒÃÇÔÚ multiline
ÖÐÉèÖÃ max_lines ÊôÐÔ¡£
¾¹ý input ²¿·Ö¶ÁÈëÔ¤´¦ÀíºóµÄÊý¾ÝÁ÷Èë filter ²¿·Ö£¬ÆäʹÓÃ
grok¡¢mutate µÈ²å¼þÀ´¹ýÂËÎı¾ºÍÆ¥Åä×ֶΣ¬²¢ÇÒÎÒÃÇ×Ô¼º¿ÉÒÔΪʼþÁ÷Ìí¼Ó¶îÍâµÄ×Ö¶ÎÐÅÏ¢£º
filter { mutate{ gsub => ['message', "\n", " "] } grok { match => { "message" => "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}-%{HOUR}\.%{MINUTE}\.%{SECOND})
%{INT:timezone}(?:%{SPACE}%{WORD:recordid}%{SPACE})(?:LEVEL%{SPACE}:%{SPACE}
%{DATA:level}%{SPACE})(?:PID%{SPACE}:%{SPACE}%{INT:processid}%{SPACE})(?:TID%{SPACE}:%{SPACE}
%{INT:threadid}%{SPACE})(?:PROC%{SPACE}:%{SPACE}%{DATA:process}%{SPACE})?(?:INSTANCE%{SPACE}:
%{SPACE}%{WORD:instance}%{SPACE})?(?:NODE%{SPACE}:%{SPACE}%{WORD:node}%{SPACE})?(?:DB%{SPACE}
:%{SPACE}%{WORD:dbname}%{SPACE})?(?:APPHDL%{SPACE}:%{SPACE}
%{NOTSPACE:apphdl}%{SPACE})?(?:APPID%{SPACE}:%{SPACE}%{NOTSPACE:appid}
%{SPACE})?(?:AUTHID%{SPACE}:%{SPACE}%{WORD:authid}%{SPACE})?(?:HOSTNAME%{SPACE}:
%{SPACE}%{HOSTNAME:hostname}%{SPACE})?(?:EDUID%{SPACE}:%{SPACE}%{INT:eduid}
%{SPACE})?(?:EDUNAME%{SPACE}:%{SPACE}%{DATA:eduname}%{SPACE})?(?:FUNCTION%{SPACE}:
%{SPACE}%{DATA:function}%{SPACE})(?:probe:%{SPACE}%{INT:probe}%{SPACE})%{GREEDYDATA:functionlog}" } } date { match => [ "timestamp", "YYYY-MM-dd-HH.mm.ss.SSSSSS" ] } } |
Ç°Ãæ input ²¿·ÖµÄ multiline ²å¼þ½«Ò»Ìõ¶àÐÐÈÕÖ¾Ïîת»¯ÎªÒ»ÐУ¬²¢ÒÔ¡°\n¡±Ìæ´úʵ¼ÊµÄ»»Ðзû¡£ÎªÁ˱ãÓÚºóÃæ´¦Àí£¬ÕâÀïµÄ
mutate ²å¼þ¾ÍÊǽ«ÕâЩ¡°\n¡±Ì滻Ϊ¿Õ¸ñ¡£¶ø grok ²å¼þÓÃÓÚÆ¥ÅäÌáÈ¡ÈÕÖ¾ÏîÖÐÓÐÒâÒåµÄ×Ö¶ÎÐÅÏ¢¡£×îºóµÄ
date ²å¼þÔòÊǰ´¸ñʽ¡°YYYY-MM-dd-HH.mm.ss.SSSSSS¡±½âÎöÌáÈ¡µÄʱ¼ä´Á×ֶΣ¬²¢¸³¸øÏµÍ³Ä¬ÈϵÄʱ¼ä´Á×ֶΡ°@timestamp¡±¡£Output
²å¼þÓÃÓÚÖ¸¶¨Ê¼þÁ÷µÄÈ¥Ïò£¬¿ÉÒÔÊÇÏûÏ¢¶ÓÁС¢È«ÎÄËÑË÷ÒýÇæ¡¢TCP Socket¡¢Email µÈ¼¸Ê®ÖÖÄ¿±ê¶Ë¡£
2. Elasticsearch
Elasticsearch ÊÇ»ùÓÚ Lucene µÄ½üʵʱËÑË÷ƽ̨£¬ËüÄÜÔÚÒ»ÃëÄÚ·µ»ØÄãÒª²éÕÒµÄÇÒÒѾÔÚ
Elasticsearch ×öÁËË÷ÒýµÄÎĵµ¡£ËüĬÈÏ»ùÓÚ Gossip ·ÓÉËã·¨µÄ×Ô¶¯·¢ÏÖ»úÖÆ¹¹½¨ÅäÖÃÓÐÏàͬ
cluster name µÄ¼¯Èº£¬µ«ÊÇÓеÄʱºòÕâÖÖ»úÖÆ²¢²»¿É¿¿£¬»á·¢ÉúÄÔÁÑÏÖÏó¡£¼øÓÚÖ÷¶¯·¢ÏÖ»úÖÆµÄ²»Îȶ¨ÐÔ£¬Óû§¿ÉÒÔÑ¡ÔñÔÚÿһ¸ö½ÚµãÉÏÅäÖü¯ÈºÆäËû½ÚµãµÄÖ÷»úÃû£¬ÔÚÆô¶¯¼¯ÈºÊ±½øÐб»¶¯·¢ÏÖ¡£
Elasticsearch ÖÐµÄ Index ÊÇÒ»×é¾ßÓÐÏàËÆÌØÕ÷µÄÎĵµ¼¯ºÏ£¬ÀàËÆÓÚ¹ØÏµÊý¾Ý¿âÄ£ÐÍÖеÄÊý¾Ý¿âʵÀý£¬Index
ÖпÉÒÔÖ¸¶¨ Type Çø·Ö²»Í¬µÄÎĵµ£¬ÀàËÆÓÚÊý¾Ý¿âʵÀýÖеĹØÏµ±í£¬Document ÊÇ´æ´¢µÄ»ù±¾µ¥Î»£¬¶¼ÊÇ
JSON ¸ñʽ£¬ÀàËÆÓÚ¹ØÏµ±íÖÐÐм¶¶ÔÏó¡£ÎÒÃÇ´¦ÀíºóµÄ JSON Îĵµ¸ñʽµÄÈÕÖ¾¶¼ÒªÔÚ Elasticsearch
ÖÐ×öË÷Òý£¬ÏàÓ¦µÄ Logstash ÓÐ Elasticsearch output ²å¼þ£¬¶ÔÓÚÓû§ÊÇ͸Ã÷µÄ¡£
Hadoop Éú̬ȦΪ´ó¹æÄ£Êý¾Ý¼¯µÄ´¦ÀíÌṩ¶àÖÖ·ÖÎö¹¦ÄÜ£¬µ«ÊµÊ±ËÑË÷Ò»Ö±ÊÇ
Hadoop µÄÈíÀß¡£Èç½ñ£¬Elasticsearch for Apache Hadoop£¨ES-Hadoop£©ÃÖ²¹ÁËÕâһȱÏÝ£¬ÎªÓû§ÕûºÏÁË
Hadoop µÄ´óÊý¾Ý·ÖÎöÄÜÁ¦ÒÔ¼° Elasticsearch µÄʵʱËÑË÷ÄÜÁ¦.

ͼ 4. Ó¦Óà es-hadoop ÕûºÏ
Hadoop Ecosystem Óë Elasticsearch ¼Ü¹¹Í¼£¨https://www.elastic.co/products/hadoop£©
3. Kibana
Kibana ÊÇרÃÅÉè¼ÆÓÃÀ´Óë Elasticsearch Ð×÷µÄ£¬¿ÉÒÔ×Ô¶¨Òå¶àÖÖ±í¸ñ¡¢Öù״ͼ¡¢±ý״ͼ¡¢ÕÛÏßͼ¶Ô´æ´¢ÔÚ
Elasticsearch ÖеÄÊý¾Ý½øÐÐÉîÈëÍÚ¾ò·ÖÎöÓë¿ÉÊÓ»¯¡£ÏÂͼ¶¨ÖƵÄÒDZíÅÌ¿ÉÒÔ¶¯Ì¬¼à²âÊý¾Ý¿â¼¯ÈºÖÐÿ¸öÊý¾Ý¿âʵÀý²úÉúµÄ¸÷ÖÖ¼¶±ðµÄÈÕÖ¾¡£

ͼ 5. ʵʱ¼à²â DB2 ʵÀýÔËÐÐ״̬µÄ¶¯Ì¬ÒDZíÅÌ
Kafka
Kafka ÊÇ LinkedIn ¿ªÔ´µÄ·Ö²¼Ê½ÏûÏ¢¶ÓÁУ¬Ëü²ÉÓÃÁ˶ÀÌØµÄÏû·ÑÕß-Éú²úÕ߼ܹ¹ÊµÏÖÊý¾Ýƽ̨¸÷×é¼þ¼äµÄÊý¾Ý¹²Ïí¡£¼¯Èº¸ÅÄîÖеÄ
server ÔÚ Kafka ÖгÆÖ®Îª broker£¬ËüʹÓÃÖ÷Ìâ¹ÜÀí²»Í¬Àà±ðµÄÊý¾Ý£¬±ÈÈç DB2 ÈÕÖ¾¹éΪһ¸öÖ÷Ì⣬tomcat
ÈÕÖ¾¹éΪһ¸öÖ÷Ìâ¡£ÎÒÃÇʹÓà Logstash ×÷Ϊ Kafka ÏûÏ¢µÄÉú²úÕßʱ£¬output ²å¼þ¾ÍÐèÒªÅäÖúÃ
Kafka broker µÄÁÐ±í£¬Ò²¾ÍÊÇ Kafka ¼¯ÈºÖ÷»úµÄÁÐ±í£»ÏàÓ¦µÄ£¬ÓÃ×÷ Kafka Ïû·ÑÕß½ÇÉ«µÄ
Logstash µÄ input ²å¼þ¾ÍÒªÅäÖúÃÐèÒª¶©ÔÄµÄ Kafka ÖеÄÖ÷ÌâÃû³ÆºÍ ZooKeeper
Ö÷»úÁÐ±í¡£Kafka ͨ¹ý½«Êý¾Ý³Ö¾Ã»¯µ½Ó²ÅÌµÄ Write Ahead Log£¨WAL£©±£Ö¤Êý¾Ý¿É¿¿ÐÔÓë˳ÐòÐÔ£¬µ«Õâ²¢²»»áÓ°ÏìʵʱÊý¾ÝµÄ´«ÊäËÙ¶È£¬ÊµÊ±Êý¾ÝÈÔÊÇͨ¹ýÄÚ´æ´«ÊäµÄ¡£Kafka
ÊÇÒÀÀµÓÚ ZooKeeper µÄ£¬Ëü½«Ã¿×éÏû·ÑÕßÏû·ÑµÄÏàÓ¦ topic µÄÆ«ÒÆÁ¿±£´æÔÚ ZooKeeper
ÖС£¾Ý³Æ LinkedIn ÄÚ²¿µÄ Kafka ¼¯ÈºÃ¿ÌìÒÑÄÜ´¦Àí³¬¹ý 1 ÍòÒÚÌõÏûÏ¢¡£

ͼ 6. »ùÓÚÏûÏ¢¶©ÔÄ»úÖÆµÄ Kafka
¼Ü¹¹
³ýÁ˿ɿ¿ÐԺͶÀÌØµÄ push&pull ¼Ü¹¹Í⣬Ïà½ÏÓÚÆäËûÏûÏ¢¶ÓÁУ¬Kafka
»¹ÓµÓиü´óµÄÍÌÍÂÁ¿£º

ͼ 7. »ùÓÚÏûÏ¢³Ö¾Ã»¯»úÖÆµÄÏûÏ¢¶ÓÁÐÍÌÍÂÁ¿±È½Ï
Spark Streaming
Spark ÓɼÓÖÝ´óѧ²®¿ËÀû·ÖУ AMP ʵÑéÊÒ (Algorithms,
Machines, and People Lab) ¿ª·¢£¬¿ÉÓÃÀ´¹¹½¨´óÐ͵ġ¢µÍÑÓ³ÙµÄÊý¾Ý·ÖÎöÓ¦ÓóÌÐò¡£Ëü½«Åú´¦Àí¡¢Á÷´¦Àí¡¢¼´Ï¯²éѯÈÚΪһÌå¡£Spark
ÉçÇøÒ²ÊÇÏ൱»ð±¬£¬Æ½¾ùÿÈý¸öÔµü´úÒ»´Î°æ±¾¸üÊÇÌåÏÖÁËËüÔÚ´óÊý¾Ý´¦ÀíÁìÓòµÄµØÎ»¡£
Spark Streaming ²»Í¬ÓÚ Storm£¬Storm ÊÇ»ùÓÚʼþ¼¶±ðµÄÁ÷´¦Àí£¬Spark
Streaming ÊÇ mini-batch ÐÎʽµÄ½üËÆÁ÷´¦ÀíµÄ΢ÐÍÅú´¦Àí¡£Spark Streaming
ÌṩÁËÁ½ÖÖ´Ó Kafka ÖлñÈ¡ÏûÏ¢µÄ·½Ê½£º
µÚÒ»ÖÖÊÇÀûÓà Kafka Ïû·ÑÕ߸߼¶ API ÔÚ Spark µÄ¹¤×÷½ÚµãÉÏ´´½¨Ïû·ÑÕßỊ̈߳¬¶©ÔÄ
Kafka ÖеÄÏûÏ¢£¬Êý¾Ý»á´«Êäµ½ Spark ¹¤×÷½ÚµãµÄÖ´ÐÐÆ÷ÖУ¬µ«ÊÇĬÈÏÅäÖÃÏÂÕâÖÖ·½·¨ÔÚ Spark
Job ³ö´íʱ»áµ¼ÖÂÊý¾Ý¶ªÊ§£¬Èç¹ûÒª±£Ö¤Êý¾Ý¿É¿¿ÐÔ£¬ÐèÒªÔÚ Spark Streaming ÖпªÆôWrite
Ahead Logs£¨WAL£©£¬Ò²¾ÍÊÇÉÏÎÄÌáµ½µÄ Kafka ÓÃÀ´±£Ö¤Êý¾Ý¿É¿¿ÐÔºÍÒ»ÖÂÐÔµÄÊý¾Ý±£´æ·½Ê½¡£¿ÉÒÔÑ¡ÔñÈÃ
Spark ³ÌÐò°Ñ WAL ±£´æÔÚ·Ö²¼Ê½Îļþϵͳ£¨±ÈÈç HDFS£©ÖС£
µÚ¶þÖÖ·½Ê½²»ÐèÒª½¨Á¢Ïû·ÑÕßỊ̈߳¬Ê¹Óà createDirectStream
½Ó¿ÚÖ±½ÓÈ¥¶ÁÈ¡ Kafka µÄ WAL£¬½« Kafka ·ÖÇøÓë RDD ·ÖÇø×öÒ»¶ÔÒ»Ó³É䣬Ïà½ÏÓÚµÚÒ»ÖÖ·½·¨£¬²»ÐèÔÙά»¤Ò»·Ý
WAL Êý¾Ý£¬Ìá¸ßÁËÐÔÄÜ¡£¶ÁÈ¡Êý¾ÝµÄÆ«ÒÆÁ¿ÓÉ Spark Streaming ³ÌÐòͨ¹ý¼ì²éµã»úÖÆ×ÔÉí´¦Àí£¬±ÜÃâÔÚ³ÌÐò³ö´íµÄÇé¿öÏÂÖØÏÖµÚÒ»ÖÖ·½·¨Öظ´¶ÁÈ¡Êý¾ÝµÄÇé¿ö£¬Ïû³ýÁË
Spark Streaming Óë ZooKeeper/Kafka Êý¾Ý²»Ò»ÖµķçÏÕ¡£±£Ö¤Ã¿ÌõÏûÏ¢Ö»»á±»
Spark Streaming ´¦ÀíÒ»´Î¡£ÒÔÏ´úÂëÆ¬Í¨¹ýµÚ¶þÖÖ·½Ê½¶ÁÈ¡ Kafka ÖеÄÊý¾Ý£º
// Create direct kafka stream with brokers and topics JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream( jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet); messages.foreachRDD(new Function<JavaPairRDD<String,String>,Void>(){ public Void call(JavaPairRDD<String, String> v1) throws Exception { v1.foreach(new VoidFunction<Tuple2<String, String>>(){ public void call(Tuple2<String, String> tuple2) { try{ JSONObject a = new JSONObject(tuple2._2); ... |
Spark Streaming »ñÈ¡µ½ÏûÏ¢ºó±ã¿ÉÒÔͨ¹ý Tuple ¶ÔÏó×Ô¶¨Òå²Ù×÷ÏûÏ¢£¬ÈçÏÂͼÊÇÕë¶Ô
DB2 Êý¾Ý¿âÈÕÖ¾µÄÓʼþ¸æ¾¯£¬Éú³É¸æ¾¯Óʼþ·¢Ë͵½ Notes ÓÊÏ䣺

ͼ 8. »ùÓÚ Spark Streaming
¶Ô DB2 Òì³£ÈÕ־ʵÏÖ Notes Óʼþ¸æ¾¯
»¥ÁªÍøÐÐÒµÈÕÖ¾´¦Àí·½°¸¾ÙÀý½éÉÜÓëÓ¦ÓÃ
1. ÐÂÀË
ÐÂÀ˲ÉÓõļ¼Êõ¼Ü¹¹Êdz£¼ûµÄ Kafka ÕûºÏ ELK Stack ·½°¸¡£Kafka
×÷ΪÏûÏ¢¶ÓÁÐÓÃÀ´»º´æÓû§ÈÕÖ¾£»Ê¹Óà Logstash ×öÈÕÖ¾½âÎö£¬Í³Ò»³É JSON ¸ñʽÊä³ö¸ø Elasticsearch£»Ê¹ÓÃ
Elasticsearch ÌṩʵʱÈÕÖ¾·ÖÎöÓëÇ¿´óµÄËÑË÷ºÍͳ¼Æ·þÎñ£»Kibana ÓÃ×÷Êý¾Ý¿ÉÊÓ»¯×é¼þ¡£¸Ã¼¼Êõ¼Ü¹¹Ä¿Ç°·þÎñµÄÓû§°üÀ¨Î¢²©¡¢Î¢ÅÌ¡¢ÔÆ´æ´¢¡¢µ¯ÐÔ¼ÆËãÆ½Ì¨µÈÊ®¶à¸ö²¿ÃŵĶà¸ö²úÆ·µÄÈÕÖ¾ËÑË÷·ÖÎöÒµÎñ£¬Ã¿Ìì´¦ÀíÔ¼
32 ÒÚÌõ£¨2TB£©ÈÕÖ¾¡£
ÐÂÀ˵ÄÈÕÖ¾´¦ÀíÆ½Ì¨ÍÅ¶Ó¶Ô Elasticsearch ×öÁË´óÁ¿ÓÅ»¯£¨±ÈÈçµ÷Õû
max open files µÈ£©£¬²¢ÇÒ¿ª·¢ÁËÒ»¸ö¶ÀÁ¢µÄ Elasticsearch Index ¹ÜÀíϵͳ£¬¸ºÔðË÷ÒýÈÕ³£Î¬»¤ÈÎÎñ£¨±ÈÈçË÷ÒýµÄ´´½¨¡¢ÓÅ»¯¡¢É¾³ý¡¢Óë·Ö²¼Ê½ÎļþϵͳµÄÊý¾Ý½»»»µÈ£©µÄµ÷¶È¼°Ö´ÐС£Îª
Elasticsearch °²×°Á˹úÄÚÖÐÎķִʲå¼þ elasticsearch-analysis-ik£¬Âú×ã΢ÅÌËÑË÷¶ÔÖÐÎķִʵÄÐèÇó¡££¨¼û²Î¿¼×ÊÔ´
2£©
2. ÌÚѶ
ÌÚѶÀ¶¾¨Êý¾Ýƽ̨¸æ¾¯ÏµÍ³µÄ¼¼Êõ¼Ü¹¹Í¬Ñù»ùÓÚ·Ö²¼Ê½ÏûÏ¢¶ÓÁкÍÈ«ÎÄËÑË÷ÒýÇæ¡£µ«ÌÚѶµÄ¸æ¾¯Æ½Ì¨²»½öÏÞÓÚ´Ë£¬ËüµÄ¸´ÔÓµÄÖ¸±êÊý¾Ýͳ¼ÆÈÎÎñͨ¹ýʹÓÃ
Storm ×Ô¶¨ÒåÁ÷ʽ¼ÆËãÈÎÎñµÄ·½·¨ÊµÏÖ£¬Òì³£¼ì²âµÄʵÏÖÀûÓÃÁËÇúÏßµÄʱ¼äÖÜÆÚÐÔºÍÏà¹ØÇúÏßÖ®¼äµÄÏà¹ØÐÔÈ¥¶¨Ò嶯̬µÄãÐÖµ£¬²¢ÇÒ»ùÓÚ»úÆ÷ѧϰË㷨ʵÏÖÁ˸´ÔÓµÄÈÕÖ¾×Ô¶¯·ÖÀࣨ±ÈÈç
summo logic£©¡£
¸æ¾¯Æ½Ì¨°Ñ²¦²â£¨¶¨Ê± curl Ò»ÏÂij¸ö url£¬ÓÐÎÊÌâ¾Í¸æ¾¯£©¡¢ÈÕÖ¾¼¯ÖмìË÷¡¢ÈÕÖ¾¸æ¾¯£¨5
·ÖÖÓ Error ´óÓÚ X ´Î¸æ¾¯£©¡¢Ö¸±ê¸æ¾¯£¨cpu ʹÓÃÂÊ´óÓÚ X ¸æ¾¯£©ÕûºÏ½øÍ¬Ò»¸öÊý¾Ý¹ÜÏߣ¬¼ò»¯ÁËÕûÌåµÄ¼Ü¹¹¡££¨¼û²Î¿¼×ÊÔ´
3£©
3. ÆßÅ£
ÆßÅ£²ÉÓõļ¼Êõ¼Ü¹¹Îª Flume£«Kafka£«Spark£¬»ì²¿ÔÚ 8 ̨¸ßÅä»úÆ÷¡£¸ù¾ÝÆßÅ£¼¼Êõ²©¿ÍÌṩµÄÊý¾Ý£¬¸ÃÈÕÖ¾´¦ÀíÆ½Ì¨Ã¿Ìì´¦Àí
500 ÒÚÌõÊý¾Ý£¬·åÖµ 80 Íò TPS¡£
Flume Ïà½ÏÓÚ Logstash Óиü´óµÄÍÌÍÂÁ¿£¬¶øÇÒÓë HDFS
ÕûºÏµÄÐÔÄÜ±È Logstash Ç¿ºÜ¶à¡£ÆßÅ£¼¼Êõ¼Ü¹¹Ñ¡ÐÍÏÔÈ»¿¼ÂÇÁËÕâÒ»µã£¬ÆßÅ£ÔÆÆ½Ì¨µÄÈÕÖ¾Êý¾Ýµ½ Kafka
ºó£¬Ò»Â·Í¬²½µ½ HDFS£¬ÓÃÓÚÀëÏßͳ¼Æ£¬Áíһ·ÓÃÓÚʹÓà Spark Streaming ½øÐÐʵʱ¼ÆË㣬¼ÆËã½á¹û±£´æÔÚ
Mongodb ¼¯ÈºÖС£
Èκνâ¾ö·½°¸¶¼²»ÊÇʮȫʮÃÀµÄ£¬¾ßÌå²ÉÓÃÄÄЩ¼¼ÊõÒªÉîÈëÁ˽â×Ô¼ºµÄÓ¦Óó¡¾°¡£¾ÍĿǰÈÕÖ¾´¦ÀíÁìÓòµÄ¿ªÔ´×é¼þÀ´Ëµ£¬ÔÚÒÔϼ¸¸ö·½Ã滹±È½ÏǷȱ£º
Logstash µÄÄÚ²¿×´Ì¬»ñÈ¡²»µ½£¬Ä¿Ç°Ã»ÓкõijÉÊìµÄ¼à¿Ø·½°¸¡£
Elasticsearch ¾ßÓк£Á¿´æ´¢º£Á¿¾ÛºÏµÄÄÜÁ¦£¬µ«ÊÇͬ Mongodb
Ò»Ñù£¬²¢²»ÊʺÏÓÚдÈëÊý¾Ý·Ç³£¶à£¨1 Íò TPS ÒÔÉÏ£©µÄ³¡¾°¡£
ȱ·¦ÕæÕýʵÓõÄÒì³£¼ì²â·½·¨£»ÊµÊ±Í³¼Æ·½ÃæÈ±·¦³ÉÊìµÄ½â¾ö·½°¸£¬Storm
¾ÍÊÇÒ»¸öµ×²ãµÄÖ´ÐÐÒýÇæ£¬¶ø Spark »¹È±ÉÙʱ¼ä´°¿ÚµÈ³éÏó¡£
¶ÔÓÚÈÕÖ¾×Ô¶¯·ÖÀ࣬»¹Ã»ÓпªÔ´¹¤¾ß¿ÉÒÔ×öµ½ summo logic ÄÇÑùµÄЧ¹û¡£
½áÊøÓï
´óÊý¾Ýʱ´úµÄÔËά¹ÜÀíÒâÒåÖØ´ó£¬ºÃµÄÈÕÖ¾´¦ÀíÆ½Ì¨¿ÉÒÔʰ빦±¶µÄÌáÉý¿ª·¢ÈËÔ±ºÍÔËάÈËÔ±µÄЧÂÊ¡£±¾ÎÄͨ¹ý¼òµ¥ÓÃÀý½éÉÜÁË
ELK Stack¡¢Kafka ºÍ Spark Streaming ÔÚÈÕÖ¾´¦ÀíÆ½Ì¨Öи÷×ÔÔÚϵͳ¼Ü¹¹ÖеŦÄÜ¡£ÏÖʵÖÐÓ¦Óó¡¾°·±¶à¸´ÔÓ¡¢Êý¾ÝÐÎʽ¶àÖÖ¶àÑù£¬ÈÕÖ¾´¦Àí¹¤×÷²»ÊÇÒ»õí¶ø¾ÍµÄ£¬·ÖÎö´¦Àí¹ý³Ì»¹ÐèÒªÔÚʵ¼ùÖв»¶ÏÍÚ¾òºÍÓÅ»¯£¬±ÊÕßÒ²½«ÖÂÁ¦ÓÚ
DB2 Êý¾Ý¿âÔËÐÐ״̬¸üϸ½ÚÊý¾ÝµÄÊÕ¼¯ºÍ¸üÈ«ÃæÏ¸ÖÂµÄ¼à¿Ø¡£
|