Spark
StreamingÓ¦ÓÃÓëʵսϵÁаüÀ¨ÒÔÏÂÁù²¿·ÖÄÚÈÝ£º
1.±³¾°Óë¼Ü¹¹¸ÄÔì
2.ͨ¹ý´úÂëʵÏÖ¾ßÌåϸ½Ú£¬²¢ÔËÐÐÏîÄ¿
3.¶ÔStreaming¼à¿ØµÄ½éÉÜÒÔ¼°½â¾öʵ¼ÊÎÊÌâ
4.¶ÔÏîÄ¿×öѹ²âÓëÏà¹ØµÄÓÅ»¯
5.Streaming³ÖÐøÓÅ»¯Ö®HBase
6.¹ÜÀíStreamingÈÎÎñ
µã´ËÔĶÁµÚÒ»²¿·ÖÄÚÈÝ£¬±¾ÆªÎªµÚ¶þ²¿·Ö£¬°üÀ¨ Streaming ³ÖÐøÓÅ»¯Ö®
HBase ÒÔ¼°¹ÜÀí Streaming ÈÎÎñ¡£
Îå¡¢Streaming³ÖÐøÓÅ»¯Ö®HBase
5.1 ÉèÖÃWALog
¹Ø±ÕWALogºóдÈëÄܵ½20Íò£¬µ«ÊÇ·¢ÏÖ»¹ÊDz»ÊÇÌØ±ðÎȶ¨£¬ÓÐʱºÄʱ»¹ÊDZȽϳ¤µÄ£¬·¢Ïִ˽׶ÎÕýÔÚ×öCompaction!!!

²é¿´streamingͳ¼Æ,·¢ÏÖºÄʱ²»Îȶ¨
HBase½çÃæÍ³¼ÆÐÅÏ¢
HBaseÊÇÒ»ÖÖ Log-Structured Merge Tree
¼Ü¹¹Ä£Ê½£¬Óû§Êý¾ÝдÈëÏÈдWAL£¬ÔÙд»º´æ£¬Âú×ãÒ»¶¨Ìõ¼þºó»º´æÊý¾Ý»áÖ´ÐÐflush²Ù×÷ÕæÕýÂäÅÌ£¬ÐγÉÒ»¸öÊý¾ÝÎļþHFile¡£Ëæ×ÅÊý¾ÝдÈë²»¶ÏÔö¶à£¬flush´ÎÊýÒ²»á²»¶ÏÔö¶à£¬½ø¶øHFileÊý¾ÝÎļþ¾Í»áÔ½À´Ô½¶à¡£È»¶ø£¬Ì«¶àÊý¾ÝÎļþ»áµ¼ÖÂÊý¾Ý²éѯIO´ÎÊýÔö¶à£¬Òò´ËHBase³¢ÊÔ×Ų»¶Ï¶ÔÕâЩÎļþ½øÐкϲ¢£¬Õâ¸öºÏ²¢¹ý³Ì³ÆÎªCompaction¡£
Compaction»á´ÓÒ»¸ö region µÄÒ»¸ö store ÖÐÑ¡ÔñһЩ
hfile Îļþ½øÐкϲ¢¡£ºÏ²¢ËµÀ´ÔÀíºÜ¼òµ¥£¬ÏÈ´ÓÕâЩ´ýºÏ²¢µÄÊý¾ÝÎļþÖжÁ³öKeyValues£¬ÔÙ°´ÕÕÓÉСµ½´óÅÅÁкóдÈëÒ»¸öеÄÎļþÖС£Ö®ºó£¬Õâ¸öÐÂÉú³ÉµÄÎļþ¾Í»áÈ¡´ú֮ǰ´ýºÏ²¢µÄËùÓÐÎļþ¶ÔÍâÌṩ·þÎñ¡£
HBase¸ù¾ÝºÏ²¢¹æÄ£½« Compaction ·ÖΪÁËÁ½Àࣺ inorCompaction
ºÍ MajorCompaction ¡£
1. Minor Compaction ÊÇָѡȡһЩСµÄ¡¢ÏàÁÚµÄ StoreFile
½«ËûÃǺϲ¢³ÉÒ»¸ö¸ü´óµÄ StoreFile £¬ÔÚÕâ¸ö¹ý³ÌÖв»»á´¦ÀíÒѾ Deleted »ò Expired
µÄ Cell ¡£Ò»´Î Minor Compaction µÄ½á¹ûÊǸüÉÙ²¢ÇÒ¸ü´óµÄ StoreFile
¡£
2. Major Compaction ÊÇÖ¸½«ËùÓÐµÄ StoreFile
ºÏ²¢³ÉÒ»¸ö StoreFile £¬Õâ¸ö¹ý³Ì»¹»áÇåÀíÈýÀàÎÞÒâÒåÊý¾Ý£º±»É¾³ýµÄÊý¾Ý¡¢TTL¹ýÆÚÊý¾Ý¡¢°æ±¾ºÅ³¬¹ýÉ趨°æ±¾ºÅµÄÊý¾Ý¡£ÁíÍ⣬һ°ãÇé¿öÏ£¬
Major Compactionʱ¼ä»á³ÖÐø±È½Ï³¤£¬Õû¸ö¹ý³Ì»áÏûºÄ´óÁ¿ÏµÍ³×ÊÔ´£¬¶ÔÉϲãÒµÎñÓбȽϴóµÄÓ°Ïì¡£Òò´ËÏßÉÏÒµÎñ¶¼»á½«¹Ø±Õ×Ô¶¯´¥·¢Major
Compaction¹¦ÄÜ£¬¸ÄΪÊÖ¶¯ÔÚÒµÎñµÍ·åÆÚ´¥·¢¡£
5.2 µ÷ÕûѹËõ
ͨ³£Éú²ú»·¾³»á¹Ø±Õ×Ô¶¯ major_compact (ÅäÖÃÎļþÖÐ hbase
. hregion . majorcompaction Éè Ϊ 0 )£¬Ñ¡ÔñÒ»¸öÍíÉÏÓû§ÉÙµÄʱ¼ä´°¿ÚÊÖ¹¤
major _ compact ¡£
ÊÖ¶¯ £º major_compact ¡® testtable ¡¯
Èç¹û hbase ¸üв»ÊÇ̫Ƶ·±£¬¿ÉÒÔÒ»¸öÐÇÆÚ¶ÔËùÓбí×öÒ»´Î major_compact£¬Õâ¸ö¿ÉÒÔÔÚ×öÍêÒ»´Îmajor_compactºó£¬¹Û¿´ËùÓеÄ
storefil eÊýÁ¿£¬Èç¹û storefile ÊýÁ¿Ôö¼Óµ½ major_compact ºóµÄ storefile
µÄ½ü¶þ±¶Ê±£¬¿ÉÒÔ¶ÔËùÓбí×öÒ»´Î major_compact £¬Ê±¼ä±È½Ï³¤£¬²Ù×÷¾¡Á¿±ÜÃâ¸ß·æÆÚ¡£
²é¿´Í³¼ÆÐÅÏ¢
Compact´¥·¢Ìõ¼þ£º
1.memstore flushÖ®ºó´¥·¢
2.¿Í»§¶Ëͨ¹ýshell»òÕßAPI´¥·¢
3.ºǫ́Ïß³ÌCompactionChecker¶¨ÆÚ´¥·¢

²é¿´Í³¼ÆÐÅÏ¢

²é¿´Í³¼ÆÐÅÏ¢
ÖÜÆÚΪ£º Hbase . server . thread . wakefrequencyhbase
. server . compactchecker . interval . multiplier
´¥·¢ compaction £¬ºóÃæ»¹ÓÐһЩÆäËûµÄÌõ¼þÒ²¿ÉÒÔÔÚÔ´ÂëÀïÃæ¿´¿´
Ìõ¼þµÄÑéÖ¤Âß¼¾ÍÊÇÔÚÕâ¸öʱ¼ä·¶Î§£ºmcTime = 7-70.5Ìì,7+70.5Ìì=3.5-10.5;
ÊÇ·ñÓÐÎļþÐ޸ľßÌåÂß¼¿É¼û RatioBasedCompactionPolicy
# isMajorCompaction ·½·¨¡£
5.3 Split
ͨ¹ýÉÏÃæµÄ½ØÍ¼ÎÒÃÇ¿ÉÒÔ¿´µ½£¬¸Ã±íÖ»ÓÐÒ»¸ö region £¬Ð´ÈëÊý¾Ý¶¼¼¯Öе½ÁËһ̨·þÎñÆ÷£¬Õâ¸öԶԶûÓз¢»Ó³ö
HBase ¼¯ÈºµÄÄÜÁ¦Ñ½£¬ÊÖ¶¯²ð·Ö°É£¡

ͨ¹ýhbase ui½çÃæ²ð·ÖRegion
²ð·Öºó£º

Region²ð·Öºó
Áù¡¢¹ÜÀíStreamingÈÎÎñ
ÕâÊÇ Spark Streaming ϵÁв©¿ÍµÄ×îºóÒ»²¿·Ö£¬Ö÷Òª½²Ò»ÏÂÎÒ×Ô¼º¶Ô
Spark Streaming ÈÎÎñµÄһЩ»®·Ö£¬»¹ÓÐÒ»¸öSpark Streaming ÈÎÎñµÄÓʼþ¼à¿Ø¡£
6.1 Streaming ÈÎÎñµÄ»®·Ö
µ± Spark Streaming ¿ª·¢Íê³É£¬²âÊÔÍê³ÉÖ®ºó£¬¾Í·¢²¼ÉÏÏßÁË£¬
Spark Streaming ÈÎÎñµÄ»®·Ö£¬ÒÔ¼°Ê±¼ä´°¿Úµ÷ÊÔ¶àÉÙÕâЩ¶¼ÊǸü¾ßÒµÎñ»®·ÖµÄ¡£
kafka Ò»¸ötopic¶ÔÓ¦HBaseÀïÃæµÄÒ»Õűí
Kafka topic ÀïÃæµÄpartition£¨3-5¸ö²»µÈ£©
Strea Streaming Ïû·ÑÕßµ½µ×È¥¶ÔÓ¦ÄÄЩ topic
ÄØ£¿»¹ÓÐΪʲôÕâô»®·Ö£¬ÒÔ¼°ÕâÑù»®·ÖÓÐʲôºÃ´¦ÄØ£¿
ÒòΪ kafka topic ¶ÔÓ¦ÁËÒµÎñÖеľßÌå HBase ±í£¬È»ºó¾Íͨ¹ý¼à¿Ø
HBase ±í²åÈëÁ÷Á¿À´Åжϸñí²åÈëÇé¿ö
¶ÔÓÚ HBase ±íÊý¾ÝµÄ²åÈëÁ¿»®·ÖÁË5ÖÖ£¬²åÈëÁ¿Ìرð´ó¡¢²åÈëÌõÊý¶àÿÌõÊý¾ÝÁ¿²»´ó¡¢Ã¿´Î²åÈëÊý¾ÝÁ¿ÉÙÊý¾Ý´ó¡¢±È½Ï¾ùÔÈ¡¢²åÈëÉÙ²»Æµ·±
¶ÔÓÚ²åÈëÁ¿Ìرð´ó£¬±ÈÈç¸Ã±í¶¼Õ¼Á˲åÈë×ÜÁ¿µÄ10%¡¢20%µÄÕâÖ־ͶÀÁ¢³öÀ´Ò»Õűí¶ÔÓ¦Ò»¸östreamingÏû·ÑÕß
²åÈëÌõÊý¶àÿÌõÊý¾ÝÁ¿²»´ó£¬¾ÍÊǰѲåÈë±È½ÏƵ·±µÄ¿ÉÒÔ·ÅÔÚÒ»Æð£¬Õâʱºò¿ÉÒÔµ÷С
timeWindow
ÿ´Î²åÈëÊý¾ÝÁ¿ÉÙÊý¾Ý´ó£¬¾ÍÊÇ¿ÉÒÔ¿´¼û²åÈëÿ´Î¶¼ÊÇ1000Ìõ£¬2000Ìõ£¬ÓÐЩʱ¼ä¼ä¸ô£¬¾Í¿ÉÒÔµ÷´ó
timeWindow ʱ¼ä¼ä¸ô£¬ maxRatePerPartition ÉèÖôóÒ»µã
±È½Ï¾ùÔȾͺðìÁË£¬ºÜºÃÉèÖòÎÊý
²åÈëÉÙ²»Æµ·±£¬¿ÉÒÔµ÷´ótimeWindowµ½¼¸Ã룬ÉõÖÁÌ«ÉÙ£¬Ì«²»Æµ·±¿ÉÒÔ¼ÌÐøµ÷´ó
ºÃ´¦´ó¼ÒÓ¦¸ÃÒ²¿´³öÀ´Á˰ɣ¬×ÊÔ´µÄºÏÀíÀûÓã¬¶Ô streaming
µÄÓÅ»¯£¬ timeWindow ¡¢ maxRatePerPartition ¶ÔÓ¦²»Í¬±í£¬Ôö¼ÓºÍ¿ØÖÆÁ˲¢·¢Á¿
6.2 StreamingÈÎÎñµÄ¼à¿Ø
¶ÔÓÚSpark Streaming jobµÄ¼à¿Ø£¬×Ô´øµÄStreaming UIÄÜ¿´µ½¾ßÌåµÄһЩÁ÷Á¿£¬Ê±¼äµÈÐÅÏ¢£¬µ«ÊÇȱÉÙÁËÒ»¸ö֪ͨ£¬ÓÚÊǼòµ¥µÄ¿ª·¢ÁËÒ»¸ö¡£ÔÚ¼à¿ØÕâÒ»¿éÒ²ÏëÁ˲»ÉÙ·½°¸£¬±ÈÈç¼à¿Øpid£¬Í¨¹ýshellÈ¥¼à¿Ø£¬»òÕßÖ±½Óµ÷ÓÃÔ´ÂëÀïÃæµÄ·½·¨£¬¶¼³¢ÊÔ¹ý£¬ÓеÄҪôû´ïµ½Ô¤ÆÚµÄЧ¹û£¬ÒªÃ´ÓеIJ»ÊǺܺÃά»¤¿ª·¢³É±¾¸ß¡£
×îÖÕÑ¡ÁËÒ»¸ö±È½Ï¼òµ¥µÄ£¬µ«ÊÇÓÖÄÜ´ïµ½Ò»¶¨Ð§¹ûµÄ£¬Í¨¹ýpyÅÀ³æ£¬µ½ÔʼµÄ
streaming UI ½çÃæÈ¥»ñÈ¡µ½¾ßÌåµÄÐÅÏ¢£¬À´¼à¿Ø£¬µ½´ïãÐÖµ¾Í·¢ËÍÓʼþ£¬×ÜÌå²½ÖèÈçÏ£º
ͨ¹ý job name ÔÚ yarn 8088 ½çÃæ/cluster/apps/RUNNINGÕÒ
ApplicationMasterURL µØÖ·
È»ºóͨ¹ý¸ÃµØÖ·µ½ streaming ½çÃæ¼à¿Ø¾ßÌå Streaming
jobµÄScheduling Delay ¡¢ Processing Time Öµ

yarn 8088½çÃæ/cluster/apps/RUNNING
¾ßÌå´úÂ룺

Python ¼à¿ØÅÀ³æ Óʼþ֪ͨ |