±à¼ÍƼö: |
±¾ÎÄÀ´×ÔÓÚ InfoQ£¬Õâ¸öÏîÄ¿Ö÷Òª½â¾ö check ºÍ opinion2 ÕÅÀúÊ·Êý¾Ý±í£¨ÀúÊ·Êý¾ÝÊÇÖ¸µ±ÒµÎñ·¢Éú¹ý³ÌÖеÄÍêÕûÖмäÁ÷³ÌºÍ½á¹ûÊý¾Ý£©µÄÔÚÏß²éѯ¡£ |
|
±³¾°½éÉÜ
±¾ÏîÄ¿Ö÷Òª½â¾ö check ºÍ opinion2 ÕÅÀúÊ·Êý¾Ý±í£¨ÀúÊ·Êý¾ÝÊÇÖ¸µ±ÒµÎñ·¢Éú¹ý³ÌÖеÄÍêÕûÖмäÁ÷³ÌºÍ½á¹ûÊý¾Ý£©µÄÔÚÏß²éѯ¡£ÔʵÏÖ»ùÓÚ Oracle Ìṩ´æ´¢²éѯ·þÎñ£¬Ëæ×ÅÊý¾ÝÁ¿µÄ²»¶ÏÔö¼Ó£¬ÔÚдÈëºÍ¶ÁÈ¡¹ý³ÌÖÐÃæÁÙÐÔÄÜÎÊÌ⣬ÇÒÀúÊ·Êý¾Ý½ö¹©ÒµÎñ²éѯ²Î¿¼£¬²¢²»Ó°Ïìʵ¼ÊÁ÷³Ì£¬´Óϵͳ½á¹¹ÉÏÀ´Ëµ£¬·ÅÔÚÒµÎñÁ´ÌõÉÏÓαȽÏÖØ¡£±¾ÏîÄ¿½«ÆäÖÃÓÚÏÂÓÎÊý¾Ý´¦Àí Hadoop ·Ö²¼Ê½Æ½Ì¨À´ÊµÏÖ´ËÐèÇó¡£ÏÂÃæÁÐһЩ¾ßÌåµÄÐèÇóÖ¸±ê£º
1. Êý¾ÝÁ¿£ºÄ¿Ç° check ±íµÄÀÛ¼ÆÊý¾ÝÁ¿Îª 5000w+ ÐУ¬11GB£»opinion ±íµÄÀÛ¼ÆÊý¾ÝÁ¿Îª 3 ÒÚ +£¬Ô¼ 100GB¡£Ã¿ÈÕÔöÁ¿Ô¼ÎªÃ¿Õűí 50 Íò + ÐУ¬Ö»×ö insert£¬²»×ö update¡£
2. ²éѯҪÇó£ºcheck ±íµÄÖ÷¼üΪ id£¨Oracle È«¾Ö id£©£¬²éѯ¼üΪ check_id£¬Ò»¸ö check_id ¶ÔÓ¦¶àÌõ¼Ç¼£¬ËùÒÔÐè·µ»Ø¶ÔÓ¦¼Ç¼µÄ list£» opinion ±íµÄÖ÷¼üÒ²ÊÇ id£¬²éѯ¼üÊÇ bussiness_no ºÍ buss_type£¬Í¬Àí·µ»Ø list¡£µ¥±Ê²éѯ·µ»Ø List ´óСԼ 50 ÌõÒÔÏ£¬²éѯƵÂÊΪ 100 ±Ê / Ìì×óÓÒ£¬²éѯÏìӦʱ¼ä 2s¡£
¼¼ÊõÑ¡ÐÍ
´ÓÊý¾ÝÁ¿¼°²éѯҪÇóÀ´¿´£¬·Ö²¼Ê½Æ½Ì¨ÉϾ߱¸´óÊý¾ÝÁ¿´æ´¢£¬ÇÒÌṩʵʱ²éѯÄÜÁ¦µÄ×é¼þÊ×Ñ¡ HBase¡£¸ù¾ÝÐèÇó×öÁ˳õ²½µÄµ÷ÑÐºÍÆÀ¹Àºó£¬´óÖÂÈ·¶¨ HBase ×÷ΪÖ÷Òª´æ´¢×é¼þ¡£½«ÐèÇó²ð½âΪдÈëºÍ¶ÁÈ¡ HBase Á½²¿·Ö¡£
¶ÁÈ¡ HBase Ïà¶ÔÀ´Ëµ·½°¸±È½ÏÈ·¶¨£¬»ù±¾¸ù¾ÝÐèÇóÉè¼Æ RowKey£¬È»ºó¸ù¾Ý HBase ÌṩµÄ·á¸» API£¨get£¬scan µÈ£©À´¶ÁÈ¡Êý¾Ý£¬Âú×ãÐÔÄÜÒªÇó¼´¿É¡£
дÈë HBase µÄ·½·¨´óÖÂÓÐÒÔϼ¸ÖÖ£º
1. Java µ÷Óà HBase ÔÉú API£¬HTable.add(List(Put))¡£
2. MapReduce ×÷Òµ£¬Ê¹Óà TableOutputFormat ×÷ΪÊä³ö¡£
3. Bulk Load£¬ÏȽ«Êý¾Ý°´ÕÕ HBase µÄÄÚ²¿Êý¾Ý¸ñʽÉú³É³Ö¾Ã»¯µÄ HFile Îļþ£¬È»ºó¸´ÖƵ½ºÏÊʵÄλÖò¢Í¨Öª RegionServer £¬¼´Íê³Éº£Á¿Êý¾ÝµÄÈë¿â¡£ÆäÖÐÉú³É Hfile ÕâÒ»²½¿ÉÒÔÑ¡Ôñ MapReduce »ò Spark¡£
±¾ÎIJÉÓÃµÚ 3 ÖÖ·½Ê½£¬Spark + Bulk Load дÈë HBase¡£¸Ã·½·¨Ïà¶ÔÆäËû 2 ÖÖ·½Ê½ÓÐÒÔÏÂÓÅÊÆ£º
1. BulkLoad ²»»áд WAL£¬Ò²²»»á²úÉú flush ÒÔ¼° split¡£
2. Èç¹ûÎÒÃÇ´óÁ¿µ÷Óà PUT ½Ó¿Ú²åÈëÊý¾Ý£¬¿ÉÄܻᵼÖ´óÁ¿µÄ GC ²Ù×÷¡£³ýÁËÓ°ÏìÐÔÄÜÖ®Í⣬ÑÏÖØÊ±ÉõÖÁ¿ÉÄÜ»á¶Ô HBase ½ÚµãµÄÎȶ¨ÐÔÔì³ÉÓ°Ï죬²ÉÓà BulkLoad Î޴˹ËÂÇ¡£
3. ¹ý³ÌÖÐûÓдóÁ¿µÄ½Ó¿Úµ÷ÓÃÏûºÄÐÔÄÜ¡£
4. ¿ÉÒÔÀûÓà Spark Ç¿´óµÄ¼ÆËãÄÜÁ¦¡£
ͼʾÈçÏ£º

Éè¼Æ
»·¾³ÐÅÏ¢
Hadoop 2.5-2.7
HBase 0.98.6
Spark 2.0.0-2.1.1
Sqoop 1.4.6
±íÉè¼Æ
±¾¶ÎµÄÖØµãÔÚÓÚÌÖÂÛ HBase ±íµÄÉè¼Æ£¬ÆäÖÐ RowKey ÊÇ×îÖØÒªµÄ²¿·Ö¡£ÎªÁË·½±ã˵Ã÷ÎÊÌ⣬ÎÒÃÇÏÈÀ´¿´¿´Êý¾Ý¸ñʽ¡£ÒÔÏÂÒÔ check ¾ÙÀý£¬opinion ͬÀí¡£
check ±í£¨Ô±í×Ö¶ÎÓÐ 18 ¸ö£¬Îª·½±ãÃèÊö£¬±¾ÎĽØÑ¡ 5 ¸ö×Ö¶ÎʾÒ⣩

ÈçÉÏͼËùʾ£¬Ö÷¼üΪ id£¬32 λ×ÖĸºÍÊý×ÖËæ»ú×é³É£¬ÒµÎñ²éѯ×Ö¶Î check_id Ϊ²»¶¨³¤×ֶΣ¨²»³¬¹ý 32 룩£¬×ÖĸºÍÊý×Ö×é³É£¬Í¬Ò» check_id ¿ÉÄܶÔÓ¦¶àÌõ¼Ç¼£¬ÆäËûΪÏà¹ØÒµÎñ×ֶΡ£ÖÚËùÖÜÖª£¬HBase ÊÇ»ùÓÚ RowKey Ìṩ²éѯ£¬ÇÒÒªÇó RowKey ÊÇΨһµÄ¡£RowKey µÄÉè¼ÆÖ÷Òª¿¼ÂǵÄÊÇÊý¾Ý½«ÔõÑù±»·ÃÎÊ¡£³õ²½À´¿´£¬ÎÒÃÇÓÐ 2 ÖÖÉè¼Æ·½·¨¡£
1. ²ð³É 2 ÕÅ±í£¬Ò»Õűí id ×÷Ϊ RowKey£¬ÁÐΪ check ±í¶ÔÓ¦µÄ¸÷ÁУ»ÁíÒ»ÕűíΪË÷Òý±í£¬RowKey Ϊ check_id£¬Ã¿Ò»ÁжÔÓ¦Ò»¸ö id¡£²éѯʱ£¬ÏÈÕÒµ½ check_id ¶ÔÓ¦µÄ id list£¬È»ºó¸ù¾Ý id ÕÒµ½¶ÔÓ¦µÄ¼Ç¼¡£¾ùΪ HBase µÄ get ²Ù×÷¡£
2. ½«±¾ÐèÇó¿É¿´³ÉÊÇÒ»¸ö·¶Î§²éѯ£¬¶ø²»Êǵ¥Ìõ²éѯ¡£½« check_id ×÷Ϊ RowKey µÄǰ׺£¬ºóÃæ¸ú id¡£²éѯʱÉèÖà Scan µÄ startRow ºÍ stopRow£¬ÕÒµ½¶ÔÓ¦µÄ¼Ç¼ list¡£
µÚÒ»ÖÖ·½·¨ÓŵãÊDZí½á¹¹¼òµ¥£¬RowKey ÈÝÒ×Éè¼Æ£¬È±µãΪ 1£©Êý¾ÝдÈëʱ£¬Ò»ÐÐÔʼÊý¾ÝÐèҪдÈëµ½ 2 ÕÅ±í£¬ÇÒË÷Òý±íдÈëǰÐèÒªÏÈɨÃè¸Ã RowKey ÊÇ·ñ´æÔÚ£¬Èç¹û´æÔÚ£¬Ôò¼ÓÈëÒ»ÁУ¬·ñÔòн¨Ò»ÐУ¬2£©¶ÁÈ¡µÄʱºò£¬¼´±ãÊDzÉÓà List, Ò²ÖÁÉÙÐèÒª¶ÁÈ¡ 2 ´Î±í¡£µÚ¶þÖÖÉè¼Æ·½·¨£¬RowKey Éè¼Æ½ÏΪ¸´ÔÓ£¬µ«ÊÇдÈëºÍ¶ÁÈ¡¶¼ÊÇÒ»´ÎÐԵġ£×ۺϿ¼ÂÇ£¬ÎÒÃDzÉÓõڶþÖÖÉè¼Æ·½·¨¡£
RowKey Éè¼Æ
ÈȵãÎÊÌâ
HBase ÖеÄÐÐÊÇÒÔ RowKey µÄ×ÖµäÐòÅÅÐòµÄ£¬ÆäÈȵãÎÊÌâͨ³£·¢ÉúÔÚ´óÁ¿µÄ¿Í»§¶ËÖ±½Ó·ÃÎʼ¯ÈºµÄÒ»¸ö»ò¼«ÉÙÊý½Úµã¡£Ä¬ÈÏÇé¿öÏ£¬ÔÚ¿ªÊ¼½¨±íʱ£¬±íÖ»»áÓÐÒ»¸ö region£¬²¢Ëæ×Å region Ôö´ó¶ø²ð·Ö³É¸ü¶àµÄ region£¬ÕâЩ region ²ÅÄÜ·Ö²¼ÔÚ¶à¸ö regionserver ÉÏ´Ó¶øÊ¹¸ºÔؾù·Ö¡£¶ÔÓÚÎÒÃǵÄÒµÎñÐèÇ󣬴æÁ¿Êý¾ÝÒѾ½Ï´ó£¬Òò´ËÓбØÒªÔÚÒ»¿ªÊ¼¾Í½« HBase µÄ¸ºÔؾù̯µ½Ã¿¸ö regionserver£¬¼´×ö pre-split¡£³£¼ûµÄ·ÀÖÎÈȵãµÄ·½·¨Îª¼ÓÑΣ¬hash É¢ÁУ¬×ÔÔö²¿·Ö£¨Èçʱ¼ä´Á£©·×ªµÈ¡£
RowKey Éè¼Æ
Step1£ºÈ·¶¨Ô¤·ÖÇøÊýÄ¿£¬´´½¨ HBase Table
²»Í¬µÄÒµÎñ³¡¾°¼°Êý¾ÝÌØµãÈ·¶¨ÊýÄ¿µÄ·½Ê½²»Ò»Ñù£¬ÎÒ¸öÈËÈÏΪӦ¸Ã×ۺϿ¼ÂÇÊý¾ÝÁ¿´óСºÍ¼¯Èº´óСµÈÒòËØ¡£±ÈÈç check ±í´óСԼΪ 11G£¬²âÊÔ¼¯Èº´óСΪ 10 ̨»úÆ÷£¬hbase.hregion.max.filesize=3G£¨µ± region µÄ´óС³¬¹ýÕâ¸öÊýʱ£¬½«²ð·ÖΪ 2 ¸ö£©£¬ËùÒÔ³õʼ»¯Ê±¾¡Á¿Ê¹µÃÒ»¸ö region µÄ´óСΪ 1~2G£¨²»»áÒ»ÉÏÀ´¾Í split£©£¬region Êý¾Ý·Öµ½ 11G/2G=6 ¸ö£¬µ«ÎªÁ˳ä·ÖÀûÓü¯Èº×ÊÔ´£¬±¾ÎÄÖÐ check ±í»®·ÖΪ 10 ¸ö·ÖÇø¡£Èç¹ûÊý¾ÝÁ¿Îª 100G£¬ÇÒ²»¶ÏÔö³¤£¬¼¯ÈºÇé¿ö²»±ä£¬Ôò region ÊýÄ¿Ôö´óµ½ 100G/2G=50 ¸ö×óÓҽϺÏÊÊ¡£Hbase check ±í½¨±íÓï¾äÈçÏ£º
create 'tinawang:check',
{ NAME => 'f', COMPRESSION => 'SNAPPY',DATA_BLOCK_ENCODING => 'FAST_DIFF',BLOOMFILTER=>'ROW'},
{SPLITS => [ '1','2','3', '4','5','6','7','8','9']} |
ÆäÖУ¬Column Family =¡®f¡¯£¬Ô½¶ÌÔ½ºÃ¡£
COMPRESSION => 'SNAPPY'£¬HBase Ö§³Ö 3 ÖÖѹËõ LZO, GZIP and Snappy¡£GZIP ѹËõÂʸߣ¬µ«ÊÇºÄ CPU¡£ºóÁ½Õ߲¶à£¬Snappy ÉÔ΢ʤ³öÒ»µã£¬cpu ÏûºÄµÄ±È GZIP ÉÙ¡£Ò»°ãÔÚ IO ºÍ CPU ¾ùºâÏ£¬Ñ¡Ôñ Snappy¡£
DATA_BLOCK_ENCODING => 'FAST_DIFF'£¬±¾°¸ÀýÖÐ RowKey ½ÏΪ½Ó½ü£¬Í¨¹ýÒÔÏÂÃüÁî²é¿´ key ³¤¶ÈÏà¶Ô value ½Ï³¤¡£
./hbase org.apache.hadoop.hbase.io.hfile.HFile -m -f /apps/hbase/data/data/tinawang/check/ a661f0f95598662a53b3d8b1ae469fdf/f/ a5fefc880f87492d908672e1634f2eed_SeqId_2_ | >

Step2£ºRowKey ×é³É
Salt
ÈÃÊý¾Ý¾ùºâµÄ·Ö²¼µ½¸÷¸ö Region ÉÏ£¬½áºÏ pre-split£¬ÎÒÃǶԲéѯ¼ü¼´ check ±íµÄ check_id Çó hashcode Öµ£¬È»ºó modulus(numRegions) ×÷Ϊǰ׺£¬×¢Òâ²¹ÆëÊý¾Ý¡£
StringUtils.leftPad(Integer.toString( Math.abs(check_id.hashCode() % numRegion)),1,¡¯0¡¯) |
˵Ã÷£ºÈç¹ûÊý¾ÝÁ¿´ïÉϰ٠G ÒÔÉÏ£¬Ôò numRegions ×ÔÈ»µ½ 2 λÊý£¬Ôò salt ҲΪ 2 λ¡£
Hash É¢ÁÐ
ÒòΪ check_id ±¾ÉíÊDz»¶¨³¤µÄ×Ö·ûÊý×Ö´®£¬ÎªÊ¹Êý¾ÝÉ¢Áл¯£¬·½±ã RowKey ²éѯºÍ±È½Ï£¬ÎÒÃÇ¶Ô check_id ²ÉÓà SHA1 É¢Áл¯£¬²¢Ê¹Ö® 32 붨³¤»¯¡£
MD5Hash.getMD5AsHex(Bytes.toBytes(check_id))
ΨһÐÔ
ÒÔÉÏ salt+hash ×÷Ϊ RowKey ǰ׺£¬¼ÓÉÏ check ±íµÄÖ÷¼ü id À´±£ÕÏ RowKey ΨһÐÔ¡£×ÛÉÏ£¬check ±íµÄ RowKey Éè¼ÆÈçÏ£º£¨check_id=A208849559£©

ΪÔöÇ¿¿É¶ÁÐÔ£¬Öм仹¿ÉÒÔ¼ÓÉÏ×Ô¶¨ÒåµÄ·Ö¸î·û£¬È硯+¡¯,¡¯|¡¯µÈ¡£
7+7c9498b4a83974da56b252122b9752bf+ 56B63AB98C2E00B4E053C501380709AD |
ÒÔÉÏÉè¼ÆÄܱ£Ö¤¶Ôÿ´Î²éѯ¶øÑÔ£¬Æä salt+hash ǰ׺ֵÊÇÈ·¶¨µÄ£¬²¢ÇÒÂäÔÚͬһ¸ö region ÖС£ÐèҪ˵Ã÷µÄÊÇ HBase ÖÐ check ±íµÄ¸÷ÁÐͬÊý¾ÝÔ´ Oracle ÖÐ check ±íµÄ¸÷Áд洢¡£
WEB ²éѯÉè¼Æ
RowKey Éè¼ÆÓë²éѯϢϢÏà¹Ø£¬²éѯ·½Ê½¾ö¶¨ RowKey Éè¼Æ£¬·´Ö®»ùÓÚÒÔÉÏ RowKey Éè¼Æ£¬²éѯʱͨ¹ýÉèÖà Scan µÄ [startRow£¬stopRow], ¼´¿ÉÍê³ÉɨÃè¡£ÒÔ²éѯ check_id=A208849559 ΪÀý£¬¸ù¾Ý RowKey µÄÉè¼ÆÔÔò£¬¶ÔÆä½øÐÐ salt+hash ¼ÆË㣬µÃǰ׺¡£
startRow = 7+ 7c9498b4a83974da56b252122b9752bf
stopRow = 7+ 7c9498b4a83974da56b252122b9752bg |
´úÂëʵÏֹؼüÁ÷³Ì
Spark write to HBase
Step0: prepare work
ÒòΪÊÇ´ÓÉÏÓÎϵͳ³Ð½ÓµÄÒµÎñÊý¾Ý£¬´æÁ¿Êý¾Ý²ÉÓà sqoop ³éµ½ hdfs£»ÔöÁ¿Êý¾ÝÿÈÕÒÔÎļþµÄÐÎʽ´Ó ftp Õ¾µã»ñÈ¡¡£ÒòΪҵÎñÊý¾Ý×Ö¶ÎÖаüº¬Ò»Ð©»»Ðзû£¬ÇÒ sqoop1.4.6 Ŀǰֻ֧³Öµ¥×Ö½Ú£¬ËùÒÔ±¾ÎÄÑ¡Ôñ¡¯0x01¡¯×÷ΪÁзָô·û£¬¡¯0x10¡¯×÷ΪÐзָô·û¡£
Step1: Spark read hdfs text file

SparkContext.textfile() ĬÈÏÐзָô·ûΪ¡±\n¡±£¬´Ë´¦ÎÒÃÇÓá°0x10¡±£¬ÐèÒªÔÚ Configuration ÖÐÅäÖá£Ó¦ÓÃÅäÖã¬ÎÒÃǵ÷Óà newAPIHadoopFile ·½·¨À´¶ÁÈ¡ hdfs Îļþ£¬·µ»Ø JavaPairRDD£¬ÆäÖÐ LongWritable ºÍ Text ·Ö±ðΪ Hadoop ÖÐµÄ Long ÀàÐÍºÍ String ÀàÐÍ£¨ËùÓÐ Hadoop Êý¾ÝÀàÐÍºÍ java µÄÊý¾ÝÀàÐͶ¼ºÜÏàÏñ£¬³ýÁËËüÃÇÊÇÕë¶ÔÍøÂçÐòÁл¯¶ø×öµÄÌØÊâÓÅ»¯£©¡£ÎÒÃÇÐèÒªµÄÊý¾ÝÎļþ·ÅÔÚ pairRDD µÄ value ÖУ¬¼´ Text Ö¸´ú¡£ÎªºóÐø´¦Àí·½±ã£¬¿É½« JavaPairRDDת»»Îª JavaRDD< String >¡£
Step2: Transfer and sort RDD
¢Ù ½« avaRDD< String>ת»»³É JavaPairRDD<tuple2,String>£¬ÆäÖвÎÊýÒÀ´Î±íʾΪ£¬RowKey£¬col£¬value¡£×öÕâÑùת»»ÊÇÒòΪ HBase µÄ»ù±¾ÔÀíÊÇ»ùÓÚ RowKey ÅÅÐòµÄ£¬²¢ÇÒµ±²ÉÓà bulk load ·½Ê½½«Êý¾ÝдÈë¶à¸öÔ¤·ÖÇø£¨region£©Ê±£¬ÒªÇó Spark ¸÷ partition µÄÊý¾ÝÊÇÓÐÐòµÄ£¬RowKey£¬column family£¨cf£©£¬col name ¾ùÐèÒªÓÐÐò¡£ÔÚ±¾°¸ÀýÖÐÒòΪֻÓÐÒ»¸öÁдأ¬ËùÒÔ½« RowKey ºÍ col name ×éÖ¯³öÀ´Îª Tuple2¸ñʽµÄ key¡£Çë×¢ÒâÔ±¾Êý¾Ý¿âÖеÄÒ»ÐмǼ£¨n ¸ö×ֶΣ©£¬´Ëʱ»á±»²ð³É n ÐС£
¢Ú »ùÓÚ JavaPairRDD<tuple2,String>½øÐÐ RowKey£¬col µÄ¶þ´ÎÅÅÐò¡£Èç¹û²»×öÅÅÐò£¬»á±¨ÒÔÏÂÒì³££º
java.io.IOException: Added a key notlexically larger than previous key |
¢Û ½«Êý¾Ý×éÖ¯³É HFile ÒªÇóµÄ JavaPairRDDhfileRDD¡£
Step3£ºcreate hfile and bulk load to HBase
¢ÙÖ÷Òªµ÷Óà saveAsNewAPIHadoopFile ·½·¨£º
hfileRdd.saveAsNewAPIHadoopFile( hfilePath,ImmutableBytesWritable.class,
KeyValue.class,HFileOutputFormat2.class,config); |
¢Ú hfilebulk load to HBase
final Job job = Job.getInstance();
job.setMapOutputKeyClass( ImmutableBytesWritable.class);
job.setMapOutputValueClass( KeyValue.class);
HFileOutputFormat2.configureIncrementalLoad(job,htable);
LoadIncrementalHFiles bulkLoader = newLoadIncrementalHFiles(config);
bulkLoader.doBulkLoad( newPath(hfilePath),htable); |
×¢£ºÈç¹û¼¯Èº¿ªÆôÁË kerberos£¬step4 ÐèÒª·ÅÖÃÔÚ ugi.doAs£¨£©·½·¨ÖУ¬ÔÚ½øÐÐÈçÏÂÑéÖ¤ºóʵÏÖ
UserGroupInformation ugi = UserGroupInformation. loginUserFromKeytabAndReturnUGI( keyUser,keytabPath);
UserGroupInformation.setLoginUser( ugi ); |
·ÃÎÊ HBase ¼¯ÈºµÄ 60010 ¶Ë¿Ú web£¬¿ÉÒÔ¿´µ½ region ·Ö²¼Çé¿ö¡£

Read from HBase
±¾ÎÄ»ùÓÚ spring boot ¿ò¼ÜÀ´¿ª·¢ web ¶Ë·ÃÎÊ HBase ÄÚÊý¾Ý¡£
use connection pool(ʹÓÃÁ¬½Ó³Ø)
´´½¨Á¬½ÓÊÇÒ»¸ö±È½ÏÖØµÄ²Ù×÷£¬ÔÚʵ¼Ê HBase ¹¤³ÌÖУ¬ÎÒÃÇÒýÈëÁ¬½Ó³ØÀ´¹²Ïí zk Á¬½Ó£¬meta ÐÅÏ¢»º´æ£¬region server ºÍ master µÄÁ¬½Ó¡£
HConnection connection = HConnectionManager.createConnection( config);
HTableInterface table = connection.getTable("table1");
try {
// Use the table as needed, for a single operation and a single thread
} finally {
table.close();
} |
Ò²¿ÉÒÔͨ¹ýÒÔÏ·½·¨£¬¸²¸ÇĬÈÏÏ̳߳ء£
HConnection createConnection( org.apache.hadoop.conf.Configuration conf,ExecutorService pool); |
process query
Step1: ¸ù¾Ý²éѯÌõ¼þ£¬È·¶¨ RowKey ǰ׺
¸ù¾Ý 3.3 RowKey Éè¼Æ½éÉÜ£¬HBase µÄдºÍ¶Á¶¼×ñѸÃÉè¼Æ¹æÔò¡£´Ë´¦ÎÒÃDzÉÓÃÏàͬµÄ·½·¨£¬½« web µ÷Ó÷½´«ÈëµÄ²éѯÌõ¼þ£¬×ª»¯³É¶ÔÓ¦µÄ RowKey ǰ׺¡£ÀýÈ磬²éѯ check ±í´«µÝ¹ýÀ´µÄ check_id=A208849559£¬Éú³Éǰ׺ 7+7c9498b4a83974da56b252122b9752bf¡£
Step2£ºÈ·¶¨ scan ·¶Î§
A208849559 ¶ÔÓ¦µÄ²éѯ½á¹ûÊý¾Ý¼´ÔÚ RowKey ǰ׺Ϊ 7+7c9498b4a83974da56b252122b9752bf ¶ÔÓ¦µÄ RowKey ¼° value ÖС£
scan.setStartRow( Bytes.toBytes(rowkey_pre)); //scan, 7+7c9498b4a83974da56b252122b9752bf
byte[] stopRow = Bytes.toBytes(rowkey_pre);
stopRow[stopRow.length-1]++;
scan.setStopRow(stopRow);// 7+7c9498b4a83974da56b252122b9752bg |
Step3£º²éѯ½á¹û×é³É·µ»Ø¶ÔÏó
±éÀú ResultScanner ¶ÔÏ󣬽«Ã¿Ò»ÐжÔÓ¦µÄÊý¾Ý·â×°³É table entity£¬×é³É list ·µ»Ø¡£
²âÊÔ
´ÓÔʼÊý¾ÝÖÐËæ»úץȡ 1000 ¸ö check_id£¬ÓÃÓÚÄ£Äâ²âÊÔ£¬Á¬Ðø·¢Æð 3 ´ÎÇëÇóÊýΪ 2000£¨200 ¸öÏ̲߳¢·¢£¬Ñ»· 10 ´Î£©£¬Æ½¾ùÏìӦʱ¼äΪ 51ms£¬´íÎóÂÊΪ 0¡£


ÈçÉÏͼ£¬¾Àú N ´ÎÀۼƲâÊԺ󣬸÷¸ö region É쵀 Requests Êý½ÏΪ½Ó½ü£¬·ûºÏ¸ºÔؾùºâÉè¼ÆÖ®³õ¡£
²È¿Ó¼Ç¼
1¡¢kerberos ÈÏÖ¤ÎÊÌâ
Èç¹û¼¯Èº¿ªÆôÁ˰²È«ÈÏÖ¤£¬ÄÇôÔÚ½øÐÐ Spark Ìá½»×÷ÒµÒÔ¼°·ÃÎÊ HBase ʱ£¬¾ùÐèÒª½øÐÐ kerberos ÈÏÖ¤¡£
±¾ÎIJÉÓà yarn cluster ģʽ£¬ÏñÌá½»ÆÕͨ×÷ÒµÒ»Ñù£¬¿ÉÄܻᱨÒÔÏ´íÎó¡£
ERROR StartApp: job failure,
java.lang.NullPointerException
at com.tinawang.spark.hbase.utils.HbaseKerberos.<init>( HbaseKerberos.java:18)
at com.tinawang.spark.hbase.job.SparkWriteHbaseJob.run( SparkWriteHbaseJob.java:60) |
¶¨Î»µ½ HbaseKerberos.java:18£¬´úÂëÈçÏ£º
this.keytabPath = (Thread.currentThread().getContextClassLoader( ).getResource(prop.getProperty("hbase.keytab") )).getPath(); |
ÕâÊÇÒòΪ executor ÔÚ½øÐÐ HBase Á¬½Óʱ£¬ÐèÒªÖØÐÂÈÏÖ¤£¬Í¨¹ý --keytab ÉÏ´«µÄ tina.keytab ²¢Î´±» HBase ÈÏÖ¤³ÌÐò¿é»ñÈ¡µ½£¬ËùÒÔÈÏÖ¤µÄ keytab ÎļþÐèÒªÁíÍâͨ¹ý --files ÉÏ´«¡£Ê¾ÒâÈçÏÂ
--keytab /path/tina.keytab \
--principal tina@GNUHPC.ORG \
--files "/path/tina.keytab.hbase" |
ÆäÖÐ tina.keytab.hbase Êǽ« tina.keytab ¸´ÖƲ¢ÖØÃüÃû¶øµÃ¡£ÒòΪ Spark ²»ÔÊÐíͬһ¸öÎļþÖØ¸´ÉÏ´«¡£
2¡¢ÐòÁл¯
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$. ensureSerializable( ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$. org$apache$spark$util$ClosureCleaner$$clean( ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean( ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean( SparkContext.scala:2101)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply( RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply( RDD.scala:369)
...
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run( ApplicationMaster.scala:637)
Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: org.apache.spark.api.java.JavaSparkContext@24a16d8c)
- field (class: com.tinawang.spark.hbase.processor.SparkReadFileRDD, name: sc, type: class org.apache.spark.api.java.JavaSparkContext)
... |
½â¾ö·½·¨Ò»£º
Èç¹û sc ×÷ΪÀàµÄ³ÉÔ±±äÁ¿£¬ÔÚ·½·¨Öб»ÒýÓã¬Ôò¼Ó transient ¹Ø¼ü×Ö£¬Ê¹Æä²»±»ÐòÁл¯¡£
private transient JavaSparkContext sc; |
½â¾ö·½·¨¶þ£º
½« sc ×÷Ϊ·½·¨²ÎÊý´«µÝ£¬Í¬Ê±Ê¹Éæ¼° RDD ²Ù×÷µÄÀà implements Serializable¡£ ´úÂëÖвÉÓõڶþÖÖ·½·¨¡£Ïê¼û´úÂë¡£
3¡¢ÅúÁ¿ÇëÇó²âÊÔ
Exception in thread "http-nio-8091-Acceptor-0" java.lang.NoClassDefFoundError: org/apache/tomcat/util/ExceptionUtils |
»òÕß
Exception in thread "http-nio-8091-exec-34" java.lang.NoClassDefFoundError: ch/qos/logback/classic/spi/ThrowableProxy |
²é¿´ÏÂÃæ issue ÒÔ¼°Ò»´ÎÅŲéÎÊÌâµÄ¹ý³Ì£¬¿ÉÄÜÊÇ open file ³¬¹ýÏÞÖÆ¡£
https://github.com/spring-projects/spring-boot/issues/1106
http://mp.weixin.qq.com/s/34GVlaYDOdY1OQ9eZs-iXg
ʹÓà ulimit-a ²é¿´Ã¿¸öÓû§Ä¬ÈÏ´ò¿ªµÄÎļþÊýΪ 1024¡£
ÔÚϵͳÎļþ /etc/security/limits.conf ÖÐÐÞ¸ÄÕâ¸öÊýÁ¿ÏÞÖÆ£¬ÔÚÎļþÖмÓÈëÒÔÏÂÄÚÈÝ, ¼´¿É½â¾öÎÊÌâ¡£
- soft nofile 65536
- hard nofile 65536
|