Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
´óÊý¾Ý²éѯ¡ª¡ªHBase¶ÁдÉè¼ÆÓëʵ¼ù
 
  2657  次浏览      27
 2018-1-10 
 
±à¼­ÍƼö:

±¾ÎÄÀ´×ÔÓÚ InfoQ£¬Õâ¸öÏîÄ¿Ö÷Òª½â¾ö check ºÍ opinion2 ÕÅÀúÊ·Êý¾Ý±í£¨ÀúÊ·Êý¾ÝÊÇÖ¸µ±ÒµÎñ·¢Éú¹ý³ÌÖеÄÍêÕûÖмäÁ÷³ÌºÍ½á¹ûÊý¾Ý£©µÄÔÚÏß²éѯ¡£

±³¾°½éÉÜ

±¾ÏîÄ¿Ö÷Òª½â¾ö check ºÍ opinion2 ÕÅÀúÊ·Êý¾Ý±í£¨ÀúÊ·Êý¾ÝÊÇÖ¸µ±ÒµÎñ·¢Éú¹ý³ÌÖеÄÍêÕûÖмäÁ÷³ÌºÍ½á¹ûÊý¾Ý£©µÄÔÚÏß²éѯ¡£Ô­ÊµÏÖ»ùÓÚ Oracle Ìṩ´æ´¢²éѯ·þÎñ£¬Ëæ×ÅÊý¾ÝÁ¿µÄ²»¶ÏÔö¼Ó£¬ÔÚдÈëºÍ¶ÁÈ¡¹ý³ÌÖÐÃæÁÙÐÔÄÜÎÊÌ⣬ÇÒÀúÊ·Êý¾Ý½ö¹©ÒµÎñ²éѯ²Î¿¼£¬²¢²»Ó°Ïìʵ¼ÊÁ÷³Ì£¬´Óϵͳ½á¹¹ÉÏÀ´Ëµ£¬·ÅÔÚÒµÎñÁ´ÌõÉÏÓαȽÏÖØ¡£±¾ÏîÄ¿½«ÆäÖÃÓÚÏÂÓÎÊý¾Ý´¦Àí Hadoop ·Ö²¼Ê½Æ½Ì¨À´ÊµÏÖ´ËÐèÇó¡£ÏÂÃæÁÐһЩ¾ßÌåµÄÐèÇóÖ¸±ê£º

1. Êý¾ÝÁ¿£ºÄ¿Ç° check ±íµÄÀÛ¼ÆÊý¾ÝÁ¿Îª 5000w+ ÐУ¬11GB£»opinion ±íµÄÀÛ¼ÆÊý¾ÝÁ¿Îª 3 ÒÚ +£¬Ô¼ 100GB¡£Ã¿ÈÕÔöÁ¿Ô¼ÎªÃ¿Õűí 50 Íò + ÐУ¬Ö»×ö insert£¬²»×ö update¡£

2. ²éѯҪÇó£ºcheck ±íµÄÖ÷¼üΪ id£¨Oracle È«¾Ö id£©£¬²éѯ¼üΪ check_id£¬Ò»¸ö check_id ¶ÔÓ¦¶àÌõ¼Ç¼£¬ËùÒÔÐè·µ»Ø¶ÔÓ¦¼Ç¼µÄ list£» opinion ±íµÄÖ÷¼üÒ²ÊÇ id£¬²éѯ¼üÊÇ bussiness_no ºÍ buss_type£¬Í¬Àí·µ»Ø list¡£µ¥±Ê²éѯ·µ»Ø List ´óСԼ 50 ÌõÒÔÏ£¬²éѯƵÂÊΪ 100 ±Ê / Ìì×óÓÒ£¬²éѯÏìӦʱ¼ä 2s¡£

¼¼ÊõÑ¡ÐÍ

´ÓÊý¾ÝÁ¿¼°²éѯҪÇóÀ´¿´£¬·Ö²¼Ê½Æ½Ì¨ÉϾ߱¸´óÊý¾ÝÁ¿´æ´¢£¬ÇÒÌṩʵʱ²éѯÄÜÁ¦µÄ×é¼þÊ×Ñ¡ HBase¡£¸ù¾ÝÐèÇó×öÁ˳õ²½µÄµ÷ÑÐºÍÆÀ¹Àºó£¬´óÖÂÈ·¶¨ HBase ×÷ΪÖ÷Òª´æ´¢×é¼þ¡£½«ÐèÇó²ð½âΪдÈëºÍ¶ÁÈ¡ HBase Á½²¿·Ö¡£

¶ÁÈ¡ HBase Ïà¶ÔÀ´Ëµ·½°¸±È½ÏÈ·¶¨£¬»ù±¾¸ù¾ÝÐèÇóÉè¼Æ RowKey£¬È»ºó¸ù¾Ý HBase ÌṩµÄ·á¸» API£¨get£¬scan µÈ£©À´¶ÁÈ¡Êý¾Ý£¬Âú×ãÐÔÄÜÒªÇó¼´¿É¡£

дÈë HBase µÄ·½·¨´óÖÂÓÐÒÔϼ¸ÖÖ£º

1. Java µ÷Óà HBase Ô­Éú API£¬HTable.add(List(Put))¡£

2. MapReduce ×÷Òµ£¬Ê¹Óà TableOutputFormat ×÷ΪÊä³ö¡£

3. Bulk Load£¬ÏȽ«Êý¾Ý°´ÕÕ HBase µÄÄÚ²¿Êý¾Ý¸ñʽÉú³É³Ö¾Ã»¯µÄ HFile Îļþ£¬È»ºó¸´ÖƵ½ºÏÊʵÄλÖò¢Í¨Öª RegionServer £¬¼´Íê³Éº£Á¿Êý¾ÝµÄÈë¿â¡£ÆäÖÐÉú³É Hfile ÕâÒ»²½¿ÉÒÔÑ¡Ôñ MapReduce »ò Spark¡£

±¾ÎIJÉÓÃµÚ 3 ÖÖ·½Ê½£¬Spark + Bulk Load дÈë HBase¡£¸Ã·½·¨Ïà¶ÔÆäËû 2 ÖÖ·½Ê½ÓÐÒÔÏÂÓÅÊÆ£º

1. BulkLoad ²»»áд WAL£¬Ò²²»»á²úÉú flush ÒÔ¼° split¡£

2. Èç¹ûÎÒÃÇ´óÁ¿µ÷Óà PUT ½Ó¿Ú²åÈëÊý¾Ý£¬¿ÉÄܻᵼÖ´óÁ¿µÄ GC ²Ù×÷¡£³ýÁËÓ°ÏìÐÔÄÜÖ®Í⣬ÑÏÖØÊ±ÉõÖÁ¿ÉÄÜ»á¶Ô HBase ½ÚµãµÄÎȶ¨ÐÔÔì³ÉÓ°Ï죬²ÉÓà BulkLoad Î޴˹ËÂÇ¡£

3. ¹ý³ÌÖÐûÓдóÁ¿µÄ½Ó¿Úµ÷ÓÃÏûºÄÐÔÄÜ¡£

4. ¿ÉÒÔÀûÓà Spark Ç¿´óµÄ¼ÆËãÄÜÁ¦¡£

ͼʾÈçÏ£º

Éè¼Æ

»·¾³ÐÅÏ¢

Hadoop 2.5-2.7

HBase 0.98.6

Spark 2.0.0-2.1.1

Sqoop 1.4.6

±íÉè¼Æ

±¾¶ÎµÄÖØµãÔÚÓÚÌÖÂÛ HBase ±íµÄÉè¼Æ£¬ÆäÖÐ RowKey ÊÇ×îÖØÒªµÄ²¿·Ö¡£ÎªÁË·½±ã˵Ã÷ÎÊÌ⣬ÎÒÃÇÏÈÀ´¿´¿´Êý¾Ý¸ñʽ¡£ÒÔÏÂÒÔ check ¾ÙÀý£¬opinion ͬÀí¡£

check ±í£¨Ô­±í×Ö¶ÎÓÐ 18 ¸ö£¬Îª·½±ãÃèÊö£¬±¾ÎĽØÑ¡ 5 ¸ö×Ö¶ÎʾÒ⣩

ÈçÉÏͼËùʾ£¬Ö÷¼üΪ id£¬32 λ×ÖĸºÍÊý×ÖËæ»ú×é³É£¬ÒµÎñ²éѯ×Ö¶Î check_id Ϊ²»¶¨³¤×ֶΣ¨²»³¬¹ý 32 룩£¬×ÖĸºÍÊý×Ö×é³É£¬Í¬Ò» check_id ¿ÉÄܶÔÓ¦¶àÌõ¼Ç¼£¬ÆäËûΪÏà¹ØÒµÎñ×ֶΡ£ÖÚËùÖÜÖª£¬HBase ÊÇ»ùÓÚ RowKey Ìṩ²éѯ£¬ÇÒÒªÇó RowKey ÊÇΨһµÄ¡£RowKey µÄÉè¼ÆÖ÷Òª¿¼ÂǵÄÊÇÊý¾Ý½«ÔõÑù±»·ÃÎÊ¡£³õ²½À´¿´£¬ÎÒÃÇÓÐ 2 ÖÖÉè¼Æ·½·¨¡£

1. ²ð³É 2 ÕÅ±í£¬Ò»Õűí id ×÷Ϊ RowKey£¬ÁÐΪ check ±í¶ÔÓ¦µÄ¸÷ÁУ»ÁíÒ»ÕűíΪË÷Òý±í£¬RowKey Ϊ check_id£¬Ã¿Ò»ÁжÔÓ¦Ò»¸ö id¡£²éѯʱ£¬ÏÈÕÒµ½ check_id ¶ÔÓ¦µÄ id list£¬È»ºó¸ù¾Ý id ÕÒµ½¶ÔÓ¦µÄ¼Ç¼¡£¾ùΪ HBase µÄ get ²Ù×÷¡£

2. ½«±¾ÐèÇó¿É¿´³ÉÊÇÒ»¸ö·¶Î§²éѯ£¬¶ø²»Êǵ¥Ìõ²éѯ¡£½« check_id ×÷Ϊ RowKey µÄǰ׺£¬ºóÃæ¸ú id¡£²éѯʱÉèÖà Scan µÄ startRow ºÍ stopRow£¬ÕÒµ½¶ÔÓ¦µÄ¼Ç¼ list¡£

µÚÒ»ÖÖ·½·¨ÓŵãÊDZí½á¹¹¼òµ¥£¬RowKey ÈÝÒ×Éè¼Æ£¬È±µãΪ 1£©Êý¾ÝдÈëʱ£¬Ò»ÐÐԭʼÊý¾ÝÐèҪдÈëµ½ 2 ÕÅ±í£¬ÇÒË÷Òý±íдÈëǰÐèÒªÏÈɨÃè¸Ã RowKey ÊÇ·ñ´æÔÚ£¬Èç¹û´æÔÚ£¬Ôò¼ÓÈëÒ»ÁУ¬·ñÔòн¨Ò»ÐУ¬2£©¶ÁÈ¡µÄʱºò£¬¼´±ãÊDzÉÓà List, Ò²ÖÁÉÙÐèÒª¶ÁÈ¡ 2 ´Î±í¡£µÚ¶þÖÖÉè¼Æ·½·¨£¬RowKey Éè¼Æ½ÏΪ¸´ÔÓ£¬µ«ÊÇдÈëºÍ¶ÁÈ¡¶¼ÊÇÒ»´ÎÐԵġ£×ۺϿ¼ÂÇ£¬ÎÒÃDzÉÓõڶþÖÖÉè¼Æ·½·¨¡£

RowKey Éè¼Æ

ÈȵãÎÊÌâ

HBase ÖеÄÐÐÊÇÒÔ RowKey µÄ×ÖµäÐòÅÅÐòµÄ£¬ÆäÈȵãÎÊÌâͨ³£·¢ÉúÔÚ´óÁ¿µÄ¿Í»§¶ËÖ±½Ó·ÃÎʼ¯ÈºµÄÒ»¸ö»ò¼«ÉÙÊý½Úµã¡£Ä¬ÈÏÇé¿öÏ£¬ÔÚ¿ªÊ¼½¨±íʱ£¬±íÖ»»áÓÐÒ»¸ö region£¬²¢Ëæ×Å region Ôö´ó¶ø²ð·Ö³É¸ü¶àµÄ region£¬ÕâЩ region ²ÅÄÜ·Ö²¼ÔÚ¶à¸ö regionserver ÉÏ´Ó¶øÊ¹¸ºÔؾù·Ö¡£¶ÔÓÚÎÒÃǵÄÒµÎñÐèÇ󣬴æÁ¿Êý¾ÝÒѾ­½Ï´ó£¬Òò´ËÓбØÒªÔÚÒ»¿ªÊ¼¾Í½« HBase µÄ¸ºÔؾù̯µ½Ã¿¸ö regionserver£¬¼´×ö pre-split¡£³£¼ûµÄ·ÀÖÎÈȵãµÄ·½·¨Îª¼ÓÑΣ¬hash É¢ÁУ¬×ÔÔö²¿·Ö£¨Èçʱ¼ä´Á£©·­×ªµÈ¡£

RowKey Éè¼Æ

Step1£ºÈ·¶¨Ô¤·ÖÇøÊýÄ¿£¬´´½¨ HBase Table

²»Í¬µÄÒµÎñ³¡¾°¼°Êý¾ÝÌØµãÈ·¶¨ÊýÄ¿µÄ·½Ê½²»Ò»Ñù£¬ÎÒ¸öÈËÈÏΪӦ¸Ã×ۺϿ¼ÂÇÊý¾ÝÁ¿´óСºÍ¼¯Èº´óСµÈÒòËØ¡£±ÈÈç check ±í´óСԼΪ 11G£¬²âÊÔ¼¯Èº´óСΪ 10 ̨»úÆ÷£¬hbase.hregion.max.filesize=3G£¨µ± region µÄ´óС³¬¹ýÕâ¸öÊýʱ£¬½«²ð·ÖΪ 2 ¸ö£©£¬ËùÒÔ³õʼ»¯Ê±¾¡Á¿Ê¹µÃÒ»¸ö region µÄ´óСΪ 1~2G£¨²»»áÒ»ÉÏÀ´¾Í split£©£¬region Êý¾Ý·Öµ½ 11G/2G=6 ¸ö£¬µ«ÎªÁ˳ä·ÖÀûÓü¯Èº×ÊÔ´£¬±¾ÎÄÖÐ check ±í»®·ÖΪ 10 ¸ö·ÖÇø¡£Èç¹ûÊý¾ÝÁ¿Îª 100G£¬ÇÒ²»¶ÏÔö³¤£¬¼¯ÈºÇé¿ö²»±ä£¬Ôò region ÊýÄ¿Ôö´óµ½ 100G/2G=50 ¸ö×óÓҽϺÏÊÊ¡£Hbase check ±í½¨±íÓï¾äÈçÏ£º

create 'tinawang:check',
{ NAME => 'f', COMPRESSION => 'SNAPPY',DATA_BLOCK_ENCODING => 'FAST_DIFF',BLOOMFILTER=>'ROW'},
{SPLITS => [ '1','2','3', '4','5','6','7','8','9']}

ÆäÖУ¬Column Family =¡®f¡¯£¬Ô½¶ÌÔ½ºÃ¡£

COMPRESSION => 'SNAPPY'£¬HBase Ö§³Ö 3 ÖÖѹËõ LZO, GZIP and Snappy¡£GZIP ѹËõÂʸߣ¬µ«ÊÇºÄ CPU¡£ºóÁ½Õ߲¶à£¬Snappy ÉÔ΢ʤ³öÒ»µã£¬cpu ÏûºÄµÄ±È GZIP ÉÙ¡£Ò»°ãÔÚ IO ºÍ CPU ¾ùºâÏ£¬Ñ¡Ôñ Snappy¡£

DATA_BLOCK_ENCODING => 'FAST_DIFF'£¬±¾°¸ÀýÖÐ RowKey ½ÏΪ½Ó½ü£¬Í¨¹ýÒÔÏÂÃüÁî²é¿´ key ³¤¶ÈÏà¶Ô value ½Ï³¤¡£

>
./hbase org.apache.hadoop.hbase.io.hfile.HFile -m -f /apps/hbase/data/data/tinawang/check/ a661f0f95598662a53b3d8b1ae469fdf/f/ a5fefc880f87492d908672e1634f2eed_SeqId_2_

Step2£ºRowKey ×é³É

Salt

ÈÃÊý¾Ý¾ùºâµÄ·Ö²¼µ½¸÷¸ö Region ÉÏ£¬½áºÏ pre-split£¬ÎÒÃǶԲéѯ¼ü¼´ check ±íµÄ check_id Çó hashcode Öµ£¬È»ºó modulus(numRegions) ×÷Ϊǰ׺£¬×¢Òâ²¹ÆëÊý¾Ý¡£

StringUtils.leftPad(Integer.toString( Math.abs(check_id.hashCode() % numRegion)),1,¡¯0¡¯)

˵Ã÷£ºÈç¹ûÊý¾ÝÁ¿´ïÉϰ٠G ÒÔÉÏ£¬Ôò numRegions ×ÔÈ»µ½ 2 λÊý£¬Ôò salt ҲΪ 2 λ¡£

Hash É¢ÁÐ

ÒòΪ check_id ±¾ÉíÊDz»¶¨³¤µÄ×Ö·ûÊý×Ö´®£¬ÎªÊ¹Êý¾ÝÉ¢Áл¯£¬·½±ã RowKey ²éѯºÍ±È½Ï£¬ÎÒÃÇ¶Ô check_id ²ÉÓà SHA1 É¢Áл¯£¬²¢Ê¹Ö® 32 붨³¤»¯¡£

MD5Hash.getMD5AsHex(Bytes.toBytes(check_id))

ΨһÐÔ

ÒÔÉÏ salt+hash ×÷Ϊ RowKey ǰ׺£¬¼ÓÉÏ check ±íµÄÖ÷¼ü id À´±£ÕÏ RowKey ΨһÐÔ¡£×ÛÉÏ£¬check ±íµÄ RowKey Éè¼ÆÈçÏ£º£¨check_id=A208849559£©

ΪÔöÇ¿¿É¶ÁÐÔ£¬Öм仹¿ÉÒÔ¼ÓÉÏ×Ô¶¨ÒåµÄ·Ö¸î·û£¬È硯+¡¯,¡¯|¡¯µÈ¡£

7+7c9498b4a83974da56b252122b9752bf+ 56B63AB98C2E00B4E053C501380709AD

ÒÔÉÏÉè¼ÆÄܱ£Ö¤¶Ôÿ´Î²éѯ¶øÑÔ£¬Æä salt+hash ǰ׺ֵÊÇÈ·¶¨µÄ£¬²¢ÇÒÂäÔÚͬһ¸ö region ÖС£ÐèҪ˵Ã÷µÄÊÇ HBase ÖÐ check ±íµÄ¸÷ÁÐͬÊý¾ÝÔ´ Oracle ÖÐ check ±íµÄ¸÷Áд洢¡£

WEB ²éѯÉè¼Æ

RowKey Éè¼ÆÓë²éѯϢϢÏà¹Ø£¬²éѯ·½Ê½¾ö¶¨ RowKey Éè¼Æ£¬·´Ö®»ùÓÚÒÔÉÏ RowKey Éè¼Æ£¬²éѯʱͨ¹ýÉèÖà Scan µÄ [startRow£¬stopRow], ¼´¿ÉÍê³ÉɨÃè¡£ÒÔ²éѯ check_id=A208849559 ΪÀý£¬¸ù¾Ý RowKey µÄÉè¼ÆÔ­Ôò£¬¶ÔÆä½øÐÐ salt+hash ¼ÆË㣬µÃǰ׺¡£

startRow = 7+ 7c9498b4a83974da56b252122b9752bf
stopRow = 7+ 7c9498b4a83974da56b252122b9752bg

´úÂëʵÏֹؼüÁ÷³Ì

Spark write to HBase

Step0: prepare work

ÒòΪÊÇ´ÓÉÏÓÎϵͳ³Ð½ÓµÄÒµÎñÊý¾Ý£¬´æÁ¿Êý¾Ý²ÉÓà sqoop ³éµ½ hdfs£»ÔöÁ¿Êý¾ÝÿÈÕÒÔÎļþµÄÐÎʽ´Ó ftp Õ¾µã»ñÈ¡¡£ÒòΪҵÎñÊý¾Ý×Ö¶ÎÖаüº¬Ò»Ð©»»Ðзû£¬ÇÒ sqoop1.4.6 Ŀǰֻ֧³Öµ¥×Ö½Ú£¬ËùÒÔ±¾ÎÄÑ¡Ôñ¡¯0x01¡¯×÷ΪÁзָô·û£¬¡¯0x10¡¯×÷ΪÐзָô·û¡£

Step1: Spark read hdfs text file

SparkContext.textfile() ĬÈÏÐзָô·ûΪ¡±\n¡±£¬´Ë´¦ÎÒÃÇÓá°0x10¡±£¬ÐèÒªÔÚ Configuration ÖÐÅäÖá£Ó¦ÓÃÅäÖã¬ÎÒÃǵ÷Óà newAPIHadoopFile ·½·¨À´¶ÁÈ¡ hdfs Îļþ£¬·µ»Ø JavaPairRDD£¬ÆäÖÐ LongWritable ºÍ Text ·Ö±ðΪ Hadoop ÖÐµÄ Long ÀàÐÍºÍ String ÀàÐÍ£¨ËùÓÐ Hadoop Êý¾ÝÀàÐÍºÍ java µÄÊý¾ÝÀàÐͶ¼ºÜÏàÏñ£¬³ýÁËËüÃÇÊÇÕë¶ÔÍøÂçÐòÁл¯¶ø×öµÄÌØÊâÓÅ»¯£©¡£ÎÒÃÇÐèÒªµÄÊý¾ÝÎļþ·ÅÔÚ pairRDD µÄ value ÖУ¬¼´ Text Ö¸´ú¡£ÎªºóÐø´¦Àí·½±ã£¬¿É½« JavaPairRDDת»»Îª JavaRDD< String >¡£

Step2: Transfer and sort RDD

¢Ù ½« avaRDD< String>ת»»³É JavaPairRDD<tuple2,String>£¬ÆäÖвÎÊýÒÀ´Î±íʾΪ£¬RowKey£¬col£¬value¡£×öÕâÑùת»»ÊÇÒòΪ HBase µÄ»ù±¾Ô­ÀíÊÇ»ùÓÚ RowKey ÅÅÐòµÄ£¬²¢ÇÒµ±²ÉÓà bulk load ·½Ê½½«Êý¾ÝдÈë¶à¸öÔ¤·ÖÇø£¨region£©Ê±£¬ÒªÇó Spark ¸÷ partition µÄÊý¾ÝÊÇÓÐÐòµÄ£¬RowKey£¬column family£¨cf£©£¬col name ¾ùÐèÒªÓÐÐò¡£ÔÚ±¾°¸ÀýÖÐÒòΪֻÓÐÒ»¸öÁдأ¬ËùÒÔ½« RowKey ºÍ col name ×éÖ¯³öÀ´Îª Tuple2¸ñʽµÄ key¡£Çë×¢ÒâÔ­±¾Êý¾Ý¿âÖеÄÒ»ÐмǼ£¨n ¸ö×ֶΣ©£¬´Ëʱ»á±»²ð³É n ÐС£

¢Ú »ùÓÚ JavaPairRDD<tuple2,String>½øÐÐ RowKey£¬col µÄ¶þ´ÎÅÅÐò¡£Èç¹û²»×öÅÅÐò£¬»á±¨ÒÔÏÂÒì³££º

java.io.IOException: Added a key notlexically larger than previous key

¢Û ½«Êý¾Ý×éÖ¯³É HFile ÒªÇóµÄ JavaPairRDDhfileRDD¡£

Step3£ºcreate hfile and bulk load to HBase

¢ÙÖ÷Òªµ÷Óà saveAsNewAPIHadoopFile ·½·¨£º

hfileRdd.saveAsNewAPIHadoopFile( hfilePath,ImmutableBytesWritable.class,
KeyValue.class,HFileOutputFormat2.class,config);

¢Ú hfilebulk load to HBase

final Job job = Job.getInstance();
job.setMapOutputKeyClass( ImmutableBytesWritable.class);
job.setMapOutputValueClass( KeyValue.class);
HFileOutputFormat2.configureIncrementalLoad(job,htable);
LoadIncrementalHFiles bulkLoader = newLoadIncrementalHFiles(config);
bulkLoader.doBulkLoad( newPath(hfilePath),htable);

×¢£ºÈç¹û¼¯Èº¿ªÆôÁË kerberos£¬step4 ÐèÒª·ÅÖÃÔÚ ugi.doAs£¨£©·½·¨ÖУ¬ÔÚ½øÐÐÈçÏÂÑéÖ¤ºóʵÏÖ

UserGroupInformation ugi = UserGroupInformation. loginUserFromKeytabAndReturnUGI( keyUser,keytabPath);
UserGroupInformation.setLoginUser( ugi );

·ÃÎÊ HBase ¼¯ÈºµÄ 60010 ¶Ë¿Ú web£¬¿ÉÒÔ¿´µ½ region ·Ö²¼Çé¿ö¡£

Read from HBase

±¾ÎÄ»ùÓÚ spring boot ¿ò¼ÜÀ´¿ª·¢ web ¶Ë·ÃÎÊ HBase ÄÚÊý¾Ý¡£

use connection pool(ʹÓÃÁ¬½Ó³Ø)

´´½¨Á¬½ÓÊÇÒ»¸ö±È½ÏÖØµÄ²Ù×÷£¬ÔÚʵ¼Ê HBase ¹¤³ÌÖУ¬ÎÒÃÇÒýÈëÁ¬½Ó³ØÀ´¹²Ïí zk Á¬½Ó£¬meta ÐÅÏ¢»º´æ£¬region server ºÍ master µÄÁ¬½Ó¡£

HConnection connection = HConnectionManager.createConnection( config);
HTableInterface table = connection.getTable("table1");
try {
// Use the table as needed, for a single operation and a single thread
} finally {
table.close();
}

Ò²¿ÉÒÔͨ¹ýÒÔÏ·½·¨£¬¸²¸ÇĬÈÏÏ̳߳ء£

HConnection createConnection( org.apache.hadoop.conf.Configuration conf,ExecutorService pool);

process query

Step1: ¸ù¾Ý²éѯÌõ¼þ£¬È·¶¨ RowKey ǰ׺

¸ù¾Ý 3.3 RowKey Éè¼Æ½éÉÜ£¬HBase µÄдºÍ¶Á¶¼×ñÑ­¸ÃÉè¼Æ¹æÔò¡£´Ë´¦ÎÒÃDzÉÓÃÏàͬµÄ·½·¨£¬½« web µ÷Ó÷½´«ÈëµÄ²éѯÌõ¼þ£¬×ª»¯³É¶ÔÓ¦µÄ RowKey ǰ׺¡£ÀýÈ磬²éѯ check ±í´«µÝ¹ýÀ´µÄ check_id=A208849559£¬Éú³Éǰ׺ 7+7c9498b4a83974da56b252122b9752bf¡£

Step2£ºÈ·¶¨ scan ·¶Î§

A208849559 ¶ÔÓ¦µÄ²éѯ½á¹ûÊý¾Ý¼´ÔÚ RowKey ǰ׺Ϊ 7+7c9498b4a83974da56b252122b9752bf ¶ÔÓ¦µÄ RowKey ¼° value ÖС£

scan.setStartRow( Bytes.toBytes(rowkey_pre)); //scan, 7+7c9498b4a83974da56b252122b9752bf
byte[] stopRow = Bytes.toBytes(rowkey_pre);
stopRow[stopRow.length-1]++;
scan.setStopRow(stopRow);// 7+7c9498b4a83974da56b252122b9752bg

Step3£º²éѯ½á¹û×é³É·µ»Ø¶ÔÏó

±éÀú ResultScanner ¶ÔÏ󣬽«Ã¿Ò»ÐжÔÓ¦µÄÊý¾Ý·â×°³É table entity£¬×é³É list ·µ»Ø¡£

²âÊÔ

´ÓԭʼÊý¾ÝÖÐËæ»úץȡ 1000 ¸ö check_id£¬ÓÃÓÚÄ£Äâ²âÊÔ£¬Á¬Ðø·¢Æð 3 ´ÎÇëÇóÊýΪ 2000£¨200 ¸öÏ̲߳¢·¢£¬Ñ­»· 10 ´Î£©£¬Æ½¾ùÏìӦʱ¼äΪ 51ms£¬´íÎóÂÊΪ 0¡£

ÈçÉÏͼ£¬¾­Àú N ´ÎÀۼƲâÊԺ󣬸÷¸ö region É쵀 Requests Êý½ÏΪ½Ó½ü£¬·ûºÏ¸ºÔؾùºâÉè¼ÆÖ®³õ¡£

²È¿Ó¼Ç¼

1¡¢kerberos ÈÏÖ¤ÎÊÌâ

Èç¹û¼¯Èº¿ªÆôÁ˰²È«ÈÏÖ¤£¬ÄÇôÔÚ½øÐÐ Spark Ìá½»×÷ÒµÒÔ¼°·ÃÎÊ HBase ʱ£¬¾ùÐèÒª½øÐÐ kerberos ÈÏÖ¤¡£

±¾ÎIJÉÓà yarn cluster ģʽ£¬ÏñÌá½»ÆÕͨ×÷ÒµÒ»Ñù£¬¿ÉÄܻᱨÒÔÏ´íÎó¡£

ERROR StartApp: job failure,
java.lang.NullPointerException
at com.tinawang.spark.hbase.utils.HbaseKerberos.<init>( HbaseKerberos.java:18)
at com.tinawang.spark.hbase.job.SparkWriteHbaseJob.run( SparkWriteHbaseJob.java:60)

¶¨Î»µ½ HbaseKerberos.java:18£¬´úÂëÈçÏ£º

this.keytabPath = (Thread.currentThread().getContextClassLoader( ).getResource(prop.getProperty("hbase.keytab") )).getPath();

ÕâÊÇÒòΪ executor ÔÚ½øÐÐ HBase Á¬½Óʱ£¬ÐèÒªÖØÐÂÈÏÖ¤£¬Í¨¹ý --keytab ÉÏ´«µÄ tina.keytab ²¢Î´±» HBase ÈÏÖ¤³ÌÐò¿é»ñÈ¡µ½£¬ËùÒÔÈÏÖ¤µÄ keytab ÎļþÐèÒªÁíÍâͨ¹ý --files ÉÏ´«¡£Ê¾ÒâÈçÏÂ

--keytab /path/tina.keytab \
--principal tina@GNUHPC.ORG \
--files "/path/tina.keytab.hbase"

ÆäÖÐ tina.keytab.hbase Êǽ« tina.keytab ¸´ÖƲ¢ÖØÃüÃû¶øµÃ¡£ÒòΪ Spark ²»ÔÊÐíͬһ¸öÎļþÖØ¸´ÉÏ´«¡£

2¡¢ÐòÁл¯

org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$. ensureSerializable( ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$. org$apache$spark$util$ClosureCleaner$$clean( ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean( ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean( SparkContext.scala:2101)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply( RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply( RDD.scala:369)
...
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run( ApplicationMaster.scala:637)
Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: org.apache.spark.api.java.JavaSparkContext@24a16d8c)
- field (class: com.tinawang.spark.hbase.processor.SparkReadFileRDD, name: sc, type: class org.apache.spark.api.java.JavaSparkContext)
...

½â¾ö·½·¨Ò»£º

Èç¹û sc ×÷ΪÀàµÄ³ÉÔ±±äÁ¿£¬ÔÚ·½·¨Öб»ÒýÓã¬Ôò¼Ó transient ¹Ø¼ü×Ö£¬Ê¹Æä²»±»ÐòÁл¯¡£

private transient JavaSparkContext sc;

½â¾ö·½·¨¶þ£º

½« sc ×÷Ϊ·½·¨²ÎÊý´«µÝ£¬Í¬Ê±Ê¹Éæ¼° RDD ²Ù×÷µÄÀà implements Serializable¡£ ´úÂëÖвÉÓõڶþÖÖ·½·¨¡£Ïê¼û´úÂë¡£

3¡¢ÅúÁ¿ÇëÇó²âÊÔ

Exception in thread "http-nio-8091-Acceptor-0" java.lang.NoClassDefFoundError: org/apache/tomcat/util/ExceptionUtils

»òÕß

Exception in thread "http-nio-8091-exec-34" java.lang.NoClassDefFoundError: ch/qos/logback/classic/spi/ThrowableProxy

²é¿´ÏÂÃæ issue ÒÔ¼°Ò»´ÎÅŲéÎÊÌâµÄ¹ý³Ì£¬¿ÉÄÜÊÇ open file ³¬¹ýÏÞÖÆ¡£

https://github.com/spring-projects/spring-boot/issues/1106

http://mp.weixin.qq.com/s/34GVlaYDOdY1OQ9eZs-iXg

ʹÓà ulimit-a ²é¿´Ã¿¸öÓû§Ä¬ÈÏ´ò¿ªµÄÎļþÊýΪ 1024¡£

ÔÚϵͳÎļþ /etc/security/limits.conf ÖÐÐÞ¸ÄÕâ¸öÊýÁ¿ÏÞÖÆ£¬ÔÚÎļþÖмÓÈëÒÔÏÂÄÚÈÝ, ¼´¿É½â¾öÎÊÌâ¡£

  • soft nofile 65536
  • hard nofile 65536
   
2657 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ