±à¼ÍƼö: |
±¾ÎÄÖ÷ÒªÏò¶ÁÕßÃǽéÉÜSequoiaDB£¨·Ö²¼Ê½´æ´¢£©ºÍSpark£¨·Ö²¼Ê½¼ÆË㣩Á½¿î²úÆ·µÄ¶Ô½ÓʹÓã¬ÒÔ¼°½éÉÜÔÚº£Á¿Êý¾Ý³¡¾°ÏÂÈçºÎÌá¸ßͳ¼Æ·ÖÎöÐÔÄÜ¡£
±¾ÎÄÀ´×ÔSequoiaDB£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼¡¢ÍƼö¡£ |
|
ǰÑÔ
ÔÚµ±Ç°ÆóÒµÉú²úÊý¾ÝÅòÕ͵Äʱ´ú£¬Êý¾Ý¼´Ê¹ÆóÒµµÄ¼ÛÖµËùÔÚ£¬Ò²ÊÇÒ»¼ÒÆóÒµµÄ¼¼ÊõÌôÕ½ËùÔÚ¡£ËùÒÔÔÚº£Á¿Êý¾Ý´¦Àí³¡¾°ÉÏ£¬ÈËÃÇÒâʶµ½µ¥»ú¼ÆËãÄÜÁ¦ÔÙǿҲÎÞ·¨Âú×ãÈÕÒæÔö³¤µÄÊý¾Ý´¦ÀíÐèÇ󣬷ֲ¼Ê½²ÅÊǽâ¾ö¸ÃÀàÎÊÌâµÄ¸ù±¾½â¾ö·½°¸¡£
¶øÔÚ·Ö²¼Ê½ÁìÓò£¬ÓÐÁ½Àà²úÆ·ÊÇÖÁ¹ØÖØÒªµÄ£¬·Ö±ð·Ö²¼Ê½´æ´¢ºÍ·Ö²¼Ê½¼ÆË㣬Óû§Ö»Óн«Á½ÕßµÄÌØÐÔ³ä·ÖÀûÓ㬲ſÉÒÔÕæÕý·¢»Ó·Ö²¼Ê½¼Ü¹¹µÄ´æ´¢ºÍ¼ÆËãÄÜÁ¦¡£
SequoiaDB½éÉÜ
SequoiaDBÊǹúÄÚΪÊý²»¶àµÄ×ÔÖ÷Ñз¢µÄ·Ö²¼Ê½Êý¾Ý¿â£¬ÌصãÊÇͬʱ֧³ÖÎĵµ´æ´¢ºÍ¿é´æ´¢£¬Ö§³Ö±ê×¼SQLºÍÊÂÎñ¹¦ÄÜ£¬Ö§³Ö¸´ÔÓË÷Òý²éѯ¡¢ÓëHadoop¡¢Hive¡¢Spark¶¼ÓнÏÉî¶ÈµÄ¼¯³É¡£Ä¿Ç°SequoiaDBÒѾÔÚGithub¿ªÔ´¡£
SequoiaDBÔÚ·Ö²¼Ê½´æ´¢¹¦ÄÜÉÏ£¬½ÏÒ»°ãµÄ´óÊý¾Ý²úÆ·Ìṩ¸ü¶àµÄÊý¾ÝÇзֹæÔò£¬°üÀ¨£ºË®Æ½Çз֡¢·¶Î§Çз֡¢¶àά·ÖÇø£¨ÀàËÆpartition·ÖÇø£©ºÍ¶àάÇзַ½Ê½£¬Óû§¿ÉÒÔ¸ù¾Ý²»Óõij¡¾°Ñ¡ÔñÏàÓ¦µÄÇзַ½Ê½£¬ÒÔÌá¸ßϵͳµÄ´æ´¢ÄÜÁ¦ºÍ²Ù×÷ÐÔÄÜ¡£
Spark ½éÉÜ
Spark ½üÄêÀ´·¢Õ¹ÌرðѸÃÍ£¬ÌرðÔÚÕýʽ·¢²¼Spark 1.0
°æ±¾ºó£¬µÃµ½ÁËÖÚ¶à¹è¹È¾ÞÍ·Ö§³Ö£¬ÀýÈ磺Cloudera¡¢IBM¡¢Hortonworks¡¢IntelµÈ£¬¶øÇÒÔÚSpark
2.0Ðû²¼Ö§³ÖTPC-DS99ºó£¬Ê¹ÓÃSparkSQL×ö´óÊý¾Ý´¦ÀíºÍ·ÖÎöµÄ¿ª·¢ÕßÔ½À´Ô½¶à£¬¿ÉÒÔÔ¤¼û£¬Spark½«»á³ÉΪ¼ÌHadoopÖ®ºó×îÖØÒªºÍÁ÷Ðеķֲ¼Ê½¼ÆËã¿ò¼Ü¡£
SparkSQL½éÉÜ
SparkSQLÊÇSpark²úÆ·ÖÐÒ»¸ö×é³É²¿·Ö£¬SQLµÄÖ´ÐÐÒýÇæÊ¹ÓÃSparkµÄRDDºÍDataframeʵÏÖ¡£Ä¿Ç°SparkSQLÒѾ¿ÉÒÔÍêÕûÔËÐÐTPC-DS99²âÊÔ£¬±êÖ¾×ÅSparkSQLÔÚÊý¾Ý·ÖÎöºÍÊý¾Ý´¦Àí³¡¾°Éϼ¼Êõ½øÒ»²½³ÉÊì¡£
SparkSQLºÍÁíÍâÒ»¿îÁ÷ÐеĴóÊý¾ÝSQL²úÆ·--HiveÓÐÏàËÆÖ®´¦£¬ÀýÈçÁ½Õß¶¼Ê¹ÓÃThriftserver×÷ΪJDBC·þÎñ£¬Á½¸ö²úÆ·¶¼Ê¹ÓÃÏàͬµÄmetadata´úÂ루ʵ¼ÊÉÏSparkSQL¸´ÓÃÁËHiveµÄmetadata´úÂ룩¡£µ«ÊÇÁ½¿î²úÆ·»¹ÊÇÓб¾ÖÊÉϵÄÇø±ð£¬×î´óµÄ²»Í¬µãÔÚÓÚÖ´ÐÐÒýÇæ£¬HiveĬÈÏÖ§³ÖHadoopºÍTez¼ÆËã¿ò¼Ü£¬¶øSparkSQLÖ»Ö§³ÖSpark
RDD¼ÆËã¿ò¼Ü£¬µ«ÊÇSparkSQLµÄÓµÓиü¼ÓÉî¶ÈµÄÖ´Ðмƻ®ÓÅ»¯ºÍ´¦ÀíÒýÇæÓÅ»¯¡£
SparkSQLÓëSequoiaDBÕûºÏ
ÔÀí½éÉÜ
Á˽âSpark¼¼ÊõÔÀíµÄ¶ÁÕßÃÇÓ¦¸ÃÇå³þ£¬Spark±¾ÉíÊÇÒ»¿î·Ö²¼Ê½¼ÆËã¿ò¼Ü¡£Ëü²»ÏñHadoopÒ»Ñù£¬Í¬Ê±Îª¿ª·¢ÕßÌṩ·Ö²¼Ê½¼ÆËãºÍ·Ö²¼Ê½´æ´¢£¬¶øÊÇ¿ª·ÅÁË´æ´¢²ãµÄ¿ª·¢½Ó¿Ú£¬Ö»Òª¿ª·¢Õß°´ÕÕSparkµÄ½Ó¿Ú¹æ·¶ÊµÏÖÁ˽ӿڷ½·¨£¬Èκδ洢²úÆ·¶¼¿ÉÒÔ³ÉΪSparkÊý¾Ý¼ÆËãµÄÀ´Ô´£¬Í¬Ê±Ò²°üÀ¨SparkSQLµÄÊý¾ÝÀ´Ô´¡£
SequoiaDBÊÇÒ»¿î·Ö²¼Ê½Êý¾Ý¿â£¬Äܹ»ÎªÓû§´æ´¢º£Á¿µÄÊý¾Ý£¬µ«ÊÇÈç¹ûÒª¶Ôº£Á¿Êý¾Ý×öͳ¼Æ¡¢·ÖÎö£¬»¹ÊÇÐèÒª½èÖú·Ö²¼Ê½¼ÆËã¿ò¼ÜµÄ²¢·¢¼ÆËãÐÔÄÜ£¬Ìá¸ß¼ÆËãЧÂÊ¡£
ËùÒÔSequoiaDBΪSpark¿ª·¢ÁËSequoiaDB for
SparkµÄÁ¬½ÓÆ÷£¬ÈÃSparkÖ§³Ö´ÓSequoiaDBÖв¢·¢»ñÈ¡Êý¾Ý£¬ÔÙÍê³ÉÏàÓ¦µÄÊý¾Ý¼ÆËã¡£
¶Ô½Ó·½Ê½
SparkºÍSequoiaDB¶Ô½Ó·½Ê½±È½Ï¼òµ¥£¬Óû§Ö»Òª½«SequoiaDB for Spark
Á¬½ÓÆ÷spark-sequoiadb.jar ºÍSequoiaDBµÄjava Çý¶¯sequoiadb.jar
¼ÓÈ뵽ÿ¸öSpark WorkerµÄCLASSPATH Öм´¿É¡£
ÀýÈ磬Óû§Ï£ÍûSparkSQL¶Ô½Óµ½SequoiaDB£¬¿ÉÒÔΪspark-env.sh ÅäÖÃÎļþÖÐÔö¼ÓSPARK_CLASSPATH²ÎÊý£¬Èç¹û¸Ã²ÎÊýÒѾ´æÔÚ£¬Ôò½«ÐÂjar
°üÌí¼Óµ½SPARK_CLASSPATH ²ÎÊýÉÏ£¬È磺
SPARK_CLASSPATH="/media/psf/mnt/sequoiadb-driver-2.9.0-SNAPSHOT.jar:/media/psf/mnt/spark-sequoiadb_2.11-2.9.0-SNAPSHOT.jar"
Óû§ÐÞ¸ÄÍêspark-env.sh ÅäÖúó£¬ÖØÆôspark-sql
»òÕß thriftserver ¾ÍÍê³ÉÁËSparkºÍSequoiaDBµÄ¶Ô½Ó¡£
SparkSQL+SequoiaDBÐÔÄÜÓÅ»¯
Spark SQL+SequoiaDBµÄÐÔÄÜÓÅ»¯½«»á´Óconnector
¼ÆËã¼¼ÊõÔÀí¡¢SparkSQLÓÅ»¯¡¢SequoiaDBÓÅ»¯ºÍconnector²ÎÊýÓÅ»¯4¸ö·½Ãæ½øÐнéÉÜ¡£
SequoiaDB for SparkSQL connector½éÉÜ
1£©connector¹¤×÷ÔÀí
Spark²úÆ·ËäȻΪÓû§ÌṩÁ˶àÖÖ¹¦ÄÜÄ£¿é£¬µ«ÊǶ¼Ö»ÊÇÊý¾Ý¼ÆËãµÄ¹¦ÄÜÄ£¿é¡£Spark²úÆ·±¾ÉíûÓÐÈκεĴ洢¹¦ÄÜ£¬ÔÚĬÈÏÇé¿öÏ£¬SparkÊÇ´Ó±¾µØÎļþ·þÎñÆ÷»òÕßHDFSÉ϶ÁÈ¡Êý¾Ý¡£¶øSparkÒ²½«ËüÓë´æ´¢²ãµÄ½Ó¿Ú¿ª·Å¸ø¹ã´ó¿ª·¢Õߣ¬¿ª·¢ÕßÖ»Òª°´ÕÕSpark½Ó¿Ú¹æ·¶ÊµÏÖÆä´æ´¢²ãÁ¬½ÓÆ÷£¬ÈκÎÊý¾ÝÔ´¾ù¿É³ÆÎªSpark¼ÆËãµÄÊý¾ÝÀ´Ô´¡£
ÏÂͼΪSpark workerÓë´æ´¢²ãÖÐdatanodeµÄ¹ØÏµ¡£

Spark¼ÆËã¿ò¼ÜÓë´æ´¢²ãµÄ¹ØÏµ£¬¿ÉÒÔ´ÓÏÂͼÖÐÁ˽âÆäÔÀí¡£
Spark masterÔÚ½ÓÊÕµ½Ò»¸ö¼ÆËãÈÎÎñºó£¬Ê×ÏÈ»áÓë´æ´¢²ã×öÒ»´ÎͨѶ£¬´Ó´æ´¢²ãµÄ·ÃÎÊ¿ìÕÕ»òÕßÊÇ´æ´¢¹æ»®ÖУ¬µÃµ½±¾´Î¼ÆËãÈÎÎñËùÉè¼ÆµÄËùÓÐÊý¾ÝµÄ´æ´¢Çé¿ö¡£´æ´¢²ã·µ»Ø¸øSpark
masterµÄ½á¹ûΪÊý¾Ý´æ´¢µÄpartition¶ÓÁС£
È»ºóSpark master»á½«Êý¾Ý´æ´¢µÄpartition¶ÓÁÐÖеÄpartitionÖð¸ö·ÖÅ䏸¸øSpark
worker¡£Spark workÔÚ½ÓÊÕµ½Êý¾ÝµÄpartitionÐÅÏ¢ºó£¬¾ÍÄܹ»Á˽âÈçºÎ»ñÈ¡¼ÆËãÊý¾Ý¡£È»ºóSpark
work»áÖ÷¶¯Óë´æ´¢²ãµÄnode½Úµã½øÐÐÁ¬½Ó£¬»ñÈ¡Êý¾Ý£¬ÔÙ½áºÏSpark masterÏ·¢¸øSpark
workerµÄ¼ÆËãÈÎÎñ£¬¿ªÊ¼Êý¾Ý¼ÆË㹤×÷¡£

SequoiaDB for SparkµÄÁ¬½ÓÆ÷µÄʵÏÖÔÀíºÍÉÏÊöÃèÊö»ù±¾Ò»Ö£¬Ö»ÊÇÔÚÉú³ÉÊý¾Ý¼ÆËãµÄpartitionÈÎÎñʱ£¬Á¬½ÓÆ÷»á¸ù¾ÝSparkÏÂѹµÄ²éѯÌõ¼þµ½SequoiaDBÖÐÉú³É²éѯ¼Æ»®¡£
Èç¹ûSequoiaDBÄܹ»¸ù¾Ý²éѯÌõ¼þ×öË÷ÒýɨÃ裬Á¬½ÓÆ÷Éú³ÉµÄpartitionÈÎÎñ½«ÊÇÈÃSpark
workÖ±½ÓÁ¬½ÓSequoiaDBµÄÊý¾Ý½Úµã¡£
Èç¹ûSequoiaDBÎÞ·¨¸ù¾Ý²éѯÌõ¼þ×öË÷ÒýɨÃ裬Á¬½ÓÆ÷½«»ñÈ¡Ïà¹ØÊý¾Ý±íµÄËùÓÐÊý¾Ý¿éÐÅÏ¢£¬È»ºó¸ù¾ÝpartitionblocknumºÍpartitionmaxnum²ÎÊýÉú³É°üº¬Èô¸É¸öÊý¾Ý¿éÁ¬½ÓÐÅÏ¢µÄpartititon¼ÆËãÈÎÎñ¡£
2£©connector²ÎÊý˵Ã÷
SequoiaDB for Spark Á¬½ÓÆ÷ÔÚSequoiaDB 2.10Ö®ºó½øÐÐÁËÖØ¹¹£¬Ìá¸ßSpark²¢·¢´ÓSequoiaDB»ñÈ¡Êý¾ÝµÄÐÔÄÜ£¬²ÎÊýÒ²ÓÐÏàÓ¦µÄµ÷Õû¡£
Óû§ÔÚSparkSQLÉÏ´´½¨Êý¾ÝԴΪSequoiaDBµÄtable£¬½¨±íÄ£°æÈçÏ£º
create [temporary]
<table|view> <name>[(schema)] using
com.sequoiadb.spark options (<options>); |
SparkSQL´´±íÃüÁîµÄ¹Ø¼ü×Ö½éÉÜ£º
1. temporary ¹Ø¼ü×Ö£¬´ú±í¸Ã±í»òÕßÊÓͼÊÇ·ñΪÁÚʱ´´½¨µÄ£¬Èç¹ûÓû§±ê¼ÇÁËtemporary
¹Ø¼ü×Ö£¬Ôò¸Ã±í»òÕßÊÓͼÔÚ¿Í»§¶ËÖØÆôºó½«×Ô¶¯±»É¾³ý£»
2. ½¨±íʱÓû§¿ÉÒÔÑ¡Ôñ²»Ö¸¶¨±í½á¹¹£¬ÒòΪÈç¹ûÓû§²»ÏÔʽָ¶¨±í½á¹¹£¬SparkSQL½«ÔÚ½¨±íʱ×Ô¶¯¼ì²âÒѾ´æÔÚÊý¾ÝµÄ±í½á¹¹£»
3. com.sequoiadb.spark ¹Ø¼ü×ÖΪSequoiaDB for Spark connector
µÄÈë¿ÚÀࣻ
4. options ΪSequoiaDB for Spark connectorµÄÅäÖòÎÊý£»
SparkSQL½¨±íÀý×ÓÈçÏ£º
create table
tableName (name string, id int) using com.sequoiadb.spark
options (host 'sdb1:11810,sdb2:11810,sdb3:11810',
collectionspace 'foo', collection 'bar', username
'sdbadmin', password 'sdbadmin'); |
SparkSQL for SequoiaDBµÄ½¨±íoptions²ÎÊýÁбíÈçÏ£º

SparkSQLÓÅ»¯
Óû§Èç¹ûҪʹÓÃSparkSQL¶Ôº£Á¿Êý¾Ý×öͳ¼Æ·ÖÎö²Ù×÷£¬ÄÇôӦ¸Ã´Ó3¸ö·½Ãæ½øÐÐÐÔÄܵ÷ÓÅ
1. µ÷´óSpark Worker ×î´ó¿ÉÓÃÄÚ´æ´óС£¬·ÀÖ¹ÔÚ¼ÆËã¹ý³ÌÖÐÊý¾Ý³¬³öÄڴ淶Χ£¬ÐèÒª½«²¿·ÖÊý¾ÝдÈëµ½ÁÙʱÎļþÉÏ£»
2. Ôö¼ÓSpark Worker ÊýÄ¿£¬²¢ÇÒÉèÖÃÿ¸öWorker¾ù¿ÉÒÔʹÓõ±Ç°·þÎñÆ÷×óÓÒCPU×ÊÔ´£¬ÒÔÌá¸ß²¢·¢ÄÜÁ¦£»
3. µ÷ÕûSparkµÄÔËÐвÎÊý£»
Óû§¿ÉÒÔ¶Ôspark-env.sh ÅäÖÃÎļþ½øÐÐÉèÖã¬SPARK_WORKER_MEMORYΪ¿ØÖÆWorker¿ÉÓÃÄÚ´æµÄ²ÎÊý£¬SPARK_WORKER_INSTANCESΪÿ̨·þÎñÆ÷Æô¶¯¶àÉÙ¸öWorkerµÄ²ÎÊý¡£
Èç¹ûÓû§ÐèÒªµ÷ÕûSparkµÄÔËÐвÎÊý£¬ÔòÓ¦¸ÃÐÞ¸Äspark-defaults.conf ÅäÖÃÎļþ£¬¶ÔÓÅ»¯º£Á¿Êý¾Ýͳ¼Æ¼ÆËãÓнÏÃ÷ÏÔÌáÉýµÄ²ÎÊýÓÐ
1) spark.storage.memoryFraction£¬¸Ã²ÎÊý¿ØÖÆWorker¶àÉÙÄÚ´æ±ÈÀýÓû§´æ´¢ÁÙʱ¼ÆËãÊý¾Ý£¬Ä¬ÈÏΪ0.6£¬´ú±í60%µÄº¬Ò壻
2) spark.shuffle.memoryFraction£¬¸Ã²ÎÊý¿ØÖƼÆËã¹ý³ÌÖÐshuffleʱÄܹ»Õ¼ÓÃÿ¸öWorkerµÄÄÚ´æ±ÈÀý£¬Ä¬ÈÏΪ0.2£¬´ú±í20%µÄº¬Ò壬Èç¹ûÁÙʱ´æ´¢µÄ¼ÆËãÊý¾Ý½ÏÉÙ£¬¶ø¼ÆËãÖÐÓн϶àµÄgroup
by¡¢sort¡¢joinµÈ²Ù×÷£¬Ó¦¸Ã¿¼Âǽ«spark.shuffle.memoryFraction
µ÷´ó£¬spark.storage.memoryFractionµ÷С£¬±ÜÃⳬ³öÄڴ沿·ÖÐèҪдÈëÁÙʱÎļþÖУ»
3) spark.serializer£¬¸Ã²ÎÊýÉèÖÃSparkÔÚÔËÐÐʱʹÓÃÄÄÖÖÐòÁл¯·½·¨£¬Ä¬ÈÏΪorg.apache.spark.serializer.JavaSerializer£¬µ«ÊÇΪÁËÌáÉýÐÔÄÜ£¬Ó¦¸ÃÑ¡Ôñorg.apache.spark.serializer.KryoSerializer
ÐòÁл¯
SequoiaDBÓÅ»¯
SparkSQL+SequoiaDBÕâÖÖ×éºÏ£¬ÓÉÓÚÊý¾Ý¶ÁÈ¡ÊÇ´ÓSequoiaDBÖнøÐУ¬ËùÒÔÔÚÐÔÄÜÓÅ»¯Ó¦¸Ã¿¼ÂÇÈýµã
1. ¾¡¿ÉÄܽ«´ó±íµÄÊý¾Ý·Ö²¼Ê½´æ´¢£¬ËùÒÔ½¨Òé·ûºÏ¶þάÇзÖÌõ¼þµÄtableÓ¦¸Ã²ÉÓöàά+HashÇзÖÁ½ÖÖÊý¾Ý¾ùºâ·½Ê½½øÐÐÊý¾Ý·Ö²¼Ê½´æ´¢£»
2. Êý¾Ýµ¼Èëʱ£¬Ó¦¸Ã±ÜÃâͬʱ¶ÔÏàͬ¼¯ºÏ¿Õ¼äµÄ¶à¸ö¼¯ºÏ×öÊý¾Ýµ¼È룬ÒòΪͬһ¸ö¼¯ºÏ¿Õ¼äϵĶà¸ö¼¯ºÏÊǹ²ÓÃÏàͬһ¸öÊý¾ÝÎļþ£¬Èç¹ûͬʱÏòÏàͬ¼¯ºÏ¿Õ¼äµÄ¶à¸ö¼¯ºÏ×öÊý¾Ýµ¼È룬»áµ¼ÖÂÿ¸ö¼¯ºÏϵÄÊý¾Ý¿é´æ´¢¹ýÓÚÀëÉ¢£¬´Ó¶øµ¼ÖÂÔÚSpark
SQL´ÓSequoiaDB»ñÈ¡º£Á¿Êý¾Ýʱ£¬ÐèÒª¶ÁÈ¡µÄÊý¾Ý¿é¹ý¶à£»
3. Èç¹ûSparkSQLµÄ²éѯÃüÁîÖаüº¬²éѯÌõ¼þ£¬Ó¦¸Ã¶ÔÓ¦µØÔÚSequoiaDBÖн¨Á¢¶ÔÓ¦×ֶεÄË÷Òý£»
connectorÓÅ»¯
SequoiaDB for Spark Á¬½ÓÆ÷µÄ²ÎÊýÓÅ»¯£¬Ö÷Òª·ÖÁ½¸ö³¡¾°£¬Ò»ÊÇÊý¾Ý¶Á£¬ÁíÍâÒ»¸öÊÇÊý¾ÝдÈë¡£
Êý¾ÝдÈëµÄÓÅ»¯¿Õ¼ä½ÏÉÙ£¬Ö»ÓÐÒ»¸ö²ÎÊý¿ÉÒÔµ÷Õû£¬¼´bulksize²ÎÊý£¬¸Ã²ÎÊýĬÈÏֵΪ500£¬´ú±íÁ¬½ÓÆ÷ÏòSequoiaDBдÈëÊý¾Ýʱ£¬ÒÔ500Ìõ¼Ç¼×é³ÉÒ»¸öÍøÂç°ü£¬ÔÙÏòSequoiaDB·¢ËÍдÈëÇëÇó£¬Í¨³£ÉèÖÃbulksize²ÎÊý£¬ÒÔÒ»¸öÍøÂç°ü²»³¬¹ý2MBΪ׼¡£
Êý¾Ý¶ÁÈ¡µÄ²ÎÊýÓÅ»¯£¬Óû§ÔòÐèÒª¹Ø×¢partitionmode¡¢partitionblocknumºÍpartitionmaxnumÈý¸ö²ÎÊý¡£
partitionmode£¬Á¬½ÓÆ÷µÄ·ÖÇøÄ£Ê½£¬¿ÉѡֵÓÐsingle¡¢sharding¡¢datablock¡¢auto£¬Ä¬ÈÏֵΪauto£¬´ú±íÁ¬½ÓÆ÷ÖÇÄÜʶ±ð¡£
1. singleÖµ´ú±íSparkSQLÔÚ·ÃÎÊSequoiaDBÊý¾Ýʱ£¬²»¿¼ÂDz¢·¢ÐÔÄÜ£¬Ö»ÓÃÒ»¸öÏß³ÌÁ¬½ÓSequoiaDBµÄCoord½Úµã£¬Ò»°ã¸Ã²ÎÊýÔÚ½¨±í×ö±í½á¹¹Êý¾Ý³éÑùʱ²ÉÓã»
2. shardingÖµ´ú±íSparkSQL·ÃÎÊSequoiaDBÊý¾Ýʱ£¬²ÉÓÃÖ±½ÓÁ¬½ÓSequoiaDB¸÷¸ödatanodeµÄ·½Ê½£¬¸Ã²ÎÊýÒ»°ã²ÉÓÃÔÚSQLÃüÁî°üº¬²éѯÌõ¼þ£¬²¢ÇҸòéѯ¿ÉÒÔÔÚSequoiaDBÖÐʹÓÃË÷Òý²éѯµÄ³¡¾°£»
3. datablockÖµ´ú±íSparkSQL·ÃÎÊSequoiaDBÊý¾Ýʱ£¬²ÉÓò¢·¢Á¬½ÓSequoiaDBµÄÊý¾Ý¿é½øÐÐÊý¾Ý¶ÁÈ¡£¬¸Ã²ÎÊýÒ»°ãʹÓÃÔÚSQLÃüÁîÎÞ·¨ÔÚSequoiaDBÖÐʹÓÃË÷Òý²éѯ£¬²¢ÇÒ²éѯµÄÊý¾ÝÁ¿½Ï´óµÄ³¡¾°£»
4. autoÖµ´ú±íSparkSQLÔÚÏòSequoiaDB²éѯÊý¾Ýʱ£¬·ÃÎÊSequoiaDBµÄ·½Ê½½«ÓÉÁ¬½ÓÆ÷¸ù¾Ý²»Í¬µÄÇé¿ö·ÖÎö¾ö¶¨£»
partitionblocknum£¬¸Ã²ÎÊýÖ»ÓÐÔÚpartitionmode=datablockʱ²Å»áÉúЧ£¬´ú±íÿ¸öWorkerÔÚ×öÊý¾Ý¼ÆËãʱ£¬Ò»´Î»ñÈ¡¶àÉÙ¸öSequoiaDBÊý¾Ý¿é¶ÁÈ¡ÈÎÎñ£¬¸Ã²ÎÊýĬÈÏֵΪ4¡£Èç¹ûSequoiaDBÖд洢µÄÊý¾ÝÁ¿½Ï´ó£¬¼ÆËãÊ±Éæ¼°µ½µÄÊý¾Ý¿é½Ï¶à£¬Óû§Ó¦¸Ãµ÷´ó¸Ã²ÎÊý£¬Ê¹µÃSparkSQLµÄ¼ÆËãÈÎÎñ±£³ÖÔÚÒ»¸öºÏÀí·¶Î§£¬Ìá¸ßÊý¾Ý¶ÁȡЧÂÊ¡£
partitionmaxnum£¬¸Ã²ÎÊýÖ»ÓÐÔÚpartitionmode=datablockʱ²Å»áÉúЧ£¬´ú±íÁ¬½ÓÆ÷×î¶àÄܹ»Éú³É¶àÉÙ¸öÊý¾Ý¿é¶ÁÈ¡ÈÎÎñ£¬¸Ã²ÎÊýµÄĬÈÏֵΪ1000¡£¸Ã²ÎÊýÖ÷ÒªÊÇΪÁ˱ÜÃâÓÉÓÚSequoiaDBÖеÄÊý¾ÝÁ¿¹ý´ó£¬µ¼ÖÂ×ܵÄÊý¾Ý¿éÊýÁ¿Ì«´ó£¬´Ó¶øµ¼ÖÂSparkSQLµÄ¼ÆËãÈÎÎñ¹ý¶à£¬¶øµ¼ÖÂ×ÜÌ弯ËãÐÔÄÜϽµ¡£
×ܽá
±¾ÎÄ´ÓSpark¡¢SequoiaDBÒÔ¼°SequoiaDB for Spark connectorÈý¸ö·½ÃæÏò¶ÁÕßÃǽéÉÜÁ˺£Á¿Êý¾ÝÏÂʹÓÃSparkSQL+SequoiaDBµÄÐÔÄܵ÷ÓÅ·½·¨¡£
ÎÄÕÂÖнéÉܵķ½·¨¾ßÓÐÒ»¶¨µÄ²Î¿¼ÒâÒ壬µ«ÊÇÐÔÄܵ÷ÓÅÒ»Ö±¶¼ÊÇ×Ñé¼¼ÊõÈËÔ±µÄ¹¤×÷¡£¼¼ÊõÈËÔ±ÔÚ¶Ô·Ö²¼Ê½»·¾³×öÐÔÄܵ÷ÓÅʱ£¬ÐèÒª×ۺϿ¼ÂǶà¸ö·½ÃæµÄÊý¾Ý£¬ÀýÈ磺·þÎñÆ÷µÄÓ²¼þ×ÊԴʹÓÃÇé¿ö¡¢SparkÔËÐÐ×´¿ö¡¢SequoiaDBÊý¾Ý·Ö²¼ÊÇ·ñºÏÀí¡¢Á¬»úÆ÷µÄ²ÎÊýÉèÖÃÊÇ·ñÕýÈ·¡¢SQLÃüÁîÊÇ·ñÓе÷ÓŵĿռäµÈ£¬ÒªÏëÐÔÄÜÌáÉý£¬ÖصãÊÇÒªÇó¼¼ÊõÈËÔ±ÕÒµ½Õû¸öϵͳÖеÄÐÔÄ̰ܶ壬Ȼºóͨ¹ýµ÷Õû²»Í¬µÄ²ÎÊý»òÕßÐ޸Ĵ洢·½°¸£¬´Ó¶øÈÃϵͳÔËÐеøü¼Ó¸ßЧ¡£
|