SparkÖУ¬Ëùν×ÊÔ´µ¥Î»Ò»°ãÖ¸µÄÊÇexecutors£¬ºÍYarnÖеÄContainersÒ»Ñù£¬ÔÚSpark
On YarnģʽÏ£¬Í¨³£Ê¹ÓèCnum-executorsÀ´Ö¸¶¨ApplicationʹÓõÄexecutorsÊýÁ¿£¬¶ø¨Cexecutor-memoryºÍ¨Cexecutor-cores·Ö±ðÓÃÀ´Ö¸¶¨Ã¿¸öexecutorËùʹÓõÄÄÚ´æºÍÐéÄâCPUºËÊý¡£ÏàÐźܶàÅóÓÑÖÁ½ñÔÚÌá½»SparkÓ¦ÓóÌÐòʱºò¶¼Ê¹Óø÷½Ê½À´Ö¸¶¨×ÊÔ´¡£
¼ÙÉèÓÐÕâÑùµÄ³¡¾°£¬Èç¹ûʹÓÃHive£¬¶à¸öÓû§Í¬Ê±Ê¹ÓÃhive-cli×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬Ö»Óе±Óû§Ìá½»Ö´ÐÐÁËHive
SQLʱºò£¬²Å»áÏòYARNÉêÇë×ÊÔ´£¬Ö´ÐÐÈÎÎñ£¬Èç¹û²»Ìá½»Ö´ÐУ¬Î޷ǾÍÊÇÍ£ÁôÔÚHive-cliÃüÁîÐУ¬Ò²¾ÍÊǸöJVM¶øÒÑ£¬²¢²»»áÀË·ÑYARNµÄ×ÊÔ´¡£ÏÖÔÚÏëÓÃSpark-SQL´úÌæHiveÀ´×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬Ò²ÊǶàÓû§Í¬Ê±Ê¹Óã¬Èç¹û°´ÕÕ֮ǰµÄ·½Ê½£¬ÒÔyarn-clientģʽÔËÐÐspark-sqlÃüÁîÐУ¨http://lxw1234.com/archives/2015/08/448.htm£©£¬ÔÚÆô¶¯Ê±ºòÖ¸¶¨¨Cnum-executors
10£¬ÄÇôÿ¸öÓû§Æô¶¯Ê±ºò¶¼Ê¹ÓÃÁË10¸öYARNµÄ×ÊÔ´£¨Container£©£¬Õâ10¸ö×ÊÔ´¾Í»áÒ»Ö±±»Õ¼ÓÃ×Å£¬Ö»Óе±Óû§Í˳öspark-sqlÃüÁîÐÐʱ²Å»áÊÍ·Å¡£
spark-sql On Yarn£¬Äܲ»ÄÜÏñHiveÒ»Ñù£¬Ö´ÐÐSQLµÄʱºò²ÅÈ¥ÉêÇë×ÊÔ´£¬²»Ö´ÐеÄʱºò¾ÍÊͷŵô×ÊÔ´ÄØ£¬Æäʵ´ÓSpark1.2Ö®ºó£¬¶ÔÓÚOn
Yarnģʽ£¬ÒѾ֧³Ö¶¯Ì¬×ÊÔ´·ÖÅ䣨Dynamic Resource Allocation£©£¬ÕâÑù£¬¾Í¿ÉÒÔ¸ù¾ÝApplicationµÄ¸ºÔØ£¨TaskÇé¿ö£©£¬¶¯Ì¬µÄÔö¼ÓºÍ¼õÉÙexecutors£¬ÕâÖÖ²ßÂԷdz£ÊʺÏÔÚYARNÉÏʹÓÃspark-sql×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬ÒÔ¼°½«spark-sql×÷Ϊ³¤·þÎñÀ´Ê¹Óõij¡¾°¡£
±¾ÎÄÒÔSpark1.5.0ºÍhadoop-2.3.0-cdh5.0.0£¬½éÉÜÔÚspark-sql On
YarnģʽÏ£¬ÈçºÎʹÓö¯Ì¬×ÊÔ´·ÖÅä²ßÂÔ¡£
YARNµÄÅäÖÃ
Ê×ÏÈÐèÒª¶ÔYARNµÄNodeManager½øÐÐÅäÖã¬Ê¹ÆäÖ§³ÖSparkµÄShuffle Service¡£
ÐÞ¸Äÿ̨NodeManagerÉϵÄyarn-site.xml£º
##ÐÞ¸Ä <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> ##Ôö¼Ó <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>spark.shuffle.service.port</name> <value>7337</value> </property> |
½«$SPARK_HOME/lib/spark-1.5.0-yarn-shuffle.jar¿½±´µ½Ã¿Ì¨NodeManagerµÄ${HADOOP_HOME}/share/hadoop/yarn/lib/Ï¡£
ÖØÆôËùÓÐNodeManager¡£
SparkµÄÅäÖÃ
ÅäÖÃ$SPARK_HOME/conf/spark-defaults.conf£¬Ôö¼ÓÒÔϲÎÊý£º
spark.shuffle.service.enabled true //ÆôÓÃExternal shuffle Service·þÎñ spark.shuffle.service.port 7337 //Shuffle Service·þÎñ¶Ë¿Ú£¬±ØÐëºÍyarn-siteÖеÄÒ»Ö spark.dynamicAllocation.enabled true //¿ªÆô¶¯Ì¬×ÊÔ´·ÖÅä spark.dynamicAllocation.minExecutors 1 //ÿ¸öApplication×îС·ÖÅäµÄexecutorÊý spark.dynamicAllocation.maxExecutors 30 //ÿ¸öApplication×î´ó²¢·¢·ÖÅäµÄexecutorÊý spark.dynamicAllocation.schedulerBacklogTimeout 1s spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s |
¶¯Ì¬×ÊÔ´·ÖÅä²ßÂÔ£º
¿ªÆô¶¯Ì¬·ÖÅä²ßÂÔºó£¬application»áÔÚtaskÒòûÓÐ×ã¹»×ÊÔ´±»¹ÒÆðµÄʱºòÈ¥¶¯Ì¬ÉêÇë×ÊÔ´£¬ÕâÖÖÇé¿öÒâζןÃapplicationÏÖÓеÄexecutorÎÞ·¨Âú×ãËùÓÐtask²¢ÐÐÔËÐС£sparkÒ»ÂÖÒ»ÂÖµÄÉêÇë×ÊÔ´£¬µ±ÓÐtask¹ÒÆð»òµÈ´ýspark.dynamicAllocation.schedulerBacklogTimeout(ĬÈÏ1s)ʱ¼äµÄʱºò£¬»á¿ªÊ¼¶¯Ì¬×ÊÔ´·ÖÅ䣻֮ºó»áÿ¸ôspark.dynamicAllocation.sustainedSchedulerBacklogTimeout(ĬÈÏ1s)ʱ¼äÉêÇëÒ»´Î£¬Ö±µ½ÉêÇëµ½×ã¹»µÄ×ÊÔ´¡£Ã¿´ÎÉêÇëµÄ×ÊÔ´Á¿ÊÇÖ¸ÊýÔö³¤µÄ£¬¼´1,2,4,8µÈ¡£
Ö®ËùÒÔ²ÉÓÃÖ¸ÊýÔö³¤£¬³öÓÚÁ½·½Ã濼ÂÇ£ºÆäÒ»£¬¿ªÊ¼ÉêÇëµÄÉÙÊÇ¿¼Âǵ½¿ÉÄÜapplication»áÂíÉϵõ½Âú×㣻Æä´ÎÒª³É±¶Ôö¼Ó£¬ÊÇΪÁË·ÀÖ¹applicationÐèÒªºÜ¶à×ÊÔ´£¬¶ø¸Ã·½Ê½¿ÉÒÔÔÚºÜÉÙ´ÎÊýµÄÉêÇëÖ®ºóµÃµ½Âú×ã¡£
×ÊÔ´»ØÊÕ²ßÂÔ£º
µ±applicationµÄexecutor¿ÕÏÐʱ¼ä³¬¹ýspark.dynamicAllocation.executorIdleTimeout£¨Ä¬ÈÏ60s£©ºó£¬¾Í»á±»»ØÊÕ¡£
ʹÓÃspark-sql On YarnÖ´ÐÐSQL£¬¶¯Ì¬·ÖÅä×ÊÔ´
./spark-sql --master yarn-client \ --executor-memory 1G \ -e "SELECT COUNT(1) FROM ut.t_ut_site_log where pt >= '2015-12-09' and pt <= '2015-12-10'" |

¸Ã²éѯÐèÒª123¸öTask¡£

´ÓAppMasterµÄWEB½çÃæ¿ÉÒÔ¿´µ½£¬×ܹ²ÓÐ31¸öExecutors£¬ÆäÖÐÒ»¸öÊÇDriver£¬¼ÈÓÐ30¸öExecutors²¢·¢Ö´ÐУ¬¶ø30£¬ÕýÊÇÔÚspark.dynamicAllocation.maxExecutors²ÎÊýÖÐÅäÖõÄ×î´ó²¢·¢Êý¡£Èç¹ûÒ»¸ö²éѯֻÓÐ10¸öTask£¬ÄÇôֻ»áÏòYarnÉêÇë10¸öexecutorsµÄ×ÊÔ´¡£
ÐèҪעÒ⣺
Èç¹ûʹÓÃ
./spark-sql ¨Cmaster yarn-client ¨Cexecutor-memory
1G
½øÈëspark-sqlÃüÁîÐУ¬ÔÚÃüÁîÐÐÖÐÖ´ÐÐÈκÎSQL²éѯ£¬¶¼²»»áÖ´ÐУ¬ÔÒòÊÇspark-sqlÔÚÌá½»µ½Yarnʱºò£¬ÒѾ±»µ±³ÉÒ»¸öApplication£¬¶øÕâÖÖ£¬³ýÁËDriver£¬ÊDz»»á±»·ÖÅäµ½ÈκÎexecutors×ÊÔ´µÄ£¬ËùÓУ¬ÄãÌá½»µÄ²éѯÒòΪûÓÐexecutor¶ø²»Äܱ»Ö´ÐС£
¶øÕâ¸öÎÊÌ⣬ÎÒʹÓÃSparkµÄThriftServer£¨HiveServer2£©µÃÒÔ½â¾ö¡£
ʹÓÃThrift JDBC·½Ê½Ö´ÐÐSQL£¬¶¯Ì¬·ÖÅä×ÊÔ´
Ê×Ñ¡ÒÔyarn-clientģʽ£¬Æô¶¯SparkµÄThriftServer·þÎñ£¬Ò²¾ÍÊÇHiveServer2.
ÅäÖÃThriftServer¼àÌýµÄ¶Ë¿ÚºÅºÍµØÖ·
vi $SPARK_HOME/conf/spark-env.sh export HIVE_SERVER2_THRIFT_PORT=10000 export HIVE_SERVER2_THRIFT_BIND_HOST=0.0.0.0 |
ÒÔyarn-clientģʽÆô¶¯ThriftServer
cd $SPARK_HOME/sbin/ ./start-thriftserver.sh \ --master yarn-client \ --conf spark.driver.memory=3G \ --conf spark.shuffle.service.enabled=true \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.minExecutors=1 \ --conf spark.dynamicAllocation.maxExecutors=30 \ --conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s |
Æô¶¯ºó£¬ThriftServer»áÔÚYarnÉÏ×÷Ϊһ¸ö³¤·þÎñÀ´ÔËÐУº

ʹÓÃbeelineͨ¹ýJDBCÁ¬½Óspark-sql
cd $SPARK_HOME/bin ./beeline -u jdbc:hive2://localhost:10000 -n lxw1234 |

Ö´Ðвéѯ£º
select count(1) from ut.t_ut_site_log
where pt = ¡®2015-12-10¡ä;
¸ÃÈÎÎñÓÐ64¸öTask£º

¶ø¼à¿ØÒ³ÃæÉϵIJ¢·¢ÊýÈÔÈ»ÊÇ30£º

Ö´ÐÐÍêºó£¬executorsÊýֻʣÏÂ1¸ö£¬Ó¦¸ÃÊÇ»º´æÊý¾Ý£¬ÆäÓàµÄÈ«²¿±»»ØÊÕ£º

ÕâÑù£¬¶à¸öÓû§¿ÉÒÔͨ¹ýbeeline£¬JDBCÁ¬½Óµ½Thrift Server£¬Ö´ÐÐSQL²éѯ£¬¶ø×ÊÔ´Ò²ÊǶ¯Ì¬·ÖÅäµÄ¡£
ÐèҪעÒâµÄÊÇ£¬ÔÚÆô¶¯ThriftServerʱºòÖ¸¶¨µÄspark.dynamicAllocation.maxExecutors=30£¬ÊÇÕû¸öThriftServerͬʱ²¢·¢µÄ×î´ó×ÊÔ´Êý£¬Èç¹û¶à¸öÓû§Í¬Ê±Á¬½Ó£¬Ôò»á±»¶à¸öÓû§¹²Ïí¾ºÕù£¬×ܹ²30¸ö¡£
ÕâÑù£¬Ò²ËãÊǽâ¾öÁ˶àÓû§Í¬Ê±Ê¹ÓÃspark-sql£¬²¢ÇÒ¶¯Ì¬·ÖÅä×ÊÔ´µÄÐèÇóÁË¡£
Spark¶¯Ì¬×ÊÔ´·ÖÅä¹Ù·½Îĵµ£ºhttp://spark.apache.org/docs/1.5.0/job-scheduling.html#dynamic-resource-allocation
Äú¿ÉÒÔ¹Ø×¢ lxwµÄ´óÊý¾ÝÌïµØ £¬»òÕß ¼ÓÈëÓʼþÁÐ±í £¬ËæÊ±½ÓÊÕ²©¿Í¸üеÄ֪ͨÓʼþ¡£
|