Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark¶¯Ì¬×ÊÔ´·ÖÅä-Dynamic Resource Allocation
 
×÷Õߣºlxw1234@qq.com À´Ô´£ºlxwµÄ´óÊý¾ÝÌïµØ ·¢²¼ÓÚ 2016-2-29
  6579  次浏览      32
 

SparkÖУ¬Ëùν×ÊÔ´µ¥Î»Ò»°ãÖ¸µÄÊÇexecutors£¬ºÍYarnÖеÄContainersÒ»Ñù£¬ÔÚSpark On YarnģʽÏ£¬Í¨³£Ê¹ÓèCnum-executorsÀ´Ö¸¶¨ApplicationʹÓõÄexecutorsÊýÁ¿£¬¶ø¨Cexecutor-memoryºÍ¨Cexecutor-cores·Ö±ðÓÃÀ´Ö¸¶¨Ã¿¸öexecutorËùʹÓõÄÄÚ´æºÍÐéÄâCPUºËÊý¡£ÏàÐźܶàÅóÓÑÖÁ½ñÔÚÌá½»SparkÓ¦ÓóÌÐòʱºò¶¼Ê¹Óø÷½Ê½À´Ö¸¶¨×ÊÔ´¡£

¼ÙÉèÓÐÕâÑùµÄ³¡¾°£¬Èç¹ûʹÓÃHive£¬¶à¸öÓû§Í¬Ê±Ê¹ÓÃhive-cli×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬Ö»Óе±Óû§Ìá½»Ö´ÐÐÁËHive SQLʱºò£¬²Å»áÏòYARNÉêÇë×ÊÔ´£¬Ö´ÐÐÈÎÎñ£¬Èç¹û²»Ìá½»Ö´ÐУ¬Î޷ǾÍÊÇÍ£ÁôÔÚHive-cliÃüÁîÐУ¬Ò²¾ÍÊǸöJVM¶øÒÑ£¬²¢²»»áÀË·ÑYARNµÄ×ÊÔ´¡£ÏÖÔÚÏëÓÃSpark-SQL´úÌæHiveÀ´×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬Ò²ÊǶàÓû§Í¬Ê±Ê¹Óã¬Èç¹û°´ÕÕ֮ǰµÄ·½Ê½£¬ÒÔyarn-clientģʽÔËÐÐspark-sqlÃüÁîÐУ¨http://lxw1234.com/archives/2015/08/448.htm£©£¬ÔÚÆô¶¯Ê±ºòÖ¸¶¨¨Cnum-executors 10£¬ÄÇôÿ¸öÓû§Æô¶¯Ê±ºò¶¼Ê¹ÓÃÁË10¸öYARNµÄ×ÊÔ´£¨Container£©£¬Õâ10¸ö×ÊÔ´¾Í»áÒ»Ö±±»Õ¼ÓÃ×Å£¬Ö»Óе±Óû§Í˳öspark-sqlÃüÁîÐÐʱ²Å»áÊÍ·Å¡£

spark-sql On Yarn£¬Äܲ»ÄÜÏñHiveÒ»Ñù£¬Ö´ÐÐSQLµÄʱºò²ÅÈ¥ÉêÇë×ÊÔ´£¬²»Ö´ÐеÄʱºò¾ÍÊͷŵô×ÊÔ´ÄØ£¬Æäʵ´ÓSpark1.2Ö®ºó£¬¶ÔÓÚOn Yarnģʽ£¬ÒѾ­Ö§³Ö¶¯Ì¬×ÊÔ´·ÖÅ䣨Dynamic Resource Allocation£©£¬ÕâÑù£¬¾Í¿ÉÒÔ¸ù¾ÝApplicationµÄ¸ºÔØ£¨TaskÇé¿ö£©£¬¶¯Ì¬µÄÔö¼ÓºÍ¼õÉÙexecutors£¬ÕâÖÖ²ßÂԷdz£ÊʺÏÔÚYARNÉÏʹÓÃspark-sql×öÊý¾Ý¿ª·¢ºÍ·ÖÎö£¬ÒÔ¼°½«spark-sql×÷Ϊ³¤·þÎñÀ´Ê¹Óõij¡¾°¡£

±¾ÎÄÒÔSpark1.5.0ºÍhadoop-2.3.0-cdh5.0.0£¬½éÉÜÔÚspark-sql On YarnģʽÏ£¬ÈçºÎʹÓö¯Ì¬×ÊÔ´·ÖÅä²ßÂÔ¡£

YARNµÄÅäÖÃ

Ê×ÏÈÐèÒª¶ÔYARNµÄNodeManager½øÐÐÅäÖã¬Ê¹ÆäÖ§³ÖSparkµÄShuffle Service¡£

ÐÞ¸Äÿ̨NodeManagerÉϵÄyarn-site.xml£º

##ÐÞ¸Ä
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
##Ôö¼Ó
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>spark.shuffle.service.port</name>
<value>7337</value>
</property>

½«$SPARK_HOME/lib/spark-1.5.0-yarn-shuffle.jar¿½±´µ½Ã¿Ì¨NodeManagerµÄ${HADOOP_HOME}/share/hadoop/yarn/lib/Ï¡£
ÖØÆôËùÓÐNodeManager¡£

SparkµÄÅäÖÃ

ÅäÖÃ$SPARK_HOME/conf/spark-defaults.conf£¬Ôö¼ÓÒÔϲÎÊý£º

spark.shuffle.service.enabled true   //ÆôÓÃExternal shuffle Service·þÎñ
spark.shuffle.service.port 7337 //Shuffle Service·þÎñ¶Ë¿Ú£¬±ØÐëºÍyarn-siteÖеÄÒ»ÖÂ
spark.dynamicAllocation.enabled true //¿ªÆô¶¯Ì¬×ÊÔ´·ÖÅä
spark.dynamicAllocation.minExecutors 1 //ÿ¸öApplication×îС·ÖÅäµÄexecutorÊý
spark.dynamicAllocation.maxExecutors 30 //ÿ¸öApplication×î´ó²¢·¢·ÖÅäµÄexecutorÊý
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s

¶¯Ì¬×ÊÔ´·ÖÅä²ßÂÔ£º

¿ªÆô¶¯Ì¬·ÖÅä²ßÂÔºó£¬application»áÔÚtaskÒòûÓÐ×ã¹»×ÊÔ´±»¹ÒÆðµÄʱºòÈ¥¶¯Ì¬ÉêÇë×ÊÔ´£¬ÕâÖÖÇé¿öÒâζןÃapplicationÏÖÓеÄexecutorÎÞ·¨Âú×ãËùÓÐtask²¢ÐÐÔËÐС£sparkÒ»ÂÖÒ»ÂÖµÄÉêÇë×ÊÔ´£¬µ±ÓÐtask¹ÒÆð»òµÈ´ýspark.dynamicAllocation.schedulerBacklogTimeout(ĬÈÏ1s)ʱ¼äµÄʱºò£¬»á¿ªÊ¼¶¯Ì¬×ÊÔ´·ÖÅ䣻֮ºó»áÿ¸ôspark.dynamicAllocation.sustainedSchedulerBacklogTimeout(ĬÈÏ1s)ʱ¼äÉêÇëÒ»´Î£¬Ö±µ½ÉêÇëµ½×ã¹»µÄ×ÊÔ´¡£Ã¿´ÎÉêÇëµÄ×ÊÔ´Á¿ÊÇÖ¸ÊýÔö³¤µÄ£¬¼´1,2,4,8µÈ¡£

Ö®ËùÒÔ²ÉÓÃÖ¸ÊýÔö³¤£¬³öÓÚÁ½·½Ã濼ÂÇ£ºÆäÒ»£¬¿ªÊ¼ÉêÇëµÄÉÙÊÇ¿¼Âǵ½¿ÉÄÜapplication»áÂíÉϵõ½Âú×㣻Æä´ÎÒª³É±¶Ôö¼Ó£¬ÊÇΪÁË·ÀÖ¹applicationÐèÒªºÜ¶à×ÊÔ´£¬¶ø¸Ã·½Ê½¿ÉÒÔÔÚºÜÉÙ´ÎÊýµÄÉêÇëÖ®ºóµÃµ½Âú×ã¡£

×ÊÔ´»ØÊÕ²ßÂÔ£º

µ±applicationµÄexecutor¿ÕÏÐʱ¼ä³¬¹ýspark.dynamicAllocation.executorIdleTimeout£¨Ä¬ÈÏ60s£©ºó£¬¾Í»á±»»ØÊÕ¡£

ʹÓÃspark-sql On YarnÖ´ÐÐSQL£¬¶¯Ì¬·ÖÅä×ÊÔ´

./spark-sql --master yarn-client \
--executor-memory 1G \
-e "SELECT COUNT(1) FROM ut.t_ut_site_log where pt >= '2015-12-09' and pt <= '2015-12-10'"

¸Ã²éѯÐèÒª123¸öTask¡£

´ÓAppMasterµÄWEB½çÃæ¿ÉÒÔ¿´µ½£¬×ܹ²ÓÐ31¸öExecutors£¬ÆäÖÐÒ»¸öÊÇDriver£¬¼ÈÓÐ30¸öExecutors²¢·¢Ö´ÐУ¬¶ø30£¬ÕýÊÇÔÚspark.dynamicAllocation.maxExecutors²ÎÊýÖÐÅäÖõÄ×î´ó²¢·¢Êý¡£Èç¹ûÒ»¸ö²éѯֻÓÐ10¸öTask£¬ÄÇôֻ»áÏòYarnÉêÇë10¸öexecutorsµÄ×ÊÔ´¡£

ÐèҪעÒ⣺

Èç¹ûʹÓÃ

./spark-sql ¨Cmaster yarn-client ¨Cexecutor-memory 1G

½øÈëspark-sqlÃüÁîÐУ¬ÔÚÃüÁîÐÐÖÐÖ´ÐÐÈκÎSQL²éѯ£¬¶¼²»»áÖ´ÐУ¬Ô­ÒòÊÇspark-sqlÔÚÌá½»µ½Yarnʱºò£¬ÒѾ­±»µ±³ÉÒ»¸öApplication£¬¶øÕâÖÖ£¬³ýÁËDriver£¬ÊDz»»á±»·ÖÅäµ½ÈκÎexecutors×ÊÔ´µÄ£¬ËùÓУ¬ÄãÌá½»µÄ²éѯÒòΪûÓÐexecutor¶ø²»Äܱ»Ö´ÐС£

¶øÕâ¸öÎÊÌ⣬ÎÒʹÓÃSparkµÄThriftServer£¨HiveServer2£©µÃÒÔ½â¾ö¡£

ʹÓÃThrift JDBC·½Ê½Ö´ÐÐSQL£¬¶¯Ì¬·ÖÅä×ÊÔ´

Ê×Ñ¡ÒÔyarn-clientģʽ£¬Æô¶¯SparkµÄThriftServer·þÎñ£¬Ò²¾ÍÊÇHiveServer2.

ÅäÖÃThriftServer¼àÌýµÄ¶Ë¿ÚºÅºÍµØÖ·

vi $SPARK_HOME/conf/spark-env.sh
export HIVE_SERVER2_THRIFT_PORT=10000
export HIVE_SERVER2_THRIFT_BIND_HOST=0.0.0.0

ÒÔyarn-clientģʽÆô¶¯ThriftServer

cd $SPARK_HOME/sbin/
./start-thriftserver.sh \
--master yarn-client \
--conf spark.driver.memory=3G \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.minExecutors=1 \
--conf spark.dynamicAllocation.maxExecutors=30 \
--conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s

Æô¶¯ºó£¬ThriftServer»áÔÚYarnÉÏ×÷Ϊһ¸ö³¤·þÎñÀ´ÔËÐУº

ʹÓÃbeelineͨ¹ýJDBCÁ¬½Óspark-sql

cd $SPARK_HOME/bin
./beeline -u jdbc:hive2://localhost:10000 -n lxw1234

Ö´Ðвéѯ£º

select count(1) from ut.t_ut_site_log where pt = ¡®2015-12-10¡ä;

¸ÃÈÎÎñÓÐ64¸öTask£º

¶ø¼à¿ØÒ³ÃæÉϵIJ¢·¢ÊýÈÔÈ»ÊÇ30£º

Ö´ÐÐÍêºó£¬executorsÊýֻʣÏÂ1¸ö£¬Ó¦¸ÃÊÇ»º´æÊý¾Ý£¬ÆäÓàµÄÈ«²¿±»»ØÊÕ£º

ÕâÑù£¬¶à¸öÓû§¿ÉÒÔͨ¹ýbeeline£¬JDBCÁ¬½Óµ½Thrift Server£¬Ö´ÐÐSQL²éѯ£¬¶ø×ÊÔ´Ò²ÊǶ¯Ì¬·ÖÅäµÄ¡£

ÐèҪעÒâµÄÊÇ£¬ÔÚÆô¶¯ThriftServerʱºòÖ¸¶¨µÄspark.dynamicAllocation.maxExecutors=30£¬ÊÇÕû¸öThriftServerͬʱ²¢·¢µÄ×î´ó×ÊÔ´Êý£¬Èç¹û¶à¸öÓû§Í¬Ê±Á¬½Ó£¬Ôò»á±»¶à¸öÓû§¹²Ïí¾ºÕù£¬×ܹ²30¸ö¡£

ÕâÑù£¬Ò²ËãÊǽâ¾öÁ˶àÓû§Í¬Ê±Ê¹ÓÃspark-sql£¬²¢ÇÒ¶¯Ì¬·ÖÅä×ÊÔ´µÄÐèÇóÁË¡£

Spark¶¯Ì¬×ÊÔ´·ÖÅä¹Ù·½Îĵµ£ºhttp://spark.apache.org/docs/1.5.0/job-scheduling.html#dynamic-resource-allocation

Äú¿ÉÒÔ¹Ø×¢ lxwµÄ´óÊý¾ÝÌïµØ £¬»òÕß ¼ÓÈëÓʼþÁÐ±í £¬ËæÊ±½ÓÊÕ²©¿Í¸üеÄ֪ͨÓʼþ¡£

   
6579 ´Îä¯ÀÀ       32
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
ͼÊý¾Ý¿âÓë֪ʶͼÆ× 8-28[±±¾©]
OCSMPÈÏÖ¤£ºOCSMP-MBF 8-29[±±¾©]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 9-9[±±¾©]
Èí¼þ¼Ü¹¹Éè¼Æ·½·¨¡¢°¸Àýʵ¼ù 9-24[±±¾©]
ÐèÇó·ÖÎöʦÄÜÁ¦ÅàÑø 10-30[±±¾©]
MBSEÌåϵÓëʵ¼ù 8-26[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ

²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí