Ò»¡¢Ö´ÐÐģʽ
Ìá½»½Å±¾³£¼ûµÄÓï·¨£º
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
|
½Å±¾ËµÃ÷£º
£¨1£©¡ª-class£º Ö÷À࣬¼´mainº¯ÊýËùÓеÄÀà
£¨2£©¡ª- master : masterµÄURL£¬¼ûÏÂÃæµÄÏêϸ˵Ã÷¡£
£¨3£©¡ª-deploy-mode:clientºÍcluster2ÖÖģʽ
£¨4£©¡ª-conf:Ö¸¶¨key=valueÐÎʽµÄÅäÖÃ
ÏÂÃæ¶Ô¸÷ÖÖģʽ×öÒ»¸ö˵Ã÷£º
1¡¢local
±¾µØÅܳÌÐò£¬Ò»°ãÓÃÀ´²âÊÔ¡£¿ÉÒÔÖ¸¶¨Ïß³ÌÊý
# Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master local[8] \ /path/to/examples.jar \ 100 |
2.Standalone client
¸Ã·½Ê½Ó¦ÓÃÖ´ÐÐÁ÷³Ì£º
£¨1£©Óû§Æô¶¯¿Í»§¶Ë£¬Ö®ºó¿Í»§¶ËÔËÐÐÓû§³ÌÐò£¬Æô¶¯Driver½ø³Ì¡£ÔÚDriverÖÐÆô¶¯»òʵÀý»¯DAGSchedulerµÈ×é¼þ¡£
¿Í»§¶ËµÄDriverÏòMaster×¢²á¡£
£¨2£©WorkerÏòMaster×¢²á£¬MasterÃüÁîWorkerÆô¶¯Exeuctor¡£Workerͨ¹ý´´½¨ExecutorRunnerỊ̈߳¬ÔÚExecutorRunnerÏß³ÌÄÚ²¿Æô¶¯ExecutorBackend½ø³Ì¡£
£¨3£©ExecutorBackendÆô¶¯ºó£¬Ïò¿Í»§¶ËDriver½ø³ÌÄÚµÄSchedulerBackend×¢²á£¬ÕâÑùDriver½ø³Ì¾ÍÄÜÕÒµ½¼ÆËã×ÊÔ´¡£DriverµÄDAGScheduler½âÎöÓ¦ÓÃÖеÄRDD
DAG²¢Éú³ÉÏàÓ¦µÄStage£¬Ã¿¸öStage°üº¬µÄTaskSetͨ¹ýTaskScheduler·ÖÅ䏸Executor¡£
ÔÚExecutorÄÚ²¿Æô¶¯Ï̳߳ز¢Ðл¯Ö´ÐÐTask¡£
ʾÀý½Å±¾£º
# Run on a Spark Standalone cluster in client deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 |

3¡¢standalone cluster
¸Ã·½Ê½Ó¦ÓÃÖ´ÐÐÁ÷³Ì£º
£¨1£©Óû§Æô¶¯¿Í»§¶Ë£¬¿Í»§¶ËÌá½»Ó¦ÓóÌÐò¸øMaster¡£
£¨2£©Masterµ÷¶ÈÓ¦Óã¬Õë¶Ôÿ¸öÓ¦Ó÷ַ¢¸øÖ¸¶¨µÄÒ»¸öWorkerÆô¶¯Driver£¬¼´Scheduler-Backend¡£
Worker½ÓÊÕµ½MasterÃüÁîºó´´½¨DriverRunnerỊ̈߳¬ÔÚDriverRunnerÏß³ÌÄÚ´´½¨SchedulerBackend½ø³Ì¡£Driver³äµ±Õû¸ö×÷ÒµµÄÖ÷¿Ø½ø³Ì¡£Master»áÖ¸¶¨ÆäËûWorkerÆô¶¯Exeuctor£¬¼´ExecutorBackend½ø³Ì£¬Ìṩ¼ÆËã×ÊÔ´¡£Á÷³ÌºÍÉÏÃæºÜÏàËÆ£¬£¨3£©Worker´´½¨ExecutorRunnerỊ̈߳¬ExecutorRunner»áÆô¶¯ExecutorBackend½ø³Ì¡£
£¨4£©ExecutorBackendÆô¶¯ºó£¬ÏòDriverµÄSchedulerBackend×¢²á£¬ÕâÑùDriver»ñÈ¡Á˼ÆËã×ÊÔ´¾Í¿ÉÒÔµ÷¶ÈºÍ½«ÈÎÎñ·Ö·¢µ½¼ÆËã½ÚµãÖ´ÐС£SchedulerBackend½ø³ÌÖаüº¬DAGScheduler£¬Ëü»á¸ù¾ÝRDDµÄDAGÇзÖStage£¬Éú³ÉTaskSet£¬²¢µ÷¶ÈºÍ·Ö·¢Taskµ½Executor¡£¶ÔÓÚÿ¸öStageµÄTaskSet£¬¶¼»á±»´æ·Åµ½TaskSchedulerÖС£TaskScheduler½«ÈÎÎñ·Ö·¢µ½Executor£¬Ö´ÐжàÏ̲߳¢ÐÐÈÎÎñ¡£

ʾÀý½Å±¾£º
# Run on a Spark Standalone cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster --supervise --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 |
4¡¢yarn-client
ÔÚyarn-clientģʽÏ£¬DriverÔËÐÐÔÚClientÉÏ£¬Í¨¹ýApplicationMasterÏòRM»ñÈ¡×ÊÔ´¡£±¾µØDriver¸ºÔðÓëËùÓеÄexecutor
container½øÐн»»¥£¬²¢½«×îºóµÄ½á¹û»ã×Ü¡£½áÊøµôÖÕ¶Ë£¬Ï൱ÓÚkillµôÕâ¸ösparkÓ¦Óá£Ò»°ãÀ´Ëµ£¬Èç¹ûÔËÐеĽá¹û½ö½ö·µ»Øµ½terminalÉÏʱÐèÒªÅäÖÃÕâ¸ö¡£
¿Í»§¶ËµÄDriver½«Ó¦ÓÃÌá½»¸øYarnºó£¬Yarn»áÏȺóÆô¶¯ApplicationMasterºÍexecutor£¬ÁíÍâApplicationMasterºÍexecutor¶¼
ÊÇ×°ÔØÔÚcontainerÀïÔËÐУ¬containerĬÈϵÄÄÚ´æÊÇ1G£¬ApplicationMaster·ÖÅäµÄÄÚ´æÊÇdriver-
memory£¬executor·ÖÅäµÄÄÚ´æÊÇexecutor-memory¡£Í¬Ê±£¬ÒòΪDriverÔÚ¿Í»§¶Ë£¬ËùÒÔ³ÌÐòµÄÔËÐнá¹û¿ÉÒÔÔÚ¿Í»§¶ËÏÔ
ʾ£¬DriverÒÔ½ø³ÌÃûΪSparkSubmitµÄÐÎʽ´æÔÚ¡£

Yarn-clientģʽÏÂ×÷ÒµÖ´ÐÐÁ÷³Ì£º
1. ¿Í»§¶ËÉú³É×÷ÒµÐÅÏ¢Ìá½»¸øResourceManager(RM)
2. RMÔÚ±¾µØNodeManagerÆô¶¯container²¢½«Application Master(AM)·ÖÅ䏸¸ÃNodeManager(NM)
3. NM½ÓÊÕµ½RMµÄ·ÖÅ䣬Æô¶¯Application Master²¢³õʼ»¯×÷Òµ£¬´ËʱÕâ¸öNM¾Í³ÆÎªDriver
4. ApplicationÏòRMÉêÇë×ÊÔ´£¬·ÖÅä×ÊԴͬʱ֪ͨÆäËûNodeManagerÆô¶¯ÏàÓ¦µÄExecutor
5. ExecutorÏò±¾µØÆô¶¯µÄApplication Master×¢²á»ã±¨²¢Íê³ÉÏàÓ¦µÄÈÎÎñ
ʾÀý½Å±¾£º
# Run on a YARN client export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 |
5¡¢Yarn-cluster
Spark DriverÊ×ÏÈ×÷Ϊһ¸öApplicationMasterÔÚYARN¼¯ÈºÖÐÆô¶¯£¬¿Í»§¶ËÌá½»¸øResourceManagerµÄÿһ¸öjob¶¼»áÔÚ¼¯ÈºµÄworker½ÚµãÉÏ·ÖÅäÒ»¸öΨһµÄApplicationMaster£¬ÓɸÃApplicationMaster¹ÜÀíÈ«ÉúÃüÖÜÆÚµÄÓ¦Óá£ÒòΪDriver³ÌÐòÔÚYARNÖÐÔËÐУ¬ËùÒÔÊÂÏȲ»ÓÃÆô¶¯Spark
Master/Client£¬Ó¦ÓõÄÔËÐнá¹û²»ÄÜÔÚ¿Í»§¶ËÏÔʾ£¨¿ÉÒÔÔÚhistory serverÖв鿴£©£¬ËùÒÔ×îºÃ½«½á¹û±£´æÔÚHDFS¶ø·ÇstdoutÊä³ö£¬¿Í»§¶ËµÄÖÕ¶ËÏÔʾµÄÊÇ×÷ΪYARNµÄjobµÄ¼òµ¥ÔËÐÐ×´¿ö

Yarn-clusterģʽÏÂ×÷ÒµÖ´ÐÐÁ÷³Ì£º
1. ¿Í»§¶ËÉú³É×÷ÒµÐÅÏ¢Ìá½»¸øResourceManager(RM)
2. RMÔÚijһ¸öNodeManager(ÓÉYarn¾ö¶¨)Æô¶¯container²¢½«Application
Master(AM)·ÖÅ䏸¸ÃNodeManager(NM)
3. NM½ÓÊÕµ½RMµÄ·ÖÅ䣬Æô¶¯Application Master²¢³õʼ»¯×÷Òµ£¬´ËʱÕâ¸öNM¾Í³ÆÎªDriver
4. ApplicationÏòRMÉêÇë×ÊÔ´£¬·ÖÅä×ÊԴͬʱ֪ͨÆäËûNodeManagerÆô¶¯ÏàÓ¦µÄExecutor
5. ExecutorÏòNMÉϵÄApplication Master×¢²á»ã±¨²¢Íê³ÉÏàÓ¦µÄÈÎÎñ
ʾÀý½Å±¾£º
# Run on a YARN cluster export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn-cluster \ --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 |
¶þ¡¢Ö´ÐÐ×¢ÒâÊÂÏî
1¡¢¹ØÓÚjar°ü
hadoopºÍsparkµÄÅäÖûᱻ×Ô¶¯¼ÓÔØµ½SparkContext,Òò´Ë£¬Ìá½»applicationʱֻÐèÒªÌá½»Óû§µÄ´úÂëÒÔ¼°ÆäËüÒÀÀµ°ü£¬ÕâÓÐ2ÖÖ×ö·¨£º
£¨1£©½«Óû§´úÂë´ò°ü³Éjar£¬È»ºóÔÚÌá½»applicationʱʹÓáª-jarÀ´Ìí¼ÓÒÀÀµjar°ü£¨ÍƼöʹÓÃÕâÖÖ·½·¨£©
£¨2£©½«Óû§´úÂëÓëÒÀÀµÒ»Æð´ò°ü³ÉÒ»¸ö´ó°ü assembly jar £¨´òÒ»¸öjar°ü£¬ÓпÉÄÜÕû¸öjar´ò³öÀ´ÉϰÙM£¬´ò°üºÍ·¢²¼¹ý³ÌÂý£©
2¡¢µ÷ÊÔģʽ
Spark Standlongģʽ£º
Ö»Ö§³ÖFIFO
Spark On Mesosģʽ£ºÓÐÁ½ÖÖµ÷¶Èģʽ
1) ´ÖÁ£¶Èģʽ£¨Coarse-grained Mode£©
2) ϸÁ£¶Èģʽ£¨Fine-grained Mode£©
Spark On YARNģʽ£º
Ŀǰ½öÖ§³Ö´ÖÁ£¶Èģʽ£¨Coarse-grained Mode£©
3¡¢¶ÁÈ¡ÅäÖÃÓÅÏȼ¶
ÔÚ´úÂëÖеÄSparkConfÖеÄÅäÖòÎÊý¾ßÓÐ×î¸ßÓÅÏȼ¶£¬Æä´ÎÊÇ´«ËÍspark-submit½Å±¾µÄ²ÎÊý£¬×îºóÊÇÅäÖÃÎļþ(conf/spark-defaults.conf)ÖеIJÎÊý¡£
Èç¹û²»Çå³þÅäÖòÎÊý´ÓºÎ¶øÀ´£¬¿ÉÒÔʹÓÃspark-submitµÄ¡ªverboseÑ¡ÏîÀ´´òÓ¡³öϸÁ£¶ÈµÄµ÷¶ÈÐÅÏ¢¡£
|