clusterÅäÖÃ
1 namenode£¬4 datanode
namenode: compute-n
datanode: compute-0-1, compute-0-2,
compute-0-3, compute-0-4
°²×°µÄ°æ±¾
Linux °æ±¾
Linux compute-n 2.6.32-38-generic #83-Ubuntu SMP Wed Jan 4 11:12:07 UTC 2012 x86_64 GNU/Linux |
JDK
java version "1.8.0_40" Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) |
Hadoop
Hadoop 2.6.0 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 Compiled by jenkins on 2014-11-13T21:10Z Compiled with protoc 2.5.0 From source with checksum 18e43357c8f927c0695f1e9522859d6a This command was run using /home/hadoop/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar |
1. ÏÂÔØSparkºÍScala
±¾ÈËÏÂÔØµÄÊÇSpark-2.6.0 ºÍ Scala 2.11.6
sparkÏÂÔØµØÖ· http://spark.apache.org/downloads.html
scalaÏÂÔØµØÖ· http://www.scala-lang.org/download/
2. ½âѹscala£¬ÅäÖÃscalaµÄ»·¾³±äÁ¿
tar -zxf scala-2.11.6.tgz</span> |
Ö®ºó½«ÎļþÒÆ¶¯µ½ /usr/lib/scala
mkdir /usr/lib/scala sudo mv scala-2.11.6 /usr/lib/scala</span> |
½«scalaÒÆ¶¯µ½ÆäËûµÄ»úÆ÷ÉÏÈ¥
sudo scp -r scala-2.11.6 hadoop@compute-0-1:/home/hadoop/Downloads/ ssh compute-0-1 |
3. °²×°Spark
3.1 ½âѹSpark£¬Òƶ¯µ½¶ÔӦĿ¼
tar -zxf spark-1.3.1-bin-hadoop2.6.tgz |
¿½±´sparkÎļþµ½ /usr/local/spark

3.2 ÅäÖû·¾³±äÁ¿
½øÈë/etc/profile
JAVA_HOME=/home/hadoop/jdk1.8.0_40 HADOOP_HOME=/home/hadoop/hadoop-2.6.0 SCALA_HOME=/usr/lib/scala/scala-2.11.6/ SPARK_HOME=/usr/local/spark/spark-1.3.1-bin-hadoop2.6l CLASSPATH=.:$JAVA_HOME/lib.tools.jar PATH=${SCALA_HOME}/bin:$JAVA_HOME/bin:${SPARK_HOME}
/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export SPARK_HOME SCALA_HOME JAVA_HOME CLASSPATH PATH |
±£´æÍ˳öºó£¬Ê¹ÅäÖÃÉúЧ
3.3 ÅäÖÃSpark
½øÈëSparkµÄÅäÖÃĿ¼conf
cp spark-env.sh.templates spark-env.sh cp slaves.templates slaves |
ÐÞ¸Äspark-env.shÎļþ
export JAVA_HOME=/home/hadoop/jdk1.8.0_40 export SCALA_HOME=/usr/lib/scala/scala-2.11.6/ export SPARK_MASTER_IP=10.119.178.200 export SPARK_WORKER_MEMORY=8G export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.6.0/etc/hadoop |
ÐÞ¸Äslaves
compute-0-1 compute-0-2 compute-0-3 compute-0-4 |
4. ¿½±´Í¬ÑùµÄ»·¾³µ½ÆäËûµÄ»úÆ÷
Ö®ºó½øÈë/usr/local Ŀ¼Ï£¬ ¿½±´Sparkµ½ÆäËûµÄ»úÆ÷ÉÏ£¬ÒòΪֱ½Ó¿½±´µ½/usr/local²»±»ÔÊÐí£¬ËùÒÔÏÈ¿½±´µ½DownloadsĿ¼ÏÂ
sudo scp -r spark hadoop@compute-0-1:/home/hadoop/Downloads/ sudo scp -r spark /usr/local/ |
5.Æô¶¯Spark
5.1 ½øÈëspark°²×°Ä¿Â¼ÏµÄsbinÎļþÏÂ
ʹÓÃÒÔÏÂÃüÁî
µ«ÊÇÌáʾÒÔÏ´íÎó
mkdir: cannot create directory `/usr/local/spark/spark-1.3.1-bin-hadoop2.6/sbin/../logs': Permission denied |
ÌáʾȨÏÞÓÐÎÊÌ⣬½øÈëÏà¶ÔÓ¦µÄslave£¬²é¿´·¢ÏÖsparkµÄownerÊÇroot£¬¸ü¸ÄsparkµÄȨÏÞ£¬½øÈë
/usr/local Ŀ¼Ï½øÐÐÐÞ¸Ä
sudo chown -R -v hadoop:hadoop spark |
½á¹ûÈçÏÂ

5.2 ÖØÐÂÆô¶¯Spark, Ö®ºóÓÃjps²é¿´Ö÷½ÚµãºÍslave½ÚµãµÄ½ø³Ì
Ö÷½ÚµãµÄ½ø³Ì

×Ó½ÚµãµÄ½ø³Ì

Ö®ºó¿ÉÒÔ½øÈëSpark¼¯ÈºµÄwebÒ³Ãæ£¬·ÃÎÊ£ºcompute-n:8080

5.3 ½øÈëSparkµÄbinĿ¼£¬Æô¶¯spark-sheel¿ØÖÆÌ¨

ÎÒÃÇ¿ÉÒÔͨ¹ýcompute-n:4040, ´ÓwebµÄ½Ç¶ÈÀ´¿´SparkUIµÄÇé¿ö

6. ÔËÐÐÒ»¸öexample£¬À´ÑéÖ¤
6.1 Ê×ÏÈ£¬ÎÒÃÇÖªµÀÔÚshell»·¾³ÖÐÉú³ÉÁËÒ»¸ösc±äÁ¿£¬scÊÇSparkContextµÄʵÀý£¬ÕâÊÇÔÚÆô¶¯Spark
shellµÄʱºòϵͳ°ïÖúÎÒÃÇ×Ô¶¯Éú³ÉµÄ¡£ÎÒÃÇÔÚ±àдSpark´úÂ룬ÎÞÂÛÊÇÒªÔËÐб¾µØ»¹ÊǼ¯Èº¶¼±ØÐëÓÐSparkContextµÄʵÀý
Ö®ºó½øÈëĿ¼ /usr/local/spark/spark-1.3.1-bin-hadoop2.6ÖÐ
hadoop fs -copyFromLocal README.md ./ |
Ö®ºóÈ¥¶ÁÕâ¸öÎļþ
val file = sc.textFile("hdfs://compute-n:8025/user/hadoop/README.md") |
½á¹û

Ö®ºó´Ó¶ÁÈ¡µÄÎļþÖйýÂ˳öËùÓеĵġ°Spark¡±Õâ¸ö´Ê£¬ÔËÐÐ
val sparks = file.filter(line => line.contains("Spark")) |
½á¹û

´ËʱÉú³ÉÁËÒ»¸öFilteredRDD
Ö®ºóͳ¼Æ¡°Spark¡±Ò»¹²³öÏÖÁ˶àÉٴΣ¬ÔËÐÐ

Ö®ºó½øÈëcompute-n:4040ÍøÒ³²é¿´

|