Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Pythonѧϰ±Ê¼Ç¡ª¡ª´óÊý¾ÝÖ®Spark¼ò½éÓë»·¾³´î½¨
 
  1811  次浏览      27
 2019-9-25
 
±à¼­ÍƼö:
±¾ÎÄÀ´×Ôcsdn,±¾ÎÄÖ÷Òª½éÉÜÁË´óÊý¾ÝSpark¼ò½éÓëhadoop¼¯Èº´î½¨£¬scala°²×°ÒÔ¼°Spark°²×°ÒÔ¼°ÅäÖã¬Ï£Íû¶ÔÄúµÄѧϰÓÐËù°ïÖú¡£

Spark ÊÇ Apache ¶¥¼¶ÏîÄ¿ÀïÃæ×î»ðµÄ´óÊý¾Ý´¦ÀíµÄ¼ÆËãÒýÇæ£¬ËüĿǰÊǸºÔð´óÊý¾Ý¼ÆËãµÄ¹¤×÷¡£°üÀ¨ÀëÏß¼ÆËã»ò½»»¥Ê½²éѯ¡¢Êý¾ÝÍÚ¾òËã·¨¡¢Á÷ʽ¼ÆËãÒÔ¼°Í¼¼ÆËãµÈ¡£

sparkÉú̬ϵͳ

ºËÐÄ×é¼þÈçÏ£º

Spark Core£º°üº¬SparkµÄ»ù±¾¹¦ÄÜ£»ÓÈÆäÊǶ¨ÒåRDDµÄAPI¡¢²Ù×÷ÒÔ¼°ÕâÁ½ÕßÉϵ͝×÷¡£ÆäËûSparkµÄ¿â¶¼Êǹ¹½¨ÔÚRDDºÍSpark CoreÖ®Éϵġ£

Spark SQL£ºÌṩͨ¹ýApache HiveµÄSQL±äÌåHive²éѯÓïÑÔ£¨HiveQL£©ÓëSpark½øÐн»»¥µÄAPI¡£Ã¿¸öÊý¾Ý¿â±í±»µ±×öÒ»¸öRDD£¬Spark SQL²éѯ±»×ª»»ÎªSpark²Ù×÷¡£¶ÔÊìϤHiveºÍHiveQLµÄÈË£¬Spark¿ÉÒÔÄÃÀ´¾ÍÓá£

Spark Streaming£ºÔÊÐí¶ÔʵʱÊý¾ÝÁ÷½øÐд¦ÀíºÍ¿ØÖÆ¡£ºÜ¶àʵʱÊý¾Ý¿â£¨ÈçApache Store£©¿ÉÒÔ´¦ÀíʵʱÊý¾Ý¡£Spark StreamingÔÊÐí³ÌÐòÄܹ»ÏñÆÕͨRDDÒ»Ñù´¦ÀíʵʱÊý¾Ý¡£

MLlib£ºÒ»¸ö³£ÓûúÆ÷ѧϰËã·¨¿â£¬Ëã·¨±»ÊµÏÖΪ¶ÔRDDµÄSpark²Ù×÷¡£Õâ¸ö¿â°üº¬¿ÉÀ©Õ¹µÄѧϰËã·¨£¬±ÈÈç·ÖÀà¡¢»Ø¹éµÈÐèÒª¶Ô´óÁ¿Êý¾Ý¼¯½øÐеü´úµÄ²Ù×÷¡£Ö®Ç°¿ÉÑ¡µÄ´óÊý¾Ý»úÆ÷ѧϰ¿âMahout£¬½«»áתµ½Spark£¬²¢ÔÚδÀ´ÊµÏÖ¡£

GraphX£º¿ØÖÆÍ¼¡¢²¢ÐÐͼ²Ù×÷ºÍ¼ÆËãµÄÒ»×éËã·¨ºÍ¹¤¾ßµÄ¼¯ºÏ¡£GraphXÀ©Õ¹ÁËRDD API£¬°üº¬¿ØÖÆÍ¼¡¢´´½¨×Óͼ¡¢·ÃÎÊ·¾¶ÉÏËùÓж¥µãµÄ²Ù×÷¡£

ÓÉÓÚÕâЩ×é¼þÂú×ãÁ˺ܶà´óÊý¾ÝÐèÇó£¬Ò²Âú×ãÁ˺ܶàÊý¾Ý¿ÆÑ§ÈÎÎñµÄËã·¨ºÍ¼ÆËãÉϵÄÐèÒª£¬Spark¿ìËÙÁ÷ÐÐÆðÀ´¡£²»½öÈç´Ë£¬SparkÒ²ÌṩÁËʹÓÃScala¡¢JavaºÍPython±àдµÄAPI£»Âú×ãÁ˲»Í¬ÍÅÌåµÄÐèÇó£¬ÔÊÐí¸ü¶àÊý¾Ý¿ÆÑ§¼Ò¼ò±ãµØ²ÉÓÃSpark×÷ΪËûÃǵĴóÊý¾Ý½â¾ö·½°¸

sparkµÄ´æ´¢²ã´Î

spark²»½ö¿ÉÒÔ½«ÈκεÄhadoop·Ö²¼Ê½ÎļþϵͳÉϵÄÎļþ¶ÁȡΪ·Ö²¼Ê½Êý¾Ý¼¯£¬Ò²¿ÉÒÔÖ§³ÖÆäËûÖ§³Öhadoop½Ó¿ÚµÄϵͳ£¬±ÈÈç±¾µØÎļþ¡¢ÑÇÂíÑ·S3¡¢Hive¡¢HBaseµÈ¡£ ÏÂͼΪhadoopÓë½ÚµãÖ®¼äµÄ¹ØÏµ£º

spark on yarn

Apache Hadoop YARN £¨Yet Another Resource Negotiator£¬ÁíÒ»ÖÖ×ÊԴЭµ÷Õߣ©ÊÇÒ»ÖÖÐ嵀 Hadoop ×ÊÔ´¹ÜÀíÆ÷£¬ËüÊÇÒ»¸öͨÓÃ×ÊÔ´¹ÜÀíϵͳ£¬¿ÉΪÉϲãÓ¦ÓÃÌṩͳһµÄ×ÊÔ´¹ÜÀíºÍµ÷¶È.YARN ·Ö²ã½á¹¹µÄ±¾ÖÊÊÇ ResourceManager¡£Õâ¸öʵÌå¿ØÖÆÕû¸ö¼¯Èº²¢¹ÜÀíÓ¦ÓóÌÐòÏò»ù´¡¼ÆËã×ÊÔ´µÄ·ÖÅä¡£ResourceManager ½«¸÷¸ö×ÊÔ´²¿·Ö£¨¼ÆËã¡¢ÄÚ´æ¡¢´ø¿íµÈ£©¾«Ðݲşø»ù´¡ NodeManager£¨YARN µÄÿ½Úµã´úÀí£©?Hadoop2°æ±¾ÒÔÉÏ£¬ÒýÈëYARNÖ®ºó£¬²»½ö½ö¿ÉÒÔʹÓÃMapReduce£¬»¹¿ÉÒÔÒýÓÃsparkµÈµÈ¼ÆËã?

1.hadoop¼¯Èº´î½¨(master+slave01)

¼¯Èº»úÆ÷×¼±¸

<1>ÔÚVMwareÖÐ×¼±¸ÁËÁ½Ì¨ubuntu14.04µÄÐéÄâ»ú£¬ÐÞ¸ÄÖ÷»úÃûΪmaster,slave01,²¢ÇÒÁ½Ì¨»úÆ÷µÄÖ÷»úÃûÒÔ¼°ipÈçÏÂ(¸ù¾Ý×Ô¼ºËùÔÚÍøÂç»·¾³ÐÞ¸Ä)£º

<2>ÐÞ¸ÄmasterºÍslave01µÄ/etc/hostsÎļþÈçÏ£º

127.0.0.1 localhost
192.168.1.123 master
192.168.1.124 slave01

ͨ¹ýpingÃüÁî²âÊÔÁ½Ì¨Ö÷»úµÄÁ¬Í¨ÐÔ

ÅäÖÃsshÎÞÃÜÂë·ÃÎʼ¯Èº

<1>·Ö±ðÔÚÁ½Ì¨Ö÷»úÉÏÔËÐÐÒ»ÏÂÃüÁî

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

<2>½«slave01µÄ¹«Ô¿id_dsa.pub´«¸ømaster

scp ~/.ssh/id_dsa.pub itcast@master:/home/itcast/.ssh/id_dsa.pub.slave01

<3>½« slave01µÄ¹«Ô¿ÐÅÏ¢×·¼Óµ½ master µÄ authorized_keysÎļþÖÐ

cat id_dsa.pub.slave01 >> authorized_keys

<4>½« master µÄ¹«Ô¿ÐÅÏ¢ authorized_keys ¸´ÖƵ½slave02 µÄ .ssh Ŀ¼ÏÂ

scp authorized_keys itcast@slave01:/home/itcast/.ssh/authorized_keys

sshµ½slave01ÉÏ£º

sshµ½masterÉÏ£º

jdkÓëhadoop°²×°°ü°²×°

<1>ʹÓÃjdk_u780°æ±¾µÄ°²×°°ü£¬ËùÓл·¾³Í³Ò»½âѹµ½/opt/software/Ŀ¼Ï£¬·Ö±ðÔÚmasterºÍslave01ÖÐÌí¼Ó»·¾³±äÁ¿£º

export JAVA_HOME=/opt/software/java/jdk1.7.0_80
export JRE_HOME=/opt/software/java/jdk1.7.0_80/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

<2>ʹÓÃHadoop-2.6.0±¾µÄ°²×°°ü£¬ËùÓл·¾³Í³Ò»½âѹµ½/opt/software/Ŀ¼Ï£¬·Ö±ðÔÚmasterºÍslave01ÖÐÌí¼Ó»·¾³±äÁ¿,¼Çס½«binĿ¼Ìí¼Óµ½PATHÖУº

export HADOOP_HOME=/opt/software/hadoop/hadoop-2.6.0

1.2.ÅäÖÃhadoop»·¾³

¼¯ÈºÅäÖÃ

<1>ÔÚ/opt/software/hadoop/hadoop-2.6.0/etc/hadoopĿ¼ÏÂÐÞ¸Ähadoop-env.sh Ôö¼ÓÈçÏÂÅäÖãº

export JAVA_HOME=/opt/software/java/jdk1.7.0_80
export HADOOP_PREFIX=/opt/software/hadoop/hadoop-2.6.0

<2>ÐÞ¸Äcore-site.xml,tmpĿ¼ÐèÒªÌáǰ´´½¨ºÃ

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/software/hadoop/hadoop-2.6.0/tmp</value>
</property>
</configuration>

<3>ÐÞ¸Ähdfs-site.xml,Ö¸¶¨Êý¾ÝµÄ¸±±¾¸öÊý

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

<4>ÐÞ¸Ämapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

<5>yarn-env.shÖÐÔö¼ÓJAVA_HOMEµÄ»·¾³

<6>ÐÞ¸Äyarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>

<7>slavesÖÐÔö¼Ó¼¯ÈºÖ÷»úÃû

master
slave01

ÔÚslave01ÉÏ×öͬÑùµÄÅäÖÃ

Æô¶¯hadoop¼¯Èº

<1>start-dfs.sh,Æô¶¯namenodeºÍdatanode

ÔÚmasterºÍslave01ÉÏʹÓÃjpsÃüÁî²é¿´java½ø³Ì£¬Ò²¿ÉÒÔÔÚhttp://master:50070/

Éϲ鿴

<2>start-yarn.sh,Æô¶¯ ResourceManager ºÍ NodeManager,ÔÚmasterºÍslave01ÉÏ

ʹÓÃjpsÃüÁî²é¿´java½ø³Ì

Èç¹ûÉÏÊö¶¼³É¹¦ÁË£¬ÄÇô¼¯ÈºÆô¶¯¾Í³É¹¦ÁË

1.3.scala°²×°

<1>ʹÓÃscala-2.10.6°æ±¾µÄ°²×°°ü£¬Í¬Ñù½âѹ·ÅÔÚ/opt/software/scala/Ï£¬Ïà¹ØÎļþ¼ÐÐè×Ô¼º´´½¨

<2>Ð޸ļÒĿ¼ÏµÄ.bashrcÎļþ£¬Ìí¼ÓÈçÏ»·¾³,¼ÇסÌí¼ÓscalaµÄPATH·¾¶£º

export SCALA_HOME=/opt/software/scala/scala-2.10.6
export SPARK_HOME=/opt/software/spark/spark-1.6.0-bin-hadoop2.6

ÔËÐÐsource .barhrcʹ»·¾³±äÁ¿ÉúЧ

<3>ͬÑùÔÚslave01ÉÏÅäÖÃ

<4>ÊäÈëscalaÃüÁ²é¿´ÊÇ·ñÉúЧ

1.4.Spark°²×°ÒÔ¼°ÅäÖÃ

Spark°²×°

<1>spark°²×°°üʹÓÃspark-1.6.0-bin-hadoop2.6£¬Í¬Ñù½âѹµ½/opt/software/hadoop

<2>Ð޸Ļ·¾³±äÁ¿Îļþ.bashrc£¬Ìí¼ÓÈçÏÂÄÚÈÝ

export SPARK_HOME=/opt/software/spark/spark-1.6.0-bin-hadoop2.6

#ÒÔÏÂÊÇÈ«²¿µÄPATH±äÁ¿
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP
_HOME/sbin:$HADOOP_HOM E/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

ÔËÐÐsource .bashrcʹ»·¾³±äÁ¿ÉúЧ

SparkÅäÖÃ

<1>½øÈëSparkµÄ°²×°Ä¿Â¼ÏµÄconfĿ¼£¬¿½±´spark-env.sh.templateµ½spark-env.sh

cp spark-env.sh.template spark-env.sh

±à¼­spark-env.sh,ÔÚÆäÖÐÌí¼ÓÒÔÏÂÅäÖÃÐÅÏ¢£º

export SCALA_HOME=/opt/software/scala/scala-2.10.6
export JAVA_HOME=/opt/software/java/jdk1.7.0_80
export SPARK_MASTER_IP=192.168.0.114
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/opt/software/hadoop/hadoop-2.6.0/etc/hadoop

<2>½«slaves.template¿½±´µ½slaves£¬±à¼­ÆðÄÚÈÝΪ£º

master
slave01

´ËÅäÖñíʾҪ¿ªÆôµÄworkerÖ÷»ú?<3>slave01ͬÑù²ÎÕÕmasterÅäÖÃ

Spark¼¯ÈºÆô¶¯

<1>Æô¶¯Master½Úµã£¬ÔËÐÐstart-master.sh,½á¹ûÈçÏ£º

<2>Æô¶¯ËùÓеÄworker½Úµã£¬ÔËÐÐstart-slaves.sh£¬ÔËÐнá¹ûÈçÏ£º

<3>ÊäÈëjpsÃüÁî²é¿´Æô¶¯Çé¿ö

<4>ä¯ÀÀÆ÷·ÃÎÊhttp://master:8080

¿É²é¿´Spark¼¯ÈºÐÅÏ¢

   
1811 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ