Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
ʹÓà Spark ºÍ IBM Cloud Object Storage ¸ü¿ìµØ·ÖÎöÊý¾Ý
 
À´Ô´£ºIBM ·¢²¼ÓÚ£º2017-8-24
  2701  次浏览      27
 

¸÷Ðи÷Òµ¶¼ÔÚÒÔ¾ªÈ˵ÄËÙ¶ÈÉú³ÉÊý¾Ý£¬°üÀ¨²âÐòϵͳÉú³ÉµÄ»ùÒò×éÊý¾Ý¡¢¾ßÓг¬¸ßÇå¸ñʽµÄýÌåºÍÓéÀÖÊý¾Ý£¬ÒÔ¼°ÖÚ¶à´«¸ÐÆ÷Éú³ÉµÄÎïÁªÍø (IoT) Êý¾Ý¡£IBM Cloud Object Storage£¨IBM COS£¬ÒÔǰ³ÆÎª Cleversafe£©¼¼ÊõΪÕâЩӦÓÃÌṩÁ˸ßÈÝÁ¿¡¢¾­¼ÃÓÐЧµÄ´æ´¢¡£µ«½ö´æ´¢Êý¾Ý»¹²»¹»£»»¹ÐèÒª´ÓÊý¾ÝÖлñÈ¡¼ÛÖµ£»¿ÉÒÔʹÓÃÁìÏȵÄÊý¾Ý·ÖÎö´¦ÀíÒýÇæ Apache Spark À´ÊµÏÖ´ËÄ¿µÄ¡£Spark µÄÔËÐÐËÙ¶ÈÊÇ Hadoop MapReduce µÄ 100 ±¶£¬¶øÇÒËü»¹½áºÏÁË SQL¡¢Á÷´¦ÀíºÍ¸´ÔÓÇé¿ö·ÖÎö¡£

±¾ÎĽ«½éÉÜÈçºÎÈà Spark ÄÜ¶Ô IBM COS Öд洢µÄÊý¾Ý½øÐзÖÎö¡£ÎÒÃǽ«½éÉÜÈçºÎʹÓà Stocator ºÍ OpenStack Keystone£¬Ç°ÕßÊÇÒ»¸öÓÃ×÷Çý¶¯³ÌÐòµÄ¿ªÔ´Èí¼þ£¬ºóÕßÌṩÁËÉí·ÝÑéÖ¤¹¦ÄÜ¡£Stocator ÀûÓÃÁ˶ÔÏó´æ´¢ÓïÒ壬¶øÇÒÓëÒÔǰרΪ´¦ÀíÎļþϵͳ¶øÉè¼ÆµÄ Spark ´æ´¢Á¬½ÓÆ÷Ïà±È£¬Stocator ÏÔÖøÌá¸ßÁËÐÔÄÜ¡£Stocator ²ÉÓà JOSS£¨Ò»¸ö¿ªÔ´ Java ¿Í»§¶Ë£©Éú³É HTTP REST ÃüÁÕâЩÃüÁîͨ¹ý OpenStack Swift ½Ó¿Ú·ÃÎÊ IBM COS¡£

ÏÂͼÑÝʾÁË IBM COS¡¢Stocator Óë OpenStack Keystone Ö®¼äµÄÈý½Ç¹ØÏµ¡£

°²×°ºÍÅäÖà Spark

ÏÂÔØ Spark¡£Spark ÍøÕ¾ÌṩÁ˹¹½¨¡¢°²×°ºÍÅäÖà Spark µÄ²Ù×÷˵Ã÷¡£ÒÀ¾ÝÄúµÄÉèÖ㬿ÉÒÔ½« Spark ÅäÖÃΪһ̨¶ÀÁ¢»úÆ÷£¬»òÕßÔÚ¼¯ÈºÉÏʹÓà YARN¡¢Mesos »ò Spark µÄ¶ÀÁ¢¼¯Èº¹ÜÀíÆ÷¡£ÔÚÎÒÃǵÄʾÀýÖУ¬ÎÒÃǽáºÏʹÓÃÁË IBM COS ºÍ Spark 2.0.1¡£

°²×°ºÍÅäÖà IBM COS

°²×° Cloud Object Storage (COS)¡£ÔÚÎÒÃǵÄʾÀýÖУ¬ÎÒÃÇΪ IBM COS ÉèÖÃÁË Keystone Éí·ÝÑéÖ¤¡£

°²×°ºÍÅäÖà Stocator

ΪÁË´Ó Spark ·ÃÎÊ IBM COS£¬ÎÒÃÇʹÓÃÁË¿ªÔ´Çý¶¯³ÌÐòÈí¼þ Stocator¡£Stocator ÊÇ Spark µÄ¸ßÐÔÄܵĶÔÏó´æ´¢Á¬½ÓÆ÷£¬ËüÀûÓÃÁ˶ÔÏó´æ´¢ÓïÒå¡£ËüÌṩÁË OpenStack Swift API µÄÍêÕûÇý¶¯³ÌÐò£¬¿ÉÇáËɵØÀ©Õ¹ËüÀ´Ö§³ÖÆäËû¶ÔÏó´æ´¢½Ó¿Ú¡£ÎÒÃÇÀûÓÃÁË Stocator ͨ¹ýÆä Swift API ½« Spark Óë IBM COS ÏàÁ¬µÄÄÜÁ¦¡£

ҪʹÓà Stocator£¬ÇëÍê³ÉÒÔϲ½Öè¡£

1.´Ó https://github.com/SparkTC/stocator ÏÂÔØÔ´´úÂ룬ʹÓà git ¸´ÖÆ»ò¿Ë¡Ëü¡£

2.´Ó Stocator µÄĿ¼ÊäÈë mvn clean package ¨CPall-in-one À´¹¹½¨ Stocator¡£

Òª½« Spark ÅäÖÃΪÀûÓà Stocator ·ÃÎÊ IBM COS£¬ÐèÒª¶¨Òå Stocator ¼°ÆäÉèÖá£ÓÐÁ½ÖÖÅäÖà Stocator µÄ·½·¨£º

1.ÏòÅäÖÃÎļþÌí¼Ó²ÎÊý

2.Ïò´úÂëÌí¼Ó²ÎÊý

ÏòÅäÖÃÎļþÌí¼Ó²ÎÊý

Òª´´½¨ core-site.xml ÅäÖÃÎļþ£¬¿ÉÖ´ÐÐÒÔϲÙ×÷Ö®Ò»£º

1.ʹÓà Stocator/conf Ŀ¼ÖÐµÄ core-site.xml.template Îļþ ×÷Ϊģ°å

2.ʹÓà Keystone Version 2 ÅäÖÃÎļþ

3.ʹÓà Keystone Version 3 ÅäÖÃÎļþ

ʹÓà Stocator/conf Ŀ¼ÖÐµÄ core-site.xml.template Îļþ

ÔÚ Configuration Files ²¿·Ö£¬Í¨¹ýÊäÈëÒÔÏÂÃüÁ·ÃÎÊÓÃÓÚÅäÖûùÓÚ Keystone µÄÉí·ÝÑéÖ¤µÄ core-site.xml ʾÀý£º

Çåµ¥ 1. ·ÃÎÊ core-site.xml.template Îļþ

ofer@beginnings:~$ cd ~/stocator/conf
ofer@beginnings:~/stocator/conf$ cp core-site.xml.template ~/spark-2.0.1/conf/core-site.x

ʹÓà Keystone Version 2 ÅäÖÃÎļþ

¶ÔÓÚ Keystone Version 2£¬¿ÉÒÔʹÓÃÕâ¸ö core-site.xml ²¢°´Ç嵥Ϸ½µÄ˵Ã÷½øÐÐÌæ»»¡£

Çåµ¥ 2. Keystone Version 2 ÅäÖÃÎļþ

<configuration>
<property>
<name>fs.swift2d.impl</name>
<value>com.ibm.stocator.fs.ObjectStoreFile

System</value>
</property>

<!-- Keystone based authentication -->
<property>
<name>fs.swift2d.service.spark.auth.url</name>
<value>http://your.keystone.server.com:

5000/v2.0/tokens</value>
</property>
<property>
<name>fs.swift2d.service.spark.public</name>
<value>true</value>
</property>
<property>
<name>fs.swift2d.service.spark.tenant</name>
<value>service</value>
</property>
<property>
<name>fs.swift2d.service.spark.password</name>
<value>passw0rd</value>
</property>
<property>
<name>fs.swift2d.service.spark.username</name>
<value>swift</value>
</property>
<property>
<name>fs.swift2d.service.spark.auth.method</name>
<value>keystone</value>
</property>
<property>
<name>fs.swift2d.service.spark.region</name>
<value>IBMCOS</value>
</property>

</configuration>

1.½« your.keystone.server.com Ìæ»»Îª Keystone ·þÎñÆ÷µÄÕæÊµµØÖ·¡£

2.½«ËùÓÐÉí·ÝÑé֤ƾ֤£¨tenant¡¢username ºÍ password£©Ì滻ΪÄúµÄ¶ÔÏó´æ´¢µÄÓÐЧƾ֤¡£

3.ʹÓÃÈ«Çò Keystone ʱ£¬ÐèÒª¸ù¾ÝΪ IBM COS ·ÃÎʶ¨ÒåµÄ Keystone µØÇøÀ´¶¨Òå region ÊôÐÔ¡£

ʹÓà Keystone Version 3 ÅäÖÃÎļþ

¶ÔÓÚ Keystone Version 3£¬¿ÉÒÔʹÓÃÕâ¸ö core-site.xml ²¢°´Ç嵥Ϸ½µÄ˵Ã÷½øÐÐÌæ»»¡£Çë¼Çס£¬¶ÔÓÚ Keystone Version 3£¬Ê¹Óà userID ºÍ tenantID ´úÌæ username ºÍ tenant Öµ¡£

Çåµ¥ 3. Keystone Version 3 ÅäÖÃÎļþ

<configuration>
<property>
<name>fs.swift2d.impl</name>
<value>com.ibm.stocator.fs.ObjectStoreFileSystem

</value>
</property>

<!-- Keystone based authentication -->
<property>
<name>fs.swift2d.service.spark.auth.url</name>
<value>http://your.keystone.server.com:5000/v3

/auth/tokens</value>
</property>
<property>
<name>fs.swift2d.service.spark.public</name>
<value>true</value>
</property>
<property>
<name>fs.swift2d.service.spark.tenant</name>
<value>1c5c9e97c8db488baeca8d667497aef7</value>
</property>
<property>
<name>fs.swift2d.service.spark.password</name>
<value>passw0rd</value>
</property>
<property>
<name>fs.swift2d.service.spark.username</name>
<value>d2a2adb8bd924c2da1545e2e9ee7c4fe</value>
</property>
<property>
<name>fs.swift2d.service.spark.auth.method</name>
<value>keystoneV3</value>
</property>
<property>
<name>fs.swift2d.service.spark.region</name>
<value>IBMCOS</value>
</property>

</configuration>

1.½« your.keystone.server.com Ìæ»»Îª Keystone ·þÎñÆ÷µÄÕæÊµµØÖ·¡£

2.½«ËùÓÐÉí·ÝÑé֤ƾ֤£¨tenant¡¢username ºÍ password£©Ì滻ΪÄúµÄ¶ÔÏó´æ´¢µÄÓÐЧƾ֤¡£Çë¼Çס£¬¶ÔÓÚ Keystone Version 3£¬Ê¹Óà userID ºÍ tenantID ´úÌæ username ºÍ tenant Öµ¡£

3.ʹÓÃÈ«Çò Keystone ʱ£¬ÐèÒª¸ù¾ÝΪ IBM COS ·ÃÎʶ¨ÒåµÄ Keystone µØÇøÀ´¶¨Òå region ÊôÐÔ¡£

ÔÚ´úÂëÖÐÒÔ±à³Ì·½Ê½Ö¸¶¨²ÎÊý

Èç¹ûϲ»¶ÔÚ´úÂëÖÐÖ¸¶¨²ÎÊý£¬¿ÉÒÔʹÓÃÏÂÃæµÄ´úÂëʾÀý£¬ÆäÖÐµÄ SERVICE_NAME Ϊ spark¡£

Çåµ¥ 4. Ïò´úÂëÌí¼ÓÅäÖòÎÊý

hconf = sc._jsc.hadoopConfiguration()
hconf.set("fs.swift2d.impl", "com.ibm.stocator

.fs.ObjectStoreFileSystem")
hconf.set("fs.swift2d.service.spark.auth.url", "http://your.authentication.server.com/v2.0/tokens")
hconf.set("fs.swift2d.service.spark.public", "true")
hconf.set("fs.swift2d.service.spark.tenant", "service")
hconf.set("fs.swift2d.service.spark.username", "swift")
hconf.set("fs.swift2d.service.spark.auth.method", "keystone")
hconf.set("fs.swift2d.service.spark.password", "passw0rd")
hconf.set("fs.swift2d.service.spark.region", "IBMCOS")

±í 1 ¶Ôÿ¸ö²ÎÊý½øÐÐÁË˵Ã÷¡£

±í 1. ÅäÖòÎÊý

Æô¶¯ÆôÓÃÁË Stocator µÄ Spark

ÔÚÀûÓà Stocator ´Ó Spark ·ÃÎÊ IBM COS ¶ÔÏó֮ǰ£¬ÐèÒª¾²Ì¬µØ½« Spark ºÍ Stocator ÖØÐ±àÒëµ½Ò»Æð£¬»òÕß¶¯Ì¬µØ½« Stocator µÄ¿â´«µÝ¸ø Spark¡£

ÒªÀûÓÃÔ´´úÂëÖØÐ±àÒë Spark À´°üº¬ Stocator Çý¶¯³ÌÐò£¬Çë²ÎÔÄ github É쵀 Stocator ´æ´¢¿â ÖеÄ˵Ã÷¡£

Ҫͨ¹ý Stocator µÄ¶ÀÁ¢ jar ¿âʹÓà Spark£¬¶ø²»ÖØÐ±àÒëËü£¬¿ÉÒÔʹÓà ¨Cjars Ñ¡ÏîÔËÐÐ Spark¡£ÔÚÎÒÃǵĻ·¾³ÖУ¬ÎÒÃÇʹÓÃÁË 1.0.8 °æµÄ Stocator£¬ËùÒÔ¶ÀÁ¢ jar ¿âµÄÃû³ÆÎª stocator-1.0.8-SNAPSHOT-jar-with-dependencies.jar¡£ÔÚÎÒÃǵÄʾÀý»·¾³ÖУ¬´«µÝ¸ø Spark µÄÑ¡Ïî°üÀ¨£º¨Cjars stocator-1.0.8-SNAPSHOT-jar-with-dependencies.jar¡£

ofer@beginnings:~$ ~/spark-2.0.1/bin/spark-shell \
--jars stocator-1.0.8-SNAPSHOT-jar-with-dependencies.jar
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

´Ó Spark ·ÃÎÊ IBM COS ¶ÔÏó

ÔÚ Spark ÉÏÆôÓà Stocator ºó£¬¾Í¿ÉÒÔʹÓÃģʽ swift2d://<container>.<service>/ ´Ó Spark ·ÃÎÊ IBM COS ¶ÔÏó¡£swift2d ¹Ø¼ü×Ö¸æËß Spark ʹÓÃÄĸöÇý¶¯³ÌÐòÀ´·ÃÎÊ´æ´¢¡£Ëü±íÃ÷ÄúÕýÔÚʹÓà Stocator ·ÃÎÊÒ»¸ö¶ÔÏó´æ´¢¡£ÈÝÆ÷ºÍ·þÎñ½«ÔÚÏÂÒ»½ÚÖиüÏêϸµØ½éÉÜ¡£

ÀýÈ磬ÒÔÏ Python ´úÂë´Ó IBM COS ¶Áȡһ¸öÃûΪ data.json µÄ JSON ¶ÔÏ󣬲¢½«Ëü×÷Ϊһ¸öÃûΪ data.parquet µÄ Parquet ¶ÔÏóд»Ø¡£

Çåµ¥ 6. ·ÃÎÊ IBM COS ¶ÔÏó

df = sqlContext.read.json("swift2d://vault.spark/data.json¡±)
df.write.parquet("swift2d://vault.spark/data.parquet¡±)

²âÊÔ Spark Óë IBM COS Ö®¼äµÄÁ¬½Ó

ΪÁ˲âÊÔ Spark Óë IBM COS Ö®¼äµÄÁ¬½Ó£¬ÎÒÃÇʹÓÃÁËÒ»¶Î¼òµ¥µÄ Python ½Å±¾£¬¸Ã½Å±¾½«µ¥Ò»ÁÐ±í µÄ 6 ¸öÔªËØ·Ö²¼ÔÚ Spark ¼¯ÈºÉÏ£¬½«Êý¾ÝдÈë Parquet ¶ÔÏóÖУ¬×îºó¶Á»Ø¸Ã¶ÔÏó¡£Parquet ¶ÔÏóµÄÃû³Æ±»×÷Ϊ²ÎÊý´«Èë½Å±¾ÖС£

¸ÃÊý¾ÝÏÔʾÁËÁ½´Î£ºµÚÒ»´ÎÊÇÔÚдÈë¶ÔÏó´æ´¢Ö®Ç°£¬ÓëÆäģʽһÆðÏÔʾ£»µÚ¶þ´ÎÊÇÔÚ´Ó¶ÔÏó´æ´¢¶Á»ØÖ®ºó¡£

Çåµ¥ 7. ²âÊÔÁ¬½ÓµÄ Python ½Å±¾

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
import sys

sc = SparkContext()
sqlContext = SQLContext(sc)

if (len(sys.argv) != 2):
print "ERROR: This program takes object name as input"
sys.exit(0)

objectName = sys.argv[1]

myList = [[1,'a'],[2,'b'],[3,'c'],[4,'d'],[5,'e'],[6,'f']]
parallelList = sc.parallelize(myList).collect()
schema = StructType([StructField('column1', IntegerType(), False),
StructField('column2', StringType(), False)])
df = sqlContext.createDataFrame(parallelList, schema)
df.printSchema()
df.show()
dfTarget = df.coalesce(1)
dfTarget.write.parquet("swift2d://vault.spark/" + objectName)
dfRead = sqlContext.read.parquet("swift2d://vault.spark/" + objectName)
dfRead.show()
print "Done!"

ÒªÔËÐиýű¾£¬ÇëÍê³ÉÒÔϲ½Ö裺

1.½«´úÂëÒÔÎļþ sniff.test.py µÄÐÎʽ±£´æÔÚ Çåµ¥ 7 ÖС£

2.´´½¨Ò»¸öÃûΪ vault µÄÈÝÆ÷¡£

3.½«·þÎñ£¨url ÖеÄÈÝÆ÷Ãû³ÆºóÏÔʾµÄ´ÊÓÉèÖÃΪ core-site.xml ÎļþÖж¨ÒåµÄ SERVICE_NAME£¨Çë¼Çס£¬ÎÒÃǵÄʾÀýÖÐʹÓÃÁË spark£©¡£

4.·¢³öÒÔÏÂÃüÁÆäÖÐµÄ testing.parquet ÊÇÒª´´½¨²¢¶ÁÈ¡µÄ¶ÔÏóµÄÃû³Æ£ºspark-submit --jars stocator-1.0.8-SNAPSHOT-jar-with-dependencies.jar sniff.test.py testing.parquet¡£

Äú»áÔÚ IBM COS Öп´µ½Ò»¸ö testing.parquet ¶ÔÏó£¬ÒÔ¼°ÒÔÏ Spark Êä³ö£º

Çåµ¥ 8. È·ÈÏÒÑÁ¬½ÓµÄ Spark ½á¹û

root
|-- column1: integer (nullable = false)
|-- column2: string (nullable = false)

+-------+-------+
|column1|column2|
+-------+-------+
| 1| a|
| 2| b|
| 3| c|
| 4| d|
| 5| e|
| 6| f|
+-------+-------+

+-------+-------+
|column1|column2|
+-------+-------+
| 1| a|
| 2| b|
| 3| c|
| 4| d|
| 5| e|
| 6| f|
+-------+-------+

Done!

½áÊøÓï

ͨ¹ýÅäÖà Spark¡¢Stocator ºÍ IBM Cloud Object Storage À´Ð­Í¬¹¤×÷£¬¿ÉÒÔʹÓöÔÏó´æ´¢ÓïÒå¸ü¿ìµØ·ÃÎʺͷÖÎö´æ´¢µÄÊý¾Ý£¬¶øÎÞÐèʹÓÃΪ´¦ÀíÎļþϵͳÉè¼ÆµÄ¾Éʽ´æ´¢Á¬½ÓÆ÷¡£

 

   
2701 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ