¼ò½é
Spark ÊÇÒ»¸ö·Ç³£ºÃµÄ¼ÆËãÆ½Ì¨£¬Ö§³Ö¶àÖÖÓïÑÔ£¬Í¬Ê±»ùÓÚÄÚ´æµÄ¼ÆËãËÙ¶ÈÒ²·Ç³£¿ì¡£Õû¸ö¿ªÔ´ÉçÇøÒ²ºÜ»îÔ¾¡£
µ«ÊÇSparkÔÚÒ×ÓÃÐÔÉÏÃæ»¹ÊÇÓÐһЩÃÀÖв»×ã¡£ ¶ÔÓÚ¸Õ½Ó´¥µÄÈËÀ´Ëµ£¬ÉÏÊÖÒÔ¼°»·¾³´î½¨»¹ÊÇÓÐһЩÀ§ÄÑ¡£ ÁíÍ⣬Èç¹ûÏ£Íû½«½á¹û»æÖƳÉͼ±í·ÖÏí¸ø±ðÈË£¬»¹ÐèÒªºÜ³¤Ò»¶Î·³Ì¡£
ĿǰÒѾÓÐһЩ½â¾ö·½°¸£º
¡¾TBD¡¿Jupyter Notebook
ʹÓúܹ㷺£¬µ«ÊÇ¿´ÆðÀ´Ö÷Òª»¹ÊÇÒÔǰipython-notebookµÄÔöÇ¿°æ¡£
Ŀǰ±ÊÕß¶ÔÆäÁ˽ⲻ¶à
Spark ĸ¹«Ë¾DataBricksÌṩµÄDataBricks Community Edition, ÀïÃæ×Ô´øSpark¼¯Èº + Notebook¡£
Ò×ÓÃÐÔ¡¢¹¦ÄÜÐÔ¶¼ºÜ²»´í¡£È±µãÊǼ¯Èº¼ÜÉèÔÚAWSÖ®ÉÏ£¬ÎÞ·¨¸ú×Ô¼º±¾µØµÄSpark ¼¯ÈºÁ¬ÔÚÒ»Æð
Apache Zeppelin
ÕâÊÇÒ»¸ö¸Õ¸Õ´ÓIncubationתÕýµÄÏîÄ¿
µ«ÊÇÒѾÔÚ¸÷´ó¹«Ë¾¾ùÓвÉÓ㬱ÈÈçÃÀÍÅ¡¢Î¢ÈíµÈµÈ
±¾ÎÄÖ÷Òª¾ÍÊǽéÉÜÈçºÎÔÚ±¾µØ´î½¨Ò»¸öZeppelin ʹµÃSpark¸üÒ×Óã¬Í¬Ê±¿ÉÒԺܷ½±ãµÄ½«×Ô¼ºµÄ¹¤×÷³É¹¦Õ¹Ê¾¸ø¿Í»§
½èÓñðÈ˵ÄÒ»¸öЧ¹ûͼÕòÂ¥^_^

×¢Ò⣺
Zeppelin×Ô´øSparkʵÀý£¬ÄúÎÞÐè×Ô¼º¹¹½¨Ò»¸öSpark ¼¯Èº¾Í¿ÉÒÔѧϰZeppelin
Zeppelin µ±Ç°(2016Äê8ÔÂ19ÈÕ)×îа汾0.6.1, Ö»¼æÈÝ2.0+
1)Èç¹ûÄú±¾µØÓÐSpark ¼¯Èº²¢ÇÒ°æ±¾ÊÇ1.6.1 + Scala 2.10 , ÇëÏÂÔØZeppelin 0.6.0µÄ°æ±¾
2)Èç¹û¹ÙÍøµÄËٶȱȽÏÂý£¬¿ÉÒԲο¼ÏÂÃæµÄ·½Ê½µ½°Ù¶ÈÅÌÏÂÔØ
Á´½Ó: http://pan.baidu.com/s/1ctBBJo ÃÜÂë: e68g
1¡¢ ÏÂÔØ
Èç¹ûÄúÐèÒªµÄÊÇ0.6.0µÄ°æ±¾£¬¿ÉÒԲο¼ÉÏÃæ°Ù¶ÈÅ̵ÄÏÂÔØÁ´½Ó¡£
Èç¹ûÄúÐèÒªµÄÊÇ0.6.1+µÄ°æ±¾£¬¿ÉÒÔÖ±½Óµ½¹ÙÍøÏÂÔØ£¬ ÀïÃæµÄMirrorÏÂÔØËÙ¶ÈÒ»°ã»¹²»´í
2¡¢ °²×°
°æ±¾£º Zeppelin 0.6.0 + ×Ô½¨Spark¼¯Èº(1.6.1)
¸Ð¾õZeppelin»¹ÊDz»Ì«³ÉÊ죬²¢¿ªÏä¾ÍÓ㬻¹ÐèÒª²»ÉÙÈ˹¤µ÷Õû²ÅÄÜÕý³£¹¤×÷
1)½âѹ֮ºó£¬Ê×ÏÈÐèÒª´ÓÄ£°å´´½¨Ò»¸öеÄzeppelin-env.sh£¬ ²¢ÉèÖÃSPARK_HOME. ±ÈÈ磺
1export SPARK_HOME=/usr/lib/spark
Èç¹ûÊÇ»ùÓÚHadoop »òÕß Mesos ´î½¨µÄSpark ¼¯Èº£¬»¹ÐèÒª½øÐÐÁíÍâµÄÉèÖá£
2)´ÓÄ£°å´´½¨Ò»¸öеÄzeppelin-site.xml£¬²¢½«Ö®Ç°µÄ8080¶Ë¿Ú¸Äµ½±ÈÈç8089£¬±ÜÃâÓëTomcatµÈ¶Ë¿Ú³åÍ»
<property>
<name>zeppelin.server.port</name>
<value>8089</value>
<description>Server port.</description>
</property> |
3)Ìæ»»jacksonÏà¹ØÀà¿â
a)ĬÈÏ×Ô´øµÄÊÇ2.5.*, µ«ÊÇʵ¼ÊʹÓõÄʱºòÖ¸¶¨µÄÊÇ2.4.4
b)²¢ÇÒ¿ÉÄÜ2.4.4 Óë 2.5.* ²¢²»ÍêÈ«¼æÈÝ¡£
c)Òò´ËÐèҪʹÓÃ2.4.4 Ìæ»»2.5.* £¬ ÓÐÏÂÃæ3¸öjarÐèÒªÌæ»»£º
jackson-annotations-2.4.4.jar
jackson-core-2.4.4.jar
jackson-databind-2.4.4.jar |
d)ÕâÕæµÄÊǷdz£¿ÓÈ˵ÄÒ»¸öµØ·½¡£¡£¡£
×öÍêÉÏËß¼¸²½Ö®ºó£¬¾Í¿ÉÒÔÆô¶¯À²£º
Æô¶¯/Í£Ö¹ÃüÁ
bin/zeppelin-daemon.sh stop/start |
Æô¶¯Ö®ºó£¬´ò¿ªhttp://localhost:8089 ¾Í¿ÉÒÔ¿´µ½ZeppelinµÄÖ÷½çÃæÀ²

3. ÅäÖÃSpark½âÊÍÆ÷
Spark InterpreterµÄÅäÖ÷dz£¼òµ¥£¬¿ÉÒÔÖ±½Ó²Î¿¼ÏÂͼµÄÅäÖ÷½Ê½£º

4. ¼¸µãʹÓþÑé
Zeppline×Ô´ø±È½ÏÏêϸµÄTutorial, ¸÷λ¿´×Ô´øµÄnotebook tutorial ¿ÉÄÜЧ¹û¸üºÃ¡£ µ«ÊÇÎÒÔÚµÚÒ»´ÎʹÓõÄʱºò£¬Óöµ½Á˲»ÉÙ¿Ó£¬Ôڴ˼ǼÏÂÀ´£¬¸ø´ó¼Ò×ö¸ö²Î¿¼£º
(1) ÈÎÎñÌá½»Ö®ºó²»»á×Ô¶¯Í£Ö¹
µ±Zeppelin Ìá½»ÈÎÎñÖ®ºó£¬¿ÉÒÔ¿´µ½Spark Master UI ÉÏÃæ£¬µ±Ç°ÈÎÎñ¼´Ê¹Ö´ÐÐÍê³ÉÁË£¬Ò²²»»á×Ô¶¯Í˵ô
ÕâÊÇÒòΪ£¬Zeppelin ĬÈϾÍÏñÈËÊÖ¹¤ÔËÐÐÁËspark-shell spark://master-ip:7077 Ò»Ñù£¬ ³ý·ÇÊÖ¶¯¹Ø±ÕshellÃüÁ·ñÔò»áÒ»Ö±Õ¼ÓÃ×Å×ÊÔ´
½â¾ö°ì·¨¾ÍÊǽ«spark ½âÊÍÆ÷(interpreter) ÖØÆô
ÊÖ¶¯µÄÖØÆô°ì·¨£º
1.´ò¿ªInterpreter½çÃæ£¬ËÑË÷µ½Spark²¿·Ö²¢µã»÷ÖØÆô
2.ÍÆ¼ö£º µ÷ÓÃRestful API ½øÐÐÖØÆô¡£
a.¿ÉÒÔͨ¹ýChromeµÄNetwork ¼à¿Ø¿´Ò»Ïµã»÷restartÖ®ºó¾ßÌåµ÷ÓõÄAPIµÄÇé¿ö¡£ÈçÏÂͼ£º

b.Õâ¸öID(2BUDQXH2R)ÔÚ¸÷×ԵĻ·¾³¿ÉÄܸ÷²»Ïàͬ¡£ÁíÍâÕâ¸öAPIÊÇPUTµÄ·½Ê½£¬¿ÉÒÔÖ±½ÓʹÓÃÏÂÃæµÄpython´úÂëÔÚUIÉÏ×Ô¶¯ÖØÆô
%python
import requests
r = requests.put("http://IP:8089/api/interpreter/setting/restart/2BUDQXH2R")
print r.text |
(2) Òì³£Ìáʾ£ºCannot call methods on a stopped SparkContext
±ÈÈçÎÒÃÇÔÚSpark Master UI ÉÏÃæ½«µ±Ç°job kill Ö®ºó£¬ÔÚZeppelinÕâ±ßÖØÆôÖ´ÐÐÈÎÎñ¾Í»áÓöµ½Õâ¸öÒì³£ÐÅÏ¢¡£
½â¾ö°ì·¨ºÜ¼òµ¥£º ÖØÆô½âÎöÆ÷
(3) ²»ÒªÖ÷¶¯µ÷Óà sc.stop()
ÕâÊǹٷ½Ã÷ȷ˵Ã÷µÄ£ºscala µÄspark-shell ×Ô¶¯³õʼ»¯ÁËSparkContext / SqlContext µÈµÈ
²»ÄÜ×Ô¼ºµ÷ÓÃsc.stop() Ö®ºóÖØÆô´´½¨Ò»¸öSparkContext
¿ÉÄܱÊÕßˮƽÔÒò£¬³¢ÊÔ×Ô¼º´´½¨ÐµÄsc Ö®ºó£¬¸÷ÖÖÆæÆæ¹Ö¹ÖµÄÎÊÌâ
(4) ¹ØÓÚpython module
Python Interpreter¿ÉÒÔʹÓõ±Ç°ZeppelinËùÔÚ»úÆ÷µÄpython ËùÓеÄmodel ͬʱ֧³Öpython 2 Óë python 3
ÕâÊÇÒ»¸öºÜÓÐÓõŦÄÜ£¬±ÈÈçÎÒʹÓÃspark½«Êý¾Ý¼ÆËãÍê³ÉÖ®ºó£¬Éú³ÉÁËÒ»¸ö²¢²»Ì«´óµÄcsvÎļþ¡£Õâ¸öʱºòÍêÈ«¿ÉÒÔʹÓÃPandasÇ¿´óµÄ´¦ÀíÄÜÁ¦À´½øÐжþ´Î´¦Àí£¬²¢×îÖÕʹÓÃZeppelinµÄ×Ô¶¯»æÍ¼ÄÜÁ¦Éú³É±¨±í
ÓëTableauÖ®ÀàµÄBI¹¤¾ßÏà±È¹¦ÄܲîÁËһЩ£¬²»¹ý¸÷ÓÐËù³¤¡£Zeppelin ¶Ô³ÌÐòÔ±À´Ëµ¿ÉÒÔËãÊǷdz£·½±ãµÄÒ»¸ö¹¤¾ßÁË¡£ ¶ÔÈÕ³£µÄһЩ¼òµ¥±¨±íµÄ¹¤×÷Á¿´ó´ó¼õСÁË
(5) ¿ÉÒÔÉèÖÃ×Ô¶¯ÔËÐÐʱ¼ä
ÔÚÕû¸öNoteµÄ×îÉ϶ˣ¬¿ÉÒÔÉèÖõ±Ç°notebook ¶¨ÆÚÖ´ÐС£ ¶øÇÒ×¢Ò⣺ »¹¿ÉÒÔÉèÖÃÖ´ÐÐÍê³ÉÖ®ºó×Ô¶¯ÖØÆôinterpreter ²Î¿¼ÏÂͼ£º

|