±¾ÆªÎÄÕÂÖÐÎÒÃǽ«Ñ§Ï°ÈçºÎʹÓÃApache
Spark streaming£¬Kafka£¬Node.js£¬Socket.IOºÍHighcharts¹¹½¨ÊµÊ±·ÖÎöDashboard¡£

ÎÊÌâÃèÊö
µç×ÓÉÌÎñÃÅ»§Ï£Íû¹¹½¨Ò»¸öʵʱ·ÖÎöÒDZíÅÌ£¬¶Ôÿ·ÖÖÓ·¢»õµÄ¶©µ¥ÊýÁ¿×öµ½¿ÉÊÓ»¯£¬´Ó¶øÓÅ»¯ÎïÁ÷µÄЧÂÊ¡£
½â¾ö·½°¸
½â¾ö·½°¸Ö®Ç°£¬ÏÈ¿ìËÙ¿´¿´ÎÒÃǽ«Ê¹ÓõŤ¾ß£º
Apache Spark¨C Ò»¸öͨÓõĴó¹æÄ£Êý¾Ý¿ìËÙ´¦ÀíÒýÇæ¡£SparkµÄÅú´¦ÀíËٶȱÈHadoop
MapReduce¿ì½ü10±¶£¬¶øÄÚ´æÖеÄÊý¾Ý·ÖÎöËÙ¶ÈÔò¿ì½ü100±¶¡£ ¸ü¶à ¹ØÓÚApache SparkµÄÐÅÏ¢¡£
Python¨C PythonÊÇÒ»Öֹ㷺ʹÓõĸ߼¶£¬Í¨Ó㬽âÊÍ£¬¶¯Ì¬±à³ÌÓïÑÔ¡£
¸ü¶à ¹ØÓÚPythonµÄÐÅÏ¢¡£
Kafka¨C Ò»¸ö¸ßÍÌÍÂÁ¿£¬·Ö²¼Ê½ÏûÏ¢·¢²¼¶©ÔÄϵͳ¡£ ¸ü¶à ¹ØÓÚKafkaµÄÐÅÏ¢¡£
Node.js¨C »ùÓÚʼþÇý¶¯µÄI/O·þÎñÆ÷¶ËJavaScript»·¾³£¬ÔËÐÐÔÚV8ÒýÇæÉÏ¡£
¸ü¶à ¹ØÓÚNode.jsµÄÐÅÏ¢¡£
Socket.io¨C Socket.IOÊÇÒ»¸ö¹¹½¨ÊµÊ±WebÓ¦ÓóÌÐòµÄJavaScript¿â¡£ËüÖ§³ÖWeb¿Í»§¶ËºÍ·þÎñÆ÷Ö®¼äµÄʵʱ¡¢Ë«ÏòͨÐÅ¡£
¸ü¶à ¹ØÓÚSocket.ioµÄÐÅÏ¢¡£
Highcharts¨C ÍøÒ³ÉϽ»»¥Ê½JavaScriptͼ±í¡£ ¸ü¶à ¹ØÓÚHighchartsµÄÐÅÏ¢¡£
CloudxLab¨C Ìṩһ¸öÕæÊµµÄ»ùÓÚÔÆµÄ»·¾³£¬ÓÃÓÚÁ·Ï°ºÍѧϰ¸÷ÖÖ¹¤¾ß¡£Äã¿ÉÒÔͨ¹ýÔÚÏß
×¢²á Á¢¼´¿ªÊ¼Á·Ï°¡£
ÈçºÎ¹¹½¨Êý¾ÝPipeline?
ÏÂÃæÊÇÊý¾ÝPipeline¸ß²ã¼Ü¹¹Í¼

Êý¾ÝPipeline

ʵʱ·ÖÎöDashboard
ÈÃÎÒÃÇ´ÓÊý¾ÝPipelineÖеÄÿ¸ö½×¶ÎµÄÃèÊö¿ªÊ¼£¬²¢Íê³É½â¾ö·½°¸µÄ¹¹½¨¡£
½×¶Î1
µ±¿Í»§¹ºÂòϵͳÖеÄÎïÆ·»ò¶©µ¥¹ÜÀíϵͳÖеĶ©µ¥×´Ì¬±ä»¯Ê±£¬ÏàÓ¦µÄ¶©µ¥IDÒÔ¼°¶©µ¥×´Ì¬ºÍʱ¼ä½«±»ÍÆË͵½ÏàÓ¦µÄKafkaÖ÷ÌâÖС£
Êý¾Ý¼¯
ÓÉÓÚûÓÐÕæÊµµÄÔÚÏßµç×ÓÉÌÎñÃÅ»§ÍøÕ¾£¬ÎÒÃÇ×¼±¸ÓÃCSVÎļþµÄÊý¾Ý¼¯À´Ä£Äâ¡£ÈÃÎÒÃÇ¿´¿´Êý¾Ý¼¯£º
DateTime, OrderId, Status 2016-07-13 14:20:33,xxxxx-xxx,processing 2016-07-13 14:20:34,xxxxx-xxx,shipped 2016-07-13 14:20:35,xxxxx-xxx,delivered |
Êý¾Ý¼¯°üº¬ÈýÁзֱðÊÇ£º¡°DateTime¡±¡¢¡°OrderId¡±ºÍ¡°Status¡±¡£Êý¾Ý¼¯ÖеÄÿһÐбíÊ¾ÌØ¶¨Ê±¼äʱ¶©µ¥µÄ״̬¡£ÕâÀïÎÒÃÇÓá°xxxxx-xxx¡±´ú±í¶©µ¥ID¡£ÎÒÃÇÖ»¶Ôÿ·ÖÖÓ·¢»õµÄ¶©µ¥Êý¸ÐÐËȤ£¬ËùÒÔ²»ÐèҪʵ¼ÊµÄ¶©µ¥ID¡£
Êý¾Ý¼¯Î»ÓÚÏîÄ¿µÄ spark-streaming/data/order_data
Îļþ¼ÐÖС£
ÍÆËÍÊý¾Ý¼¯µ½Kafka
shell½Å±¾½«´ÓÕâЩCSVÎļþÖзֱð»ñȡÿһÐв¢ÍÆË͵½Kafka¡£ÍÆËÍÍêÒ»¸öCSVÎļþµ½KafkaÖ®ºó£¬ÐèÒªµÈ´ý1·ÖÖÓÔÙÍÆËÍÏÂÒ»¸öCSVÎļþ£¬ÕâÑù¿ÉÒÔÄ£Äâʵʱµç×ÓÉÌÎñÃÅ»§»·¾³£¬Õâ¸ö»·¾³ÖеĶ©µ¥×´Ì¬ÊÇÒÔ²»Í¬µÄʱ¼ä¼ä¸ô¸üеġ£ÔÚÏÖʵÊÀ½çµÄÇé¿öÏ£¬µ±¶©µ¥×´Ì¬¸Ä±äʱ£¬ÏàÓ¦µÄ¶©µ¥ÏêϸÐÅÏ¢»á±»ÍÆË͵½Kafka¡£
ÔËÐÐÎÒÃǵÄshell½Å±¾½«Êý¾ÝÍÆË͵½KafkaÖ÷ÌâÖС£µÇ¼µ½ CloudxLab
Web¿ØÖÆÌ¨ ²¢ÔËÐÐÒÔÏÂÃüÁî¡£
# Clone the repository
git clone https://github.com/singhabhinav/cloudxlab.git
# Create the order-data topic in Kafka
export PATH=$PATH:/usr/hdp/current/kafka-broker/bin
kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic
order-data
# Go to Kafka directory
cd cloudxlab/spark-streaming/kafka
# Run the Script for pushing data to Kafka topic
# ip-172-31-13-154.ec2.internal is the hostname
of broker.
# Find list of brokers in Ambari (a.cloudxlab.com:8080).
# Use hostname of any one of the brokers
# order-data is the Kafka topic
/bin/bash put_order_data_in_topic.sh ../data/order_data/
ip-172-31-13-154.ec2.internal:6667 order-data
# Script will push CSV files one by one to Kafka
topic
after every one minute interval
# Let the script run. Do not close the terminal |
½×¶Î2
ÔÚµÚ1½×¶Îºó£¬Kafka¡°order-data¡±Ö÷ÌâÖеÄÿ¸öÏûÏ¢¶¼½«ÈçÏÂËùʾ
2016-07-13 14:20:33,xxxxx-xxx,processing |
½×¶Î3
Spark streaming´úÂ뽫ÔÚ60ÃëµÄʱ¼ä´°¿ÚÖдӡ°order-data¡±µÄKafkaÖ÷Ìâ»ñÈ¡Êý¾Ý²¢´¦Àí£¬ÕâÑù¾ÍÄÜÔÚ¸Ã60Ãëʱ¼ä´°¿ÚÖÐΪÿÖÖ״̬µÄ¶©µ¥¼ÆÊý¡£´¦Àíºó£¬Ã¿ÖÖ״̬¶©µ¥µÄ×ܼÆÊý±»ÍÆË͵½¡°order-one-min-data¡±µÄKafkaÖ÷ÌâÖС£
ÇëÔÚWeb¿ØÖÆÌ¨ÖÐÔËÐÐÕâЩSpark streaming´úÂë
# Login to CloudxLab web console in the second tab
# Create order-one-min-data Kafka topic
export PATH=$PATH:/usr/hdp/current/kafka-broker/bin
kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic
order-one-min-data
# Go to spark directory
cd cloudxlab/spark-streaming/spark
# Run the Spark Streaming code
spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar
spark_streaming_order_status.py localhost:2181
order-data
# Let the script run. Do not close the terminal |
½×¶Î4
ÔÚÕâ¸ö½×¶Î£¬KafkaÖ÷Ìâ¡°order-one-min-data¡±ÖеÄÿ¸öÏûÏ¢¶¼½«ÀàËÆÓÚÒÔÏÂJSON×Ö·û´®
{ "shipped": 657, "processing": 987, "delivered": 1024 } |
½×¶Î5
ÔËÐÐNode.js server
ÏÖÔÚÎÒÃǽ«ÔËÐÐÒ»¸önode.js·þÎñÆ÷À´Ê¹Óá°order-one-min-data¡±KafkaÖ÷ÌâµÄÏûÏ¢£¬²¢½«ÆäÍÆË͵½Webä¯ÀÀÆ÷£¬ÕâÑù¾Í¿ÉÒÔÔÚWebä¯ÀÀÆ÷ÖÐÏÔʾ³öÿ·ÖÖÓ·¢»õµÄ¶©µ¥ÊýÁ¿¡£
ÇëÔÚWeb¿ØÖÆÌ¨ÖÐÔËÐÐÒÔÏÂÃüÁîÒÔÆô¶¯node.js·þÎñÆ÷
# Login to CloudxLab web console in the third tab
# Go to node directory
cd cloudxlab/spark-streaming/node
# Install dependencies as specified in package.json
npm install
# Run the node server
node index.js
# Let the server run. Do not close the terminal |
ÏÖÔÚnode·þÎñÆ÷½«ÔËÐÐÔÚ¶Ë¿Ú3001ÉÏ¡£Èç¹ûÔÚÆô¶¯node·þÎñÆ÷ʱ³öÏÖ¡°EADDRINUSE¡±´íÎó£¬Çë±à¼index.jsÎļþ²¢½«¶Ë¿ÚÒÀ´Î¸ü¸ÄΪ3002...3003...3004µÈ¡£ÇëʹÓÃ3001-3010·¶Î§ÄÚµÄÈÎÒâ¿ÉÓö˿ÚÀ´ÔËÐÐnode·þÎñÆ÷¡£
ÓÃä¯ÀÀÆ÷·ÃÎÊ
Æô¶¯node·þÎñÆ÷ºó£¬Çëתµ½ http://YOUR_WEB_CONSOLE:PORT_NUMBER
·ÃÎÊʵʱ·ÖÎöDashboard¡£Èç¹ûÄúµÄWeb¿ØÖÆÌ¨ÊÇf.cloudxlab.com£¬²¢ÇÒnode·þÎñÆ÷ÕýÔÚ¶Ë¿Ú3002ÉÏÔËÐУ¬Çëתµ½
http://f.cloudxlab.com:3002 ·ÃÎÊDashboard¡£
µ±ÎÒÃÇ·ÃÎÊÉÏÃæµÄURLʱ£¬socket.io-client¿â±»¼ÓÔØµ½ä¯ÀÀÆ÷£¬Ëü»á¿ªÆô·þÎñÆ÷ºÍä¯ÀÀÆ÷Ö®¼äµÄË«ÏòͨÐÅÐŵÀ¡£
½×¶Î6
Ò»µ©ÔÚKafkaµÄ¡°order-one-min-data¡±Ö÷ÌâÖÐÓÐÐÂÏûÏ¢µ½´ï£¬node½ø³Ì¾Í»áÏû·ÑËü¡£Ïû·ÑµÄÏûÏ¢½«Í¨¹ýsocket.io·¢Ë͸øWebä¯ÀÀÆ÷¡£
½×¶Î7
Ò»µ©webä¯ÀÀÆ÷ÖеÄsocket.io-client½ÓÊÕµ½Ò»¸öеġ°message¡±Ê¼þ£¬Ê¼þÖеÄÊý¾Ý½«»á±»´¦Àí¡£Èç¹û½ÓÊÕµÄÊý¾ÝÖеĶ©µ¥×´Ì¬ÊÇ¡°shipped¡±£¬Ëü½«»á±»Ìí¼Óµ½HighCharts×ø±êϵÉϲ¢ÏÔʾÔÚä¯ÀÀÆ÷ÖС£
½ØÍ¼
ÎÒÃÇÒѳɹ¦¹¹½¨ÊµÊ±·ÖÎöDashboard¡£ÕâÊÇÒ»¸ö»ù±¾Ê¾Àý£¬ÑÝʾÈçºÎ¼¯³ÉSpark-streaming£¬Kafka£¬node.jsºÍsocket.ioÀ´¹¹½¨ÊµÊ±·ÖÎöDashboard¡£ÏÖÔÚ£¬ÓÉÓÚÓÐÁËÕâЩ»ù´¡ÖªÊ¶£¬ÎÒÃǾͿÉÒÔʹÓÃÉÏÊö¹¤¾ß¹¹½¨¸ü¸´ÔÓµÄϵͳ¡£ |