Ëæ×Å´óÊý¾ÝµÄ·¢Õ¹£¬ÈËÃǶԴóÊý¾ÝµÄ´¦ÀíÒªÇóÒ²Ô½À´Ô½¸ß£¬ÔÓеÄÅú´¦Àí¿ò¼ÜMapReduceÊʺÏÀëÏß¼ÆË㣬ȴÎÞ·¨Âú×ãʵʱÐÔÒªÇó½Ï¸ßµÄÒµÎñ£¬ÈçÊµÊ±ÍÆ¼ö¡¢Óû§ÐÐΪ·ÖÎöµÈ¡£
Spark StreamingÊǽ¨Á¢ÔÚSparkÉϵÄʵʱ¼ÆËã¿ò¼Ü£¬Í¨¹ýËüÌṩµÄ·á¸»µÄAPI¡¢»ùÓÚÄÚ´æµÄ¸ßËÙÖ´ÐÐÒýÇæ£¬Óû§¿ÉÒÔ½áºÏÁ÷ʽ¡¢Åú´¦ÀíºÍ½»»¥ÊÔ²éѯӦÓᣱ¾ÎĽ«Ïêϸ½éÉÜSpark
Streamingʵʱ¼ÆËã¿ò¼ÜµÄÔÀíÓëÌØµã¡¢ÊÊÓó¡¾°¡£
Spark Streamingʵʱ¼ÆËã¿ò¼Ü
SparkÊÇÒ»¸öÀàËÆÓÚMapReduceµÄ·Ö²¼Ê½¼ÆËã¿ò¼Ü£¬ÆäºËÐÄÊǵ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£¬ÌṩÁ˱ÈMapReduce¸ü·á¸»µÄÄ£ÐÍ£¬¿ÉÒÔÔÚ¿ìËÙÔÚÄÚ´æÖжÔÊý¾Ý¼¯½øÐжà´Îµü´ú£¬ÒÔÖ§³Ö¸´ÔÓµÄÊý¾ÝÍÚ¾òËã·¨ºÍͼÐμÆËãËã·¨¡£Spark
StreamingÊÇÒ»ÖÖ¹¹½¨ÔÚSparkÉϵÄʵʱ¼ÆËã¿ò¼Ü£¬ËüÀ©Õ¹ÁËSpark´¦Àí´ó¹æÄ£Á÷ʽÊý¾ÝµÄÄÜÁ¦¡£
Spark StreamingµÄÓÅÊÆÔÚÓÚ£º
ÄÜÔËÐÐÔÚ100+µÄ½áµãÉÏ£¬²¢´ïµ½Ãë¼¶ÑÓ³Ù¡£
ʹÓûùÓÚÄÚ´æµÄSpark×÷ΪִÐÐÒýÇæ£¬¾ßÓиßЧºÍÈÝ´íµÄÌØÐÔ¡£
Äܼ¯³ÉSparkµÄÅú´¦ÀíºÍ½»»¥²éѯ¡£
ΪʵÏÖ¸´ÔÓµÄËã·¨ÌṩºÍÅú´¦ÀíÀàËÆµÄ¼òµ¥½Ó¿Ú¡£
»ùÓÚÔÆÌÝSpark on YarnµÄSpark Streaming×ÜÌå¼Ü¹¹Èçͼ1Ëùʾ¡£ÆäÖÐSpark on
YarnµÄÆô¶¯Á÷³ÌÎÒµÄÁíÍâһƪÎÄÕ£¨¡¶³ÌÐòÔ±¡·2013Äê11ÔÂÆÚ¿¯¡¶ÉîÈëÆÊÎö°¢Àï°Í°ÍÔÆÌÝYarn¼¯Èº¡·£©ÓÐÏêϸÃèÊö£¬ÕâÀï²»ÔÙ׸Êö¡£Spark
on YarnÆô¶¯ºó£¬ÓÉSpark AppMaster°ÑReceiver×÷Ϊһ¸öTaskÌá½»¸øÄ³Ò»¸öSpark
Executor£»ReceiveÆô¶¯ºóÊäÈëÊý¾Ý£¬Éú³ÉÊý¾Ý¿é£¬È»ºó֪ͨSpark AppMaster£»Spark
AppMaster»á¸ù¾ÝÊý¾Ý¿éÉú³ÉÏàÓ¦µÄJob£¬²¢°ÑJobµÄTaskÌá½»¸ø¿ÕÏÐSpark Executor
Ö´ÐС£Í¼ÖÐÀ¶É«µÄ´Ö¼ýÍ·ÏÔʾ±»´¦ÀíµÄÊý¾ÝÁ÷£¬ÊäÈëÊý¾ÝÁ÷¿ÉÒÔÊÇ´ÅÅÌ¡¢ÍøÂçºÍHDFSµÈ£¬Êä³ö¿ÉÒÔÊÇHDFS£¬Êý¾Ý¿âµÈ¡£

ͼ1 ÔÆÌÝSpark Streaming×ÜÌå¼Ü¹¹
Spark StreamingµÄ»ù±¾ÔÀíÊǽ«ÊäÈëÊý¾ÝÁ÷ÒÔʱ¼äƬ£¨Ãë¼¶£©Îªµ¥Î»½øÐвð·Ö£¬È»ºóÒÔÀàËÆÅú´¦ÀíµÄ·½Ê½´¦Àíÿ¸öʱ¼äƬÊý¾Ý£¬Æä»ù±¾ÔÀíÈçͼ2Ëùʾ¡£

ͼ2 Spark Streaming»ù±¾ÔÀíͼ
Ê×ÏÈ£¬Spark Streaming°ÑʵʱÊäÈëÊý¾ÝÁ÷ÒÔʱ¼äƬ¦¤t £¨Èç1Ã룩Ϊµ¥Î»Çзֳɿ顣Spark
Streaming»á°Ñÿ¿éÊý¾Ý×÷Ϊһ¸öRDD£¬²¢Ê¹ÓÃRDD²Ù×÷´¦ÀíÿһС¿éÊý¾Ý¡£Ã¿¸ö¿é¶¼»áÉú³ÉÒ»¸öSpark
Job´¦Àí£¬×îÖÕ½á¹ûÒ²·µ»Ø¶à¿é¡£
ÏÂÃæ½éÉÜSpark StreamingÄÚ²¿ÊµÏÖÔÀí¡£
ʹÓÃSpark Streaming±àдµÄ³ÌÐòÓë±àдSpark³ÌÐò·Ç³£ÏàËÆ£¬ÔÚSpark³ÌÐòÖУ¬Ö÷Ҫͨ¹ý²Ù×÷RDD£¨Resilient
Distributed Datasetsµ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯£©ÌṩµÄ½Ó¿Ú£¬Èçmap¡¢reduce¡¢filterµÈ£¬ÊµÏÖÊý¾ÝµÄÅú´¦Àí¡£¶øÔÚSpark
StreamingÖУ¬Ôòͨ¹ý²Ù×÷DStream£¨±íʾÊý¾ÝÁ÷µÄRDDÐòÁУ©ÌṩµÄ½Ó¿Ú£¬ÕâЩ½Ó¿ÚºÍRDDÌṩµÄ½Ó¿ÚÀàËÆ¡£Í¼3ºÍͼ4չʾÁËÓÉSpark
Streaming³ÌÐòµ½Spark jobsµÄת»»Í¼¡£

ͼ3 Spark Streaming³ÌÐòת»»ÎªDStream Graph

ͼ4 DStream Graphת»»ÎªSpark jobs
ÔÚͼ3ÖУ¬Spark Streaming°Ñ³ÌÐòÖжÔDStreamµÄ²Ù×÷ת»»ÎªDStream Graph£¬Í¼4ÖУ¬¶ÔÓÚÿ¸öʱ¼äƬ£¬DStream
Graph¶¼»á²úÉúÒ»¸öRDD Graph£»Õë¶Ôÿ¸öÊä³ö²Ù×÷£¨Èçprint¡¢foreachµÈ£©£¬Spark
Streaming¶¼»á´´½¨Ò»¸öSpark action£»¶ÔÓÚÿ¸öSpark action£¬Spark Streaming¶¼»á²úÉúÒ»¸öÏàÓ¦µÄSpark
job£¬²¢½»¸øJobManager¡£JobManagerÖÐά»¤×ÅÒ»¸öJobs¶ÓÁÐ, Spark job´æ´¢ÔÚÕâ¸ö¶ÓÁÐÖУ¬JobManager°ÑSpark
jobÌá½»¸øSpark Scheduler£¬Spark Scheduler¸ºÔðµ÷¶ÈTaskµ½ÏàÓ¦µÄSpark
ExecutorÉÏÖ´ÐС£
Spark StreamingµÄÁíÒ»´óÓÅÊÆÔÚÓÚÆäÈÝ´íÐÔ£¬RDD»á¼Çס´´½¨×Ô¼ºµÄ²Ù×÷£¬Ã¿Ò»ÅúÊäÈëÊý¾Ý¶¼»áÔÚÄÚ´æÖб¸·Ý£¬Èç¹ûÓÉÓÚij¸ö½áµã¹ÊÕϵ¼Ö¸ýáµãÉϵÄÊý¾Ý¶ªÊ§£¬Õâʱ¿ÉÒÔͨ¹ý±¸·ÝµÄÊý¾ÝÔÚÆäËü½áµãÉÏÖØËãµÃµ½×îÖյĽá¹û¡£
ÕýÈçSpark Streaming×î³õµÄÄ¿±êÒ»Ñù£¬Ëüͨ¹ý·á¸»µÄAPIºÍ»ùÓÚÄÚ´æµÄ¸ßËÙ¼ÆËãÒýÇæÈÃÓû§¿ÉÒÔ½áºÏÁ÷ʽ´¦Àí£¬Åú´¦ÀíºÍ½»»¥²éѯµÈÓ¦Óá£Òò´ËSpark
StreamingÊʺÏһЩÐèÒªÀúÊ·Êý¾ÝºÍʵʱÊý¾Ý½áºÏ·ÖÎöµÄÓ¦Óó¡ºÏ¡£µ±È»£¬¶ÔÓÚʵʱÐÔÒªÇó²»ÊÇÌØ±ð¸ßµÄÓ¦ÓÃÒ²ÄÜÍêȫʤÈΡ£ÁíÍâͨ¹ýRDDµÄÊý¾ÝÖØÓûúÖÆ¿ÉÒԵõ½¸ü¸ßЧµÄÈÝ´í´¦Àí¡£
|