±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁËspark
Éú̬¼°ÔËÐÐÔÀí¡¢sparkÔËÐмܹ¹¡¢SparkºËÐĸÅÄîÖ®RDD¡¢SparkºËÐĸÅÄîÖ®Jobs
/ Stage¡¢Spark StreamingÔËÐÐÔÀí¡¢Spark ×ÊÔ´µ÷ÓŵÈÄÚÈÝ
±¾ÎÄÀ´×Ô²©¿ÍÔ°£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
µ¼Óï
spark ÒѾ³ÉΪ¹ã¸æ¡¢±¨±íÒÔ¼°ÍƼöϵͳµÈ´óÊý¾Ý¼ÆË㳡¾°ÖÐÊ×ѡϵͳ£¬ÒòЧÂʸߣ¬Ò×ÓÃÒÔ¼°Í¨ÓÃÐÔÔ½À´Ô½µÃµ½´ó¼ÒµÄÇàíù£¬ÎÒ×Ô¼º×î½ü°ëÄêÔÚ½Ó´¥sparkÒÔ¼°spark
streamingÖ®ºó£¬¶Ôspark¼¼ÊõµÄʹÓÃÓÐһЩ×Ô¼ºµÄ¾Ñé»ýÀÛÒÔ¼°ÐĵÃÌå»á£¬ÔÚ´Ë·ÖÏí¸ø´ó¼Ò¡£
±¾ÎÄÒÀ´Î´ÓsparkÉú̬£¬ÔÀí£¬»ù±¾¸ÅÄspark streamingÔÀí¼°Êµ¼ù£¬»¹ÓÐsparkµ÷ÓÅÒÔ¼°»·¾³´î½¨µÈ·½Ãæ½øÐнéÉÜ£¬Ï£Íû¶Ô´ó¼ÒÓÐËù°ïÖú¡£
spark Éú̬¼°ÔËÐÐÔÀí

Spark ÌØµã
ÔËÐÐËÙ¶È¿ì SparkÓµÓÐDAGÖ´ÐÐÒýÇæ£¬Ö§³ÖÔÚÄÚ´æÖжÔÊý¾Ý½øÐеü´ú¼ÆËã¡£¹Ù·½ÌṩµÄÊý¾Ý±íÃ÷£¬Èç¹ûÊý¾ÝÓÉ´ÅÅ̶ÁÈ¡£¬ËÙ¶ÈÊÇHadoop
MapReduceµÄ10±¶ÒÔÉÏ£¬Èç¹ûÊý¾Ý´ÓÄÚ´æÖжÁÈ¡£¬ËÙ¶È¿ÉÒԸߴï100¶à±¶¡£
ÊÊÓó¡¾°¹ã·º ´óÊý¾Ý·ÖÎöͳ¼Æ£¬ÊµÊ±Êý¾Ý´¦Àí£¬Í¼¼ÆËã¼°»úÆ÷ѧϰ
Ò×ÓÃÐÔ ±àд¼òµ¥£¬Ö§³Ö80ÖÖÒÔÉϵĸ߼¶Ëã×Ó£¬Ö§³Ö¶àÖÖÓïÑÔ£¬Êý¾ÝÔ´·á¸»£¬¿É²¿ÊðÔÚ¶àÖÖ¼¯ÈºÖÐ
ÈÝ´íÐԸߡ£SparkÒý½øÁ˵¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯RDD (Resilient Distributed Dataset)
µÄ³éÏó£¬ËüÊÇ·Ö²¼ÔÚÒ»×é½ÚµãÖеÄÖ»¶Á¶ÔÏ󼯺ϣ¬ÕâЩ¼¯ºÏÊǵ¯ÐԵģ¬Èç¹ûÊý¾Ý¼¯Ò»²¿·Ö¶ªÊ§£¬Ôò¿ÉÒÔ¸ù¾Ý¡°ÑªÍ³¡±£¨¼´³äÐí»ùÓÚÊý¾ÝÑÜÉú¹ý³Ì£©¶ÔËüÃǽøÐÐÖØ½¨¡£ÁíÍâÔÚRDD¼ÆËãʱ¿ÉÒÔͨ¹ýCheckPointÀ´ÊµÏÖÈÝ´í£¬¶øCheckPointÓÐÁ½ÖÖ·½Ê½£ºCheckPoint
Data£¬ºÍLogging The Updates£¬Óû§¿ÉÒÔ¿ØÖƲÉÓÃÄÄÖÖ·½Ê½À´ÊµÏÖÈÝ´í¡£
SparkµÄÊÊÓó¡¾°
Ŀǰ´óÊý¾Ý´¦Àí³¡¾°ÓÐÒÔϼ¸¸öÀàÐÍ£º
¸´ÔÓµÄÅúÁ¿´¦Àí£¨Batch Data Processing£©£¬Æ«ÖصãÔÚÓÚ´¦Àíº£Á¿Êý¾ÝµÄÄÜÁ¦£¬ÖÁÓÚ´¦ÀíËÙ¶È¿ÉÈÌÊÜ£¬Í¨³£µÄʱ¼ä¿ÉÄÜÊÇÔÚÊýÊ®·ÖÖÓµ½ÊýСʱ£»
»ùÓÚÀúÊ·Êý¾ÝµÄ½»»¥Ê½²éѯ£¨Interactive Query£©£¬Í¨³£µÄʱ¼äÔÚÊýÊ®Ãëµ½ÊýÊ®·ÖÖÓÖ®¼ä
»ùÓÚʵʱÊý¾ÝÁ÷µÄÊý¾Ý´¦Àí£¨Streaming Data Processing£©£¬Í¨³£ÔÚÊý°ÙºÁÃëµ½ÊýÃëÖ®¼ä
Spark³É¹¦°¸Àý
Ŀǰ´óÊý¾ÝÔÚ»¥ÁªÍø¹«Ë¾Ö÷ÒªÓ¦ÓÃÔÚ¹ã¸æ¡¢±¨±í¡¢ÍƼöϵͳµÈÒµÎñÉÏ¡£ÔÚ¹ã¸æÒµÎñ·½ÃæÐèÒª´óÊý¾Ý×öÓ¦Ó÷ÖÎö¡¢Ð§¹û·ÖÎö¡¢¶¨ÏòÓÅ»¯µÈ£¬ÔÚÍÆ¼öϵͳ·½ÃæÔòÐèÒª´óÊý¾ÝÓÅ»¯Ïà¹ØÅÅÃû¡¢¸öÐÔ»¯ÍƼöÒÔ¼°Èȵãµã»÷·ÖÎöµÈ¡£ÕâЩӦÓó¡¾°µÄÆÕ±éÌØµãÊǼÆËãÁ¿´ó¡¢Ð§ÂÊÒªÇó¸ß¡£
ÌÚѶ / yahoo / ÌÔ±¦ / ÓÅ¿áÍÁ¶¹
sparkÔËÐмܹ¹
spark»ù´¡ÔËÐмܹ¹ÈçÏÂËùʾ£º

spark½áºÏyarn¼¯Èº±³ºóµÄÔËÐÐÁ÷³ÌÈçÏÂËùʾ£º

spark ÔËÐÐÁ÷³Ì£º
Spark¼Ü¹¹²ÉÓÃÁË·Ö²¼Ê½¼ÆËãÖеÄMaster-SlaveÄ£ÐÍ¡£MasterÊǶÔÓ¦¼¯ÈºÖеĺ¬ÓÐMaster½ø³ÌµÄ½Úµã£¬SlaveÊǼ¯ÈºÖк¬ÓÐWorker½ø³ÌµÄ½Úµã¡£
Master×÷ΪÕû¸ö¼¯ÈºµÄ¿ØÖÆÆ÷£¬¸ºÔðÕû¸ö¼¯ÈºµÄÕý³£ÔËÐУ»
WorkerÏ൱ÓÚ¼ÆËã½Úµã£¬½ÓÊÕÖ÷½ÚµãÃüÁîÓë½øÐÐ״̬»ã±¨£»
Executor¸ºÔðÈÎÎñµÄÖ´ÐУ»
Client×÷ΪÓû§µÄ¿Í»§¶Ë¸ºÔðÌá½»Ó¦Óã»
Driver¸ºÔð¿ØÖÆÒ»¸öÓ¦ÓõÄÖ´ÐС£
Spark¼¯Èº²¿Êðºó£¬ÐèÒªÔÚÖ÷½ÚµãºÍ´Ó½Úµã·Ö±ðÆô¶¯Master½ø³ÌºÍWorker½ø³Ì£¬¶ÔÕû¸ö¼¯Èº½øÐпØÖÆ¡£ÔÚÒ»¸öSparkÓ¦ÓõÄÖ´Ðйý³ÌÖУ¬DriverºÍWorkerÊÇÁ½¸öÖØÒª½ÇÉ«¡£Driver
³ÌÐòÊÇÓ¦ÓÃÂß¼Ö´ÐÐµÄÆðµã£¬¸ºÔð×÷ÒµµÄµ÷¶È£¬¼´TaskÈÎÎñµÄ·Ö·¢£¬¶ø¶à¸öWorkerÓÃÀ´¹ÜÀí¼ÆËã½ÚµãºÍ´´½¨Executor²¢Ðд¦ÀíÈÎÎñ¡£ÔÚÖ´Ðн׶Σ¬Driver»á½«TaskºÍTaskËùÒÀÀµµÄfileºÍjarÐòÁл¯ºó´«µÝ¸ø¶ÔÓ¦µÄWorker»úÆ÷£¬Í¬Ê±Executor¶ÔÏàÓ¦Êý¾Ý·ÖÇøµÄÈÎÎñ½øÐд¦Àí¡£
Excecutor /Task ÿ¸ö³ÌÐò×ÔÓУ¬²»Í¬³ÌÐò»¥Ïà¸ôÀ룬task¶àÏ̲߳¢ÐÐ
¼¯Èº¶ÔSpark͸Ã÷£¬SparkÖ»ÒªÄÜ»ñÈ¡Ïà¹Ø½ÚµãºÍ½ø³Ì
Driver ÓëExecutor±£³ÖͨÐÅ£¬Ð×÷´¦Àí
ÈýÖÖ¼¯ÈºÄ£Ê½£º
1.Standalone ¶ÀÁ¢¼¯Èº
2.Mesos, apache mesos
3.Yarn, hadoop yarn
»ù±¾¸ÅÄ
Application ⇨SparkµÄÓ¦ÓóÌÐò£¬°üº¬Ò»¸öDriver
programºÍÈô¸ÉExecutor
SparkContext⇨ SparkÓ¦ÓóÌÐòµÄÈë¿Ú£¬¸ºÔðµ÷¶È¸÷¸öÔËËã×ÊÔ´£¬Ðµ÷¸÷¸öWorker
NodeÉϵÄExecutor
Driver Program ⇨ÔËÐÐApplicationµÄmain()º¯Êý²¢ÇÒ´´½¨SparkContext
Executor ⇨ÊÇΪApplicationÔËÐÐÔÚWorker
nodeÉϵÄÒ»¸ö½ø³Ì£¬¸Ã½ø³Ì¸ºÔðÔËÐÐTask£¬²¢ÇÒ¸ºÔð½«Êý¾Ý´æÔÚÄÚ´æ»òÕß´ÅÅÌÉÏ¡£Ã¿¸öApplication¶¼»áÉêÇë¸÷×ÔµÄExecutorÀ´´¦ÀíÈÎÎñ
Cluster Manager ⇨ÔÚ¼¯ÈºÉÏ»ñÈ¡×ÊÔ´µÄÍⲿ·þÎñ
(ÀýÈ磺Standalone¡¢Mesos¡¢Yarn)
Worker Node⇨ ¼¯ÈºÖÐÈκοÉÒÔÔËÐÐApplication´úÂëµÄ½Úµã£¬ÔËÐÐÒ»¸ö»ò¶à¸öExecutor½ø³Ì
Task ⇨ÔËÐÐÔÚExecutorÉϵŤ×÷µ¥Ôª
Job ⇨SparkContextÌá½»µÄ¾ßÌåAction²Ù×÷£¬³£ºÍAction¶ÔÓ¦
Stage ⇨ÿ¸öJob»á±»²ð·ÖºÜ¶à×étask£¬Ã¿×éÈÎÎñ±»³ÆÎªStage£¬Ò²³ÆTaskSet
RDD ⇨ÊÇResilient distributed
datasetsµÄ¼ò³Æ£¬ÖÐÎÄΪµ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯;ÊÇSpark×îºËÐĵÄÄ£¿éºÍÀà
DAGScheduler ⇨¸ù¾ÝJob¹¹½¨»ùÓÚStageµÄDAG£¬²¢Ìá½»Stage¸øTaskScheduler
TaskScheduler ⇨½«TasksetÌá½»¸øWorker
node¼¯ÈºÔËÐв¢·µ»Ø½á¹û
Transformations ⇨ÊÇSpark APIµÄÒ»ÖÖÀàÐÍ£¬Transformation·µ»ØÖµ»¹ÊÇÒ»¸öRDD£¬ËùÓеÄTransformation²ÉÓõͼÊÇÀÁ²ßÂÔ£¬Èç¹ûÖ»Êǽ«TransformationÌá½»ÊDz»»áÖ´ÐмÆËãµÄ
Action ⇨ÊÇSpark APIµÄÒ»ÖÖÀàÐÍ£¬Action·µ»ØÖµ²»ÊÇÒ»¸öRDD£¬¶øÊÇÒ»¸öscala¼¯ºÏ£»¼ÆËãÖ»ÓÐÔÚAction±»Ìá½»µÄʱºò¼ÆËã²Å±»´¥·¢¡£
SparkºËÐĸÅÄîÖ®RDD

SparkºËÐĸÅÄîÖ®Transformations / Actions

Transformation·µ»ØÖµ»¹ÊÇÒ»¸öRDD¡£ËüʹÓÃÁËÁ´Ê½µ÷ÓõÄÉè¼ÆÄ£Ê½£¬¶ÔÒ»¸öRDD½øÐмÆËãºó£¬±ä»»³ÉÁíÍâÒ»¸öRDD£¬È»ºóÕâ¸öRDDÓÖ¿ÉÒÔ½øÐÐÁíÍâÒ»´Îת»»¡£Õâ¸ö¹ý³ÌÊÇ·Ö²¼Ê½µÄ¡£
Action·µ»ØÖµ²»ÊÇÒ»¸öRDD¡£ËüҪôÊÇÒ»¸öScalaµÄÆÕͨ¼¯ºÏ£¬ÒªÃ´ÊÇÒ»¸öÖµ£¬ÒªÃ´Êǿգ¬×îÖÕ»ò·µ»Øµ½Driver³ÌÐò£¬»ò°ÑRDDдÈëµ½ÎļþϵͳÖС£
ActionÊÇ·µ»ØÖµ·µ»Ø¸ødriver»òÕß´æ´¢µ½Îļþ£¬ÊÇRDDµ½resultµÄ±ä»»£¬TransformationÊÇRDDµ½RDDµÄ±ä»»¡£
Ö»ÓÐactionÖ´ÐÐʱ£¬rdd²Å»á±»¼ÆËãÉú³É£¬ÕâÊÇrddÀÁ¶èÖ´Ðеĸù±¾ËùÔÚ¡£
SparkºËÐĸÅÄîÖ®Jobs / Stage
Job ⇨°üº¬¶à¸ötaskµÄ²¢ÐмÆË㣬һ¸öaction´¥·¢Ò»¸öjob
stage ⇨Ò»¸öjob»á±»²ðΪ¶à×étask£¬Ã¿×éÈÎÎñ³ÆÎªÒ»¸östage£¬ÒÔshuffle½øÐл®·Ö

SparkºËÐĸÅÄîÖ®Shuffle
ÒÔreduceByKeyΪÀý½âÊÍshuffle¹ý³Ì¡£

ÔÚûÓÐtaskµÄÎļþ·ÖƬºÏ²¢ÏµÄshuffle¹ý³ÌÈçÏ£º£¨spark.shuffle.consolidateFiles=false£©

fetch À´µÄÊý¾Ý´æ·Åµ½ÄÄÀ
¸Õ fetch À´µÄ FileSegment ´æ·ÅÔÚ softBuffer »º³åÇø£¬¾¹ý´¦ÀíºóµÄÊý¾Ý·ÅÔÚÄÚ´æ
+ ´ÅÅÌÉÏ¡£ÕâÀïÎÒÃÇÖ÷ÒªÌÖÂÛ´¦ÀíºóµÄÊý¾Ý£¬¿ÉÒÔÁé»îÉèÖÃÕâЩÊý¾ÝÊÇ¡°Ö»ÓÃÄڴ桱»¹ÊÇ¡°Äڴ棫´ÅÅÌ¡±¡£Èç¹ûspark.shuffle.spill
= false¾ÍÖ»ÓÃÄÚ´æ¡£ÓÉÓÚ²»ÒªÇóÊý¾ÝÓÐÐò£¬shuffle write µÄÈÎÎñºÜ¼òµ¥£º½«Êý¾Ý partition
ºÃ£¬²¢³Ö¾Ã»¯¡£Ö®ËùÒÔÒª³Ö¾Ã»¯£¬Ò»·½ÃæÊÇÒª¼õÉÙÄÚ´æ´æ´¢¿Õ¼äѹÁ¦£¬ÁíÒ»·½ÃæÒ²ÊÇΪÁË fault-tolerance¡£
shuffleÖ®ËùÒÔÐèÒª°ÑÖмä½á¹û·Åµ½´ÅÅÌÎļþÖУ¬ÊÇÒòΪËäÈ»ÉÏÒ»Åútask½áÊøÁË£¬ÏÂÒ»Åútask»¹ÐèҪʹÓÃÄÚ´æ¡£Èç¹ûÈ«²¿·ÅÔÚÄÚ´æÖУ¬ÄÚ´æ»á²»¹»¡£ÁíÍâÒ»·½ÃæÎªÁËÈÝ´í£¬·ÀÖ¹ÈÎÎñ¹Òµô¡£
´æÔÚÎÊÌâÈçÏ£º
²úÉúµÄ FileSegment ¹ý¶à¡£Ã¿¸ö ShuffleMapTask ²úÉú R£¨reducer
¸öÊý£©¸ö FileSegment£¬M ¸ö ShuffleMapTask ¾Í»á²úÉú M * R ¸öÎļþ¡£Ò»°ã
Spark job µÄ M ºÍ R ¶¼ºÜ´ó£¬Òò´Ë´ÅÅÌÉÏ»á´æÔÚ´óÁ¿µÄÊý¾ÝÎļþ¡£
»º³åÇøÕ¼ÓÃÄÚ´æ¿Õ¼ä´ó¡£Ã¿¸ö ShuffleMapTask ÐèÒª¿ª R ¸ö bucket£¬M ¸ö ShuffleMapTask
¾Í»á²úÉú MR ¸ö bucket¡£ËäȻһ¸ö ShuffleMapTask ½áÊøºó£¬¶ÔÓ¦µÄ»º³åÇø¿ÉÒÔ±»»ØÊÕ£¬µ«Ò»¸ö
worker node ÉÏͬʱ´æÔÚµÄ bucket ¸öÊý¿ÉÒÔ´ïµ½ cores R ¸ö£¨Ò»°ã worker
ͬʱ¿ÉÒÔÔËÐÐ cores ¸ö ShuffleMapTask£©£¬Õ¼ÓõÄÄÚ´æ¿Õ¼äÒ²¾Í´ïµ½ÁËcores¡Á
R ¡Á 32 KB¡£¶ÔÓÚ 8 ºË 1000 ¸ö reducer À´Ëµ£¬Õ¼ÓÃÄÚ´æ¾ÍÊÇ 256MB¡£
ΪÁ˽â¾öÉÏÊöÎÊÌ⣬ÎÒÃÇ¿ÉÒÔʹÓÃÎļþºÏ²¢µÄ¹¦ÄÜ¡£
ÔÚ½øÐÐtaskµÄÎļþ·ÖƬºÏ²¢ÏµÄshuffle¹ý³ÌÈçÏ£º£¨spark.shuffle.consolidateFiles=true£©

¿ÉÒÔÃ÷ÏÔ¿´³ö£¬ÔÚÒ»¸ö core ÉÏÁ¬ÐøÖ´ÐÐµÄ ShuffleMapTasks ¿ÉÒÔ¹²ÓÃÒ»¸öÊä³öÎļþ
ShuffleFile¡£ÏÈÖ´ÐÐÍêµÄ ShuffleMapTask ÐÎ³É ShuffleBlock i£¬ºóÖ´ÐеÄ
ShuffleMapTask ¿ÉÒÔ½«Êä³öÊý¾ÝÖ±½Ó×·¼Óµ½ ShuffleBlock i ºóÃæ£¬ÐÎ³É ShuffleBlock
i'£¬Ã¿¸ö ShuffleBlock ±»³ÆÎª FileSegment¡£ÏÂÒ»¸ö stage µÄ reducer
Ö»ÐèÒª fetch Õû¸ö ShuffleFile ¾ÍÐÐÁË¡£ÕâÑù£¬Ã¿¸ö worker ³ÖÓеÄÎļþÊý½µÎª
cores¡Á R¡£FileConsolidation ¹¦ÄÜ¿ÉÒÔͨ¹ýspark.shuffle.consolidateFiles=trueÀ´¿ªÆô¡£
SparkºËÐĸÅÄîÖ®Cache
val rdd1 = ... // ¶ÁÈ¡hdfsÊý¾Ý£¬¼ÓÔØ³ÉRDD
rdd1.cache
val rdd2 = rdd1.map(...)
val rdd3 = rdd1.filter(...)
rdd2.take(10).foreach(println)
rdd3.take(10).foreach(println)
rdd1.unpersist
cacheºÍunpersisitÁ½¸ö²Ù×÷±È½ÏÌØÊ⣬ËûÃǼȲ»ÊÇactionÒ²²»ÊÇtransformation¡£cache»á½«±ê¼ÇÐèÒª»º´æµÄrdd£¬ÕæÕý»º´æÊÇÔÚµÚÒ»´Î±»Ïà¹Øactionµ÷Óúó²Å»º´æ£»unpersisitÊÇĨµô¸Ã±ê¼Ç£¬²¢ÇÒÁ¢¿ÌÊÍ·ÅÄÚ´æ¡£Ö»ÓÐactionÖ´ÐÐʱ£¬rdd1²Å»á¿ªÊ¼´´½¨²¢½øÐкóÐøµÄrdd±ä»»¼ÆËã¡£
cacheÆäʵҲÊǵ÷ÓõÄpersist³Ö¾Ã»¯º¯Êý£¬Ö»ÊÇÑ¡ÔñµÄ³Ö¾Ã»¯¼¶±ðΪMEMORY_ONLY¡£
persistÖ§³ÖµÄRDD³Ö¾Ã»¯¼¶±ðÈçÏ£º

ÐèҪעÒâµÄÎÊÌ⣺
Cache»òshuffle³¡¾°ÐòÁл¯Ê±£¬ sparkÐòÁл¯²»Ö§³Öprotobuf message£¬ÐèÒªjava
¿ÉÒÔserializableµÄ¶ÔÏó¡£Ò»µ©ÔÚÐòÁл¯Óõ½²»Ö§³Öjava serializableµÄ¶ÔÏó¾Í»á³öÏÖÉÏÊö´íÎó¡£
SparkֻҪд´ÅÅÌ£¬¾Í»áÓõ½ÐòÁл¯¡£³ýÁËshuffle½×¶ÎºÍpersist»áÐòÁл¯£¬ÆäËûʱºòRDD´¦Àí¶¼ÔÚÄÚ´æÖУ¬²»»áÓõ½ÐòÁл¯¡£
Spark StreamingÔËÐÐÔÀí
spark³ÌÐòÊÇʹÓÃÒ»¸ösparkÓ¦ÓÃʵÀýÒ»´ÎÐÔ¶ÔÒ»ÅúÀúÊ·Êý¾Ý½øÐд¦Àí£¬spark streamingÊǽ«³ÖÐø²»¶ÏÊäÈëµÄÊý¾ÝÁ÷ת»»³É¶à¸öbatch·ÖƬ£¬Ê¹ÓÃÒ»ÅúsparkÓ¦ÓÃʵÀý½øÐд¦Àí¡£

´ÓÔÀíÉÏ¿´£¬°Ñ´«Í³µÄsparkÅú´¦Àí³ÌÐò±ä³Éstreaming³ÌÐò£¬sparkÐèÒª¹¹½¨Ê²Ã´£¿


ÐèÒª¹¹½¨4¸ö¶«Î÷£º
Ò»¸ö¾²Ì¬µÄ RDD DAG µÄÄ£°å£¬À´±íʾ´¦ÀíÂß¼£»
Ò»¸ö¶¯Ì¬µÄ¹¤×÷¿ØÖÆÆ÷£¬½«Á¬ÐøµÄ streaming data ÇзÖÊý¾ÝƬ¶Î£¬²¢°´ÕÕÄ£°å¸´ÖƳöеÄ
RDD £»
DAG µÄʵÀý£¬¶ÔÊý¾ÝƬ¶Î½øÐд¦Àí£»
Receiver½øÐÐÔʼÊý¾ÝµÄ²úÉúºÍµ¼È룻Receiver½«½ÓÊÕµ½µÄÊý¾ÝºÏ²¢ÎªÊý¾Ý¿é²¢´æµ½ÄÚ´æ»òÓ²ÅÌÖУ¬¹©ºóÐøbatch
RDD½øÐÐÏû·Ñ£»
¶Ô³¤Ê±ÔËÐÐÈÎÎñµÄ±£ÕÏ£¬°üÀ¨ÊäÈëÊý¾ÝµÄʧЧºóµÄÖØ¹¹£¬´¦ÀíÈÎÎñµÄʧ°ÜºóµÄÖØµ÷¡£
¾ßÌåstreamingµÄÏêϸÔÀí¿ÉÒԲο¼¹ãµãͨ³öÆ·µÄÔ´Âë½âÎöÎÄÕ£º
¶ÔÓÚspark streamingÐèҪעÒâÒÔÏÂÈýµã£º
¾¡Á¿±£Ö¤Ã¿¸öwork½ÚµãÖеÄÊý¾Ý²»ÒªÂäÅÌ£¬ÒÔÌáÉýÖ´ÐÐЧÂÊ¡£

±£Ö¤Ã¿¸öbatchµÄÊý¾ÝÄܹ»ÔÚbatch intervalʱ¼äÄÚ´¦ÀíÍê±Ï£¬ÒÔÃâÔì³ÉÊý¾Ý¶Ñ»ý¡£

ʹÓÃstevenÌṩµÄ¿ò¼Ü½øÐÐÊý¾Ý½ÓÊÕʱµÄÔ¤´¦Àí£¬¼õÉÙ²»±ØÒªÊý¾ÝµÄ´æ´¢ºÍ´«Êä¡£´ÓtdbankÖнÓÊÕºóת´¢Ç°½øÐйýÂË£¬¶ø²»ÊÇÔÚtask¾ßÌå´¦Àíʱ²Å½øÐйýÂË¡£


Spark ×ÊÔ´µ÷ÓÅ
ÄÚ´æ¹ÜÀí£º

ExecutorµÄÄÚ´æÖ÷Òª·ÖΪÈý¿é£º
µÚÒ»¿éÊÇÈÃtaskÖ´ÐÐÎÒÃÇ×Ô¼º±àдµÄ´úÂëʱʹÓã¬Ä¬ÈÏÊÇÕ¼Executor×ÜÄÚ´æµÄ20%£»
µÚ¶þ¿éÊÇÈÃtaskͨ¹ýshuffle¹ý³ÌÀÈ¡ÁËÉÏÒ»¸östageµÄtaskµÄÊä³öºó£¬½øÐоۺϵȲÙ×÷ʱʹÓã¬Ä¬ÈÏÒ²ÊÇÕ¼Executor×ÜÄÚ´æµÄ20%£»
µÚÈý¿éÊÇÈÃRDD³Ö¾Ã»¯Ê±Ê¹Óã¬Ä¬ÈÏÕ¼Executor×ÜÄÚ´æµÄ60%¡£
ÿ¸ötaskÒÔ¼°Ã¿¸öexecutorÕ¼ÓõÄÄÚ´æÐèÒª·ÖÎöһϡ£Ã¿¸ötask´¦ÀíÒ»¸öpartiitonµÄÊý¾Ý£¬·ÖƬ̫ÉÙ£¬»áÔì³ÉÄÚ´æ²»¹»¡£
ÆäËû×ÊÔ´ÅäÖãº

|