
¡¡¡¡Spark ×÷Ϊһ¸ö»ùÓÚÄÚ´æµÄ·Ö²¼Ê½¼ÆËãÒýÇæ£¬ÆäÄÚ´æ¹ÜÀíÄ£¿éÔÚÕû¸öϵͳÖаçÑÝ×ŷdz£ÖØÒªµÄ½ÇÉ«¡£Àí½â
Spark ÄÚ´æ¹ÜÀíµÄ»ù±¾ÔÀí£¬ÓÐÖúÓÚ¸üºÃµØ¿ª·¢ Spark Ó¦ÓóÌÐòºÍ½øÐÐÐÔÄܵ÷ÓÅ¡£±¾ÎÄÖ¼ÔÚÊáÀí³ö
Spark ÄÚ´æ¹ÜÀíµÄÂöÂ磬Å×שÒýÓñ£¬Òý³ö¶ÁÕß¶ÔÕâ¸ö»°ÌâµÄÉîÈë̽ÌÖ¡£±¾ÎÄÖвûÊöµÄÔÀí»ùÓÚ Spark
2.1 °æ±¾£¬ÔĶÁ±¾ÎÄÐèÒª¶ÁÕßÓÐÒ»¶¨µÄ Spark ºÍ Java »ù´¡£¬Á˽â RDD¡¢Shuffle¡¢JVM
µÈÏà¹Ø¸ÅÄî¡£
¡¡¡¡ÔÚÖ´ÐÐ Spark µÄÓ¦ÓóÌÐòʱ£¬Spark ¼¯Èº»áÆô¶¯ Driver ºÍ Executor
Á½ÖÖ JVM ½ø³Ì£¬Ç°ÕßΪÖ÷¿Ø½ø³Ì£¬¸ºÔð´´½¨ Spark ÉÏÏÂÎÄ£¬Ìá½» Spark ×÷Òµ(Job)£¬²¢½«×÷ҵת»¯Îª¼ÆËãÈÎÎñ(Task)£¬ÔÚ¸÷¸ö
Executor ½ø³Ì¼äе÷ÈÎÎñµÄµ÷¶È£¬ºóÕ߸ºÔðÔÚ¹¤×÷½ÚµãÉÏÖ´ÐоßÌåµÄ¼ÆËãÈÎÎñ£¬²¢½«½á¹û·µ»Ø¸ø Driver£¬Í¬Ê±ÎªÐèÒª³Ö¾Ã»¯µÄ
RDD Ìṩ´æ´¢¹¦ÄÜ[1]¡£ÓÉÓÚ Driver µÄÄÚ´æ¹ÜÀíÏà¶ÔÀ´Ëµ½ÏΪ¼òµ¥£¬±¾ÎÄÖ÷Òª¶Ô Executor
µÄÄÚ´æ¹ÜÀí½øÐзÖÎö£¬ÏÂÎÄÖÐµÄ Spark ÄÚ´æ¾ùÌØÖ¸ Executor µÄÄÚ´æ¡£
¡¡¡¡1. ¶ÑÄںͶÑÍâÄÚ´æ¹æ»®
¡¡¡¡×÷Ϊһ¸ö JVM ½ø³Ì£¬Executor µÄÄÚ´æ¹ÜÀí½¨Á¢ÔÚ JVM µÄÄÚ´æ¹ÜÀíÖ®ÉÏ£¬Spark
¶Ô JVM µÄ¶ÑÄÚ(On-heap)¿Õ¼ä½øÐÐÁ˸üΪÏêϸµÄ·ÖÅ䣬ÒÔ³ä·ÖÀûÓÃÄڴ档ͬʱ£¬Spark ÒýÈëÁ˶ÑÍâ(Off-heap)Äڴ棬ʹ֮¿ÉÒÔÖ±½ÓÔÚ¹¤×÷½ÚµãµÄϵͳÄÚ´æÖпª±Ù¿Õ¼ä£¬½øÒ»²½ÓÅ»¯ÁËÄÚ´æµÄʹÓá£

¡¡¡¡Í¼ 1 . ¶ÑÄںͶÑÍâÄÚ´æÊ¾Òâͼ
¡¡¡¡1.1 ¶ÑÄÚÄÚ´æ
¡¡¡¡¶ÑÄÚÄÚ´æµÄ´óС£¬ÓÉ Spark Ó¦ÓóÌÐòÆô¶¯Ê±µÄ ¨Cexecutor-memory »ò spark.executor.memory
²ÎÊýÅäÖá£Executor ÄÚÔËÐеIJ¢·¢ÈÎÎñ¹²Ïí JVM ¶ÑÄÚÄڴ棬ÕâЩÈÎÎñÔÚ»º´æ RDD Êý¾ÝºÍ¹ã²¥(Broadcast)Êý¾ÝʱռÓõÄÄÚ´æ±»¹æ»®Îª´æ´¢(Storage)Äڴ棬¶øÕâЩÈÎÎñÔÚÖ´ÐÐ
Shuffle ʱռÓõÄÄÚ´æ±»¹æ»®ÎªÖ´ÐÐ(Execution)Äڴ棬ʣÓàµÄ²¿·Ö²»×öÌØÊâ¹æ»®£¬ÄÇЩ Spark
ÄÚ²¿µÄ¶ÔÏóʵÀý£¬»òÕßÓû§¶¨ÒåµÄ Spark Ó¦ÓóÌÐòÖеĶÔÏóʵÀý£¬¾ùÕ¼ÓÃÊ£ÓàµÄ¿Õ¼ä¡£²»Í¬µÄ¹ÜÀíģʽÏ£¬ÕâÈý²¿·ÖÕ¼ÓõĿռä´óС¸÷²»Ïàͬ(ÏÂÃæµÚ
2 С½Ú»á½øÐнéÉÜ)¡£
¡¡¡¡Spark ¶Ô¶ÑÄÚÄÚ´æµÄ¹ÜÀíÊÇÒ»ÖÖÂß¼Éϵġ±¹æ»®Ê½¡±µÄ¹ÜÀí£¬ÒòΪ¶ÔÏóʵÀýÕ¼ÓÃÄÚ´æµÄÉêÇëºÍÊͷŶ¼ÓÉ
JVM Íê³É£¬Spark Ö»ÄÜÔÚÉêÇëºóºÍÊÍ·Åǰ¼Ç¼ÕâЩÄڴ棬ÎÒÃÇÀ´¿´Æä¾ßÌåÁ÷³Ì£º
¡¡¡¡ÉêÇëÄڴ棺
¡¡¡¡Spark ÔÚ´úÂëÖÐ new Ò»¸ö¶ÔÏóʵÀý
¡¡¡¡JVM ´Ó¶ÑÄÚÄÚ´æ·ÖÅä¿Õ¼ä£¬´´½¨¶ÔÏó²¢·µ»Ø¶ÔÏóÒýÓÃ
¡¡¡¡Spark ±£´æ¸Ã¶ÔÏóµÄÒýÓ㬼Ǽ¸Ã¶ÔÏóÕ¼ÓõÄÄÚ´æ
¡¡¡¡ÊÍ·ÅÄڴ棺
¡¡¡¡Spark ¼Ç¼¸Ã¶ÔÏóÊͷŵÄÄڴ棬ɾ³ý¸Ã¶ÔÏóµÄÒýÓÃ
¡¡¡¡µÈ´ý JVM µÄÀ¬»ø»ØÊÕ»úÖÆÊͷŸöÔÏóÕ¼ÓõĶÑÄÚÄÚ´æ
¡¡¡¡ÎÒÃÇÖªµÀ£¬JVM µÄ¶ÔÏó¿ÉÒÔÒÔÐòÁл¯µÄ·½Ê½´æ´¢£¬ÐòÁл¯µÄ¹ý³ÌÊǽ«¶ÔÏóת»»Îª¶þ½øÖÆ×Ö½ÚÁ÷£¬±¾ÖÊÉÏ¿ÉÒÔÀí½âΪ½«·ÇÁ¬Ðø¿Õ¼äµÄÁ´Ê½´æ´¢×ª»¯ÎªÁ¬Ðø¿Õ¼ä»ò¿é´æ´¢£¬ÔÚ·ÃÎÊʱÔòÐèÒª½øÐÐÐòÁл¯µÄÄæ¹ý³Ì¡ª¡ª·´ÐòÁл¯£¬½«×Ö½ÚÁ÷ת»¯Îª¶ÔÏó£¬ÐòÁл¯µÄ·½Ê½¿ÉÒÔ½ÚÊ¡´æ´¢¿Õ¼ä£¬µ«Ôö¼ÓÁË´æ´¢ºÍ¶ÁȡʱºòµÄ¼ÆË㿪Ïú¡£
¡¡¡¡¶ÔÓÚ Spark ÖÐÐòÁл¯µÄ¶ÔÏó£¬ÓÉÓÚÊÇ×Ö½ÚÁ÷µÄÐÎʽ£¬ÆäÕ¼ÓõÄÄÚ´æ´óС¿ÉÖ±½Ó¼ÆË㣬¶ø¶ÔÓÚ·ÇÐòÁл¯µÄ¶ÔÏ󣬯äÕ¼ÓõÄÄÚ´æÊÇͨ¹ýÖÜÆÚÐԵزÉÑù½üËÆ¹ÀËã¶øµÃ£¬¼´²¢²»ÊÇÿ´ÎÐÂÔöµÄÊý¾ÝÏî¶¼»á¼ÆËãÒ»´ÎÕ¼ÓõÄÄÚ´æ´óС£¬ÕâÖÖ·½·¨½µµÍÁËʱ¼ä¿ªÏúµ«ÊÇÓпÉÄÜÎó²î½Ï´ó£¬µ¼ÖÂijһʱ¿ÌµÄʵ¼ÊÄÚ´æÓпÉÄÜÔ¶Ô¶³¬³öÔ¤ÆÚ[2]¡£´ËÍ⣬ÔÚ±»
Spark ±ê¼ÇΪÊͷŵĶÔÏóʵÀý£¬ºÜÓпÉÄÜÔÚʵ¼ÊÉϲ¢Ã»Óб» JVM »ØÊÕ£¬µ¼ÖÂʵ¼Ê¿ÉÓõÄÄÚ´æÐ¡ÓÚ Spark
¼Ç¼µÄ¿ÉÓÃÄÚ´æ¡£ËùÒÔ Spark ²¢²»ÄÜ׼ȷ¼Ç¼ʵ¼Ê¿ÉÓõĶÑÄÚÄڴ棬´Ó¶øÒ²¾ÍÎÞ·¨ÍêÈ«±ÜÃâÄÚ´æÒç³ö(OOM,
Out of Memory)µÄÒì³£¡£
¡¡¡¡ËäÈ»²»Äܾ«×¼¿ØÖƶÑÄÚÄÚ´æµÄÉêÇëºÍÊÍ·Å£¬µ« Spark ͨ¹ý¶Ô´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¸÷×Ô¶ÀÁ¢µÄ¹æ»®¹ÜÀí£¬¿ÉÒÔ¾ö¶¨ÊÇ·ñÒªÔÚ´æ´¢ÄÚ´æÀﻺ´æÐµÄ
RDD£¬ÒÔ¼°ÊÇ·ñΪеÄÈÎÎñ·ÖÅäÖ´ÐÐÄڴ棬ÔÚÒ»¶¨³Ì¶ÈÉÏ¿ÉÒÔÌáÉýÄÚ´æµÄÀûÓÃÂÊ£¬¼õÉÙÒì³£µÄ³öÏÖ¡£
¡¡¡¡1.2 ¶ÑÍâÄÚ´æ
¡¡¡¡ÎªÁ˽øÒ»²½ÓÅ»¯ÄÚ´æµÄʹÓÃÒÔ¼°Ìá¸ß Shuffle ʱÅÅÐòµÄЧÂÊ£¬Spark ÒýÈëÁ˶ÑÍâ(Off-heap)Äڴ棬ʹ֮¿ÉÒÔÖ±½ÓÔÚ¹¤×÷½ÚµãµÄϵͳÄÚ´æÖпª±Ù¿Õ¼ä£¬´æ´¢¾¹ýÐòÁл¯µÄ¶þ½øÖÆÊý¾Ý¡£ÀûÓÃ
JDK Unsafe API(´Ó Spark 2.0 ¿ªÊ¼£¬ÔÚ¹ÜÀí¶ÑÍâµÄ´æ´¢ÄÚ´æÊ±²»ÔÙ»ùÓÚ Tachyon£¬¶øÊÇÓë¶ÑÍâµÄÖ´ÐÐÄÚ´æÒ»Ñù£¬»ùÓÚ
JDK Unsafe API ʵÏÖ[3])£¬Spark ¿ÉÒÔÖ±½Ó²Ù×÷ϵͳ¶ÑÍâÄڴ棬¼õÉÙÁ˲»±ØÒªµÄÄڴ濪Ïú£¬ÒÔ¼°Æµ·±µÄ
GC ɨÃèºÍ»ØÊÕ£¬ÌáÉýÁË´¦ÀíÐÔÄÜ¡£¶ÑÍâÄÚ´æ¿ÉÒÔ±»¾«È·µØÉêÇëºÍÊÍ·Å£¬¶øÇÒÐòÁл¯µÄÊý¾ÝÕ¼ÓõĿռä¿ÉÒÔ±»¾«È·¼ÆË㣬ËùÒÔÏà±È¶ÑÄÚÄÚ´æÀ´Ëµ½µµÍÁ˹ÜÀíµÄÄѶȣ¬Ò²½µµÍÁËÎó²î¡£
¡¡¡¡ÔÚĬÈÏÇé¿ö϶ÑÍâÄÚ´æ²¢²»ÆôÓ㬿Éͨ¹ýÅäÖà spark.memory.offHeap.enabled
²ÎÊýÆôÓ㬲¢ÓÉ spark.memory.offHeap.size ²ÎÊýÉ趨¶ÑÍâ¿Õ¼äµÄ´óС¡£³ýÁËûÓÐ
other ¿Õ¼ä£¬¶ÑÍâÄÚ´æÓë¶ÑÄÚÄÚ´æµÄ»®·Ö·½Ê½Ïàͬ£¬ËùÓÐÔËÐÐÖеIJ¢·¢ÈÎÎñ¹²Ïí´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¡£
¡¡¡¡1.3 ÄÚ´æ¹ÜÀí½Ó¿Ú
¡¡¡¡Spark Ϊ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æµÄ¹ÜÀíÌṩÁËͳһµÄ½Ó¿Ú¡ª¡ªMemoryManager£¬Í¬Ò»¸ö
Executor ÄÚµÄÈÎÎñ¶¼µ÷ÓÃÕâ¸ö½Ó¿ÚµÄ·½·¨À´ÉêÇë»òÊÍ·ÅÄÚ´æ:
¡¡¡¡Çåµ¥ 1 . ÄÚ´æ¹ÜÀí½Ó¿ÚµÄÖ÷Òª·½·¨
//ÉêÇë´æ´¢ÄÚ´æ def acquireStorageMemory(blockId: BlockId,
numBytes: Long, memoryMode: MemoryMode):
Boolean //ÉêÇëÕ¹¿ªÄÚ´æ def acquireUnrollMemory(blockId: BlockId,
numBytes:
Long, memoryMode: MemoryMode): Boolean //ÉêÇëÖ´ÐÐÄÚ´æ def acquireExecutionMemory(numBytes: Long,
taskAttemptId:
Long, memoryMode: MemoryMode): Long //ÊÍ·Å´æ´¢ÄÚ´æ def releaseStorageMemory(numBytes: Long,
memoryMode:
MemoryMode): Unit //ÊÍ·ÅÖ´ÐÐÄÚ´æ def releaseExecutionMemory(numBytes: Long,
taskAttemptId:
Long, memoryMode: MemoryMode): Unit //ÊÍ·ÅÕ¹¿ªÄÚ´æ def releaseUnrollMemory(numBytes: Long,
memoryMode:
MemoryMode): Unit |
ÎÒÃÇ¿´µ½£¬ÔÚµ÷ÓÃÕâЩ·½·¨Ê±¶¼ÐèÒªÖ¸¶¨ÆäÄÚ´æÄ£Ê½(MemoryMode)£¬Õâ¸ö²ÎÊý¾ö¶¨ÁËÊÇÔÚ¶ÑÄÚ»¹ÊǶÑÍâÍê³ÉÕâ´Î²Ù×÷¡£
¡¡¡¡MemoryManager µÄ¾ßÌåʵÏÖÉÏ£¬Spark 1.6 Ö®ºóĬÈÏΪͳһ¹ÜÀí(Unified
Memory Manager)·½Ê½£¬1.6 ֮ǰ²ÉÓõľ²Ì¬¹ÜÀí(Static Memory Manager)·½Ê½ÈÔ±»±£Áô£¬¿Éͨ¹ýÅäÖÃ
spark.memory.useLegacyMode ²ÎÊýÆôÓá£Á½ÖÖ·½Ê½µÄÇø±ðÔÚÓÚ¶Ô¿Õ¼ä·ÖÅäµÄ·½Ê½£¬ÏÂÃæµÄµÚ
2 С½Ú»á·Ö±ð¶ÔÕâÁ½ÖÖ·½Ê½½øÐнéÉÜ¡£
¡¡¡¡2 . ÄÚ´æ¿Õ¼ä·ÖÅä
¡¡¡¡2.1 ¾²Ì¬ÄÚ´æ¹ÜÀí
¡¡¡¡ÔÚ Spark ×î³õ²ÉÓõľ²Ì¬ÄÚ´æ¹ÜÀí»úÖÆÏ£¬´æ´¢ÄÚ´æ¡¢Ö´ÐÐÄÚ´æºÍÆäËûÄÚ´æµÄ´óСÔÚ Spark
Ó¦ÓóÌÐòÔËÐÐÆÚ¼ä¾ùΪ¹Ì¶¨µÄ£¬µ«Óû§¿ÉÒÔÓ¦ÓóÌÐòÆô¶¯Ç°½øÐÐÅäÖ㬶ÑÄÚÄÚ´æµÄ·ÖÅäÈçͼ 2 Ëùʾ£º

ͼ 2 . ¾²Ì¬ÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÄÚ
¡¡¡¡¿ÉÒÔ¿´µ½£¬¿ÉÓõĶÑÄÚÄÚ´æµÄ´óСÐèÒª°´ÕÕÏÂÃæµÄ·½Ê½¼ÆË㣺
¡¡¡¡Çåµ¥ 2 . ¿ÉÓöÑÄÚÄÚ´æ¿Õ¼ä
¿ÉÓõĴ洢ÄÚ´æ = systemMaxMemory * spark.storage
.memoryFraction * spark.storage.safetyFraction ¿ÉÓõÄÖ´ÐÐÄÚ´æ = systemMaxMemory * spark.shuffle.
memoryFraction * spark.shuffle.safetyFraction |
ÆäÖÐ systemMaxMemory È¡¾öÓÚµ±Ç° JVM ¶ÑÄÚÄÚ´æµÄ´óС£¬×îºó¿ÉÓõÄÖ´ÐÐÄÚ´æ»òÕß´æ´¢ÄÚ´æÒªÔÚ´Ë»ù´¡ÉÏÓë¸÷×ÔµÄ
memoryFraction ²ÎÊýºÍ safetyFraction ²ÎÊýÏà³ËµÃ³ö¡£ÉÏÊö¼ÆË㹫ʽÖеÄÁ½¸ö
safetyFraction ²ÎÊý£¬ÆäÒâÒåÔÚÓÚÔÚÂß¼ÉÏÔ¤Áô³ö 1-safetyFraction Õâôһ¿é±£ÏÕÇøÓò£¬½µµÍÒòʵ¼ÊÄڴ泬³öµ±Ç°Ô¤É跶Χ¶øµ¼ÖÂ
OOM µÄ·çÏÕ(ÉÏÎÄÌáµ½£¬¶ÔÓÚ·ÇÐòÁл¯¶ÔÏóµÄÄÚ´æ²ÉÑù¹ÀËã»á²úÉúÎó²î)¡£ÖµµÃ×¢ÒâµÄÊÇ£¬Õâ¸öÔ¤ÁôµÄ±£ÏÕÇøÓò½ö½öÊÇÒ»ÖÖÂß¼ÉϵĹ滮£¬ÔÚ¾ßÌåʹÓÃʱ
Spark ²¢Ã»ÓÐÇø±ð¶Ô´ý£¬ºÍ¡±ÆäËüÄڴ桱һÑù½»¸øÁË JVM È¥¹ÜÀí¡£
¡¡¡¡¶ÑÍâµÄ¿Õ¼ä·ÖÅä½ÏΪ¼òµ¥£¬Ö»Óд洢ÄÚ´æºÍÖ´ÐÐÄڴ棬Èçͼ 3 Ëùʾ¡£¿ÉÓõÄÖ´ÐÐÄÚ´æºÍ´æ´¢ÄÚ´æÕ¼ÓõĿռä´óСֱ½ÓÓɲÎÊý
spark.memory.storageFraction ¾ö¶¨£¬ÓÉÓÚ¶ÑÍâÄÚ´æÕ¼ÓõĿռä¿ÉÒÔ±»¾«È·¼ÆË㣬ËùÒÔÎÞÐèÔÙÉ趨±£ÏÕÇøÓò¡£

¡¡¡¡Í¼ 3 . ¾²Ì¬ÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÍâ
¡¡¡¡¾²Ì¬ÄÚ´æ¹ÜÀí»úÖÆÊµÏÖÆðÀ´½ÏΪ¼òµ¥£¬µ«Èç¹ûÓû§²»ÊìϤ Spark µÄ´æ´¢»úÖÆ£¬»òûÓиù¾Ý¾ßÌåµÄÊý¾Ý¹æÄ£ºÍ¼ÆËãÈÎÎñ»ò×öÏàÓ¦µÄÅäÖ㬺ÜÈÝÒ×Ôì³É¡±Ò»°ëº£Ë®£¬Ò»°ë»ðÑæ¡±µÄ¾ÖÃæ£¬¼´´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÖеÄÒ»·½Ê£Óà´óÁ¿µÄ¿Õ¼ä£¬¶øÁíÒ»·½È´ÔçÔç±»Õ¼Âú£¬²»µÃ²»ÌÔÌ»òÒÆ³ö¾ÉµÄÄÚÈÝÒԴ洢еÄÄÚÈÝ¡£ÓÉÓÚеÄÄÚ´æ¹ÜÀí»úÖÆµÄ³öÏÖ£¬ÕâÖÖ·½Ê½Ä¿Ç°ÒѾºÜÉÙÓпª·¢ÕßʹÓ㬳öÓÚ¼æÈݾɰ汾µÄÓ¦ÓóÌÐòµÄÄ¿µÄ£¬Spark
ÈÔÈ»±£ÁôÁËËüµÄʵÏÖ¡£
¡¡¡¡2.2 ͳһÄÚ´æ¹ÜÀí
¡¡¡¡Spark 1.6 Ö®ºóÒýÈëµÄͳһÄÚ´æ¹ÜÀí»úÖÆ£¬Ó뾲̬ÄÚ´æ¹ÜÀíµÄÇø±ðÔÚÓÚ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¹²Ïíͬһ¿é¿Õ¼ä£¬¿ÉÒÔ¶¯Ì¬Õ¼ÓöԷ½µÄ¿ÕÏÐÇøÓò£¬Èçͼ
4 ºÍͼ 5 Ëùʾ

¡¡¡¡Í¼ 4 . ͳһÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÄÚ

¡¡¡¡Í¼ 5 . ͳһÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÍâ
¡¡¡¡ÆäÖÐ×îÖØÒªµÄÓÅ»¯ÔÚÓÚ¶¯Ì¬Õ¼ÓûúÖÆ£¬Æä¹æÔòÈçÏ£º
¡¡¡¡É趨»ù±¾µÄ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÇøÓò(spark.storage.storageFraction
²ÎÊý)£¬¸ÃÉ趨ȷ¶¨ÁËË«·½¸÷×ÔÓµÓеĿռäµÄ·¶Î§
¡¡¡¡Ë«·½µÄ¿Õ¼ä¶¼²»×ãʱ£¬Ôò´æ´¢µ½Ó²ÅÌ;Èô¼º·½¿Õ¼ä²»×ã¶ø¶Ô·½¿ÕÓàʱ£¬¿É½èÓöԷ½µÄ¿Õ¼ä;(´æ´¢¿Õ¼ä²»×ãÊÇÖ¸²»×ãÒÔ·ÅÏÂÒ»¸öÍêÕûµÄ
Block)
¡¡¡¡Ö´ÐÐÄÚ´æµÄ¿Õ¼ä±»¶Ô·½Õ¼Óú󣬿ÉÈöԷ½½«Õ¼ÓõIJ¿·Öת´æµ½Ó²ÅÌ£¬È»ºó¡±¹é»¹¡±½èÓõĿռä
¡¡¡¡´æ´¢ÄÚ´æµÄ¿Õ¼ä±»¶Ô·½Õ¼Óúó£¬ÎÞ·¨ÈöԷ½¡±¹é»¹¡±£¬ÒòΪÐèÒª¿¼ÂÇ Shuffle ¹ý³ÌÖеĺܶàÒòËØ£¬ÊµÏÖÆðÀ´½ÏΪ¸´ÔÓ[4]

¡¡¡¡Í¼ 6 . ¶¯Ì¬Õ¼ÓûúÖÆÍ¼Ê¾
¡¡¡¡Æ¾½èͳһÄÚ´æ¹ÜÀí»úÖÆ£¬Spark ÔÚÒ»¶¨³Ì¶ÈÉÏÌá¸ßÁ˶ÑÄںͶÑÍâÄÚ´æ×ÊÔ´µÄÀûÓÃÂÊ£¬½µµÍÁË¿ª·¢Õßά»¤
Spark ÄÚ´æµÄÄѶȣ¬µ«²¢²»Òâζ×Å¿ª·¢Õß¿ÉÒÔ¸ßÕíÎÞÓÇ¡£Æ©È磬ËùÒÔÈç¹û´æ´¢ÄÚ´æµÄ¿Õ¼äÌ«´ó»òÕß˵»º´æµÄÊý¾Ý¹ý¶à£¬·´¶ø»áµ¼ÖÂÆµ·±µÄÈ«Á¿À¬»ø»ØÊÕ£¬½µµÍÈÎÎñÖ´ÐÐʱµÄÐÔÄÜ£¬ÒòΪ»º´æµÄ
RDD Êý¾Ýͨ³£¶¼Êdz¤ÆÚפÁôÄÚ´æµÄ [5] ¡£ËùÒÔÒªÏë³ä·Ö·¢»Ó Spark µÄÐÔÄÜ£¬ÐèÒª¿ª·¢Õß½øÒ»²½ÁË½â´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¸÷×ԵĹÜÀí·½Ê½ºÍʵÏÖÔÀí¡£
¡¡¡¡3. ´æ´¢ÄÚ´æ¹ÜÀí
¡¡¡¡3.1 RDD µÄ³Ö¾Ã»¯»úÖÆ
¡¡¡¡µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯(RDD)×÷Ϊ Spark ×î¸ù±¾µÄÊý¾Ý³éÏó£¬ÊÇÖ»¶ÁµÄ·ÖÇø¼Ç¼(Partition)µÄ¼¯ºÏ£¬Ö»ÄÜ»ùÓÚÔÚÎȶ¨ÎïÀí´æ´¢ÖеÄÊý¾Ý¼¯ÉÏ´´½¨£¬»òÕßÔÚÆäËûÒÑÓеÄ
RDD ÉÏÖ´ÐÐת»»(Transformation)²Ù×÷²úÉúÒ»¸öÐ嵀 RDD¡£×ª»»ºóµÄ RDD ÓëÔʼµÄ
RDD Ö®¼ä²úÉúµÄÒÀÀµ¹ØÏµ£¬¹¹³ÉÁËѪͳ(Lineage)¡£Æ¾½èѪͳ£¬Spark ±£Ö¤ÁËÿһ¸ö RDD
¶¼¿ÉÒÔ±»ÖØÐ»ָ´¡£µ« RDD µÄËùÓÐת»»¶¼ÊǶèÐԵ쬼´Ö»Óе±Ò»¸ö·µ»Ø½á¹û¸ø Driver µÄÐж¯(Action)·¢Éúʱ£¬Spark
²Å»á´´½¨ÈÎÎñ¶ÁÈ¡ RDD£¬È»ºóÕæÕý´¥·¢×ª»»µÄÖ´ÐС£
¡¡¡¡Task ÔÚÆô¶¯Ö®³õ¶Áȡһ¸ö·ÖÇøÊ±£¬»áÏÈÅжÏÕâ¸ö·ÖÇøÊÇ·ñÒѾ±»³Ö¾Ã»¯£¬Èç¹ûûÓÐÔòÐèÒª¼ì²é Checkpoint
»ò°´ÕÕÑªÍ³ÖØÐ¼ÆËã¡£ËùÒÔÈç¹ûÒ»¸ö RDD ÉÏÒªÖ´Ðжà´ÎÐж¯£¬¿ÉÒÔÔÚµÚÒ»´ÎÐж¯ÖÐʹÓà persist
»ò cache ·½·¨£¬ÔÚÄÚ´æ»ò´ÅÅÌÖг־û¯»ò»º´æÕâ¸ö RDD£¬´Ó¶øÔÚºóÃæµÄÐж¯Ê±ÌáÉý¼ÆËãËÙ¶È¡£ÊÂʵÉÏ£¬cache
·½·¨ÊÇʹÓÃĬÈ쵀 MEMORY_ONLY µÄ´æ´¢¼¶±ð½« RDD ³Ö¾Ã»¯µ½Äڴ棬¹Ê»º´æÊÇÒ»ÖÖÌØÊâµÄ³Ö¾Ã»¯¡£
¶ÑÄںͶÑÍâ´æ´¢ÄÚ´æµÄÉè¼Æ£¬±ã¿ÉÒÔ¶Ô»º´æ RDD ʱʹÓõÄÄÚ´æ×öͳһµÄ¹æ»®ºÍ¹Ü Àí (´æ´¢ÄÚ´æµÄÆäËûÓ¦Óó¡¾°£¬È绺´æ
broadcast Êý¾Ý£¬ÔÝʱ²»ÔÚ±¾ÎĵÄÌÖÂÛ·¶Î§Ö®ÄÚ)¡£
¡¡¡¡RDD µÄ³Ö¾Ã»¯ÓÉ Spark µÄ Storage Ä£¿é [7] ¸ºÔð£¬ÊµÏÖÁË RDD ÓëÎïÀí´æ´¢µÄ½âñîºÏ¡£Storage
Ä£¿é¸ºÔð¹ÜÀí Spark ÔÚ¼ÆËã¹ý³ÌÖвúÉúµÄÊý¾Ý£¬½«ÄÇЩÔÚÄÚ´æ»ò´ÅÅÌ¡¢ÔÚ±¾µØ»òÔ¶³Ì´æÈ¡Êý¾ÝµÄ¹¦ÄÜ·â×°ÁËÆðÀ´¡£ÔÚ¾ßÌåʵÏÖʱ
Driver ¶ËºÍ Executor ¶ËµÄ Storage Ä£¿é¹¹³ÉÁËÖ÷´ÓʽµÄ¼Ü¹¹£¬¼´ Driver
¶ËµÄ BlockManager Ϊ Master£¬Executor ¶ËµÄ BlockManager
Ϊ Slave¡£Storage Ä£¿éÔÚÂß¼ÉÏÒÔ Block Ϊ»ù±¾´æ´¢µ¥Î»£¬RDD µÄÿ¸ö Partition
¾¹ý´¦ÀíºóΨһ¶ÔÓ¦Ò»¸ö Block(BlockId µÄ¸ñʽΪ rdd_RDD-ID_PARTITION-ID
)¡£Master ¸ºÔðÕû¸ö Spark Ó¦ÓóÌÐòµÄ Block µÄÔªÊý¾ÝÐÅÏ¢µÄ¹ÜÀíºÍά»¤£¬¶ø Slave
ÐèÒª½« Block µÄ¸üеÈ״̬Éϱ¨µ½ Master£¬Í¬Ê±½ÓÊÕ Master µÄÃüÁÀýÈçÐÂÔö»òɾ³ýÒ»¸ö
RDD¡£

¡¡¡¡Í¼ 7 . Storage Ä£¿éʾÒâͼ
¡¡¡¡ÔÚ¶Ô RDD ³Ö¾Ã»¯Ê±£¬Spark ¹æ¶¨ÁË MEMORY_ONLY¡¢MEMORY_AND_DISK
µÈ 7 ÖÖ²»Í¬µÄ ´æ´¢¼¶±ð £¬¶ø´æ´¢¼¶±ðÊÇÒÔÏ 5 ¸ö±äÁ¿µÄ×éºÏ£º
¡¡¡¡Çåµ¥ 3 . ´æ´¢¼¶±ð
class StorageLevel private( private var _useDisk: Boolean, //´ÅÅÌ private var _useMemory: Boolean, //ÕâÀïÆäʵÊÇÖ¸¶ÑÄÚÄÚ´æ private var _useOffHeap: Boolean, //¶ÑÍâÄÚ´æ private var _deserialized: Boolean, //ÊÇ·ñΪ·ÇÐòÁл¯ private var _replication: Int = 1 //¸±±¾¸öÊý ) |
¡¡¡¡
ͨ¹ý¶ÔÊý¾Ý½á¹¹µÄ·ÖÎö£¬¿ÉÒÔ¿´³ö´æ´¢¼¶±ð´ÓÈý¸öά¶È¶¨ÒåÁË RDD µÄ Partition(ͬʱҲ¾ÍÊÇ
Block)µÄ´æ´¢·½Ê½£º
¡¡¡¡´æ´¢Î»Ö㺴ÅÅÌ/¶ÑÄÚÄÚ´æ/¶ÑÍâÄÚ´æ¡£Èç MEMORY_AND_DISK ÊÇͬʱÔÚ´ÅÅ̺ͶÑÄÚÄÚ´æÉÏ´æ´¢£¬ÊµÏÖÁËÈßÓ౸·Ý¡£OFF_HEAP
ÔòÊÇÖ»ÔÚ¶ÑÍâÄÚ´æ´æ´¢£¬Ä¿Ç°Ñ¡Ôñ¶ÑÍâÄÚ´æÊ±²»ÄÜͬʱ´æ´¢µ½ÆäËûλÖá£
¡¡¡¡´æ´¢ÐÎʽ£ºBlock »º´æµ½´æ´¢ÄÚ´æºó£¬ÊÇ·ñΪ·ÇÐòÁл¯µÄÐÎʽ¡£Èç MEMORY_ONLY ÊÇ·ÇÐòÁл¯·½Ê½´æ´¢£¬OFF_HEAP
ÊÇÐòÁл¯·½Ê½´æ´¢¡£
¡¡¡¡¸±±¾ÊýÁ¿£º´óÓÚ 1 ʱÐèÒªÔ¶³ÌÈßÓ౸·Ýµ½ÆäËû½Úµã¡£Èç DISK_ONLY_2 ÐèÒªÔ¶³Ì±¸·Ý 1
¸ö¸±±¾¡£
¡¡¡¡3.2 RDD »º´æµÄ¹ý³Ì
¡¡¡¡RDD ÔÚ»º´æµ½´æ´¢ÄÚ´æÖ®Ç°£¬Partition ÖеÄÊý¾ÝÒ»°ãÒÔµü´úÆ÷(Iterator)µÄÊý¾Ý½á¹¹À´·ÃÎÊ£¬ÕâÊÇ
Scala ÓïÑÔÖÐÒ»ÖÖ±éÀúÊý¾Ý¼¯ºÏµÄ·½·¨¡£Í¨¹ý Iterator ¿ÉÒÔ»ñÈ¡·ÖÇøÖÐÿһÌõÐòÁл¯»òÕß·ÇÐòÁл¯µÄÊý¾ÝÏî(Record)£¬ÕâЩ
Record µÄ¶ÔÏóʵÀýÔÚÂß¼ÉÏÕ¼ÓÃÁË JVM ¶ÑÄÚÄÚ´æµÄ other ²¿·ÖµÄ¿Õ¼ä£¬Í¬Ò» Partition
µÄ²»Í¬ Record µÄ¿Õ¼ä²¢²»Á¬Ðø¡£
¡¡¡¡RDD ÔÚ»º´æµ½´æ´¢ÄÚ´æÖ®ºó£¬Partition ±»×ª»»³É Block£¬Record ÔÚ¶ÑÄÚ»ò¶ÑÍâ´æ´¢ÄÚ´æÖÐÕ¼ÓÃÒ»¿éÁ¬ÐøµÄ¿Õ¼ä¡£½«PartitionÓɲ»Á¬ÐøµÄ´æ´¢¿Õ¼äת»»ÎªÁ¬Ðø´æ´¢¿Õ¼äµÄ¹ý³Ì£¬Spark³ÆÖ®Îª¡±Õ¹¿ª¡±(Unroll)¡£Block
ÓÐÐòÁл¯ºÍ·ÇÐòÁл¯Á½ÖÖ´æ´¢¸ñʽ£¬¾ßÌåÒÔÄÄÖÖ·½Ê½È¡¾öÓڸà RDD µÄ´æ´¢¼¶±ð¡£·Ç·´ÐòÁл¯µÄ Block
ÒÔÒ»ÖÖ DeserializedMemoryEntry µÄÊý¾Ý½á¹¹¶¨Ò壬ÓÃÒ»¸öÊý×é´æ´¢ËùÓÐµÄ Java
¶ÔÏ󣬷ÇÐòÁл¯µÄ Block ÔòÒÔ SerializedMemoryEntry µÄÊý¾Ý½á¹¹¶¨Ò壬ÓÃ×Ö½Ú»º³åÇø(ByteBuffer)À´´æ´¢¶þ½øÖÆÊý¾Ý¡£Ã¿¸ö
Executor µÄ Storage Ä£¿éÓÃÒ»¸öÁ´Ê½ Map ½á¹¹(LinkedHashMap)À´¹ÜÀí¶ÑÄںͶÑÍâ´æ´¢ÄÚ´æÖÐËùÓеÄ
Block ¶ÔÏóµÄʵÀý[6]£¬¶ÔÕâ¸ö LinkedHashMap ÐÂÔöºÍɾ³ý¼ä½Ó¼Ç¼ÁËÄÚ´æµÄÉêÇëºÍÊÍ·Å¡£
¡¡¡¡ÒòΪ²»Äܱ£Ö¤´æ´¢¿Õ¼ä¿ÉÒÔÒ»´ÎÈÝÄÉ Iterator ÖеÄËùÓÐÊý¾Ý£¬µ±Ç°µÄ¼ÆËãÈÎÎñÔÚ
Unroll ʱҪÏò MemoryManager ÉêÇë×ã¹»µÄ Unroll ¿Õ¼äÀ´ÁÙʱռ룬¿Õ¼ä²»×ãÔò
Unroll ʧ°Ü£¬¿Õ¼ä×㹻ʱ¿ÉÒÔ¼ÌÐø½øÐС£¶ÔÓÚÐòÁл¯µÄ Partition£¬ÆäËùÐèµÄ Unroll
¿Õ¼ä¿ÉÒÔÖ±½ÓÀÛ¼Ó¼ÆË㣬һ´ÎÉêÇë¡£¶ø·ÇÐòÁл¯µÄ Partition ÔòÒªÔÚ±éÀú Record µÄ¹ý³ÌÖÐÒÀ´ÎÉêÇ룬¼´Ã¿¶ÁȡһÌõ
Record£¬²ÉÑù¹ÀËãÆäËùÐèµÄ Unroll ¿Õ¼ä²¢½øÐÐÉêÇ룬¿Õ¼ä²»×ãʱ¿ÉÒÔÖжϣ¬ÊÍ·ÅÒÑÕ¼ÓÃµÄ Unroll
¿Õ¼ä¡£Èç¹û×îÖÕ Unroll ³É¹¦£¬µ±Ç° Partition ËùÕ¼ÓÃµÄ Unroll ¿Õ¼ä±»×ª»»ÎªÕý³£µÄ»º´æ
RDD µÄ´æ´¢¿Õ¼ä£¬ÈçÏÂͼ 8 Ëùʾ¡£
¡¡¡¡ÔÚͼ 3 ºÍͼ 5 ÖпÉÒÔ¿´µ½£¬ÔÚ¾²Ì¬ÄÚ´æ¹ÜÀíʱ£¬Spark ÔÚ´æ´¢ÄÚ´æÖÐרÃÅ»®·ÖÁËÒ»¿é Unroll
¿Õ¼ä£¬Æä´óСÊǹ̶¨µÄ£¬Í³Ò»ÄÚ´æ¹ÜÀíʱÔòûÓÐ¶Ô Unroll ¿Õ¼ä½øÐÐÌØ±ðÇø·Ö£¬µ±´æ´¢¿Õ¼ä²»×ãʱ»á¸ù¾Ý¶¯Ì¬Õ¼ÓûúÖÆ½øÐд¦Àí¡£
¡¡¡¡3.3 ÌÔ̺ÍÂäÅÌ
¡¡¡¡ÓÉÓÚͬһ¸ö Executor µÄËùÓеļÆËãÈÎÎñ¹²ÏíÓÐÏ޵Ĵ洢ÄÚ´æ¿Õ¼ä£¬µ±ÓÐÐ嵀 Block ÐèÒª»º´æµ«ÊÇÊ£Óà¿Õ¼ä²»×ãÇÒÎÞ·¨¶¯Ì¬Õ¼ÓÃʱ£¬¾ÍÒª¶Ô
LinkedHashMap ÖÐµÄ¾É Block ½øÐÐÌÔÌ(Eviction)£¬¶ø±»ÌÔÌµÄ Block
Èç¹ûÆä´æ´¢¼¶±ðÖÐͬʱ°üº¬´æ´¢µ½´ÅÅ̵ÄÒªÇó£¬ÔòÒª¶ÔÆä½øÐÐÂäÅÌ(Drop)£¬·ñÔòÖ±½Óɾ³ý¸Ã Block¡£
¡¡¡¡´æ´¢ÄÚ´æµÄÌÔ̹æÔòΪ£º
¡¡¡¡±»ÌÔÌµÄ¾É Block ÒªÓëРBlock µÄ MemoryMode Ïàͬ£¬¼´Í¬ÊôÓÚ¶ÑÍâ»ò¶ÑÄÚÄÚ´æ
¡¡¡¡ÐÂ¾É Block ²»ÄÜÊôÓÚͬһ¸ö RDD£¬±ÜÃâÑ»·ÌÔÌ
¡¡¡¡¾É Block ËùÊô RDD ²»ÄÜ´¦ÓÚ±»¶Á״̬£¬±ÜÃâÒý·¢Ò»ÖÂÐÔÎÊÌâ
¡¡¡¡±éÀú LinkedHashMap ÖÐ Block£¬°´ÕÕ×î½ü×îÉÙʹÓÃ(LRU)µÄ˳ÐòÌÔÌ£¬Ö±µ½Âú×ãÐÂ
Block ËùÐèµÄ¿Õ¼ä¡£ÆäÖÐ LRU ÊÇ LinkedHashMap µÄÌØÐÔ¡£
¡¡¡¡ÂäÅ̵ÄÁ÷³ÌÔò±È½Ï¼òµ¥£¬Èç¹ûÆä´æ´¢¼¶±ð·ûºÏ_useDisk Ϊ true µÄÌõ¼þ£¬ÔÙ¸ù¾ÝÆä_deserialized
ÅжÏÊÇ·ñÊÇ·ÇÐòÁл¯µÄÐÎʽ£¬ÈôÊÇÔò¶ÔÆä½øÐÐÐòÁл¯£¬×îºó½«Êý¾Ý´æ´¢µ½´ÅÅÌ£¬ÔÚ Storage Ä£¿éÖиüÐÂÆäÐÅÏ¢¡£
¡¡¡¡4. Ö´ÐÐÄÚ´æ¹ÜÀí
¡¡¡¡4.1 ¶àÈÎÎñ¼äÄÚ´æ·ÖÅä
¡¡¡¡Executor ÄÚÔËÐеÄÈÎÎñͬÑù¹²ÏíÖ´ÐÐÄڴ棬Spark ÓÃÒ»¸ö HashMap ½á¹¹±£´æÁËÈÎÎñµ½ÄÚ´æºÄ·ÑµÄÓ³É䡣ÿ¸öÈÎÎñ¿ÉÕ¼ÓõÄÖ´ÐÐÄÚ´æ´óСµÄ·¶Î§Îª
1/2N ~ 1/N£¬ÆäÖÐ N Ϊµ±Ç° Executor ÄÚÕýÔÚÔËÐеÄÈÎÎñµÄ¸öÊý¡£Ã¿¸öÈÎÎñÔÚÆô¶¯Ö®Ê±£¬ÒªÏò
MemoryManager ÇëÇóÉêÇë×îÉÙΪ 1/2N µÄÖ´ÐÐÄڴ棬Èç¹û²»Äܱ»Âú×ãÒªÇóÔò¸ÃÈÎÎñ±»×èÈû£¬Ö±µ½ÓÐÆäËûÈÎÎñÊÍ·ÅÁË×ã¹»µÄÖ´ÐÐÄڴ棬¸ÃÈÎÎñ²Å¿ÉÒÔ±»»½ÐÑ¡£
¡¡¡¡4.2 Shuffle µÄÄÚ´æÕ¼ÓÃ
¡¡¡¡Ö´ÐÐÄÚ´æÖ÷ÒªÓÃÀ´´æ´¢ÈÎÎñÔÚÖ´ÐÐ Shuffle ʱռÓõÄÄڴ棬Shuffle Êǰ´ÕÕÒ»¶¨¹æÔò¶Ô
RDD Êý¾ÝÖØÐ·ÖÇøµÄ¹ý³Ì£¬ÎÒÃÇÀ´¿´ Shuffle µÄ Write ºÍ Read Á½½×¶Î¶ÔÖ´ÐÐÄÚ´æµÄʹÓãº
¡¡¡¡Shuffle Write
¡¡¡¡ÈôÔÚ map ¶ËÑ¡ÔñÆÕͨµÄÅÅÐò·½Ê½£¬»á²ÉÓà ExternalSorter ½øÐÐÍâÅÅ£¬ÔÚÄÚ´æÖд洢Êý¾ÝʱÖ÷ÒªÕ¼ÓöÑÄÚÖ´Ðпռ䡣
¡¡¡¡ÈôÔÚ map ¶ËÑ¡Ôñ Tungsten µÄÅÅÐò·½Ê½£¬Ôò²ÉÓà ShuffleExternalSorter
Ö±½Ó¶ÔÒÔÐòÁл¯ÐÎʽ´æ´¢µÄÊý¾ÝÅÅÐò£¬ÔÚÄÚ´æÖд洢Êý¾Ýʱ¿ÉÒÔÕ¼ÓöÑÍâ»ò¶ÑÄÚÖ´Ðпռ䣬ȡ¾öÓÚÓû§ÊÇ·ñ¿ªÆôÁ˶ÑÍâÄÚ´æÒÔ¼°¶ÑÍâÖ´ÐÐÄÚ´æÊÇ·ñ×ã¹»¡£
¡¡¡¡Shuffle Read
¡¡¡¡ÔÚ¶Ô reduce ¶ËµÄÊý¾Ý½øÐоۺÏʱ£¬Òª½«Êý¾Ý½»¸ø Aggregator ´¦Àí£¬ÔÚÄÚ´æÖд洢Êý¾ÝʱռÓöÑÄÚÖ´Ðпռ䡣
¡¡¡¡Èç¹ûÐèÒª½øÐÐ×îÖÕ½á¹ûÅÅÐò£¬ÔòÒª½«Ôٴν«Êý¾Ý½»¸ø ExternalSorter ´¦Àí£¬Õ¼ÓöÑÄÚÖ´Ðпռ䡣
¡¡¡¡ÔÚ ExternalSorter ºÍ Aggregator ÖУ¬Spark »áʹÓÃÒ»ÖֽРAppendOnlyMap
µÄ¹þÏ£±íÔÚ¶ÑÄÚÖ´ÐÐÄÚ´æÖд洢Êý¾Ý£¬µ«ÔÚ Shuffle ¹ý³ÌÖÐËùÓÐÊý¾Ý²¢²»Äܶ¼±£´æµ½¸Ã¹þÏ£±íÖУ¬µ±Õâ¸ö¹þÏ£±íÕ¼ÓõÄÄÚ´æ»á½øÐÐÖÜÆÚÐԵزÉÑù¹ÀË㣬µ±Æä´óµ½Ò»¶¨³Ì¶È£¬ÎÞ·¨ÔÙ´Ó
MemoryManager ÉêÇ뵽еÄÖ´ÐÐÄÚ´æÊ±£¬Spark ¾Í»á½«ÆäÈ«²¿ÄÚÈÝ´æ´¢µ½´ÅÅÌÎļþÖУ¬Õâ¸ö¹ý³Ì±»³ÆÎªÒç´æ(Spill)£¬Òç´æµ½´ÅÅ̵ÄÎļþ×îºó»á±»¹é²¢(Merge)¡£
¡¡¡¡Shuffle Write ½×¶ÎÖÐÓõ½µÄ Tungsten ÊÇ Databricks ¹«Ë¾Ìá³öµÄ¶Ô
Spark ÓÅ»¯ÄÚ´æºÍ CPU ʹÓõļƻ®[9]£¬½â¾öÁËһЩ JVM ÔÚÐÔÄÜÉϵÄÏÞÖÆºÍ±×¶Ë¡£Spark
»á¸ù¾Ý Shuffle µÄÇé¿öÀ´×Ô¶¯Ñ¡ÔñÊÇ·ñ²ÉÓà Tungsten ÅÅÐò¡£Tungsten ²ÉÓõÄҳʽÄÚ´æ¹ÜÀí»úÖÆ½¨Á¢ÔÚ
MemoryManager Ö®ÉÏ£¬¼´ Tungsten ¶ÔÖ´ÐÐÄÚ´æµÄʹÓýøÐÐÁËÒ»²½µÄ³éÏó£¬ÕâÑùÔÚ Shuffle
¹ý³ÌÖÐÎÞÐè¹ØÐÄÊý¾Ý¾ßÌå´æ´¢ÔÚ¶ÑÄÚ»¹ÊǶÑÍ⡣ÿ¸öÄÚ´æÒ³ÓÃÒ»¸ö MemoryBlock À´¶¨Ò壬²¢Óà Object
obj ºÍ long offset ÕâÁ½¸ö±äÁ¿Í³Ò»±êʶһ¸öÄÚ´æÒ³ÔÚϵͳÄÚ´æÖеĵØÖ·¡£¶ÑÄÚµÄ MemoryBlock
ÊÇÒÔ long ÐÍÊý×éµÄÐÎʽ·ÖÅäµÄÄڴ棬Æä obj µÄֵΪÊÇÕâ¸öÊý×éµÄ¶ÔÏóÒýÓã¬offset ÊÇ long
ÐÍÊý×éµÄÔÚ JVM ÖеijõÊ¼Æ«ÒÆµØÖ·£¬Á½ÕßÅäºÏʹÓÿÉÒÔ¶¨Î»Õâ¸öÊý×éÔÚ¶ÑÄڵľø¶ÔµØÖ·;¶ÑÍâµÄ MemoryBlock
ÊÇÖ±½ÓÉêÇëµ½µÄÄÚ´æ¿é£¬Æä obj Ϊ null£¬offset ÊÇÕâ¸öÄÚ´æ¿éÔÚϵͳÄÚ´æÖÐµÄ 64 λ¾ø¶ÔµØÖ·¡£Spark
Óà MemoryBlock ÇÉÃîµØ½«¶ÑÄںͶÑÍâÄÚ´æÒ³Í³Ò»³éÏó·â×°£¬²¢ÓÃÒ³±í(pageTable)¹ÜÀíÿ¸ö
Task ÉêÇëµ½µÄÄÚ´æÒ³¡£
¡¡¡¡Tungsten ҳʽ¹ÜÀíϵÄËùÓÐÄÚ´æÓà 64 λµÄÂß¼µØÖ·±íʾ£¬ÓÉÒ³ºÅºÍÒ³ÄÚÆ«ÒÆÁ¿×é³É£º
¡¡¡¡Ò³ºÅ£ºÕ¼ 13 λ£¬Î¨Ò»±êʶһ¸öÄÚ´æÒ³£¬Spark ÔÚÉêÇëÄÚ´æÒ³Ö®Ç°ÒªÏÈÉêÇë¿ÕÏÐÒ³ºÅ¡£
¡¡¡¡Ò³ÄÚÆ«ÒÆÁ¿£ºÕ¼ 51 룬ÊÇÔÚʹÓÃÄÚ´æÒ³´æ´¢Êý¾Ýʱ£¬Êý¾ÝÔÚÒ³Ä򵀮«ÒƵØÖ·¡£
¡¡¡¡ÓÐÁËͳһµÄѰַ·½Ê½£¬Spark ¿ÉÒÔÓà 64 λÂß¼µØÖ·µÄÖ¸Õ붨λµ½¶ÑÄÚ»ò¶ÑÍâµÄÄڴ棬Õû¸ö Shuffle
Write ÅÅÐòµÄ¹ý³ÌÖ»ÐèÒª¶ÔÖ¸Õë½øÐÐÅÅÐò£¬²¢ÇÒÎÞÐè·´ÐòÁл¯£¬Õû¸ö¹ý³Ì·Ç³£¸ßЧ£¬¶ÔÓÚÄÚ´æ·ÃÎÊЧÂʺÍ
CPU ʹÓÃЧÂÊ´øÀ´ÁËÃ÷ÏÔµÄÌáÉý[10]¡£
¡¡¡¡Spark µÄ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÓÐ׎ØÈ»²»Í¬µÄ¹ÜÀí·½Ê½£º¶ÔÓÚ´æ´¢ÄÚ´æÀ´Ëµ£¬Spark ÓÃÒ»¸ö
LinkedHashMap À´¼¯ÖйÜÀíËùÓÐµÄ Block£¬Block ÓÉÐèÒª»º´æµÄ RDD µÄ Partition
ת»¯¶ø³É;¶ø¶ÔÓÚÖ´ÐÐÄڴ棬Spark Óà AppendOnlyMap À´´æ´¢ Shuffle ¹ý³ÌÖеÄÊý¾Ý£¬ÔÚ
Tungsten ÅÅÐòÖÐÉõÖÁ³éÏó³ÉΪҳʽÄÚ´æ¹ÜÀí£¬¿ª±ÙÁËÈ«Ð嵀 JVM ÄÚ´æ¹ÜÀí»úÖÆ¡£
¡¡¡¡½áÊøÓï
¡¡¡¡Spark µÄÄÚ´æ¹ÜÀíÊÇÒ»Ì׸´ÔӵĻúÖÆ£¬ÇÒ Spark µÄ°æ±¾¸üбȽϿ죬±ÊÕßˮƽÓÐÏÞ£¬ÄÑÃâÓÐÐðÊö²»Çå¡¢´íÎóµÄµØ·½£¬Èô¶ÁÕßÓкõĽ¨ÒéºÍ¸üÉîµÄÀí½â£¬»¹Íû²»Áߴͽ̡£ |