Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Model Center   Code  
»áÔ±   
   
 
     
   
 ¶©ÔÄ
  ¾èÖú
Apache SparkÄÚ´æ¹ÜÀíÏê½â
 
À´Ô´£ºÍøÂç ·¢²¼ÓÚ£º 2017-5-10
  1724  次浏览      27
 

¡¡¡¡Spark ×÷Ϊһ¸ö»ùÓÚÄÚ´æµÄ·Ö²¼Ê½¼ÆËãÒýÇæ£¬ÆäÄÚ´æ¹ÜÀíÄ£¿éÔÚÕû¸öϵͳÖаçÑÝ×ŷdz£ÖØÒªµÄ½ÇÉ«¡£Àí½â Spark ÄÚ´æ¹ÜÀíµÄ»ù±¾Ô­Àí£¬ÓÐÖúÓÚ¸üºÃµØ¿ª·¢ Spark Ó¦ÓóÌÐòºÍ½øÐÐÐÔÄܵ÷ÓÅ¡£±¾ÎÄÖ¼ÔÚÊáÀí³ö Spark ÄÚ´æ¹ÜÀíµÄÂöÂ磬Å×שÒýÓñ£¬Òý³ö¶ÁÕß¶ÔÕâ¸ö»°ÌâµÄÉîÈë̽ÌÖ¡£±¾ÎÄÖвûÊöµÄÔ­Àí»ùÓÚ Spark 2.1 °æ±¾£¬ÔĶÁ±¾ÎÄÐèÒª¶ÁÕßÓÐÒ»¶¨µÄ Spark ºÍ Java »ù´¡£¬Á˽â RDD¡¢Shuffle¡¢JVM µÈÏà¹Ø¸ÅÄî¡£

¡¡¡¡ÔÚÖ´ÐÐ Spark µÄÓ¦ÓóÌÐòʱ£¬Spark ¼¯Èº»áÆô¶¯ Driver ºÍ Executor Á½ÖÖ JVM ½ø³Ì£¬Ç°ÕßΪÖ÷¿Ø½ø³Ì£¬¸ºÔð´´½¨ Spark ÉÏÏÂÎÄ£¬Ìá½» Spark ×÷Òµ(Job)£¬²¢½«×÷ҵת»¯Îª¼ÆËãÈÎÎñ(Task)£¬ÔÚ¸÷¸ö Executor ½ø³Ì¼äЭµ÷ÈÎÎñµÄµ÷¶È£¬ºóÕ߸ºÔðÔÚ¹¤×÷½ÚµãÉÏÖ´ÐоßÌåµÄ¼ÆËãÈÎÎñ£¬²¢½«½á¹û·µ»Ø¸ø Driver£¬Í¬Ê±ÎªÐèÒª³Ö¾Ã»¯µÄ RDD Ìṩ´æ´¢¹¦ÄÜ[1]¡£ÓÉÓÚ Driver µÄÄÚ´æ¹ÜÀíÏà¶ÔÀ´Ëµ½ÏΪ¼òµ¥£¬±¾ÎÄÖ÷Òª¶Ô Executor µÄÄÚ´æ¹ÜÀí½øÐзÖÎö£¬ÏÂÎÄÖÐµÄ Spark ÄÚ´æ¾ùÌØÖ¸ Executor µÄÄÚ´æ¡£

¡¡¡¡1. ¶ÑÄںͶÑÍâÄÚ´æ¹æ»®

¡¡¡¡×÷Ϊһ¸ö JVM ½ø³Ì£¬Executor µÄÄÚ´æ¹ÜÀí½¨Á¢ÔÚ JVM µÄÄÚ´æ¹ÜÀíÖ®ÉÏ£¬Spark ¶Ô JVM µÄ¶ÑÄÚ(On-heap)¿Õ¼ä½øÐÐÁ˸üΪÏêϸµÄ·ÖÅ䣬ÒÔ³ä·ÖÀûÓÃÄڴ档ͬʱ£¬Spark ÒýÈëÁ˶ÑÍâ(Off-heap)Äڴ棬ʹ֮¿ÉÒÔÖ±½ÓÔÚ¹¤×÷½ÚµãµÄϵͳÄÚ´æÖпª±Ù¿Õ¼ä£¬½øÒ»²½ÓÅ»¯ÁËÄÚ´æµÄʹÓá£

¡¡¡¡Í¼ 1 . ¶ÑÄںͶÑÍâÄÚ´æÊ¾Òâͼ

¡¡¡¡1.1 ¶ÑÄÚÄÚ´æ

¡¡¡¡¶ÑÄÚÄÚ´æµÄ´óС£¬ÓÉ Spark Ó¦ÓóÌÐòÆô¶¯Ê±µÄ ¨Cexecutor-memory »ò spark.executor.memory ²ÎÊýÅäÖá£Executor ÄÚÔËÐеIJ¢·¢ÈÎÎñ¹²Ïí JVM ¶ÑÄÚÄڴ棬ÕâЩÈÎÎñÔÚ»º´æ RDD Êý¾ÝºÍ¹ã²¥(Broadcast)Êý¾ÝʱռÓõÄÄÚ´æ±»¹æ»®Îª´æ´¢(Storage)Äڴ棬¶øÕâЩÈÎÎñÔÚÖ´ÐÐ Shuffle ʱռÓõÄÄÚ´æ±»¹æ»®ÎªÖ´ÐÐ(Execution)Äڴ棬ʣÓàµÄ²¿·Ö²»×öÌØÊâ¹æ»®£¬ÄÇЩ Spark ÄÚ²¿µÄ¶ÔÏóʵÀý£¬»òÕßÓû§¶¨ÒåµÄ Spark Ó¦ÓóÌÐòÖеĶÔÏóʵÀý£¬¾ùÕ¼ÓÃÊ£ÓàµÄ¿Õ¼ä¡£²»Í¬µÄ¹ÜÀíģʽÏ£¬ÕâÈý²¿·ÖÕ¼ÓõĿռä´óС¸÷²»Ïàͬ(ÏÂÃæµÚ 2 С½Ú»á½øÐнéÉÜ)¡£

¡¡¡¡Spark ¶Ô¶ÑÄÚÄÚ´æµÄ¹ÜÀíÊÇÒ»ÖÖÂß¼­Éϵġ±¹æ»®Ê½¡±µÄ¹ÜÀí£¬ÒòΪ¶ÔÏóʵÀýÕ¼ÓÃÄÚ´æµÄÉêÇëºÍÊͷŶ¼ÓÉ JVM Íê³É£¬Spark Ö»ÄÜÔÚÉêÇëºóºÍÊÍ·Åǰ¼Ç¼ÕâЩÄڴ棬ÎÒÃÇÀ´¿´Æä¾ßÌåÁ÷³Ì£º

¡¡¡¡ÉêÇëÄڴ棺

¡¡¡¡Spark ÔÚ´úÂëÖÐ new Ò»¸ö¶ÔÏóʵÀý

¡¡¡¡JVM ´Ó¶ÑÄÚÄÚ´æ·ÖÅä¿Õ¼ä£¬´´½¨¶ÔÏó²¢·µ»Ø¶ÔÏóÒýÓÃ

¡¡¡¡Spark ±£´æ¸Ã¶ÔÏóµÄÒýÓ㬼Ǽ¸Ã¶ÔÏóÕ¼ÓõÄÄÚ´æ

¡¡¡¡ÊÍ·ÅÄڴ棺

¡¡¡¡Spark ¼Ç¼¸Ã¶ÔÏóÊͷŵÄÄڴ棬ɾ³ý¸Ã¶ÔÏóµÄÒýÓÃ

¡¡¡¡µÈ´ý JVM µÄÀ¬»ø»ØÊÕ»úÖÆÊͷŸöÔÏóÕ¼ÓõĶÑÄÚÄÚ´æ

¡¡¡¡ÎÒÃÇÖªµÀ£¬JVM µÄ¶ÔÏó¿ÉÒÔÒÔÐòÁл¯µÄ·½Ê½´æ´¢£¬ÐòÁл¯µÄ¹ý³ÌÊǽ«¶ÔÏóת»»Îª¶þ½øÖÆ×Ö½ÚÁ÷£¬±¾ÖÊÉÏ¿ÉÒÔÀí½âΪ½«·ÇÁ¬Ðø¿Õ¼äµÄÁ´Ê½´æ´¢×ª»¯ÎªÁ¬Ðø¿Õ¼ä»ò¿é´æ´¢£¬ÔÚ·ÃÎÊʱÔòÐèÒª½øÐÐÐòÁл¯µÄÄæ¹ý³Ì¡ª¡ª·´ÐòÁл¯£¬½«×Ö½ÚÁ÷ת»¯Îª¶ÔÏó£¬ÐòÁл¯µÄ·½Ê½¿ÉÒÔ½ÚÊ¡´æ´¢¿Õ¼ä£¬µ«Ôö¼ÓÁË´æ´¢ºÍ¶ÁȡʱºòµÄ¼ÆË㿪Ïú¡£

¡¡¡¡¶ÔÓÚ Spark ÖÐÐòÁл¯µÄ¶ÔÏó£¬ÓÉÓÚÊÇ×Ö½ÚÁ÷µÄÐÎʽ£¬ÆäÕ¼ÓõÄÄÚ´æ´óС¿ÉÖ±½Ó¼ÆË㣬¶ø¶ÔÓÚ·ÇÐòÁл¯µÄ¶ÔÏ󣬯äÕ¼ÓõÄÄÚ´æÊÇͨ¹ýÖÜÆÚÐԵزÉÑù½üËÆ¹ÀËã¶øµÃ£¬¼´²¢²»ÊÇÿ´ÎÐÂÔöµÄÊý¾ÝÏî¶¼»á¼ÆËãÒ»´ÎÕ¼ÓõÄÄÚ´æ´óС£¬ÕâÖÖ·½·¨½µµÍÁËʱ¼ä¿ªÏúµ«ÊÇÓпÉÄÜÎó²î½Ï´ó£¬µ¼ÖÂijһʱ¿ÌµÄʵ¼ÊÄÚ´æÓпÉÄÜÔ¶Ô¶³¬³öÔ¤ÆÚ[2]¡£´ËÍ⣬ÔÚ±» Spark ±ê¼ÇΪÊͷŵĶÔÏóʵÀý£¬ºÜÓпÉÄÜÔÚʵ¼ÊÉϲ¢Ã»Óб» JVM »ØÊÕ£¬µ¼ÖÂʵ¼Ê¿ÉÓõÄÄÚ´æÐ¡ÓÚ Spark ¼Ç¼µÄ¿ÉÓÃÄÚ´æ¡£ËùÒÔ Spark ²¢²»ÄÜ׼ȷ¼Ç¼ʵ¼Ê¿ÉÓõĶÑÄÚÄڴ棬´Ó¶øÒ²¾ÍÎÞ·¨ÍêÈ«±ÜÃâÄÚ´æÒç³ö(OOM, Out of Memory)µÄÒì³£¡£

¡¡¡¡ËäÈ»²»Äܾ«×¼¿ØÖƶÑÄÚÄÚ´æµÄÉêÇëºÍÊÍ·Å£¬µ« Spark ͨ¹ý¶Ô´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¸÷×Ô¶ÀÁ¢µÄ¹æ»®¹ÜÀí£¬¿ÉÒÔ¾ö¶¨ÊÇ·ñÒªÔÚ´æ´¢ÄÚ´æÀﻺ´æÐ嵀 RDD£¬ÒÔ¼°ÊÇ·ñΪеÄÈÎÎñ·ÖÅäÖ´ÐÐÄڴ棬ÔÚÒ»¶¨³Ì¶ÈÉÏ¿ÉÒÔÌáÉýÄÚ´æµÄÀûÓÃÂÊ£¬¼õÉÙÒì³£µÄ³öÏÖ¡£

¡¡¡¡1.2 ¶ÑÍâÄÚ´æ

¡¡¡¡ÎªÁ˽øÒ»²½ÓÅ»¯ÄÚ´æµÄʹÓÃÒÔ¼°Ìá¸ß Shuffle ʱÅÅÐòµÄЧÂÊ£¬Spark ÒýÈëÁ˶ÑÍâ(Off-heap)Äڴ棬ʹ֮¿ÉÒÔÖ±½ÓÔÚ¹¤×÷½ÚµãµÄϵͳÄÚ´æÖпª±Ù¿Õ¼ä£¬´æ´¢¾­¹ýÐòÁл¯µÄ¶þ½øÖÆÊý¾Ý¡£ÀûÓà JDK Unsafe API(´Ó Spark 2.0 ¿ªÊ¼£¬ÔÚ¹ÜÀí¶ÑÍâµÄ´æ´¢ÄÚ´æÊ±²»ÔÙ»ùÓÚ Tachyon£¬¶øÊÇÓë¶ÑÍâµÄÖ´ÐÐÄÚ´æÒ»Ñù£¬»ùÓÚ JDK Unsafe API ʵÏÖ[3])£¬Spark ¿ÉÒÔÖ±½Ó²Ù×÷ϵͳ¶ÑÍâÄڴ棬¼õÉÙÁ˲»±ØÒªµÄÄڴ濪Ïú£¬ÒÔ¼°Æµ·±µÄ GC ɨÃèºÍ»ØÊÕ£¬ÌáÉýÁË´¦ÀíÐÔÄÜ¡£¶ÑÍâÄÚ´æ¿ÉÒÔ±»¾«È·µØÉêÇëºÍÊÍ·Å£¬¶øÇÒÐòÁл¯µÄÊý¾ÝÕ¼ÓõĿռä¿ÉÒÔ±»¾«È·¼ÆË㣬ËùÒÔÏà±È¶ÑÄÚÄÚ´æÀ´Ëµ½µµÍÁ˹ÜÀíµÄÄѶȣ¬Ò²½µµÍÁËÎó²î¡£

¡¡¡¡ÔÚĬÈÏÇé¿ö϶ÑÍâÄÚ´æ²¢²»ÆôÓ㬿Éͨ¹ýÅäÖà spark.memory.offHeap.enabled ²ÎÊýÆôÓ㬲¢ÓÉ spark.memory.offHeap.size ²ÎÊýÉ趨¶ÑÍâ¿Õ¼äµÄ´óС¡£³ýÁËûÓÐ other ¿Õ¼ä£¬¶ÑÍâÄÚ´æÓë¶ÑÄÚÄÚ´æµÄ»®·Ö·½Ê½Ïàͬ£¬ËùÓÐÔËÐÐÖеIJ¢·¢ÈÎÎñ¹²Ïí´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¡£

¡¡¡¡1.3 ÄÚ´æ¹ÜÀí½Ó¿Ú

¡¡¡¡Spark Ϊ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æµÄ¹ÜÀíÌṩÁËͳһµÄ½Ó¿Ú¡ª¡ªMemoryManager£¬Í¬Ò»¸ö Executor ÄÚµÄÈÎÎñ¶¼µ÷ÓÃÕâ¸ö½Ó¿ÚµÄ·½·¨À´ÉêÇë»òÊÍ·ÅÄÚ´æ:

¡¡¡¡Çåµ¥ 1 . ÄÚ´æ¹ÜÀí½Ó¿ÚµÄÖ÷Òª·½·¨

//ÉêÇë´æ´¢ÄÚ´æ
def acquireStorageMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
//ÉêÇëÕ¹¿ªÄÚ´æ
def acquireUnrollMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
//ÉêÇëÖ´ÐÐÄÚ´æ
def acquireExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Long
//ÊÍ·Å´æ´¢ÄÚ´æ
def releaseStorageMemory(numBytes: Long, memoryMode: MemoryMode): Unit
//ÊÍ·ÅÖ´ÐÐÄÚ´æ
def releaseExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Unit
//ÊÍ·ÅÕ¹¿ªÄÚ´æ
def releaseUnrollMemory(numBytes: Long, memoryMode: MemoryMode): Unit

ÎÒÃÇ¿´µ½£¬ÔÚµ÷ÓÃÕâЩ·½·¨Ê±¶¼ÐèÒªÖ¸¶¨ÆäÄÚ´æÄ£Ê½(MemoryMode)£¬Õâ¸ö²ÎÊý¾ö¶¨ÁËÊÇÔÚ¶ÑÄÚ»¹ÊǶÑÍâÍê³ÉÕâ´Î²Ù×÷¡£

¡¡¡¡MemoryManager µÄ¾ßÌåʵÏÖÉÏ£¬Spark 1.6 Ö®ºóĬÈÏΪͳһ¹ÜÀí(Unified Memory Manager)·½Ê½£¬1.6 ֮ǰ²ÉÓõľ²Ì¬¹ÜÀí(Static Memory Manager)·½Ê½ÈÔ±»±£Áô£¬¿Éͨ¹ýÅäÖà spark.memory.useLegacyMode ²ÎÊýÆôÓá£Á½ÖÖ·½Ê½µÄÇø±ðÔÚÓÚ¶Ô¿Õ¼ä·ÖÅäµÄ·½Ê½£¬ÏÂÃæµÄµÚ 2 С½Ú»á·Ö±ð¶ÔÕâÁ½ÖÖ·½Ê½½øÐнéÉÜ¡£

¡¡¡¡2 . ÄÚ´æ¿Õ¼ä·ÖÅä

¡¡¡¡2.1 ¾²Ì¬ÄÚ´æ¹ÜÀí

¡¡¡¡ÔÚ Spark ×î³õ²ÉÓõľ²Ì¬ÄÚ´æ¹ÜÀí»úÖÆÏ£¬´æ´¢ÄÚ´æ¡¢Ö´ÐÐÄÚ´æºÍÆäËûÄÚ´æµÄ´óСÔÚ Spark Ó¦ÓóÌÐòÔËÐÐÆÚ¼ä¾ùΪ¹Ì¶¨µÄ£¬µ«Óû§¿ÉÒÔÓ¦ÓóÌÐòÆô¶¯Ç°½øÐÐÅäÖ㬶ÑÄÚÄÚ´æµÄ·ÖÅäÈçͼ 2 Ëùʾ£º

ͼ 2 . ¾²Ì¬ÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÄÚ

¡¡¡¡¿ÉÒÔ¿´µ½£¬¿ÉÓõĶÑÄÚÄÚ´æµÄ´óСÐèÒª°´ÕÕÏÂÃæµÄ·½Ê½¼ÆË㣺

¡¡¡¡Çåµ¥ 2 . ¿ÉÓöÑÄÚÄÚ´æ¿Õ¼ä

¿ÉÓõĴ洢ÄÚ´æ = systemMaxMemory * spark.storage
.memoryFraction * spark.storage.safetyFraction
¿ÉÓõÄÖ´ÐÐÄÚ´æ = systemMaxMemory * spark.shuffle. memoryFraction * spark.shuffle.safetyFraction

ÆäÖÐ systemMaxMemory È¡¾öÓÚµ±Ç° JVM ¶ÑÄÚÄÚ´æµÄ´óС£¬×îºó¿ÉÓõÄÖ´ÐÐÄÚ´æ»òÕß´æ´¢ÄÚ´æÒªÔÚ´Ë»ù´¡ÉÏÓë¸÷×﵀ memoryFraction ²ÎÊýºÍ safetyFraction ²ÎÊýÏà³ËµÃ³ö¡£ÉÏÊö¼ÆË㹫ʽÖеÄÁ½¸ö safetyFraction ²ÎÊý£¬ÆäÒâÒåÔÚÓÚÔÚÂß¼­ÉÏÔ¤Áô³ö 1-safetyFraction Õâôһ¿é±£ÏÕÇøÓò£¬½µµÍÒòʵ¼ÊÄڴ泬³öµ±Ç°Ô¤É跶Χ¶øµ¼Ö OOM µÄ·çÏÕ(ÉÏÎÄÌáµ½£¬¶ÔÓÚ·ÇÐòÁл¯¶ÔÏóµÄÄÚ´æ²ÉÑù¹ÀËã»á²úÉúÎó²î)¡£ÖµµÃ×¢ÒâµÄÊÇ£¬Õâ¸öÔ¤ÁôµÄ±£ÏÕÇøÓò½ö½öÊÇÒ»ÖÖÂß¼­ÉϵĹ滮£¬ÔÚ¾ßÌåʹÓÃʱ Spark ²¢Ã»ÓÐÇø±ð¶Ô´ý£¬ºÍ¡±ÆäËüÄڴ桱һÑù½»¸øÁË JVM È¥¹ÜÀí¡£

¡¡¡¡¶ÑÍâµÄ¿Õ¼ä·ÖÅä½ÏΪ¼òµ¥£¬Ö»Óд洢ÄÚ´æºÍÖ´ÐÐÄڴ棬Èçͼ 3 Ëùʾ¡£¿ÉÓõÄÖ´ÐÐÄÚ´æºÍ´æ´¢ÄÚ´æÕ¼ÓõĿռä´óСֱ½ÓÓɲÎÊý spark.memory.storageFraction ¾ö¶¨£¬ÓÉÓÚ¶ÑÍâÄÚ´æÕ¼ÓõĿռä¿ÉÒÔ±»¾«È·¼ÆË㣬ËùÒÔÎÞÐèÔÙÉ趨±£ÏÕÇøÓò¡£

¡¡¡¡Í¼ 3 . ¾²Ì¬ÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÍâ

¡¡¡¡¾²Ì¬ÄÚ´æ¹ÜÀí»úÖÆÊµÏÖÆðÀ´½ÏΪ¼òµ¥£¬µ«Èç¹ûÓû§²»ÊìϤ Spark µÄ´æ´¢»úÖÆ£¬»òûÓиù¾Ý¾ßÌåµÄÊý¾Ý¹æÄ£ºÍ¼ÆËãÈÎÎñ»ò×öÏàÓ¦µÄÅäÖ㬺ÜÈÝÒ×Ôì³É¡±Ò»°ëº£Ë®£¬Ò»°ë»ðÑæ¡±µÄ¾ÖÃæ£¬¼´´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÖеÄÒ»·½Ê£Óà´óÁ¿µÄ¿Õ¼ä£¬¶øÁíÒ»·½È´ÔçÔç±»Õ¼Âú£¬²»µÃ²»ÌÔÌ­»òÒÆ³ö¾ÉµÄÄÚÈÝÒԴ洢еÄÄÚÈÝ¡£ÓÉÓÚеÄÄÚ´æ¹ÜÀí»úÖÆµÄ³öÏÖ£¬ÕâÖÖ·½Ê½Ä¿Ç°ÒѾ­ºÜÉÙÓпª·¢ÕßʹÓ㬳öÓÚ¼æÈݾɰ汾µÄÓ¦ÓóÌÐòµÄÄ¿µÄ£¬Spark ÈÔÈ»±£ÁôÁËËüµÄʵÏÖ¡£

¡¡¡¡2.2 ͳһÄÚ´æ¹ÜÀí

¡¡¡¡Spark 1.6 Ö®ºóÒýÈëµÄͳһÄÚ´æ¹ÜÀí»úÖÆ£¬Ó뾲̬ÄÚ´æ¹ÜÀíµÄÇø±ðÔÚÓÚ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¹²Ïíͬһ¿é¿Õ¼ä£¬¿ÉÒÔ¶¯Ì¬Õ¼ÓöԷ½µÄ¿ÕÏÐÇøÓò£¬Èçͼ 4 ºÍͼ 5 Ëùʾ

¡¡¡¡Í¼ 4 . ͳһÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÄÚ

¡¡¡¡Í¼ 5 . ͳһÄÚ´æ¹ÜÀíͼʾ¡ª¡ª¶ÑÍâ

¡¡¡¡ÆäÖÐ×îÖØÒªµÄÓÅ»¯ÔÚÓÚ¶¯Ì¬Õ¼ÓûúÖÆ£¬Æä¹æÔòÈçÏ£º

¡¡¡¡É趨»ù±¾µÄ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÇøÓò(spark.storage.storageFraction ²ÎÊý)£¬¸ÃÉ趨ȷ¶¨ÁËË«·½¸÷×ÔÓµÓеĿռäµÄ·¶Î§

¡¡¡¡Ë«·½µÄ¿Õ¼ä¶¼²»×ãʱ£¬Ôò´æ´¢µ½Ó²ÅÌ;Èô¼º·½¿Õ¼ä²»×ã¶ø¶Ô·½¿ÕÓàʱ£¬¿É½èÓöԷ½µÄ¿Õ¼ä;(´æ´¢¿Õ¼ä²»×ãÊÇÖ¸²»×ãÒÔ·ÅÏÂÒ»¸öÍêÕûµÄ Block)

¡¡¡¡Ö´ÐÐÄÚ´æµÄ¿Õ¼ä±»¶Ô·½Õ¼Óú󣬿ÉÈöԷ½½«Õ¼ÓõIJ¿·Öת´æµ½Ó²ÅÌ£¬È»ºó¡±¹é»¹¡±½èÓõĿռä

¡¡¡¡´æ´¢ÄÚ´æµÄ¿Õ¼ä±»¶Ô·½Õ¼Óúó£¬ÎÞ·¨ÈöԷ½¡±¹é»¹¡±£¬ÒòΪÐèÒª¿¼ÂÇ Shuffle ¹ý³ÌÖеĺܶàÒòËØ£¬ÊµÏÖÆðÀ´½ÏΪ¸´ÔÓ[4]

¡¡¡¡Í¼ 6 . ¶¯Ì¬Õ¼ÓûúÖÆÍ¼Ê¾

¡¡¡¡Æ¾½èͳһÄÚ´æ¹ÜÀí»úÖÆ£¬Spark ÔÚÒ»¶¨³Ì¶ÈÉÏÌá¸ßÁ˶ÑÄںͶÑÍâÄÚ´æ×ÊÔ´µÄÀûÓÃÂÊ£¬½µµÍÁË¿ª·¢Õßά»¤ Spark ÄÚ´æµÄÄѶȣ¬µ«²¢²»Òâζ×Å¿ª·¢Õß¿ÉÒÔ¸ßÕíÎÞÓÇ¡£Æ©È磬ËùÒÔÈç¹û´æ´¢ÄÚ´æµÄ¿Õ¼äÌ«´ó»òÕß˵»º´æµÄÊý¾Ý¹ý¶à£¬·´¶ø»áµ¼ÖÂÆµ·±µÄÈ«Á¿À¬»ø»ØÊÕ£¬½µµÍÈÎÎñÖ´ÐÐʱµÄÐÔÄÜ£¬ÒòΪ»º´æµÄ RDD Êý¾Ýͨ³£¶¼Êdz¤ÆÚפÁôÄÚ´æµÄ [5] ¡£ËùÒÔÒªÏë³ä·Ö·¢»Ó Spark µÄÐÔÄÜ£¬ÐèÒª¿ª·¢Õß½øÒ»²½ÁË½â´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æ¸÷×ԵĹÜÀí·½Ê½ºÍʵÏÖÔ­Àí¡£

¡¡¡¡3. ´æ´¢ÄÚ´æ¹ÜÀí

¡¡¡¡3.1 RDD µÄ³Ö¾Ã»¯»úÖÆ

¡¡¡¡µ¯ÐÔ·Ö²¼Ê½Êý¾Ý¼¯(RDD)×÷Ϊ Spark ×î¸ù±¾µÄÊý¾Ý³éÏó£¬ÊÇÖ»¶ÁµÄ·ÖÇø¼Ç¼(Partition)µÄ¼¯ºÏ£¬Ö»ÄÜ»ùÓÚÔÚÎȶ¨ÎïÀí´æ´¢ÖеÄÊý¾Ý¼¯ÉÏ´´½¨£¬»òÕßÔÚÆäËûÒÑÓÐµÄ RDD ÉÏÖ´ÐÐת»»(Transformation)²Ù×÷²úÉúÒ»¸öÐ嵀 RDD¡£×ª»»ºóµÄ RDD ÓëԭʼµÄ RDD Ö®¼ä²úÉúµÄÒÀÀµ¹ØÏµ£¬¹¹³ÉÁËѪͳ(Lineage)¡£Æ¾½èѪͳ£¬Spark ±£Ö¤ÁËÿһ¸ö RDD ¶¼¿ÉÒÔ±»ÖØÐ»ָ´¡£µ« RDD µÄËùÓÐת»»¶¼ÊǶèÐԵ쬼´Ö»Óе±Ò»¸ö·µ»Ø½á¹û¸ø Driver µÄÐж¯(Action)·¢Éúʱ£¬Spark ²Å»á´´½¨ÈÎÎñ¶ÁÈ¡ RDD£¬È»ºóÕæÕý´¥·¢×ª»»µÄÖ´ÐС£

¡¡¡¡Task ÔÚÆô¶¯Ö®³õ¶Áȡһ¸ö·ÖÇøÊ±£¬»áÏÈÅжÏÕâ¸ö·ÖÇøÊÇ·ñÒѾ­±»³Ö¾Ã»¯£¬Èç¹ûûÓÐÔòÐèÒª¼ì²é Checkpoint »ò°´ÕÕÑªÍ³ÖØÐ¼ÆËã¡£ËùÒÔÈç¹ûÒ»¸ö RDD ÉÏÒªÖ´Ðжà´ÎÐж¯£¬¿ÉÒÔÔÚµÚÒ»´ÎÐж¯ÖÐʹÓà persist »ò cache ·½·¨£¬ÔÚÄÚ´æ»ò´ÅÅÌÖг־û¯»ò»º´æÕâ¸ö RDD£¬´Ó¶øÔÚºóÃæµÄÐж¯Ê±ÌáÉý¼ÆËãËÙ¶È¡£ÊÂʵÉÏ£¬cache ·½·¨ÊÇʹÓÃĬÈ쵀 MEMORY_ONLY µÄ´æ´¢¼¶±ð½« RDD ³Ö¾Ã»¯µ½Äڴ棬¹Ê»º´æÊÇÒ»ÖÖÌØÊâµÄ³Ö¾Ã»¯¡£ ¶ÑÄںͶÑÍâ´æ´¢ÄÚ´æµÄÉè¼Æ£¬±ã¿ÉÒÔ¶Ô»º´æ RDD ʱʹÓõÄÄÚ´æ×öͳһµÄ¹æ»®ºÍ¹Ü Àí (´æ´¢ÄÚ´æµÄÆäËûÓ¦Óó¡¾°£¬È绺´æ broadcast Êý¾Ý£¬ÔÝʱ²»ÔÚ±¾ÎĵÄÌÖÂÛ·¶Î§Ö®ÄÚ)¡£

¡¡¡¡RDD µÄ³Ö¾Ã»¯ÓÉ Spark µÄ Storage Ä£¿é [7] ¸ºÔð£¬ÊµÏÖÁË RDD ÓëÎïÀí´æ´¢µÄ½âñîºÏ¡£Storage Ä£¿é¸ºÔð¹ÜÀí Spark ÔÚ¼ÆËã¹ý³ÌÖвúÉúµÄÊý¾Ý£¬½«ÄÇЩÔÚÄÚ´æ»ò´ÅÅÌ¡¢ÔÚ±¾µØ»òÔ¶³Ì´æÈ¡Êý¾ÝµÄ¹¦ÄÜ·â×°ÁËÆðÀ´¡£ÔÚ¾ßÌåʵÏÖʱ Driver ¶ËºÍ Executor ¶ËµÄ Storage Ä£¿é¹¹³ÉÁËÖ÷´ÓʽµÄ¼Ü¹¹£¬¼´ Driver ¶ËµÄ BlockManager Ϊ Master£¬Executor ¶ËµÄ BlockManager Ϊ Slave¡£Storage Ä£¿éÔÚÂß¼­ÉÏÒÔ Block Ϊ»ù±¾´æ´¢µ¥Î»£¬RDD µÄÿ¸ö Partition ¾­¹ý´¦ÀíºóΨһ¶ÔÓ¦Ò»¸ö Block(BlockId µÄ¸ñʽΪ rdd_RDD-ID_PARTITION-ID )¡£Master ¸ºÔðÕû¸ö Spark Ó¦ÓóÌÐòµÄ Block µÄÔªÊý¾ÝÐÅÏ¢µÄ¹ÜÀíºÍά»¤£¬¶ø Slave ÐèÒª½« Block µÄ¸üеÈ״̬Éϱ¨µ½ Master£¬Í¬Ê±½ÓÊÕ Master µÄÃüÁÀýÈçÐÂÔö»òɾ³ýÒ»¸ö RDD¡£

¡¡¡¡Í¼ 7 . Storage Ä£¿éʾÒâͼ

¡¡¡¡ÔÚ¶Ô RDD ³Ö¾Ã»¯Ê±£¬Spark ¹æ¶¨ÁË MEMORY_ONLY¡¢MEMORY_AND_DISK µÈ 7 ÖÖ²»Í¬µÄ ´æ´¢¼¶±ð £¬¶ø´æ´¢¼¶±ðÊÇÒÔÏ 5 ¸ö±äÁ¿µÄ×éºÏ£º

¡¡¡¡Çåµ¥ 3 . ´æ´¢¼¶±ð

class StorageLevel private(
private var _useDisk: Boolean, //´ÅÅÌ
private var _useMemory: Boolean, //ÕâÀïÆäʵÊÇÖ¸¶ÑÄÚÄÚ´æ
private var _useOffHeap: Boolean, //¶ÑÍâÄÚ´æ
private var _deserialized: Boolean, //ÊÇ·ñΪ·ÇÐòÁл¯
private var _replication: Int = 1 //¸±±¾¸öÊý
)

¡¡¡¡

ͨ¹ý¶ÔÊý¾Ý½á¹¹µÄ·ÖÎö£¬¿ÉÒÔ¿´³ö´æ´¢¼¶±ð´ÓÈý¸öά¶È¶¨ÒåÁË RDD µÄ Partition(ͬʱҲ¾ÍÊÇ Block)µÄ´æ´¢·½Ê½£º

¡¡¡¡´æ´¢Î»Ö㺴ÅÅÌ/¶ÑÄÚÄÚ´æ/¶ÑÍâÄÚ´æ¡£Èç MEMORY_AND_DISK ÊÇͬʱÔÚ´ÅÅ̺ͶÑÄÚÄÚ´æÉÏ´æ´¢£¬ÊµÏÖÁËÈßÓ౸·Ý¡£OFF_HEAP ÔòÊÇÖ»ÔÚ¶ÑÍâÄÚ´æ´æ´¢£¬Ä¿Ç°Ñ¡Ôñ¶ÑÍâÄÚ´æÊ±²»ÄÜͬʱ´æ´¢µ½ÆäËûλÖá£

¡¡¡¡´æ´¢ÐÎʽ£ºBlock »º´æµ½´æ´¢ÄÚ´æºó£¬ÊÇ·ñΪ·ÇÐòÁл¯µÄÐÎʽ¡£Èç MEMORY_ONLY ÊÇ·ÇÐòÁл¯·½Ê½´æ´¢£¬OFF_HEAP ÊÇÐòÁл¯·½Ê½´æ´¢¡£

¡¡¡¡¸±±¾ÊýÁ¿£º´óÓÚ 1 ʱÐèÒªÔ¶³ÌÈßÓ౸·Ýµ½ÆäËû½Úµã¡£Èç DISK_ONLY_2 ÐèÒªÔ¶³Ì±¸·Ý 1 ¸ö¸±±¾¡£

¡¡¡¡3.2 RDD »º´æµÄ¹ý³Ì

¡¡¡¡RDD ÔÚ»º´æµ½´æ´¢ÄÚ´æÖ®Ç°£¬Partition ÖеÄÊý¾ÝÒ»°ãÒÔµü´úÆ÷(Iterator)µÄÊý¾Ý½á¹¹À´·ÃÎÊ£¬ÕâÊÇ Scala ÓïÑÔÖÐÒ»ÖÖ±éÀúÊý¾Ý¼¯ºÏµÄ·½·¨¡£Í¨¹ý Iterator ¿ÉÒÔ»ñÈ¡·ÖÇøÖÐÿһÌõÐòÁл¯»òÕß·ÇÐòÁл¯µÄÊý¾ÝÏî(Record)£¬ÕâЩ Record µÄ¶ÔÏóʵÀýÔÚÂß¼­ÉÏÕ¼ÓÃÁË JVM ¶ÑÄÚÄÚ´æµÄ other ²¿·ÖµÄ¿Õ¼ä£¬Í¬Ò» Partition µÄ²»Í¬ Record µÄ¿Õ¼ä²¢²»Á¬Ðø¡£

¡¡¡¡RDD ÔÚ»º´æµ½´æ´¢ÄÚ´æÖ®ºó£¬Partition ±»×ª»»³É Block£¬Record ÔÚ¶ÑÄÚ»ò¶ÑÍâ´æ´¢ÄÚ´æÖÐÕ¼ÓÃÒ»¿éÁ¬ÐøµÄ¿Õ¼ä¡£½«PartitionÓɲ»Á¬ÐøµÄ´æ´¢¿Õ¼äת»»ÎªÁ¬Ðø´æ´¢¿Õ¼äµÄ¹ý³Ì£¬Spark³ÆÖ®Îª¡±Õ¹¿ª¡±(Unroll)¡£Block ÓÐÐòÁл¯ºÍ·ÇÐòÁл¯Á½ÖÖ´æ´¢¸ñʽ£¬¾ßÌåÒÔÄÄÖÖ·½Ê½È¡¾öÓڸà RDD µÄ´æ´¢¼¶±ð¡£·Ç·´ÐòÁл¯µÄ Block ÒÔÒ»ÖÖ DeserializedMemoryEntry µÄÊý¾Ý½á¹¹¶¨Ò壬ÓÃÒ»¸öÊý×é´æ´¢ËùÓÐµÄ Java ¶ÔÏ󣬷ÇÐòÁл¯µÄ Block ÔòÒÔ SerializedMemoryEntry µÄÊý¾Ý½á¹¹¶¨Ò壬ÓÃ×Ö½Ú»º³åÇø(ByteBuffer)À´´æ´¢¶þ½øÖÆÊý¾Ý¡£Ã¿¸ö Executor µÄ Storage Ä£¿éÓÃÒ»¸öÁ´Ê½ Map ½á¹¹(LinkedHashMap)À´¹ÜÀí¶ÑÄںͶÑÍâ´æ´¢ÄÚ´æÖÐËùÓÐµÄ Block ¶ÔÏóµÄʵÀý[6]£¬¶ÔÕâ¸ö LinkedHashMap ÐÂÔöºÍɾ³ý¼ä½Ó¼Ç¼ÁËÄÚ´æµÄÉêÇëºÍÊÍ·Å¡£

¡¡¡¡ÒòΪ²»Äܱ£Ö¤´æ´¢¿Õ¼ä¿ÉÒÔÒ»´ÎÈÝÄÉ Iterator ÖеÄËùÓÐÊý¾Ý£¬µ±Ç°µÄ¼ÆËãÈÎÎñÔÚ Unroll ʱҪÏò MemoryManager ÉêÇë×ã¹»µÄ Unroll ¿Õ¼äÀ´ÁÙʱռ룬¿Õ¼ä²»×ãÔò Unroll ʧ°Ü£¬¿Õ¼ä×㹻ʱ¿ÉÒÔ¼ÌÐø½øÐС£¶ÔÓÚÐòÁл¯µÄ Partition£¬ÆäËùÐèµÄ Unroll ¿Õ¼ä¿ÉÒÔÖ±½ÓÀÛ¼Ó¼ÆË㣬һ´ÎÉêÇë¡£¶ø·ÇÐòÁл¯µÄ Partition ÔòÒªÔÚ±éÀú Record µÄ¹ý³ÌÖÐÒÀ´ÎÉêÇ룬¼´Ã¿¶ÁȡһÌõ Record£¬²ÉÑù¹ÀËãÆäËùÐèµÄ Unroll ¿Õ¼ä²¢½øÐÐÉêÇ룬¿Õ¼ä²»×ãʱ¿ÉÒÔÖжϣ¬ÊÍ·ÅÒÑÕ¼ÓÃµÄ Unroll ¿Õ¼ä¡£Èç¹û×îÖÕ Unroll ³É¹¦£¬µ±Ç° Partition ËùÕ¼ÓÃµÄ Unroll ¿Õ¼ä±»×ª»»ÎªÕý³£µÄ»º´æ RDD µÄ´æ´¢¿Õ¼ä£¬ÈçÏÂͼ 8 Ëùʾ¡£

¡¡¡¡ÔÚͼ 3 ºÍͼ 5 ÖпÉÒÔ¿´µ½£¬ÔÚ¾²Ì¬ÄÚ´æ¹ÜÀíʱ£¬Spark ÔÚ´æ´¢ÄÚ´æÖÐרÃÅ»®·ÖÁËÒ»¿é Unroll ¿Õ¼ä£¬Æä´óСÊǹ̶¨µÄ£¬Í³Ò»ÄÚ´æ¹ÜÀíʱÔòûÓÐ¶Ô Unroll ¿Õ¼ä½øÐÐÌØ±ðÇø·Ö£¬µ±´æ´¢¿Õ¼ä²»×ãʱ»á¸ù¾Ý¶¯Ì¬Õ¼ÓûúÖÆ½øÐд¦Àí¡£

¡¡¡¡3.3 ÌÔÌ­ºÍÂäÅÌ

¡¡¡¡ÓÉÓÚͬһ¸ö Executor µÄËùÓеļÆËãÈÎÎñ¹²ÏíÓÐÏ޵Ĵ洢ÄÚ´æ¿Õ¼ä£¬µ±ÓÐÐ嵀 Block ÐèÒª»º´æµ«ÊÇÊ£Óà¿Õ¼ä²»×ãÇÒÎÞ·¨¶¯Ì¬Õ¼ÓÃʱ£¬¾ÍÒª¶Ô LinkedHashMap ÖÐµÄ¾É Block ½øÐÐÌÔÌ­(Eviction)£¬¶ø±»ÌÔÌ­µÄ Block Èç¹ûÆä´æ´¢¼¶±ðÖÐͬʱ°üº¬´æ´¢µ½´ÅÅ̵ÄÒªÇó£¬ÔòÒª¶ÔÆä½øÐÐÂäÅÌ(Drop)£¬·ñÔòÖ±½Óɾ³ý¸Ã Block¡£

¡¡¡¡´æ´¢ÄÚ´æµÄÌÔÌ­¹æÔòΪ£º

¡¡¡¡±»ÌÔÌ­µÄ¾É Block ÒªÓëРBlock µÄ MemoryMode Ïàͬ£¬¼´Í¬ÊôÓÚ¶ÑÍâ»ò¶ÑÄÚÄÚ´æ

¡¡¡¡ÐÂ¾É Block ²»ÄÜÊôÓÚͬһ¸ö RDD£¬±ÜÃâÑ­»·ÌÔÌ­

¡¡¡¡¾É Block ËùÊô RDD ²»ÄÜ´¦ÓÚ±»¶Á״̬£¬±ÜÃâÒý·¢Ò»ÖÂÐÔÎÊÌâ

¡¡¡¡±éÀú LinkedHashMap ÖÐ Block£¬°´ÕÕ×î½ü×îÉÙʹÓÃ(LRU)µÄ˳ÐòÌÔÌ­£¬Ö±µ½Âú×ãРBlock ËùÐèµÄ¿Õ¼ä¡£ÆäÖÐ LRU ÊÇ LinkedHashMap µÄÌØÐÔ¡£

¡¡¡¡ÂäÅ̵ÄÁ÷³ÌÔò±È½Ï¼òµ¥£¬Èç¹ûÆä´æ´¢¼¶±ð·ûºÏ_useDisk Ϊ true µÄÌõ¼þ£¬ÔÙ¸ù¾ÝÆä_deserialized ÅжÏÊÇ·ñÊÇ·ÇÐòÁл¯µÄÐÎʽ£¬ÈôÊÇÔò¶ÔÆä½øÐÐÐòÁл¯£¬×îºó½«Êý¾Ý´æ´¢µ½´ÅÅÌ£¬ÔÚ Storage Ä£¿éÖиüÐÂÆäÐÅÏ¢¡£

¡¡¡¡4. Ö´ÐÐÄÚ´æ¹ÜÀí

¡¡¡¡4.1 ¶àÈÎÎñ¼äÄÚ´æ·ÖÅä

¡¡¡¡Executor ÄÚÔËÐеÄÈÎÎñͬÑù¹²ÏíÖ´ÐÐÄڴ棬Spark ÓÃÒ»¸ö HashMap ½á¹¹±£´æÁËÈÎÎñµ½ÄÚ´æºÄ·ÑµÄÓ³É䡣ÿ¸öÈÎÎñ¿ÉÕ¼ÓõÄÖ´ÐÐÄÚ´æ´óСµÄ·¶Î§Îª 1/2N ~ 1/N£¬ÆäÖÐ N Ϊµ±Ç° Executor ÄÚÕýÔÚÔËÐеÄÈÎÎñµÄ¸öÊý¡£Ã¿¸öÈÎÎñÔÚÆô¶¯Ö®Ê±£¬ÒªÏò MemoryManager ÇëÇóÉêÇë×îÉÙΪ 1/2N µÄÖ´ÐÐÄڴ棬Èç¹û²»Äܱ»Âú×ãÒªÇóÔò¸ÃÈÎÎñ±»×èÈû£¬Ö±µ½ÓÐÆäËûÈÎÎñÊÍ·ÅÁË×ã¹»µÄÖ´ÐÐÄڴ棬¸ÃÈÎÎñ²Å¿ÉÒÔ±»»½ÐÑ¡£

¡¡¡¡4.2 Shuffle µÄÄÚ´æÕ¼ÓÃ

¡¡¡¡Ö´ÐÐÄÚ´æÖ÷ÒªÓÃÀ´´æ´¢ÈÎÎñÔÚÖ´ÐÐ Shuffle ʱռÓõÄÄڴ棬Shuffle Êǰ´ÕÕÒ»¶¨¹æÔò¶Ô RDD Êý¾ÝÖØÐ·ÖÇøµÄ¹ý³Ì£¬ÎÒÃÇÀ´¿´ Shuffle µÄ Write ºÍ Read Á½½×¶Î¶ÔÖ´ÐÐÄÚ´æµÄʹÓãº

¡¡¡¡Shuffle Write

¡¡¡¡ÈôÔÚ map ¶ËÑ¡ÔñÆÕͨµÄÅÅÐò·½Ê½£¬»á²ÉÓà ExternalSorter ½øÐÐÍâÅÅ£¬ÔÚÄÚ´æÖд洢Êý¾ÝʱÖ÷ÒªÕ¼ÓöÑÄÚÖ´Ðпռ䡣

¡¡¡¡ÈôÔÚ map ¶ËÑ¡Ôñ Tungsten µÄÅÅÐò·½Ê½£¬Ôò²ÉÓà ShuffleExternalSorter Ö±½Ó¶ÔÒÔÐòÁл¯ÐÎʽ´æ´¢µÄÊý¾ÝÅÅÐò£¬ÔÚÄÚ´æÖд洢Êý¾Ýʱ¿ÉÒÔÕ¼ÓöÑÍâ»ò¶ÑÄÚÖ´Ðпռ䣬ȡ¾öÓÚÓû§ÊÇ·ñ¿ªÆôÁ˶ÑÍâÄÚ´æÒÔ¼°¶ÑÍâÖ´ÐÐÄÚ´æÊÇ·ñ×ã¹»¡£

¡¡¡¡Shuffle Read

¡¡¡¡ÔÚ¶Ô reduce ¶ËµÄÊý¾Ý½øÐоۺÏʱ£¬Òª½«Êý¾Ý½»¸ø Aggregator ´¦Àí£¬ÔÚÄÚ´æÖд洢Êý¾ÝʱռÓöÑÄÚÖ´Ðпռ䡣

¡¡¡¡Èç¹ûÐèÒª½øÐÐ×îÖÕ½á¹ûÅÅÐò£¬ÔòÒª½«Ôٴν«Êý¾Ý½»¸ø ExternalSorter ´¦Àí£¬Õ¼ÓöÑÄÚÖ´Ðпռ䡣

¡¡¡¡ÔÚ ExternalSorter ºÍ Aggregator ÖУ¬Spark »áʹÓÃÒ»ÖֽРAppendOnlyMap µÄ¹þÏ£±íÔÚ¶ÑÄÚÖ´ÐÐÄÚ´æÖд洢Êý¾Ý£¬µ«ÔÚ Shuffle ¹ý³ÌÖÐËùÓÐÊý¾Ý²¢²»Äܶ¼±£´æµ½¸Ã¹þÏ£±íÖУ¬µ±Õâ¸ö¹þÏ£±íÕ¼ÓõÄÄÚ´æ»á½øÐÐÖÜÆÚÐԵزÉÑù¹ÀË㣬µ±Æä´óµ½Ò»¶¨³Ì¶È£¬ÎÞ·¨ÔÙ´Ó MemoryManager ÉêÇ뵽еÄÖ´ÐÐÄÚ´æÊ±£¬Spark ¾Í»á½«ÆäÈ«²¿ÄÚÈÝ´æ´¢µ½´ÅÅÌÎļþÖУ¬Õâ¸ö¹ý³Ì±»³ÆÎªÒç´æ(Spill)£¬Òç´æµ½´ÅÅ̵ÄÎļþ×îºó»á±»¹é²¢(Merge)¡£

¡¡¡¡Shuffle Write ½×¶ÎÖÐÓõ½µÄ Tungsten ÊÇ Databricks ¹«Ë¾Ìá³öµÄ¶Ô Spark ÓÅ»¯ÄÚ´æºÍ CPU ʹÓõļƻ®[9]£¬½â¾öÁËһЩ JVM ÔÚÐÔÄÜÉϵÄÏÞÖÆºÍ±×¶Ë¡£Spark »á¸ù¾Ý Shuffle µÄÇé¿öÀ´×Ô¶¯Ñ¡ÔñÊÇ·ñ²ÉÓà Tungsten ÅÅÐò¡£Tungsten ²ÉÓõÄҳʽÄÚ´æ¹ÜÀí»úÖÆ½¨Á¢ÔÚ MemoryManager Ö®ÉÏ£¬¼´ Tungsten ¶ÔÖ´ÐÐÄÚ´æµÄʹÓýøÐÐÁËÒ»²½µÄ³éÏó£¬ÕâÑùÔÚ Shuffle ¹ý³ÌÖÐÎÞÐè¹ØÐÄÊý¾Ý¾ßÌå´æ´¢ÔÚ¶ÑÄÚ»¹ÊǶÑÍ⡣ÿ¸öÄÚ´æÒ³ÓÃÒ»¸ö MemoryBlock À´¶¨Ò壬²¢Óà Object obj ºÍ long offset ÕâÁ½¸ö±äÁ¿Í³Ò»±êʶһ¸öÄÚ´æÒ³ÔÚϵͳÄÚ´æÖеĵØÖ·¡£¶ÑÄÚµÄ MemoryBlock ÊÇÒÔ long ÐÍÊý×éµÄÐÎʽ·ÖÅäµÄÄڴ棬Æä obj µÄֵΪÊÇÕâ¸öÊý×éµÄ¶ÔÏóÒýÓã¬offset ÊÇ long ÐÍÊý×éµÄÔÚ JVM ÖеijõÊ¼Æ«ÒÆµØÖ·£¬Á½ÕßÅäºÏʹÓÿÉÒÔ¶¨Î»Õâ¸öÊý×éÔÚ¶ÑÄڵľø¶ÔµØÖ·;¶ÑÍâµÄ MemoryBlock ÊÇÖ±½ÓÉêÇëµ½µÄÄÚ´æ¿é£¬Æä obj Ϊ null£¬offset ÊÇÕâ¸öÄÚ´æ¿éÔÚϵͳÄÚ´æÖÐµÄ 64 λ¾ø¶ÔµØÖ·¡£Spark Óà MemoryBlock ÇÉÃîµØ½«¶ÑÄںͶÑÍâÄÚ´æÒ³Í³Ò»³éÏó·â×°£¬²¢ÓÃÒ³±í(pageTable)¹ÜÀíÿ¸ö Task ÉêÇëµ½µÄÄÚ´æÒ³¡£

¡¡¡¡Tungsten ҳʽ¹ÜÀíϵÄËùÓÐÄÚ´æÓà 64 λµÄÂß¼­µØÖ·±íʾ£¬ÓÉÒ³ºÅºÍÒ³ÄÚÆ«ÒÆÁ¿×é³É£º

¡¡¡¡Ò³ºÅ£ºÕ¼ 13 λ£¬Î¨Ò»±êʶһ¸öÄÚ´æÒ³£¬Spark ÔÚÉêÇëÄÚ´æÒ³Ö®Ç°ÒªÏÈÉêÇë¿ÕÏÐÒ³ºÅ¡£

¡¡¡¡Ò³ÄÚÆ«ÒÆÁ¿£ºÕ¼ 51 룬ÊÇÔÚʹÓÃÄÚ´æÒ³´æ´¢Êý¾Ýʱ£¬Êý¾ÝÔÚÒ³Ä򵀮«ÒƵØÖ·¡£

¡¡¡¡ÓÐÁËͳһµÄѰַ·½Ê½£¬Spark ¿ÉÒÔÓà 64 λÂß¼­µØÖ·µÄÖ¸Õ붨λµ½¶ÑÄÚ»ò¶ÑÍâµÄÄڴ棬Õû¸ö Shuffle Write ÅÅÐòµÄ¹ý³ÌÖ»ÐèÒª¶ÔÖ¸Õë½øÐÐÅÅÐò£¬²¢ÇÒÎÞÐè·´ÐòÁл¯£¬Õû¸ö¹ý³Ì·Ç³£¸ßЧ£¬¶ÔÓÚÄÚ´æ·ÃÎÊЧÂÊºÍ CPU ʹÓÃЧÂÊ´øÀ´ÁËÃ÷ÏÔµÄÌáÉý[10]¡£

¡¡¡¡Spark µÄ´æ´¢ÄÚ´æºÍÖ´ÐÐÄÚ´æÓÐ׎ØÈ»²»Í¬µÄ¹ÜÀí·½Ê½£º¶ÔÓÚ´æ´¢ÄÚ´æÀ´Ëµ£¬Spark ÓÃÒ»¸ö LinkedHashMap À´¼¯ÖйÜÀíËùÓÐµÄ Block£¬Block ÓÉÐèÒª»º´æµÄ RDD µÄ Partition ת»¯¶ø³É;¶ø¶ÔÓÚÖ´ÐÐÄڴ棬Spark Óà AppendOnlyMap À´´æ´¢ Shuffle ¹ý³ÌÖеÄÊý¾Ý£¬ÔÚ Tungsten ÅÅÐòÖÐÉõÖÁ³éÏó³ÉΪҳʽÄÚ´æ¹ÜÀí£¬¿ª±ÙÁËÈ«Ð嵀 JVM ÄÚ´æ¹ÜÀí»úÖÆ¡£

¡¡¡¡½áÊøÓï

¡¡¡¡Spark µÄÄÚ´æ¹ÜÀíÊÇÒ»Ì׸´ÔӵĻúÖÆ£¬ÇÒ Spark µÄ°æ±¾¸üбȽϿ죬±ÊÕßˮƽÓÐÏÞ£¬ÄÑÃâÓÐÐðÊö²»Çå¡¢´íÎóµÄµØ·½£¬Èô¶ÁÕßÓкõĽ¨ÒéºÍ¸üÉîµÄÀí½â£¬»¹Íû²»Áߴͽ̡£

   
1724 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ