ÒýÑÔ
ÉúÃü¿ÆÑ§·½ÐËδ°¬£¬ ´ÓʳƷ¹¤ÒµÖеÄϸ¾úÅàÑø¼ø¶¨µ½°©Ö¢¿ìËÙÕï¶Ï£¬»ùÓÚ DNA ·ÖÎöµÄÓ¦Óò»¶Ï³öÏÖ£¬µ«Í¬Ê±»ùÒò·ÖÎöÓ¦ÓÃÒ²ÃæÁÙןܴóÌôÕ½£»Ðí¶àм¼Êõ¡¢Ð·½·¨±»Ó¦Óõ½»ùÒòÐòÁзÖÎöÓ¦ÓÃÖУ¬°üÀ¨
Spark¡¢FPGA ÒÔ¼° GPU д¦ÀíÆ÷¼ÓËٵȣ¬ÕâЩ¼¼ÊõµÄÓ¦Óò»½öÄܹ»Ê¹´ó²¿·ÖÉúÃü¿ÆÑ§ÁìÓòµÄÓ¦Ó㬰üÀ¨¿ªÔ´ºÍ
ISV Èí¼þ£¬ÔÚ²»ÐèÒª¸´Ô MPI ±à³ÌÇé¿öÏÂʵÏÖ²¢Ðл¯´¦Àí£¬Í¬Ê± Spark ÄÚ´æÄÚ¼ÆËã¼¼ÊõÒ²Äܹ»Ìá¸ß·ÖÎöЧÂÊ£¬¼ÓËÙ¹¤×÷Á÷³Ì£¬
Ëõ¶Ì·ÖÎöʱ¼ä£¬´Ó¶øÓиü¶àеķ¢ÏÖ¡£±¾ÎĽ«½éÉÜÈçºÎÀûÓà Spark ¼¼ÊõÔËÐг£ÓõĻùÒòÐòÁзÖÎöÓ¦Ó㬰üÀ¨ÔÚ
Spark ²»Í¬Ä£Ê½ÏµÄÔËÐз½·¨£¬ ÔËÐйý³ÌÒÔ¼°ÔËÐнá¹û·ÖÎö£¬²¢±È½ÏÔÚ²»Í¬ÔËËãÆ½Ì¨ÒÔ¼°²»Í¬ÔËÐвÎÊýÇé¿öϵÄÐÔÄܺͼÓËٱȡ£
1. »ùÒòÐòÁзÖÎö¹¤×÷Á÷
»ùÒòÐòÁзÖÎö¹¤×÷Á÷ÒÔ GATK µÄ×î¼Ñʵ¼ùΪ±ê×¼¡£ËüÒÔ×î³õµÄ FASTQ ÎļþΪÊäÈ룬´Ó BWA-mem
²âÐòµ½ GATK µÄ HaploTyperCaller£¬Íê³É¶ÔÕû¸öÑù°åµÄ²âÐò·ÖÎö¡£

ͼ 1¡¢GATK ×î¼Ñʵ¼ù
ÔÚ²âÐò¹¤×÷Á÷µÄµÚÒ»½×¶Î£¬BWA-mem ¶ÔÊäÈëÎļþ FASTQ Ö´Ðбȶԣ¬Éú³ÉÐòÁбȶԺÍÓ³ÉäÎļþ SAM£¬È»ºóͨ¹ý
SortSam Éú³ÉÒ»¸ö¾¹ýÅÅÐòµÄ BAM Îļþ£¬Êµ¼ÊÉÏ£¬BAM ÎļþÊÇ SAM ÎļþµÄ¶þ½øÖÆÐÎʽ£¬´ËºóµÄ´¦Àí¾ù»ùÓÚ
BAM ¶þ½øÖÆÎļþ¡£
BAM Îļþ´«Ë͸ø Picard ¹¤¾ß MarkDuplicates, È¥³ýÖØ¸´µÄƬ¶Î£¬²¢Éú³ÉÒ»¸öºÏ²¢µÄ¡¢È¥³ýÖØ¸´Æ¬¶ÎµÄ
BAM Îļþ¡£ÒÔÏµļ¸²½£¬RealignerTargetCreator¡¢IndelRealigner¡¢BaseRecalibrator¡¢PrintReads
ºÍ HaplotypeCaller ¶¼ÊÇ GATK µÄÒ»²¿·Ö£¬ÊǶԸßÍÌÍÂÐòÁÐÊý¾Ý½øÐзÖÎöµÄÈí¼þ°ü¡£

ͼ 2¡¢»ùÒò²âÐò¹¤×÷Á÷·Ö½â
ÐòÁзÖÎöµÄÖ÷Òª¹¤×÷ÊÇÊý¾Ýǰ´¦Àí£¬¾¹ý´¦ÀíµÄÊý¾Ý¿ÉÒÔΪºóÐøµÄ·ÖÎö¹¤×÷Ëùµ÷Óá£Ç°´¦Àí½×¶Î£¬±È¶ÔºÍÅÅÐòÊǼÆËãÃܼ¯ÇұȽϺķÑʱ¼äµÄ¹ý³Ì£¬¾¡¹Üͨ¹ý¶à´¦ÀíÆ÷»ò¶àÏ̵߳ķ½Ê½¿ÉÒÔÌá¸ßЧÂÊ£¬µ«ÊÇÔÚʵ¼Ê¹¤×÷ÖÐÓÉÓÚ¼ÆËã·½·¨µÄ¸´Ôӳ̶ÈÒÔ¼°ÐèÒª·ÖÎöµÄÊý¾ÝÁ¿Ñ¸ËÙÔö¼Ó£¬µ±Ç°Ò»¸ö·ÖÎö¹ý³ÌÈÔÈ»¿ÉÄÜ»¨·Ñ³¬¹ý
1 Ììʱ¼ä£¬·ÑÓÃ´Ó 200 ÃÀÔªµ½ 600 ÃÀÔª²»µÈ¡£Spark ¼¼Êõ¿ÉÒÔ½«´®ÐеķÖÎö²¢Ðл¯£¬½«Êý¾Ý·Ö¶ÎÓÅ»¯²¢½øÐж¯Ì¬¸ºÔؾùºâÒÔÌá¸ßЧÂÊ¡£
GATK4 ¾ÍÊÇ Broad ÍÆ³öµÄ»ùÓÚ Spark ¼¼ÊõµÄ»ùÒòÐòÁзÖÎöÈí¼þ°ü¡£GATK4 Êý¾Ýǰ´¦ÀíµÄÁ÷³ÌÊÇ£º

ͼ 3¡¢»ùÓÚ Spark ¼¼ÊõÊý¾Ýǰ´¦Àí
ºÏ²¢ÊäÈëÎļþºÍ²Î¿¼Îļþ
·Ö³ÉÊý¾Ý¿é
Êý¾Ý¿éµÄÊýÁ¿È¡¾öÓÚ¼¯Èº´óСºÍ¿ÉÓÃ×ÊÔ´
¹¤×÷Á÷·ÖΪ¶à¼¶£¬ÔÚÊý¾Ý´¦Àí֮ǰֻ»®·ÖÒ»´Î
·Ö¼¶ÀàËÆÓÚ Mapreduce
TaskManager ·ÖÅäÈÎÎñ¸ø executor
BlockManager ÀûÓà spark.broadcast.blockSize ÉèÖÿéµÄÿһ¸öƬµÄ´óС£»Êýֵ̫´ó£¬Ôڹ㲥¹ý³ÌÖлá¼õС²¢·¢£¨Ê¹ÔËÐбäÂý£©£¬µ«ÊÇ£¬Êýֵ̫Сʱ£¬BlockManager
µÄÐÔÄÜ»áÊÜÓ°Ï죬ȱʡÊÇ 4M
Ê£ÓàÄÚ´æ¿Õ¼ä²»¶Ï¼õÉÙ£¬µ±Ê£ÓàÄÚ´æÌ«Ð¡Ê±£¬ÈÎÎñ»áÖжϲ¢±»Ìß³ö¡£Spark »á³¢ÊÔÖØÆôÈÎÎñ£¬µ±³¬¹ýÉ趨µÄÖØÆô´ÎÊýÈÔÎÞ·¨³É¹¦Ê±£¬×÷Òµ¾Í·ÇÕý³£½áÊø¡£
GATK4 ÈÔ´¦ÔÚ²»¶Ï¿ª·¢¡¢²»¶ÏÍêÉÆµÄ¹ý³ÌÖУ¬ÆäËùÌṩµÄ¹¤¾ßºÍ¹¤×÷Á÷Ò²ÔÚ²»¶ÏÔö¼Ó£¬Ä¿Ç°×îа汾µÄ GATK4
ÌṩµÄ¹¤×÷Á÷°üÀ¨£º
BQSRPipelineSpark
ÔÚ Spark ÉÏÖ´ÐÐ BQSR µÄÁ½¸ö²½Öè(BaseRecalibrator ºÍ ApplyBQSR)
BwaAndMarkDuplicatesPipelineSpark
ÒÔÃû³ÆÅÅÐòµÄÎļþΪÊäÈëÔËÐÐ BWA ºÍ MarkDuplicates.
ReadsPipelineSpark
ÒÔ BWA µÄУÕýƬ¶ÎΪÊäÈëÔËÐÐ MarkDuplicates ºÍ BQSR£¬ÆäÊä³öÓÃÓÚºóÐøµÄ·ÖÎö
ͼ 4 ÊÇ ReadsPipelineSpark ¹¤×÷Á÷ʾÒâͼ¡£

ͼ 4¡¢ ReadsPipelineSpark
¹¤×÷Á÷
2. ÐòÁзÖÎöÓÅ»¯·½·¨
ÐòÁзÖÎöÖеIJ»Í¬Ó¦ÓöÔϵͳ×ÊÔ´ÓÐͬ²½µÄÐèÇó¡£´Óͼ 5 ¿ÉÒÔ¿´µ½£¬ÓеÄÓ¦ÓÃÕ¼Óà CPU ×ÊÔ´±È½Ï¸ß£¬Èç BWA
²»½öÕ¼ÓôóÁ¿´¦ÀíÆ÷×ÊÔ´£¬ÇÒÔËÐÐʱ¼ä³¤£¬¶øÓеÄÓ¦ÓÃÔòÐèÒª´óÁ¿Äڴ棬´¦Àíʱ¼äͬÑù±È½Ï³¤£¬Èç HaploTyperCaller¡£

ͼ 5¡¢²»Í¬Ó¦ÓöÔϵͳ×ÊÔ´µÄÐèÇó
Ò»°ãµØ£¬ÓÐËÄÖÖ·½·¨¶Ô»ùÒò´¦Àí¹ý³Ì½øÐÐÓÅ»¯ºÍ¼ÓËÙ£º
-nt ÔÚÒýÇæ engine ¼¶±ð½øÐв¢ÐУ¬²¢Ðд¦Àí»ùÒòÐòÁеIJ»Í¬²¿·Ö
-nct ÔÚ walker ¼¶±ð½øÐв¢ÐУ¬¼ÓËÙ´¦Àí»ùÒòÐòÁÐÿ¸öµ¥¶ÀÇøÓò
MapReduce ͬʱÉú³ÉÐí¶àʵÀý£¬Ã¿¸öʵÀý´¦Àí»ùÒòÐòÁеIJ»Í¬µÄ£¨ÈÎÒâµÄ£©²¿·Ö
ÀûÓÿÆÑ§¿âÓÅ»¯
ÔÚ GATK ¹¤×÷Á÷ÖпÉÒÔͨ¹ýÉèÖÃ-nt ºÍ-nct ²ÎÊý£¬Ìá¸ß×÷ÒµÔËÐÐЧÂÊ¡£

GATK4 ÊÇ GATK »ùÓÚ Spark ¿ª·¢µÄ°æ±¾£¬ËüÓкܶà¿ÉÒÔÔÚ Spark »·¾³ÖÐÔËÐеŤ¾ßºÍ¹¤×÷Á÷£¬Ëü²ÉÓ÷ּ¶µÄ·½Ê½ÔËÐÐ×÷Òµ£¬Æä¹¤×÷¹¤³ÌÀàËÆÓÚ
Mapreduce ¡£ÓÐ 3 ÖÖÔËÐÐģʽ£º
None-spark standalone ģʽ
Spark standalone ģʽ
Spark cluster ģʽ
ÔÚÊäÈëÊý¾ÝºÍ²Î¿¼ÎļþÉèÖÃÕýÈ·µÄÇé¿öÏ£¬´ó²¿·Ö GATK4 ¹¤¾ß¶¼¿ÉÒÔÔÚ Spark ¼¯ÈºÄ£Ê½Ï³ɹ¦ÔËÐС£
ÓÐЩӦÓÃÔÚ¼¯ÈºÄ£Ê½ÏµÄÔËÐнá¹û¿ÉÒԵõ½ÏÔÖøµÄÌáÉý£¬Èç CountReadsSpark£¬ÓеÄÓ¦Óã¬ÌرðÊǵ±¹¤×÷Á÷ÔòÐèÒª¸ü¶àµÄϵͳ×ÊԴʱ£¬ÔÚ
spark standalone ģʽÏÂÎÞ·¨ÔËÐУ¬»á±¨¸æ"Not enough space to
cache RDD in memory"´íÎ󣬶øÔÚ Spark Cluster ģʽÏÂÔòÄÜ˳ÀûÔËÐУ¬
Èç CollectInsertSizeMetricsSpark ¡£

ͼ 6¡¢CollectInsertSizeMetricsSpark
½á¹û
¶ÔÐòÁзÖÎöÓ¦ÓüÓËÙµÄÁíÒ»ÖÖ·½·¨ÊÇÌṩ»ùÓÚ POWER8 ´¦ÀíÆ÷µÄÓÅ»¯¿ÆÑ§¿â¡£ÒÔ HaplotypeCaller
·ÖÎöΪÀý£¬ËüÔÚ·ÖÎö¹ý³ÌÖÐÕ¼ÓôóÁ¿µÄÄڴ棬ÔËÐÐʱ¼ä×¡£²»Í¬³§¼ÒÒ²ÔÚ¿ª·¢»ùÓÚ×Ô¼ºÈí¼þÕ»µÄ¼ÓËٿ⣬Èç Intel
GKL »ùÒòÄں˿⡣

IBM Ìṩһ¸öÔÚ POWER8 ϵͳÉÏÓÅ»¯µÄ PairHMM Ëã·¨£¬Ëü³ä·Ö·¢»ÓÁË POWER8 ϵͳÉÏеÄÈí¼þ¡¢Ó²¼þÌØÐÔ£¬Ä¿Ç°£¬¸ÃÓÅ»¯¿ÆÑ§¿â¿ÉÒÔÔËÐÐÔÚ
POWER8 Ubuntu14 ºÍ RHEL7 ²Ù×÷ϵͳÉÏ¡£
×îа汾µÄ¿ÆÑ§¿âÀûÓÃÓë POWER8 ÉÏ Java ÏàͬµÄ¸¡µã¾«¶È¶Ô HaplotyperCaller
½øÐмÓËÙ£¬Í¬Ê±Ëü³ä·ÖÀûÓò¢·¢¶àÏß³Ì SMT ÒÔ¼°ÏòÁ¿Ö¸Á£¬¶Ô HaplotypeCaller µÄ¼ÓËÙÐÔÄܳ¬¹ýÒÔǰµÄ°æ±¾£¬ÌرðÊÇÔÚµ¥Ïß³Ìģʽ£¨¼´
¨Cnct Ñ¡Ïîδָ¶¨£©¡£ÔÚµ¥Ïß³ÌģʽÏ£¬ÀûÓà PairHMM ¼ÓËÙ£¬HaplotypeCaller ÏûºÄµÄʱ¼äÖ»ÓÐÒ»°ë£¬¼ÓËٱȴﵽ
1.88 ±¶¡£
ÔÚ P8 ϵͳÉϵ÷Óà PairHMM£º

¼ÓËÙ»ùÒò²âÐòÓ¦ÓÃµÄÆäËû·½·¨»¹°üÀ¨ FPGA¡¢GPGPU ¼ÆËãÒÔ¼°È«Ó²¼þ¼ÓËÙµÄ Edico Dragon
Solution µÈ£¬²»ÔÚ±¾ÎĵµÌÖÂÛ·¶Î§¡£
3. Spark ¼¼Êõ½éÉÜ
Spark ÊÇÒ»ÖÖÓë Hadoop ÏàËÆµÄ¼¯Èº¼ÆËã»·¾³£¬ µ« Spark ÆôÓÃÁËÄÚ´æ·Ö²¼Êý¾Ý¼¯£¬ ÔÚijЩ¹¤×÷¸ºÔØ·½Ãæ±íÏֵøü¼ÓÓÅÔ½£¬³ýÁËÄܹ»Ìṩ½»»¥Ê½²éѯÍ⣬Spark
»¹¿ÉÒÔÓÅ»¯µü´ú¹¤×÷¸ºÔØ¡£
Spark µÄÖ÷ÒªÌØµã°üÀ¨£º
ËÙ¶È¿ì
Spark ¾ßÓÐÏȽøµÄ DAG Ö´ÐÐÒýÇæ£¬Ö§³Öµü´úÊý¾ÝÁ÷ºÍÄÚ´æÄÚ¼ÆË㣬ӦÓóÌÐòÖ´ÐÐËÙ¶ÈÊÇ Hadoop
ÔÚÄÚ´æÄÚ MapReduce µÄ 100 ±¶£¬»òÔÚ´ÅÅÌÉ쵀 10 ±¶£»

Ò×ÓÚʹÓÃ
Óà Java¡¢Scala¡¢Python ºÍ R ¿ìËÙ±àдӦÓ㬠Spark Ìṩ³¬¹ý 80 Öָ߼¶²Ù×÷£¬¹¹½¨²¢ÐÐÓ¦Ó÷dz£·½±ã£¬¶øÇÒ¿ÉÒÔÓë
Scala ¡¢Python ºÍ R ½øÐн»»¥£»
ͨÓÃ
Spark Ö§³ÅһϵÁк¯Êý¿âºÍÈí¼þÕ»£¬°üÀ¨ SQL¡¢DataFrames¡¢»úÆ÷ѧϰ MLlib¡¢GraphX
ºÍ Spark Streaming£¬¿ÉÒÔ½«ÕâЩ¿âÎÞ·ìµØ¼¯³Éµ½Í¬Ò»¸öÓ¦Ó㬽« SQL¡¢Á÷ÒÔ¼°¸´ÔÓ·ÖÎö½áºÏÔÚÒ»Æð¡£

ÔËÐÐÔÚÈκεط½
Spark ¿ÉÒÔÔËÐÐÔÚ Hadoop¡¢Mesos¡¢standalone ģʽ»òÔÆ¡£Ëü¿ÉÒÔ·ÃÎʶàÖÖ¶àÑùµÄÊý¾Ý£¬°üÀ¨
HDFS¡¢Cassandra¡¢HBase ºÍ S3¡£¿ÉÒÔÒÔ²»Í¬Ä£Ê½ÔËÐÐ Spark£¬°üÀ¨±¾µØÄ£Ê½¡¢Standalone
ģʽ¡¢Mesoes ģʽºÍ yarn ģʽ¡£
IBM Spectrum Conductor with Spark Äܹ»¼ò»¯¿ªÔ´´óÊý¾Ý·ÖÎöƽ̨ Apache
Spark µÄ²¿Ê𣬽«Æä·ÖÎöËÙ¶ÈÌáÉý½ü 60%¡£×÷ΪһÖÖ¿ªÔ´´óÊý¾Ý·ÖÎö¿ò¼Ü£¬Apache Spark ÌṩÁîÈËÐÅ·þµÄÐÔÄÜÓÅÊÆ¡£
ʵʩ Spark ¼«¾ßÌôÕ½ÐÔ£¬°üÀ¨Í¶×ÊеÄרҵÄÜÁ¦¡¢¹¤¾ßºÍ¹¤×÷Á÷µÈ¡£ÉèÖÃÁÙʱ Spark ¼¯Èº¿ÉÄܵ¼ÖÂÎÞ·¨¸ßЧÀûÓÃ×ÊÔ´£¬²¢´øÀ´¹ÜÀíºÍ°²È«ÌôÕ½¡£IBM
Spectrum Conductor with Spark ¿É°ïÖú½â¾öÕâЩÎÊÌâ¡£Ëü½« Spark ·¢ÐÐÓë×ÊÔ´¡¢»ù´¡¼Ü¹¹ºÍÊý¾ÝÉúÃüÖÜÆÚ¹ÜÀí¼¯³É£¬ÒÔ¾«¼òµÄ·½Ê½´´½¨ÆóÒµ¼¶¶à×â»§
Spark »·¾³¡£ÎªÁ˰ïÖú¹ÜÀí¿ìËÙ±äǨµÄ Spark ÉúÃüÖÜÆÚ£¬IBM Spectrum Conductor
with Spark Ö§³Öͬ ʱÔËÐÐ Spark µÄ¶àÖÖʵÀýºÍ°æ±¾¡£
±¾ÎĵµËù×öµÄ²âÊÔ£¬ÊÇ»ùÓÚ IBM Conductor with Spark ¼Ü¹¹£¬Ëü°üº¬ 3 ̨ Firestone
·þÎñÆ÷£¬1 ̨ Driver ½Úµã£¬2 ̨ Worker ½Úµã£¬¼´ 1+2 ½á¹¹¡£Èçͼ 6 Ëùʾ£º

Conductor with spark
¼¯Èº¼Ü¹¹
ÔËÐл·¾³ÊÇ Conductor with spark, Spark °æ±¾ÊÇ 1.6.1, ²ÉÓà Spark
ȱʡµÄ DAGScheduler µ÷¶ÈÈí¼þ£¬ÔËÐÐ gatk-launch µÄÑ¡ÏîÊÇ --sparkRunner
SPARK --sparkMaster spark://c712f6n10:7077¡£
4. ÀûÓà Spark ¼¼Êõ½øÐлùÒòÐòÁзÖÎö´¦Àí
ÒÔ ReadsPipelineSpark ¹¤×÷Á÷ΪÀý£¬ËüÊÇ GATK4 Ô¤¶¨ÒåµÄÒ»¸ö¹¤×÷Á÷£¬Óà BAM
ÎļþΪÊäÈ룬ÔËÐÐ MarkDuplicate ºÍ BQSR £¬ÆäÊä³öÎļþ½«ÓÃÓÚÏÂÒ»½×¶Î·ÖÎö¡£
ÖØ¸´ÊÇÖ¸Ò»×é»ùÒòƬ¶ÎÓÐÏàͬµÄ¡¢Î´ÐÞÊÎµÄÆðʼºÍ½áÊø£¬MarkDuplicate ¾ÍÊÇÒªÌôÑ¡³ö"×î¼ÑµÄ"¸´ÖÆ£¬´Ó¶ø¼õ»º´íÎóЧӦ¡£

BQSR ¶ÔÒ»¸öÒѾÅÅÐò¹ýµÄ BAM ÎļþµÄºÏ³É²âÐòÊý¾Ý»ù´¡ÖÊÁ¿ÊýÖµ½øÐÐÖØÐµ÷Õû£¬ÖØÐµ÷Õûºó£¬ÔÚ BAM
Êä³öÖÐÿ¸öƬ¶ÎÔÚ QUAL ÓòÖиü¾«È·£¬Æä±¨¸æµÄÖÊÁ¿ÊýÖµ¸ü½Ó½üÓڲο¼»ùÒòµÄʵ¼Ê¿ÉÄÜÐÔ¡£

ÔËÐÐÃüÁ
Çåµ¥ 1.
./gatk/gatk-launch \ ReadsPipelineSpark \ # Pipeline name -I $bam \ # Input file -R $ref \ # Reference file -O $bamout \ # Output file ¨CbamPartitionSize 134217728 \ # maximum number of bytes to read from a file into each partition of reads. ¨CknownSites $dbsnp \ # knownSites (see notes) ¨CshardedOutput true \ # Write output to multiple pieces ¨Cduplicates_scoring_strategy # MarkDuplicatesScoringStrategy SUM_OF_BASE_QUALITIES \ ¨CsparkRunner SPARK \ # Run mode ¨CsparkMaster spark://c712f6n10:7077 # Spark cluster --conf spark.driver.memory=5g --conf spark.executor.memory=16g
|
ÊäÈëÎļþ£º
-I
CEUTrio.HiSeq.WEx.b37.NA12892.bam
-R human_g1k_v37.2bit
-knownSites dbsnp_138.b37.excluding_sites_after_129.vcf
|
ÀûÓò»Í¬µÄ×ÊÔ´¹ÜÀíÆ÷£¬»¹¿ÉÒÔ¶¨Òå¨Cnum-executors, --executor-mem, --executor-cores£¬´Ó¶ø¸ù¾Ý¼ÆËã×ÊÔ´µÄ´óСºÏÀí·ÖÅäºÍµ÷¶È×ÊÔ´¡£
5. ÐÔÄܺͼÓËٱȷÖÎö
±¾ÎĵµÒÔ CountReadsSpark ΪÀý£¬¶Ô±È·ÖÎöÔÚ²»Í¬Ä£Ê½ÏµÄÔËÐнá¹û¡£
µ¥»ú¡¢·Ç Spark ģʽÏÂÔËÐÐ CountReads:
Çåµ¥ 2.
#./gatk-launch
CountReads -I /home/dlspark/SRR034975.Sort_all.bam
Running:
/home/dlspark/gatk/build/install/gatk/bin/gatk
CountReads -I /home/dlspark/SRR034975.Sort_all.bam
[May 31, 2016 9:52:01 PM EDT] org.broadinstitute.hellbender.tools.CountReads
--input /home/dlspark/SRR034975.Sort_all.bam --disable_all_read_filters
false --interval_set_rule UNION --interval_padding
0 --readValidationStringency SILENT --secondsBetweenProgressUpdates
10.0 --disableSequenceDictionaryValidation false
--createOutputBamIndex true --createOutputBamMD5
false --addOutputSAMProgramRecord true --help
false --version false --verbosity INFO --QUIET
false
Output: 34929382
Elapsed time: 12.14 minutes
|
ÔÚ Spark Standalone ģʽÏÂÔËÐÐ CountReadsSpark
Çåµ¥ 3.
#
./gatk-launch CountReadsSpark -I /home/dlspark/SRR034975.Sort_all.bam
Running:
/home/dlspark/gatk/build/install/gatk/bin/gatk
CountReadsSpark -I /home/dlspark/SRR034975.Sort_all.bam
[June 1, 2016 1:11:34 AM EDT] org.broadinstitute.hellbender.tools.spark.pipelines.CountReadsSpark
--input /home/dlspark/SRR034975.Sort_all.bam --readValidationStringency
SILENT --interval_set_rule UNION --interval_padding
0 --bamPartitionSize 0 --disableSequenceDictionaryValidation
false --shardedOutput false --numReducers 0 --sparkMaster
local[*] --help false --version false --verbosity
INFO --QUIET false
Output: 34929382
Elapsed time: 9.50 minutes
|
ÔÚ Spark Cluster ģʽÏÂÔËÐÐ CountReadsSpark
Çåµ¥ 4.
# ./gatk-launch CountReadsSpark -I /gpfs1/yrx/SRR034975.Sort_all.bam -O /gpfs1/yrx/gatk4-test.output - -sparkRunner SPARK --sparkMaster spark://c712f6n10:7077 Running: spark-submit --master spark://c712f6n10:7077 --conf spark.kryoserializer.buffer.max=512m --conf spark.driver.maxResultSize=0 --conf spark.driver.userClassPathFirst=true --conf spark.io.compression.codec=lzf --conf spark.yarn.executor.memoryOverhead=600 --conf spark.yarn.dist.files =/home/dlspark/gatk/build/libIntelDeflater.so --conf spark.driver.extraJavaOptions =-Dsamjdk.intel_deflater_so_path=libIntelDeflater.so -Dsamjdk.compression_level =1 -DGATK_STACKTRACE_ON_USER_EXCEPTION= true --conf spark.executor.extraJavaOptions= -Dsamjdk.intel_deflater_so_path =libIntelDeflater.so -Dsamjdk.compression_level =1 -DGATK_STACKTRACE_ON_USER_EXCEPTION =true Output: 34929382 Elapsed time: 0.60 minutes |
ÊÇÈýÖÖ²»Í¬ÔËËãģʽ½á¹ûµÄ±È½Ï£¬´ÓÔËÐÐʱ¼ä¿´£¬ÔÚ Spark Cluster ģʽÏ£¬CountReadsSpark
µÄÔËÐÐЧÂÊ±È Spark Standalone ģʽÌá¸ßÁË 15 ±¶¡£

ͼ 7¡¢3 ÖÖÔËËãģʽ½á¹û±È½Ï
×ܽá
ͨ¹ýÒÔÉϵķÖÎö£¬»ùÓÚ Spark ¼¼ÊõµÄÉúÃü¿ÆÑ§½â¾ö·½°¸£¬Äܹ»Ê¹ÔÀ´´®ÐеÄÓ¦ÓÃÔÚ²»Ð޸ĴúÂë»òÐÞ¸ÄÉÙÁ¿´úÂëµÄÇé¿öϾͿÉÒÔʵÏÖ²¢Ðл¯¡¢ÄÚ´æÄÚ¼ÆË㣬»òʹÓü¸Ìõ
Java ´úÂë¼´¿ÉÔÚ¼¯ÈºÉÏ´¦Àí´óÁ¿µÄÊý¾Ý£¬²»ÐèÒª¸´Ô MPI »ò OpenMP ±à³Ì£¬Ê¹¿ÆÑ§¼Ò½«¾«Á¦¸ü¶à¼¯ÖÐÓÚз½·¨µÄÑо¿ºÍеķ¢ÏÖ¡£Ëæ×Å»ùÓÚ
Spark µÄÉúÃü¿ÆÑ§½â¾ö·½°¸µÄ²»¶Ï³ÉÊìºÍÍêÉÆ£¬Ô½À´Ô½¶àµÄ¹¤¾ßºÍ¹¤×÷Á÷¿ÉÒÔÔËÐÐÔÚ Spark ¼ÆËãÆ½Ì¨£¬Ëü¾ßÓÐÈÝ´í¹¦ÄÜ£¬À©Õ¹ÐÔÒ²²»¶ÏÌá¸ß£¬²¢Ðл¯µÄÉúÃü¿ÆÑ§Ó¦Óý«»áʹÐòÁзÖÎöʱ¼ä´ÓĿǰµÄÊ®¼¸¸öСʱËõ¶Ìµ½
1 ¸öСʱ֮ÄÚ£¬Í¬Ê±ÀûÓ÷ֲ¼Ê½×÷Òµµ÷¶ÈºÍµÍ³É±¾¼¯Èº£¬Ò²»á¼«´óµØ½µµÍ·ÖÎö³É±¾¡£
|