{"id":548,"date":"2020-08-26T23:21:39","date_gmt":"2020-08-26T15:21:39","guid":{"rendered":"http:\/\/www.wuchangsong.com\/?p=548"},"modified":"2020-09-01T16:37:33","modified_gmt":"2020-09-01T08:37:33","slug":"%e5%a4%9a%e7%89%a9%e7%a7%8d%e5%85%a8%e5%9f%ba%e5%9b%a0%e7%bb%84%e5%ba%8f%e5%88%97%e6%af%94%e5%af%b9%e5%be%97%e5%88%b0%e4%bf%9d%e5%ae%88%e7%9a%84dna%e5%ba%8f%e5%88%97","status":"publish","type":"post","link":"http:\/\/www.wuchangsong.com\/?p=548","title":{"rendered":"\u591a\u7269\u79cd\u5168\u57fa\u56e0\u7ec4\u6bd4\u5bf9\u5f97\u5230\u4fdd\u5b88\u7684DNA\u5e8f\u5217"},"content":{"rendered":"<p>\u67e5\u9605\u4e0d\u540c\u6587\u732e\u548c\u6559\u7a0b\u53d1\u73b0\u83b7\u5f97\u591a\u7269\u79cd\u4fdd\u5b88\u7684DNA\u5e8f\u5217\uff08\u7f16\u7801\u533a\u548c\u975e\u7f16\u7801\u533a\uff09\u4e3b\u8981\u8fc7\u7a0b\u6709\uff1a<br \/>\n1.Repeat mask\uff1a\u901a\u8fc7RepeatMasker\u548cRepeatModeler\u83b7\u5f97<br \/>\n2.Pairwise alignment: \u7528\u5230\u7684\u8f6f\u4ef6\u4e3b\u8981\u6709last\u3001lastz\u3001blastz<br \/>\n3.Chaining: axtChain<br \/>\n4.Netting: chainNet<br \/>\n5.Mafing:<br \/>\n6.Combine multiple pairwise results:<br \/>\n7.PhastCons: PHAST<\/p>\n<p>\u8be6\u7ec6\u6b65\u9aa4\u5982\u4e0b<\/p>\n<p>\u7b2c\u4e00\u6b65\uff1a\u4ece\u6570\u636e\u5e93\u4e0b\u8f7d\u91cd\u590d\u5e8f\u5217\u5c4f\u853d\u540e\u7684\u57fa\u56e0\u7ec4fasta\u6587\u4ef6\uff0c\u5bf9\u81ea\u5df1\u7ec4\u88c5\u7684\u5e8f\u5217\u53ef\u4ee5\u901a\u8fc7<a href=\"https:\/\/github.com\/chenlianfu\/geta\">Geta<\/a>\u83b7\u5f97<br \/>\n\u7b2c\u4e8c\u6b65\uff1a\u524d\u8fb9\u4ecb\u7ecd\u8fc7last\u7684\u4f7f\u7528\uff0c\u4f46\u770b\u6587\u732e\u53d1\u73b0\u4f7f\u7528lastz\u7684\u6bd4\u8f83\u591a\uff0c\u6709\u5173last\u548clastz\u7684\u6bd4\u8f83\uff08last aligner is considered faster and memory efficient. It creates maf file, which can converted to psl files. Then the same following processes can be used on psl files. Different from lastz, last aligner starts with fasta files. The target genome sequence has to build the index file first, and then align with the query genome sequence.\uff09\uff0c\u64cd\u4f5c\u4e0alast\u4f7f\u7528\u8d77\u6765\u66f4\u52a0\u7b80\u5355\uff0c\u53c2\u6570\u9009\u62e9\u8f83\u5c11\uff0c\u76ee\u524d\u8fd8\u4e0d\u77e5\u9053\u4e24\u8005\u7ed3\u679c\u7684\u5f02\u540c\uff08\u670d\u52a1\u5668\u6b63\u5728\u8fd0\u884c\uff0c\u7ed3\u679c\u51fa\u6765\u66f4\u65b0\uff09\u3002<a href=\"http:\/\/www.bx.psu.edu\/miller_lab\/dist\/README.lastz-1.02.00\/README.lastz-1.02.00a.html#blastz\">lastz\u548cblastz\u7684\u4e0d\u540c<\/a>\u3002<br \/>\nlastz\u6570\u636e\u9884\u5904\u7406\uff1a<\/p>\n<pre>\r\nfor i in `cat 00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    echo \"faToTwoBit 00.initial_data\/$i.genome.fasta 00.initial_data\/$i.genome.2bit\"\r\ndone > fa2bit.list\r\nParaFly -c fa2bit.list -CPU 19\r\n\r\nfor i in `cat 00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    echo \"twoBitInfo 00.initial_data\/$i.genome.2bit stdout | sort -k2rn > $i.chrom.sizes\"\r\ndone > chrom.sizes.list\r\nParaFly -c chrom.sizes.list -CPU 19\r\nfor i in `cat 00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    mkdir ${i}PartList\r\ndone\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    echo \"\/opt\/biosoft\/userApps\/kent\/src\/hg\/utils\/automation\/partitionSequence.pl 10000000 0 00.initial_data\/$i.genome.2bit $i.chrom.sizes 1 -lstDir ${i}PartList > $i.part.list\"\r\ndone > query_partitionSequence.list\r\nParaFly -c query_partitionSequence.list -CPU 18\r\n\/opt\/biosoft\/userApps\/kent\/src\/hg\/utils\/automation\/partitionSequence.pl 20000000 10000 00.initial_data\/darer.genome.2bit darer.chrom.sizes 1 -lstDir darerPartList > darer.part.list\r\ngrep -v PartList darer.part.list > darer.list\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    echo \"grep -v PartList $i.part.list > $i.list\"\r\ndone > 1111.lsit\r\nParaFly -c 1111.lsit -CPU 18\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    echo \"cat ${i}PartList\/*.lst >> $i.list\"\r\ndone > cat_Part.list\r\nParaFly -c cat_Part.list -CPU 18\r\n\/opt\/biosoft\/userApps\/kent\/src\/hg\/utils\/automation\/constructLiftFile.pl darer.chrom.sizes darer.list > darer.lift\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    echo \"\/opt\/biosoft\/userApps\/kent\/src\/hg\/utils\/automation\/constructLiftFile.pl $i.chrom.sizes $i.list > $i.lift\"\r\ndone > constructLiftFile.list\r\nParaFly -c constructLiftFile.list -CPU 18\r\nfor i in `cat 00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    mkdir $i\r\n    for x in `cat $i.list`\r\n    do\r\n        y=${x\/*2bit:\/}\r\n        echo \"twoBitToFa $x $i\/$y.fa\"\r\n    done >> twoBitToFa.list\r\ndone\r\n#\u53bb\u9664\u957f\u5ea6\u5c0f\u4e8e1000bp\u7684\u5e8f\u5217\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    for x in $i\/*fa\r\n    do\r\n        y=${x\/*-\/}\r\n        k=${y\/.fa\/}\r\n        if [ $k -le 1000 ]\r\n        then\r\n            rm $x\r\n        fi\r\n\r\n    done\r\ndone\r\n\r\nParaFly -c twoBitToFa.list -CPU 80\r\nfor i in darer\/*fa\r\ndo\r\n    for x in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\n    do\r\n        for y in $x\/*fa\r\n        do\r\n            echo \"lastz $i $y --strand=both --seed=12of19 --notransition --chain --gapped --gap=400,30 --hspthresh=3000 --gappedthresh=3000 --inner=2000 --masking=50 --ydrop=9400 --scores=\/opt\/biosoft\/GenomeAlignmentTools\/HoxD55.q --format=axt > ${x}_axt\/$i.$y.axt\"\r\n        done >> lastz_all.list\r\n    done\r\ndone\r\n\r\n#lastz\u7684\u8fd0\u884c\u901f\u5ea6\u592a\u6162\uff0c\u82e5\u5206\u6790\u7684\u7269\u79cd\u592a\u591a\u6ca1\u6709\u8d85\u7b97\u4e0d\u63a8\u8350\u4f7f\u7528\r\n<\/pre>\n<p>\u7b2c\u4e09\u6b65\uff1aChaining\uff0c\u5c06\u76f8\u90bb\u7684block\u8fde\u63a5\u8d77\u6765\uff0c\u6253\u5206\u77e9\u9635\u548cblastz\u76f8\u540c\uff0cgap\u6253\u5206\u6539\u53d8<\/p>\n<pre>\r\nfor i in `cat 00.initial_data\/reflect_no_darer.txt | cut -f 1`\r\ndo\r\n    echo \"axtChain -linearGap=loose -psl $i.psl darer.genome.2bit $i.genome.2bit $i.Todarer.chain\"\r\ndone > chain_axtChain.list\r\nParaFly -c chain_axtChain.list -CPU 18\r\n<\/pre>\n<p>\u7b2c\u56db\u6b65\uff1aNetting:chainNet\uff0c\u5bf9target\u5e8f\u5217\u786e\u5b9a\u6700\u4f18\u6bd4\u5bf9\u5e8f\u5217\u3002<br \/>\n1.\u9996\u5148\u5c06\u6240\u6709\u7684\u67d3\u8272\u4f53\u6216scaffold\u7684\u78b1\u57fa\u6807\u8bb0\u672a\u7528\u7684\u3002<br \/>\n2.\u6309\u6253\u5206\u7531\u9ad8\u5230\u4f4e\u6392\u5217\uff0c\u5f62\u6210\u5217\u8868\u3002<br \/>\n3.\u8fed\u4ee3\uff1a\u6bcf\u6b21\u4ece\u5217\u8868\u4e2d\u53d6\u51fa\u4e00\u4e2achain\uff0c\u6254\u6389\u4e0e\u5df2\u7ecf\u5b58\u5728\u7684chain\u6709overlap\u7684\u533a\u57df\uff0c\u4f59\u4e0b\u7684\u90e8\u5206\u6dfb\u52a0\u4e0a\u53bb\uff0c\u5982\u679c\u548c\u4e4b\u524d\u7684chain\u6709gap\uff0c\u6807\u8bb0\u6210\u5b50\u96c6\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u5f62\u6210\u7684\u5c42\u7ea7\u79f0\u4e3anet\u3002\u8bb0\u5f55overlap\u7684\u533a\u57df\uff0c\u7528\u4e8e\u4e0b\u4e00\u6b65\u8bc6\u522b\u91cd\u590d\u3002<\/p>\n<pre>\r\nchainMergeSort $output_dir\/3.chain\/*.chain > $output_dir\/4.prenet\/all.chain\r\nchainPreNet $output_dir\/4.prenet\/all.chain $output_dir\/$tn.sizes $output_dir\/$qn.sizes $output_dir\/4.prenet\/all_sort.chain\r\nchainNet $output_dir\/4.prenet\/all_sort.chain $output_dir\/$tn.sizes $output_dir\/$qn.sizes $output_dir\/5.net\/temp.tn $output_dir\/5.net\/temp.qn\r\nnetSyntenic $output_dir\/5.net\/temp.tn $output_dir\/5.net\/$tn.net\r\nnetSyntenic $output_dir\/5.net\/temp.qn $output_dir\/5.net\/$qn.net<\/pre>\n<p>\u7b2c\u4e94\u6b65\uff1aMafing<\/p>\n<pre>\r\nnetToAxt $output_dir\/5.net\/$tn.net $output_dir\/4.prenet\/all_sort.chain $output_dir\/$tn.2bit $output_dir\/$qn.2bit $output_dir\/6.net_to_axt\/all.axt\r\naxtSort $output_dir\/6.net_to_axt\/all.axt $output_dir\/6.net_to_axt\/all_sort.axt\r\naxtToMaf -tPrefix=$tn -qPrefix=$qn $output_dir\/6.net_to_axt\/all_sort.axt $output_dir\/$tn.sizes $output_dir\/$qn.sizes $output_dir\/7.maf\/all.maf<\/pre>\n<p>\u7b2c\u516d\u6b65\uff1aCombine multiple pairwise results:<\/p>\n<pre>\r\nroast + E=darer tree.txt .\/*Final.maf all.roast.maf<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u67e5\u9605\u4e0d\u540c\u6587\u732e\u548c\u6559\u7a0b\u53d1\u73b0\u83b7\u5f97\u591a\u7269\u79cd\u4fdd\u5b88\u7684DNA\u5e8f\u5217\uff08\u7f16\u7801\u533a\u548c\u975e\u7f16\u7801\u533a\uff09\u4e3b\u8981\u8fc7\u7a0b\u6709\uff1a 1.Repeat mask\uff1a\u901a\u8fc7RepeatMasker\u548cRepeatModeler\u83b7\u5f97 2.Pairwise alignment: \u7528\u5230\u7684\u8f6f\u4ef6\u4e3b\u8981\u6709last\u3001lastz\u3001blastz 3.Chaining: axtChain 4.Netting: chainNet 5.Mafing: 6.Combine multiple pairwise results: 7.PhastCons: PHAST \u8be6\u7ec6\u6b65\u9aa4\u5982\u4e0b \u7b2c\u4e00\u6b65\uff1a\u4ece\u6570\u636e\u5e93\u4e0b\u8f7d\u91cd\u590d\u5e8f\u5217\u5c4f\u853d\u540e\u7684\u57fa\u56e0\u7ec4fasta\u6587\u4ef6\uff0c\u5bf9\u81ea\u5df1\u7ec4\u88c5\u7684\u5e8f\u5217\u53ef\u4ee5\u901a\u8fc7Geta\u83b7\u5f97 \u7b2c\u4e8c\u6b65\uff1a\u524d\u8fb9\u4ecb\u7ecd\u8fc7last\u7684\u4f7f\u7528\uff0c\u4f46\u770b\u6587\u732e\u53d1\u73b0\u4f7f\u7528lastz\u7684\u6bd4\u8f83\u591a\uff0c\u6709\u5173last\u548clastz\u7684\u6bd4\u8f83\uff08last aligner is considered faster and memory efficient. It creates maf file, which can converted to psl files. Then the same following processes can be used on psl files. Different from lastz, last aligner starts with [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"_links":{"self":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/548"}],"collection":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=548"}],"version-history":[{"count":19,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/548\/revisions"}],"predecessor-version":[{"id":575,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/548\/revisions\/575"}],"wp:attachment":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=548"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}