{"id":527,"date":"2020-08-05T17:21:49","date_gmt":"2020-08-05T09:21:49","guid":{"rendered":"http:\/\/www.wuchangsong.com\/?p=527"},"modified":"2020-08-11T22:47:14","modified_gmt":"2020-08-11T14:47:14","slug":"%e4%bd%bf%e7%94%a8last%e6%af%94%e5%af%b9%e5%9f%ba%e5%9b%a0%e7%bb%84dna%e5%ba%8f%e5%88%97","status":"publish","type":"post","link":"http:\/\/www.wuchangsong.com\/?p=527","title":{"rendered":"\u4f7f\u7528Last\u6bd4\u5bf9\u57fa\u56e0\u7ec4DNA\u5e8f\u5217"},"content":{"rendered":"<p>LAST can:<\/p>\n<p>Handle big sequence data, e.g:<br \/>\nCompare two vertebrate genomes<br \/>\nAlign billions of DNA reads to a genome<br \/>\nIndicate the reliability of each aligned column.<br \/>\nUse sequence quality data properly.<br \/>\nCompare DNA to proteins, with frameshifts.<br \/>\nCompare PSSMs to sequences<br \/>\nCalculate the likelihood of chance similarities between random sequences.<br \/>\nDo split and spliced alignment.<br \/>\nTrain alignment parameters for unusual kinds of sequence (e.g. nanopore).<\/p>\n<p>\u5b89\u88c5<\/p>\n<pre>\r\nwget http:\/\/last.cbrc.jp\/last-1080.zip\r\nunzip last-1080.zip\r\ncd last-1080\r\nmake && make install\r\ncd .. && rm last-1080.zip<\/pre>\n<p>\u4f7f\u7528\u6d41\u7a0b\uff1a<\/p>\n<pre>\r\nlastdb -P0 -uMAM4 -R01 darer-MAM4 ..\/01.proteomics\/darer.genome.fasta\r\n#-P \u8bbe\u7f6e\u7ebf\u7a0b\u6570\uff0c0\u8c03\u7528\u670d\u52a1\u5668\u6240\u6709\u7ebf\u7a0b\r\n#-u \u8be5\u53c2\u6570\u7684\u9009\u62e9\u662flast\u6bd4\u5bf9\u975e\u5e38\u5173\u952e\u7684\u4e00\u6b65\uff0c\u5e38\u7528\u53c2\u6570\uff1a\r\n##MAM8\uff1aThis DNA seeding scheme finds weak similarities with high sensitivity, but low speed and high memory usage (e.g. ~50 GB for mammal genomes).\r\n##MAM4\uff1aThis DNA seeding scheme is like MAM8, but a bit less sensitive, and uses about half as much memory.\r\n##NEAR\uff1aThis DNA seeding scheme is good for finding short-and-strong (near-identical) similarities. It is also good for similarities with many gaps (insertions and deletions), because it can find the short matches between the gaps. (Long-and-weak seeding schemes allow for mismatches but not gaps.) \r\n##YASS\uff1aThis DNA seeding scheme is good for finding long-and-weak similarities. It is a good compromise for both protein-coding and non protein-coding DNA\r\n#-R01 tells it to mark simple sequences (such as cacacacacacacacaca) by lowercase, but not suppress them.\r\nfor i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    echo \"last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 darer-MAM4 ..\/01.proteomics\/$i.genome.fasta > $i.mat\"\r\ndone > last_all_pairs_mat.list\r\nParaFly -c last_all_pairs_mat.list -CPU 25\r\n\r\nfor i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\ndo\r\n    echo \"lastal -P0 -m100 -E0.05 -C2 -p $i.mat darer-MAM4 ..\/01.proteomics\/$i.genome.fasta | last-split -m1 > $i.maf\"\r\ndone > last_all_pairs_maf.list\r\nParaFly -c last_all_pairs_maf.list -CPU 25\r\n\r\nmaf-swap datra.maf | awk '\/^s\/ {$2 = (++s % 2 ? \"datra.\" : \"darer.\") $2} 1' | last-split -m1 | maf-swap > datra-2.maf\r\nlast-postmask datra-2.maf | maf-convert -n tab | awk -F'=' '$2 <= 1e-5' > datra.tab\r\n#lastdb -P0 -uNEAR -cR11 darer.fa.db ..\/01.proteomics\/darer.genome.fasta\r\n#for i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\n#do\r\n#    echo \"lastal -P20 -m100 -E0.05 darer.fa.db ..\/01.proteomics\/$i.genome.fasta | last-split -m1 > $i.maf\"\r\n#done > last_all_pairs_maf.list\r\n#ParaFly -c last_all_pairs_maf.list -CPU 8\r\n#for i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\n#do\r\n#    echo \"maf-swap $i.maf | last-split | maf-swap | last-split | maf-sort > $i.LAST.maf\"\r\n#done > last_maf_swap.list\r\n#ParaFly -c last_maf_swap.list -CPU 25\r\n#for i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\n#do\r\n#    echo \"maf-convert psl $i.LAST.maf > $i.psl\"\r\n#done > last_maf_convert.list\r\n#ParaFly -c last_maf_convert.list -CPU 25\r\n#for i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\n#do\r\n#    echo \"perl maf.rename.species.S.pl $i.LAST.maf darer $i $i.Final.maf > $i.stat\"\r\n#done > last_maf_rename.list\r\n#ParaFly -c last_maf_rename.list -CPU 25\r\n\r\n#for i in `cat ..\/00.initial_data\/reflect.txt | cut -f 1`\r\n#do\r\n#    echo \"cp $i.Final.maf darer.$i.sing.maf\"\r\n#done > cp.lsit\r\n#ParaFly -c lcp.list -CPU 25\r\n\r\nroast T=\/home\/wuchangsong\/gc_genome\/19.paml\/i.LAST\/ E=darer \"tree topology\" .\/*Final.maf all.roast.maf\r\n#Then the output file example.roast.maf will contain the orthologous multiple alignment.\r\n<\/pre>\n<p>\u53c2\u8003\u94fe\u63a5\uff1ahttps:\/\/github.com\/mcfrith\/last-genome-alignments                                                                                                                                                <\/p>\n","protected":false},"excerpt":{"rendered":"<p>LAST can: Handle big sequence data, e.g: Compare two vertebrate genomes Align billions of DNA reads to a genome Indicate the reliability of each aligned column. Use sequence quality data properly. Compare DNA to proteins, with frameshifts. Compare PSSMs to sequences Calculate the likelihood of chance similarities between random sequences. Do split and spliced alignment. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"_links":{"self":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/527"}],"collection":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=527"}],"version-history":[{"count":8,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/527\/revisions"}],"predecessor-version":[{"id":547,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=\/wp\/v2\/posts\/527\/revisions\/547"}],"wp:attachment":[{"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=527"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=527"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.wuchangsong.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=527"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}