Month: August 2020

  • R画折线图

    library(ggplot2) library(gcookbook) data

  • 多物种全基因组比对得到保守的DNA序列

    查阅不同文献和教程发现获得多物种保守的DNA序列(编码区和非编码区)主要过程有: 1.Repeat mask:通过RepeatMasker和RepeatModeler获得 2.Pairwise alignment: 用到的软件主要有last、lastz、blastz 3.Chaining: axtChain 4.Netting: chainNet 5.Mafing: 6.Combine multiple pairwise results: 7.PhastCons: PHAST 详细步骤如下 第一步:从数据库下载重复序列屏蔽后的基因组fasta文件,对自己组装的序列可以通过Geta获得 第二步:前边介绍过last的使用,但看文献发现使用lastz的比较多,有关last和lastz的比较(last aligner is considered faster and memory efficient. It creates maf file, which can converted to psl files. Then the same following processes can be used on psl files. Different from lastz, last aligner starts with…

  • PSMC分析流程

    bowtie2-build ../genome.fasta genome bowtie2 -x genome -p 80 -1 reads.1.fastq -2 reads.2.fastq -S bowtie2.sam samtools sort -o bowtie2_sort.bam -O BAM -@ 40 -m 4G bowtie2.sam /opt/biosoft/samtools-0.1.18/samtools mpileup -C50 -uf ../genome.fasta bowtie2_sort.bam > gc_psmc.bcf /opt/biosoft/samtools-0.1.18/bcftools/bcftools view -c gc_psmc.bcf > Pb_2G.vcf vcfutils.pl vcf2fq -d 10 -D 100 Pb_2G.vcf | gzip > diploid.fq.gz /opt/biosoft/psmc-master/utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa…

  • 使用AdmixTools做D-statistics

    安装软件和缺少的库文件 git clone https://github.com/DReichLab/AdmixTools.git cd AdmixTools/src make clobber make all #如果报错/usr/bin/ld: cannot find -lopenblas说明缺少libopenblas库文件 git clone https://github.com/xianyi/OpenBLAS.git cd OpenBLAS make make PREFIX=/path/to/your/installation install cd /usr/lib/ ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.a ./libopenblas.a ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.so ./libopenblas.so cd /opt/biosoft/AdmixTools/src/ make clean make all && make install

  • 根据gff文件统计exon、intron长度分布图

    下载需要的脚本和安装Python模块 wget https://github.com/irusri/Extract-intron-from-gff3/archive/master.zip unzip master.zip rm master.zip && cd Extract-intron-from-gff3-master/scripts/ sudo chmod 755 * pip install misopy pip install gffutils 获取exon、intron的gff文件并提取DNA序列 python /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_intron_gff3_from_gff3.py out.gff3 out_intron.gff3 ##结果文件out_intron.gff3_introns.gff3 awk ‘/intron\t/{print}’ out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n > processed_intron.gff3 awk ‘/exon\t/{print}’ out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n > processed_exon.gff3 perl /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_seq_from_gff3.pl -d out.tmp/genome.fasta – processed_intron.gff3 > output_intron.fa perl…

  • 使用Last比对基因组DNA序列

    LAST can: Handle big sequence data, e.g: Compare two vertebrate genomes Align billions of DNA reads to a genome Indicate the reliability of each aligned column. Use sequence quality data properly. Compare DNA to proteins, with frameshifts. Compare PSSMs to sequences Calculate the likelihood of chance similarities between random sequences. Do split and spliced alignment.…