Month: August 2020

R画折线图

library(ggplot2) library(gcookbook) data

2020年8月29日
多物种全基因组比对得到保守的DNA序列

查阅不同文献和教程发现获得多物种保守的DNA序列（编码区和非编码区）主要过程有： 1.Repeat mask：通过RepeatMasker和RepeatModeler获得 2.Pairwise alignment: 用到的软件主要有last、lastz、blastz 3.Chaining: axtChain 4.Netting: chainNet 5.Mafing: 6.Combine multiple pairwise results: 7.PhastCons: PHAST 详细步骤如下第一步：从数据库下载重复序列屏蔽后的基因组fasta文件，对自己组装的序列可以通过Geta获得第二步：前边介绍过last的使用，但看文献发现使用lastz的比较多，有关last和lastz的比较（last aligner is considered faster and memory efficient. It creates maf file, which can converted to psl files. Then the same following processes can be used on psl files. Different from lastz, last aligner starts with…

2020年8月26日
PSMC分析流程

bowtie2-build ../genome.fasta genome bowtie2 -x genome -p 80 -1 reads.1.fastq -2 reads.2.fastq -S bowtie2.sam samtools sort -o bowtie2_sort.bam -O BAM -@ 40 -m 4G bowtie2.sam /opt/biosoft/samtools-0.1.18/samtools mpileup -C50 -uf ../genome.fasta bowtie2_sort.bam > gc_psmc.bcf /opt/biosoft/samtools-0.1.18/bcftools/bcftools view -c gc_psmc.bcf > Pb_2G.vcf vcfutils.pl vcf2fq -d 10 -D 100 Pb_2G.vcf | gzip > diploid.fq.gz /opt/biosoft/psmc-master/utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa…

2020年8月22日
使用AdmixTools做D-statistics

安装软件和缺少的库文件 git clone https://github.com/DReichLab/AdmixTools.git cd AdmixTools/src make clobber make all #如果报错/usr/bin/ld: cannot find -lopenblas说明缺少libopenblas库文件 git clone https://github.com/xianyi/OpenBLAS.git cd OpenBLAS make make PREFIX=/path/to/your/installation install cd /usr/lib/ ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.a ./libopenblas.a ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.so ./libopenblas.so cd /opt/biosoft/AdmixTools/src/ make clean make all && make install

2020年8月11日
根据gff文件统计exon、intron长度分布图

下载需要的脚本和安装Python模块 wget https://github.com/irusri/Extract-intron-from-gff3/archive/master.zip unzip master.zip rm master.zip && cd Extract-intron-from-gff3-master/scripts/ sudo chmod 755 * pip install misopy pip install gffutils 获取exon、intron的gff文件并提取DNA序列 python /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_intron_gff3_from_gff3.py out.gff3 out_intron.gff3 ##结果文件out_intron.gff3_introns.gff3 awk ‘/intron\t/{print}’ out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n > processed_intron.gff3 awk ‘/exon\t/{print}’ out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n > processed_exon.gff3 perl /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_seq_from_gff3.pl -d out.tmp/genome.fasta – processed_intron.gff3 > output_intron.fa perl…

2020年8月8日
使用Last比对基因组DNA序列

LAST can: Handle big sequence data, e.g: Compare two vertebrate genomes Align billions of DNA reads to a genome Indicate the reliability of each aligned column. Use sequence quality data properly. Compare DNA to proteins, with frameshifts. Compare PSSMs to sequences Calculate the likelihood of chance similarities between random sequences. Do split and spliced alignment.…

2020年8月5日

Month: August 2020

R画折线图

多物种全基因组比对得到保守的DNA序列

PSMC分析流程

使用AdmixTools做D-statistics

根据gff文件统计exon、intron长度分布图

使用Last比对基因组DNA序列