PHAST安装使用

PHAST:Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package consisting of a collection of command-line programs and supporting libraries for comparative and evolutionary genomics. Best known as the search engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes several tools for phylogenetic modeling, functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations.

Major PHAST programs:

phastCons: Conservation scoring and identification of conserved elements
phyloFit: Fitting of phylogenetic models to aligned DNA sequences
phyloP: Computation of p-values for conservation or acceleration, either lineage-specific or across all branches
phastOdds: Log-odds scoring for phylogenetic models or phylo-HMMs
exoniphy: Phylogenetic exon prediction
dless: Prediction of elements under lineage-specific selection
prequel: Probabilistic reconstruction of ancestral sequences
phastBias: Identification of GC-biased gene conversion using a phylo-HMM

安装PHAST:

wget -c http://www.netlib.org/clapack/clapack.tgz
tar -xvzf clapack.tgz
cd CLAPACK-3.2.1
cp make.inc.example make.inc && make f2clib && make blaslib && make lib
wget -c http://compgen.cshl.edu/phast/downloads/phast.v1_5.tgz
tar zxf phast.v1_5.tgz
cd phast/src/
make CLAPACKPATH=/opt/biosoft/CLAPACK-3.2.1
echo 'PATH=$PATH:/opt/biosoft/phast/bin' >> ~/.bashrc
source ~/.bashrc

使用MCScanX和JCVI共线性区块分析

首先从基因组数据库下载目标物种的全基因组文件和gff3文件
对gff3文件预处理得到对应格式的gff和bed文件

MCScanX:perl -e 'while (<>) { if (m/^(\S+)\t.*\tgene\t(\d+)\t(\d+).*ID=(darer\d+)/) { print "$1\t$4\t$2\t$3\n" } }' darer.genome.gff3 > darer.gff
jcvi:perl -e 'while (<>) { if (m/^(\S+)\t.*\tgene\t(\d+)\t(\d+)\t(\S+)\t(\S+).*ID=(darer\d+)/) { print "$1\t$2\t$3\t$6\t$4\t$5\n" } }' darer.genome.gff3 > zebr.bed

对NCBI的GFF3文件再进行提取,若基因含有可变剪接,则仅保留CDS长度最长的转录本并获得蛋白序列,该过程参考网上其他教程

cat darer.gff gc.gff > all.gff
cat darer.pep.fasta gc.pep.fasta > all.pep.fasta
makeblastdb -in all.pep.fasta -dbtype prot -title all -parse_seqids -out all -logfile all.log
blast.pl blastp all all.pep.fasta 1e-10 160 blast 6
mkdir MCScanx
mv ../blast.tab all.blast
mv ../all.gff ./
MCScanX all
cd ..
mkdir jcvi && cd jcvi
mv ../*.bed ./
mv ../darer.pep.fasta darer.pep
mv ../gc.pep.fasta gc.pep
python -m jcvi.compara.catalog ortholog --dbtype prot --no_strip_names darer gc
python -m jcvi.compara.synteny screen --minspan=30 --simple darer.gc.anchors darer.gc.anchors.new