BS-seq分析流程(一)

短序列的质控都可以使用trimmomatic,这里不多做介绍,得到的clean data可做下面分析
一、比对和甲基化位点提取(Bismark)
Bismark安装及使用

git clone https://github.com/FelixKrueger/Bismark.git
#conda install -c bioconda bismark
#Genome Preparation
/opt/biosoft/Bismark-0.23.0/bismark_genome_preparation --parallel 40 --verbose ./
#bismark alignment
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/bismark --parallel 40 --genome ./ --phred33-quals -1 ../$i.1.fastq -2 ../$i.2.fastq"
done > bismark.list
ParaFly -c bismark.list -CPU 4
#deduplicate 
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/deduplicate_bismark --bam $i.1_bismark_bt2_pe.bam"
done > deduplicate_bismark.list
ParaFly -c deduplicate_bismark.list -CPU 9
#bismark_methylation_extractor提取甲基化信息
for i in `ls *deduplicated.bam`
do
    echo "/opt/biosoft/Bismark-0.23.0/bismark_methylation_extractor $i -p --genome ./ --gzip --bedGraph --cytosine_report --CX --buffer_size 20G --output ./"
done > methylation_extractor.list
ParaFly -c methylation_extractor.list -CPU 9
/opt/biosoft/Bismark-0.23.0/bismark2report
/opt/biosoft/Bismark-0.23.0/bismark2summary
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/coverage2cytosine $i.1_bismark_bt2_pe.deduplicated.bismark.cov.gz --merge_CpG --genome ./ -o $i.CpG.output"
done > coverage2cytosine.list
ParaFly -c coverage2cytosine.list -CPU 9
/opt/biosoft/Bismark-0.23.0/bam2nuc --genome_folder ./ --genomic_composition_only
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/bam2nuc --genome_folder ./ $i.1_bismark_bt2_pe.deduplicated.bam"
done > bam2nuc.list
ParaFly -c bam2nuc.list -CPU 9



/opt/biosoft/Bismark-0.23.0/bismark2bedGraph -o buffer_CpG.bedGraph --buffer 5G CpG_*

ATAC-seq分析流程(一)

首先根据今年发表在Genome Biology上一篇综述总览分析流程:

作者建议的预处理分析流程为:FastQC-trimmomatic-BWA-MEM-ATACseqQC。
一、使用Fastqc对fastq 文件的测序质量统计

for i in `ls *fastq`; do echo "fastqc -t 4 -o ./ $i"; done > fastqc.sh
nohup ParaFly -c fastqc.sh -CPU 18 &

二、Trimmomatic质控(可使用 cutadapt,AdapterRemoval v2,Skewer 和 trimmomatic 等软件)
The forward and reverse adapters are slightly different. We will also trim low quality bases at the ends of the reads (quality less than 20). We will only keep reads that are at least 20 bases long. We remove short reads (< 20bp) as they are not useful, they will either be thrown out by the mapping or may interfere with our results at the end.

mkdir 1.trimmomatic
cd 1.trimmomatic/
for i in `cat ../samples.txt`
do
    echo "java -jar /opt/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 20 ../$i.1.fastq ../$i.2.fastq $i.1.fastq $i.1.unpaired.fastq $i.2.fastq $i.2.unpaired.fastq ILLUMINACLIP:/opt/biosoft/Trimmomatic-0.38/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20 TOPHRED33 2> $i.trimmomatic.log"
done > command.trimmomatic.list
ParaFly -c command.trimmomatic.list -CPU 9
cd ..

三、Mapping Reads to Reference Genome

mkdir 2.bowtie2 && cd 2.bowtie2
bowtie2-build --threads 160 genome.fasta genome
for i in `cat ../samples.txt`
do
    echo "bowtie2 -x genome -1 ../1.trimmomatic/$i.1.fastq -2 ../1.trimmomatic/$i.2.fastq -p 160 -I 0 -X 1000 --very-sensitive -S $i.bowtie2.sam 2> $i.bowtie2.log"
done > bowtie2.list
ParaFly -c bowtie2.list -CPU 9
for i in `cat ../samples.txt`
do
    echo "samtools sort -@ 40 -m 8G -o $i.bowtie2.bam -O BAM $i.bowtie2.sam"
done > sam2bam.list
ParaFly -c sam2bam.list -CPU 9
cd ..

四、Filtering Mapped Reads
1、Filter Duplicate Reads
去除重复可选择samtools、picard和sambamba,其中sambamba的操作速度最快,推荐使用,安装推荐使用conda

sambamba markdup -r --overflow-list-size 600000  --tmpdir='./' ../2.bowtie2/$i.bowtie2.bam $i.sambamba.rmdup.bam

–overflow-list-size=OVERFLOWLISTSIZE
size of the overflow list where reads, thrown away from the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created
-r, –remove-duplicates
remove duplicates instead of just marking them
-t, –nthreads=NTHREADS
number of threads to use
BUGS
External sort is not implemented. Thus, memory consumption grows by 2Gb per each 100M reads. Check that you have enough RAM before running the tool.

2、Filter Uninformative Reads

samtools view -h -f 2 -q 30 $i.sambamba.rmdup.bam |grep -v pilon373| samtools sort -O bam -@ 40 -o - > $i.last.bam
samtools index $i.last.bam
samtools index $i.sambamba.rmdup.bam
samtools flagstat $i.sambamba.rmdup.bam > $i.sambamba.rmdup.stat
samtools flagstat $i.last.bam > $i.last.stat
bedtools bamtobed -i $i.last.bam > $i.bed

3、Check Insert Sizes

for i in `cat ../samples.txt`
do
    echo "samtools view $i.last.bam |awk '{print $9}'  > $i.last_length.txt"
done > last_bam_length.list
ParaFly -c last_bam_length.list -CPU 9

数据可导入R绘图
五、Peak calling(详细参数参考https://www.jianshu.com/p/21e8c51fca23

macs2 callpeak -t ../3.filtering/$i.bed -g 8.9e8 --nomodel --shift -100 --extsize 200 -n $i.peaks --outdir ./

Upset plot:进阶版Venn plot实现多元素可视化

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("UpSetR")
library(UpSetR)
pathway=read.table("carp_combined_id1.xls",header=T,sep="\t")
upset(pathway,nsets = 9, nintersects = 50,mb.ratio = c(0.5, 0.5),
      order.by = "freq", decreasing = c(TRUE,FALSE),
      queries = list(list(query=intersects, params=list("datra", "caaur","meamb","onmac","ctide","cycar","sigra","sians","sirhi"), color="red", active=T)))
#详细参数查看官网
#行列转置awk '{for(i=0;++i<=NF;)a[i]=a[i]?a[i] FS $i:$i}END{for(i=0;i++ carp_combined_absentid_convert.txt

R画折线图

library(ggplot2)
library(gcookbook)
data <- read.table("test.xls",header=T,sep="\t")
ggplot(data,aes(x=Species, y=CDS))+ geom_line(color="red") + geom_point(size=2, shape=16,color="red") + ylim(0,1750) + theme_classic() + scale_x_discrete(limits=c("darer","datra","ctide","meamb","onmac","cycar","caaur","sirhi","sians","sigra"))
#scale_x_discrete修改x轴坐标,不改变图的数值
#theme_classic显示X、Y轴线

多物种全基因组比对得到保守的DNA序列

查阅不同文献和教程发现获得多物种保守的DNA序列(编码区和非编码区)主要过程有:
1.Repeat mask:通过RepeatMasker和RepeatModeler获得
2.Pairwise alignment: 用到的软件主要有last、lastz、blastz
3.Chaining: axtChain
4.Netting: chainNet
5.Mafing:
6.Combine multiple pairwise results:
7.PhastCons: PHAST

详细步骤如下

第一步:从数据库下载重复序列屏蔽后的基因组fasta文件,对自己组装的序列可以通过Geta获得
第二步:前边介绍过last的使用,但看文献发现使用lastz的比较多,有关last和lastz的比较(last aligner is considered faster and memory efficient. It creates maf file, which can converted to psl files. Then the same following processes can be used on psl files. Different from lastz, last aligner starts with fasta files. The target genome sequence has to build the index file first, and then align with the query genome sequence.),操作上last使用起来更加简单,参数选择较少,目前还不知道两者结果的异同(服务器正在运行,结果出来更新)。lastz和blastz的不同
lastz数据预处理:

for i in `cat 00.initial_data/reflect.txt | cut -f 1`
do
    echo "faToTwoBit 00.initial_data/$i.genome.fasta 00.initial_data/$i.genome.2bit"
done > fa2bit.list
ParaFly -c fa2bit.list -CPU 19

for i in `cat 00.initial_data/reflect.txt | cut -f 1`
do
    echo "twoBitInfo 00.initial_data/$i.genome.2bit stdout | sort -k2rn > $i.chrom.sizes"
done > chrom.sizes.list
ParaFly -c chrom.sizes.list -CPU 19
for i in `cat 00.initial_data/reflect.txt | cut -f 1`
do
    mkdir ${i}PartList
done
for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    echo "/opt/biosoft/userApps/kent/src/hg/utils/automation/partitionSequence.pl 10000000 0 00.initial_data/$i.genome.2bit $i.chrom.sizes 1 -lstDir ${i}PartList > $i.part.list"
done > query_partitionSequence.list
ParaFly -c query_partitionSequence.list -CPU 18
/opt/biosoft/userApps/kent/src/hg/utils/automation/partitionSequence.pl 20000000 10000 00.initial_data/darer.genome.2bit darer.chrom.sizes 1 -lstDir darerPartList > darer.part.list
grep -v PartList darer.part.list > darer.list
for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    echo "grep -v PartList $i.part.list > $i.list"
done > 1111.lsit
ParaFly -c 1111.lsit -CPU 18
for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    echo "cat ${i}PartList/*.lst >> $i.list"
done > cat_Part.list
ParaFly -c cat_Part.list -CPU 18
/opt/biosoft/userApps/kent/src/hg/utils/automation/constructLiftFile.pl darer.chrom.sizes darer.list > darer.lift
for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    echo "/opt/biosoft/userApps/kent/src/hg/utils/automation/constructLiftFile.pl $i.chrom.sizes $i.list > $i.lift"
done > constructLiftFile.list
ParaFly -c constructLiftFile.list -CPU 18
for i in `cat 00.initial_data/reflect.txt | cut -f 1`
do
    mkdir $i
    for x in `cat $i.list`
    do
        y=${x/*2bit:/}
        echo "twoBitToFa $x $i/$y.fa"
    done >> twoBitToFa.list
done
#去除长度小于1000bp的序列
for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    for x in $i/*fa
    do
        y=${x/*-/}
        k=${y/.fa/}
        if [ $k -le 1000 ]
        then
            rm $x
        fi

    done
done

ParaFly -c twoBitToFa.list -CPU 80
for i in darer/*fa
do
    for x in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
    do
        for y in $x/*fa
        do
            echo "lastz $i $y --strand=both --seed=12of19 --notransition --chain --gapped --gap=400,30 --hspthresh=3000 --gappedthresh=3000 --inner=2000 --masking=50 --ydrop=9400 --scores=/opt/biosoft/GenomeAlignmentTools/HoxD55.q --format=axt > ${x}_axt/$i.$y.axt"
        done >> lastz_all.list
    done
done

#lastz的运行速度太慢,若分析的物种太多没有超算不推荐使用

第三步:Chaining,将相邻的block连接起来,打分矩阵和blastz相同,gap打分改变

for i in `cat 00.initial_data/reflect_no_darer.txt | cut -f 1`
do
    echo "axtChain -linearGap=loose -psl $i.psl darer.genome.2bit $i.genome.2bit $i.Todarer.chain"
done > chain_axtChain.list
ParaFly -c chain_axtChain.list -CPU 18

第四步:Netting:chainNet,对target序列确定最优比对序列。
1.首先将所有的染色体或scaffold的碱基标记未用的。
2.按打分由高到低排列,形成列表。
3.迭代:每次从列表中取出一个chain,扔掉与已经存在的chain有overlap的区域,余下的部分添加上去,如果和之前的chain有gap,标记成子集,通过这种方式形成的层级称为net。记录overlap的区域,用于下一步识别重复。

chainMergeSort $output_dir/3.chain/*.chain > $output_dir/4.prenet/all.chain
chainPreNet $output_dir/4.prenet/all.chain $output_dir/$tn.sizes $output_dir/$qn.sizes $output_dir/4.prenet/all_sort.chain
chainNet $output_dir/4.prenet/all_sort.chain $output_dir/$tn.sizes $output_dir/$qn.sizes $output_dir/5.net/temp.tn $output_dir/5.net/temp.qn
netSyntenic $output_dir/5.net/temp.tn $output_dir/5.net/$tn.net
netSyntenic $output_dir/5.net/temp.qn $output_dir/5.net/$qn.net

第五步:Mafing

netToAxt $output_dir/5.net/$tn.net $output_dir/4.prenet/all_sort.chain $output_dir/$tn.2bit $output_dir/$qn.2bit $output_dir/6.net_to_axt/all.axt
axtSort $output_dir/6.net_to_axt/all.axt $output_dir/6.net_to_axt/all_sort.axt
axtToMaf -tPrefix=$tn -qPrefix=$qn $output_dir/6.net_to_axt/all_sort.axt $output_dir/$tn.sizes $output_dir/$qn.sizes $output_dir/7.maf/all.maf

第六步:Combine multiple pairwise results:

roast + E=darer tree.txt ./*Final.maf all.roast.maf

PSMC分析流程

bowtie2-build ../genome.fasta genome
bowtie2 -x genome -p 80 -1 reads.1.fastq -2 reads.2.fastq -S bowtie2.sam
samtools sort -o bowtie2_sort.bam -O BAM -@ 40 -m 4G bowtie2.sam
/opt/biosoft/samtools-0.1.18/samtools mpileup -C50 -uf ../genome.fasta bowtie2_sort.bam > gc_psmc.bcf
/opt/biosoft/samtools-0.1.18/bcftools/bcftools view -c gc_psmc.bcf > Pb_2G.vcf
vcfutils.pl vcf2fq -d 10 -D 100 Pb_2G.vcf | gzip > diploid.fq.gz
/opt/biosoft/psmc-master/utils/fq2psmcfa -q20 diploid.fq.gz > diploid.psmcfa
/opt/biosoft/psmc-master/utils/splitfa diploid.psmcfa > split.psmcfa
/opt/biosoft/psmc-master/psmc -N25 -t15 -r5 -p "4+25*2+4+6" -o diploid.psmc diploid.psmcfa
seq 100 | xargs -i echo /opt/biosoft/psmc-master/psmc -N25 -t15 -r5 -b -p "4+25*2+4+6" -o round-{}.psmc split.fa | sh
cat diploid.psmc round-*.psmc > combined.psmc
/opt/biosoft/psmc-master/utils/psmc_plot.pl -g x -u y combined combined.psmc
# x为该物种繁殖一代的时间,比如人的默认为25年,该处值为-g 25.
# y为该物种碱基替换率,可由进化树的枝长除以r8s评估该物种的分歧时间得到

使用AdmixTools做D-statistics

安装软件和缺少的库文件

git clone https://github.com/DReichLab/AdmixTools.git
cd AdmixTools/src
make clobber
make all
#如果报错/usr/bin/ld: cannot find -lopenblas说明缺少libopenblas库文件
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
make
make PREFIX=/path/to/your/installation install
cd /usr/lib/
ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.a ./libopenblas.a
ln -s /opt/biosoft/OpenBLAS/lib/libopenblas_nehalemp-r0.3.10.dev.so ./libopenblas.so
cd /opt/biosoft/AdmixTools/src/
make clean
make all && make install

根据gff文件统计exon、intron长度分布图

下载需要的脚本和安装Python模块

wget https://github.com/irusri/Extract-intron-from-gff3/archive/master.zip
unzip master.zip
rm master.zip && cd Extract-intron-from-gff3-master/scripts/
sudo chmod 755 *
pip install misopy
pip install gffutils

获取exon、intron的gff文件并提取DNA序列

python /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_intron_gff3_from_gff3.py out.gff3 out_intron.gff3
##结果文件out_intron.gff3_introns.gff3
awk '/intron\t/{print}' out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n   > processed_intron.gff3
awk '/exon\t/{print}' out_intron.gff3_introns.gff3 | sort -k 1,1 -k4,2n   > processed_exon.gff3
perl /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_seq_from_gff3.pl -d out.tmp/genome.fasta - processed_intron.gff3 > output_intron.fa
perl /opt/biosoft/Extract-intron-from-gff3-master/scripts/extract_seq_from_gff3.pl -d out.tmp/genome.fasta - processed_exon.gff3 > output_exon.fa
##对序列重命名
genome_seq_clear.pl output_intron.fa --seq_prefix intron --min_length 1 --line_length -1 > output_intron_rename.fa
genome_seq_clear.pl output_exon.fa --seq_prefix exon --min_length 1 --line_length -1 > output_exon_rename.fa
##获得每条序列的长度信息
samtools faidx output_intron_rename.fa
cut -f 1,2 output_intron_rename.fa.fai > output_intron_length.txt
samtools faidx output_exon_rename.fa
cut -f 1,2 output_exon_rename.fa.fai > output_exon_length.txt
##R画图
library(ggplot2)
library(patchwork)
data <- read.table("output_gene_length.txt",header=T,check.names=F)
p1 <- ggplot(data,aes(length))+ geom_line(stat="density",color="red")+theme_classic()+theme(axis.title = element_text(size=24),axis.text = element_text(size = 22,color = "black"))
p1

使用Last比对基因组DNA序列

LAST can:

Handle big sequence data, e.g:
Compare two vertebrate genomes
Align billions of DNA reads to a genome
Indicate the reliability of each aligned column.
Use sequence quality data properly.
Compare DNA to proteins, with frameshifts.
Compare PSSMs to sequences
Calculate the likelihood of chance similarities between random sequences.
Do split and spliced alignment.
Train alignment parameters for unusual kinds of sequence (e.g. nanopore).

安装

wget http://last.cbrc.jp/last-1080.zip
unzip last-1080.zip
cd last-1080
make && make install
cd .. && rm last-1080.zip

使用流程:

lastdb -P0 -uMAM4 -R01 darer-MAM4 ../01.proteomics/darer.genome.fasta
#-P 设置线程数,0调用服务器所有线程
#-u 该参数的选择是last比对非常关键的一步,常用参数:
##MAM8:This DNA seeding scheme finds weak similarities with high sensitivity, but low speed and high memory usage (e.g. ~50 GB for mammal genomes).
##MAM4:This DNA seeding scheme is like MAM8, but a bit less sensitive, and uses about half as much memory.
##NEAR:This DNA seeding scheme is good for finding short-and-strong (near-identical) similarities. It is also good for similarities with many gaps (insertions and deletions), because it can find the short matches between the gaps. (Long-and-weak seeding schemes allow for mismatches but not gaps.) 
##YASS:This DNA seeding scheme is good for finding long-and-weak similarities. It is a good compromise for both protein-coding and non protein-coding DNA
#-R01 tells it to mark simple sequences (such as cacacacacacacacaca) by lowercase, but not suppress them.
for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
do
    echo "last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 darer-MAM4 ../01.proteomics/$i.genome.fasta > $i.mat"
done > last_all_pairs_mat.list
ParaFly -c last_all_pairs_mat.list -CPU 25

for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
do
    echo "lastal -P0 -m100 -E0.05 -C2 -p $i.mat darer-MAM4 ../01.proteomics/$i.genome.fasta | last-split -m1 > $i.maf"
done > last_all_pairs_maf.list
ParaFly -c last_all_pairs_maf.list -CPU 25

maf-swap datra.maf | awk '/^s/ {$2 = (++s % 2 ? "datra." : "darer.") $2} 1' | last-split -m1 | maf-swap > datra-2.maf
last-postmask datra-2.maf | maf-convert -n tab | awk -F'=' '$2 <= 1e-5' > datra.tab
#lastdb -P0 -uNEAR -cR11 darer.fa.db ../01.proteomics/darer.genome.fasta
#for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
#do
#    echo "lastal -P20 -m100 -E0.05 darer.fa.db ../01.proteomics/$i.genome.fasta | last-split -m1 > $i.maf"
#done > last_all_pairs_maf.list
#ParaFly -c last_all_pairs_maf.list -CPU 8
#for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
#do
#    echo "maf-swap $i.maf | last-split | maf-swap | last-split | maf-sort > $i.LAST.maf"
#done > last_maf_swap.list
#ParaFly -c last_maf_swap.list -CPU 25
#for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
#do
#    echo "maf-convert psl $i.LAST.maf > $i.psl"
#done > last_maf_convert.list
#ParaFly -c last_maf_convert.list -CPU 25
#for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
#do
#    echo "perl maf.rename.species.S.pl $i.LAST.maf darer $i $i.Final.maf > $i.stat"
#done > last_maf_rename.list
#ParaFly -c last_maf_rename.list -CPU 25

#for i in `cat ../00.initial_data/reflect.txt | cut -f 1`
#do
#    echo "cp $i.Final.maf darer.$i.sing.maf"
#done > cp.lsit
#ParaFly -c lcp.list -CPU 25

roast T=/home/wuchangsong/gc_genome/19.paml/i.LAST/ E=darer "tree topology" ./*Final.maf all.roast.maf
#Then the output file example.roast.maf will contain the orthologous multiple alignment.

参考链接:https://github.com/mcfrith/last-genome-alignments

PHAST安装使用

PHAST:Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package consisting of a collection of command-line programs and supporting libraries for comparative and evolutionary genomics. Best known as the search engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes several tools for phylogenetic modeling, functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations.

Major PHAST programs:

phastCons: Conservation scoring and identification of conserved elements
phyloFit: Fitting of phylogenetic models to aligned DNA sequences
phyloP: Computation of p-values for conservation or acceleration, either lineage-specific or across all branches
phastOdds: Log-odds scoring for phylogenetic models or phylo-HMMs
exoniphy: Phylogenetic exon prediction
dless: Prediction of elements under lineage-specific selection
prequel: Probabilistic reconstruction of ancestral sequences
phastBias: Identification of GC-biased gene conversion using a phylo-HMM

安装PHAST:

wget -c http://www.netlib.org/clapack/clapack.tgz
tar -xvzf clapack.tgz
cd CLAPACK-3.2.1
cp make.inc.example make.inc && make f2clib && make blaslib && make lib
wget -c http://compgen.cshl.edu/phast/downloads/phast.v1_5.tgz
tar zxf phast.v1_5.tgz
cd phast/src/
make CLAPACKPATH=/opt/biosoft/CLAPACK-3.2.1
echo 'PATH=$PATH:/opt/biosoft/phast/bin' >> ~/.bashrc
source ~/.bashrc