BS-seq分析流程(一)

短序列的质控都可以使用trimmomatic,这里不多做介绍,得到的clean data可做下面分析
一、比对和甲基化位点提取(Bismark)
Bismark安装及使用

git clone https://github.com/FelixKrueger/Bismark.git
#conda install -c bioconda bismark
#Genome Preparation
/opt/biosoft/Bismark-0.23.0/bismark_genome_preparation --parallel 40 --verbose ./
#bismark alignment
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/bismark --parallel 40 --genome ./ --phred33-quals -1 ../$i.1.fastq -2 ../$i.2.fastq"
done > bismark.list
ParaFly -c bismark.list -CPU 4
#deduplicate 
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/deduplicate_bismark --bam $i.1_bismark_bt2_pe.bam"
done > deduplicate_bismark.list
ParaFly -c deduplicate_bismark.list -CPU 9
#bismark_methylation_extractor提取甲基化信息
for i in `ls *deduplicated.bam`
do
    echo "/opt/biosoft/Bismark-0.23.0/bismark_methylation_extractor $i -p --genome ./ --gzip --bedGraph --cytosine_report --CX --buffer_size 20G --output ./"
done > methylation_extractor.list
ParaFly -c methylation_extractor.list -CPU 9
/opt/biosoft/Bismark-0.23.0/bismark2report
/opt/biosoft/Bismark-0.23.0/bismark2summary
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/coverage2cytosine $i.1_bismark_bt2_pe.deduplicated.bismark.cov.gz --merge_CpG --genome ./ -o $i.CpG.output"
done > coverage2cytosine.list
ParaFly -c coverage2cytosine.list -CPU 9
/opt/biosoft/Bismark-0.23.0/bam2nuc --genome_folder ./ --genomic_composition_only
for i in `cat ../samples.txt`
do
    echo "/opt/biosoft/Bismark-0.23.0/bam2nuc --genome_folder ./ $i.1_bismark_bt2_pe.deduplicated.bam"
done > bam2nuc.list
ParaFly -c bam2nuc.list -CPU 9



/opt/biosoft/Bismark-0.23.0/bismark2bedGraph -o buffer_CpG.bedGraph --buffer 5G CpG_*

ATAC-seq分析流程(一)

首先根据今年发表在Genome Biology上一篇综述总览分析流程:

作者建议的预处理分析流程为:FastQC-trimmomatic-BWA-MEM-ATACseqQC。
一、使用Fastqc对fastq 文件的测序质量统计

for i in `ls *fastq`; do echo "fastqc -t 4 -o ./ $i"; done > fastqc.sh
nohup ParaFly -c fastqc.sh -CPU 18 &

二、Trimmomatic质控(可使用 cutadapt,AdapterRemoval v2,Skewer 和 trimmomatic 等软件)
The forward and reverse adapters are slightly different. We will also trim low quality bases at the ends of the reads (quality less than 20). We will only keep reads that are at least 20 bases long. We remove short reads (< 20bp) as they are not useful, they will either be thrown out by the mapping or may interfere with our results at the end.

mkdir 1.trimmomatic
cd 1.trimmomatic/
for i in `cat ../samples.txt`
do
    echo "java -jar /opt/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar PE -threads 20 ../$i.1.fastq ../$i.2.fastq $i.1.fastq $i.1.unpaired.fastq $i.2.fastq $i.2.unpaired.fastq ILLUMINACLIP:/opt/biosoft/Trimmomatic-0.38/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20 TOPHRED33 2> $i.trimmomatic.log"
done > command.trimmomatic.list
ParaFly -c command.trimmomatic.list -CPU 9
cd ..

三、Mapping Reads to Reference Genome

mkdir 2.bowtie2 && cd 2.bowtie2
bowtie2-build --threads 160 genome.fasta genome
for i in `cat ../samples.txt`
do
    echo "bowtie2 -x genome -1 ../1.trimmomatic/$i.1.fastq -2 ../1.trimmomatic/$i.2.fastq -p 160 -I 0 -X 1000 --very-sensitive -S $i.bowtie2.sam 2> $i.bowtie2.log"
done > bowtie2.list
ParaFly -c bowtie2.list -CPU 9
for i in `cat ../samples.txt`
do
    echo "samtools sort -@ 40 -m 8G -o $i.bowtie2.bam -O BAM $i.bowtie2.sam"
done > sam2bam.list
ParaFly -c sam2bam.list -CPU 9
cd ..

四、Filtering Mapped Reads
1、Filter Duplicate Reads
去除重复可选择samtools、picard和sambamba,其中sambamba的操作速度最快,推荐使用,安装推荐使用conda

sambamba markdup -r --overflow-list-size 600000  --tmpdir='./' ../2.bowtie2/$i.bowtie2.bam $i.sambamba.rmdup.bam

–overflow-list-size=OVERFLOWLISTSIZE
size of the overflow list where reads, thrown away from the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created
-r, –remove-duplicates
remove duplicates instead of just marking them
-t, –nthreads=NTHREADS
number of threads to use
BUGS
External sort is not implemented. Thus, memory consumption grows by 2Gb per each 100M reads. Check that you have enough RAM before running the tool.

2、Filter Uninformative Reads

samtools view -h -f 2 -q 30 $i.sambamba.rmdup.bam |grep -v pilon373| samtools sort -O bam -@ 40 -o - > $i.last.bam
samtools index $i.last.bam
samtools index $i.sambamba.rmdup.bam
samtools flagstat $i.sambamba.rmdup.bam > $i.sambamba.rmdup.stat
samtools flagstat $i.last.bam > $i.last.stat
bedtools bamtobed -i $i.last.bam > $i.bed

3、Check Insert Sizes

for i in `cat ../samples.txt`
do
    echo "samtools view $i.last.bam |awk '{print $9}'  > $i.last_length.txt"
done > last_bam_length.list
ParaFly -c last_bam_length.list -CPU 9

数据可导入R绘图
五、Peak calling(详细参数参考https://www.jianshu.com/p/21e8c51fca23

macs2 callpeak -t ../3.filtering/$i.bed -g 8.9e8 --nomodel --shift -100 --extsize 200 -n $i.peaks --outdir ./