Author: Wuchangsong

clusterProfiler富集分析

BiocManager::install(‘clusterProfiler’) BiocManager::install(‘org.Hs.eg.db’) BiocManager::install(‘DOSE’) library(clusterProfiler) library(org.Hs.eg.db) library(DOSE) entrezID <- read.table(“11.xls”,header=F,sep=”\t”) entrezID <- entrezID$V1 BP <- enrichGO(entrezID,”org.Hs.eg.db”,ont=”BP”,keyType = “ENSEMBL”,pAdjustMethod = “BH”,pvalueCutoff = 0.05,qvalueCutoff = 0.1,readable = T) dotplot(BP, x = “GeneRatio”, color = “p.adjust”, showCategory = 20, size = NULL, split = NULL, font.size = 12, title=”Dotplot for Gene Ontology Analysis”) write.table(BP, ‘go_tmp.txt’, sep = ‘\t’, row.names…

2020年1月14日
Anvi’o 安装

Dependencies DIAMOND or NCBI’s blastp for search. MCL for clustering. muscle for alignment. easy install through conda: wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod 777 Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh #在询问是否将conda加入环境变量的时候选择no cd miniconda3/bin/ chmod 777 activate . ./activate #添加频道 conda config –env –add channels conda-forge conda config –env –add channels bioconda conda create -n anvio-6 python=3.6 conda activate anvio-6…

2019年11月10日
MLST细菌分型

多位点序列分型（multilocus sequence typing,MLST）是一种基于核酸序列测定的细菌分型方法。这种方法通过PCR扩增多个管家基因内部片段并测定其序列，分析菌株的变异。依赖conda安装 conda install -c conda-forge -c bioconda -c defaults mlst mlst contigs.fa contigs.fa neisseria 11149 abcZ(672) adk(3) aroE(4) fumC(3) gdh(8) pdhC(4) pgm(6) mlst genomes/* genomes/6008.fna saureus 239 arcc(2) aroe(3) glpf(1) gmk_(1) pta_(4) tpi_(4) yqil(3) genomes/strep.fasta.gz ssuis 1 aroA(1) cpn60(1) dpr(1) gki(1) mutS(1) recA(1) thrA(1) genomes/NC_002973.gbk lmonocytogenes 1 abcZ(3) bglA(1) cat(1) dapE(1) dat(3)…

2019年11月5日
ARDB和VFDB数据库比对后筛选

python .py -i db.fasta -I blastresult.tab -o selected.txt -O filtered.txt blastresult.tab是根据P值sort后的文件 from __future__ import division import re import sys, getopt import operator from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Alphabet import generic_nucleotide opts, args = getopt.getopt(sys.argv[1:], “hI:i:o:O:”) input_info1= “” input_info2= “” out_file1 = “” out_file2 = “” for op,…

2019年11月4日
ARDB注释抗药基因

wget -c ftp://ftp.cbcb.umd.edu/pub/data/ARDB/ARDBflatFiles.tar.gz tar -zxvf ARDBflatFiles.tar.gz wget -c ftp://ftp.cbcb.umd.edu/pub/data/ARDB/ardbAnno1.0.tar.gz tar -zxvf ardbAnno1.0.tar.gz makeblastdb -in resisGenes.pfasta -dbtype prot -out ARDB vim genomeList.tab #目的蛋白序列路径 perl ardbAnno.pl

2019年11月4日
Prodigal注释原核基因组

下载地址：https://github.com/hyattpd/prodigal/releases/ 下载源码包：Prodigal-2.6.3.tar.gz tar -zxvf Prodigal-2.6.3.tar.gz cd Prodigal-2.6.3 make install #添加环境变量 prodigal -a UBA705.pep -d UBA705.cds -f gff -g 11 -o UBA705.gff -p single -s UBA705.stat -i UBA705.fasta > prodigal.log -a 是输出氨基酸文件 -c 不允许基因一边断开，也就是要求完整的orf，有起始和终止结构 -d 输出预测基因的序列文件 -f 选择输出文件格式，有gbk,gff,和sco格式可供选择 -g 指定密码子，原核为第11套 -i 输入文件，即需要预测的基因组序列文件 -m 屏蔽基因组中的N碱基 -o 输出文件，默认为屏幕输出 -p 选择方式，是单菌还是meta样品 -q 不输出错误信息到屏幕 -t 指定训练集 -s 输出所有潜在基因以及分值到一个文件中

2019年10月25日
python 3.7安装

wget https://www.python.org/ftp/python/3.7.2/Python-3.7.2.tar.xz tar -xvf Python-3.7.2.tar.xz cd Python-3.7.2 ./configure –enable-optimizations make altinstall

2019年10月24日
Cellranger使用教程

建库，人和小鼠的数据库可以直接下载，对于无法直接下载的需要自行下载全基因组序列和gtf文件，根据 cellranger mkref构建参考数据库 wget ftp://ftp.ensembl.org/pub/release-97/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz gunzip Danio_rerio.GRCz11.dna.primary_assembly.fa.gz wget ftp://ftp.ensembl.org/pub/release-97/gtf/danio_rerio/Danio_rerio.GRCz11.97.gtf.gz gunzip Danio_rerio.GRCz11.97.gtf.gz cellranger mkgtf Danio_rerio.GRCz11.97.gtf Danio_rerio.GRCz11.97.filtered.gtf –attribute=gene_biotype:protein_coding \ –attribute=gene_biotype:lincRNA \ –attribute=gene_biotype:antisense \ –attribute=gene_biotype:IG_LV_gene \ –attribute=gene_biotype:IG_V_gene \ –attribute=gene_biotype:IG_V_pseudogene \ –attribute=gene_biotype:IG_D_gene \ –attribute=gene_biotype:IG_J_gene \ –attribute=gene_biotype:IG_J_pseudogene \ –attribute=gene_biotype:IG_C_gene \ –attribute=gene_biotype:IG_C_pseudogene \ –attribute=gene_biotype:TR_V_gene \ –attribute=gene_biotype:TR_V_pseudogene \ –attribute=gene_biotype:TR_D_gene \ –attribute=gene_biotype:TR_J_gene \ –attribute=gene_biotype:TR_J_pseudogene \ –attribute=gene_biotype:TR_C_gene cellranger mkref –nthreads=80 –genome=ref_zebr_GRCz11 –fasta=Danio_rerio.GRCz11.dna.primary_assembly.fa –genes=Danio_rerio.GRCz11.97.filtered.gtf –ref-version=3.1.0…

2019年8月16日
Seurat使用流程

seurat软件安装 Depends R (>= 3.4.0), methods if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“Seurat”) CentOS系统安装时要注意gcc的版本 setwd(“D:/Experiment_data/zxj/outs”) library(Seurat) pbl.data <- Read10X(data.dir = “D:/Experiment_data/zxj/outs/filtered_feature_bc_matrix”) dim(pbl.data) #查看行和列 #创建 Seurat 对象与数据过滤。保留在>=3 个细胞中表达的基因；保留能检测到>=200 个基因的细胞。 pbl <- CreateSeuratObject(counts = pbl.data, project = “pbl1907”, min.cells = 3, min.features = 200) #mt-开头的为线粒体基因，这里将其进行标记并统计其分布频率 pbl[[“percent.mt”]] <- PercentageFeatureSet(pbl, pattern = “^mt-“) # 对 pbmc 对象做小提琴图，分别为基因数，细胞数和线粒体占比 VlnPlot(object =…

2019年8月6日
CentOS 6.9 安装R-3.6.1

根据configure报错下载bzip2、curl、PCRE、xz-lzma、zlib对应的版本如果是64位的系统，安装bzip2时修改Makefile文件，如下： CC=gcc -fPIC AR=ar RANLIB=ranlib LDFLAGS= BIGFILES=-D_FILE_OFFSET_BITS=64 CFLAGS=-fPIC -Wall -Winline -O2 -g $(BIGFILES) 安装好上边的模块后设置环境变量 export PATH=/home/wuchangsong/packages/bin:$PATH export LD_LIBRARY_PATH=/home/wuchangsong/packages/lib:$LD_LIBRARY_PATH export CFLAGS=”-I/home/wuchangsong/packages/include” export LDFLAGS=”-L/home/wuchangsong/packages/lib” 根据报错做如下操作 sudo yum install texinfo sudo yum install texlive unzip inconsolata.zip cp -Rfp inconsolata/* /usr/share/texmf/ sudo mktexlsr ./configure –prefix=/opt/sysoft/R-3.6.1 –enable-R-shlib –with-readline=yes –with-libpng=yes –with-x=no make -j 80 make install

2019年8月6日