Skip to content

The pipeline for NORM-Seq in tangfuchou lab for Epigenetic analysis.

Notifications You must be signed in to change notification settings

huboqiang/NOMeSeq

Repository files navigation

A pipeline which could processing from raw fastq reads to result of NORM-seq for both DNA methylation(WCG) and the accessibility of DNA (GCH).

First, before this pipeline in a server, make sure the required modules were installed. If not, running the following scripts for deploying.

Remember, DO USE the right version of the software listed below, or some bugs would be introduced.

mkdir install_packages

### install python anaconda 2.2.0
cd software/
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.2.0-Linux-x86_64.sh
bash Anaconda-2.2.0-Linux-x86_64.sh  # prefix=/path/for/anaconda
mv Anaconda-2.2.0-Linux-x86_64.sh install_packages

### install samtools 0.1.18
### using old version because the latest one could have somewhat trouble with
### other software like tophat.
wget http://sourceforge.net/projects/samtools/files/samtools/0.1.18/samtools-0.1.18.tar.bz2
tar -jxvf samtools-0.1.18.tar.bz2
cd samtools-0.1.18
make
cd ..
mv samtools-0.1.18.tar.bz2 install_packages


### install bowtie1 1.0.0
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.0.0/bowtie-1.0.0-linux-x86_64.zip
unzip bowtie-1.0.0-linux-x86_64.zip
mv bowtie-1.0.0-linux-x86_64.zip install_packages

### install trim_galore
wget http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_v0.4.1.zip
unzip trim_galore_v0.4.1.zip
mv trim_galore_v0.4.1.zip install_packages

### install bismark 0.7.6
wget http://www.bioinformatics.babraham.ac.uk/projects/bismark/bismark_v0.7.6.tar.gz
tar -zxvf bismark_v0.7.6.tar.gz
mv bismark_v0.7.6.tar.gz install_packages

### install bedtools 2.24.0
wget https://github.com/arq5x/bedtools2/releases/download/v2.24.0/bedtools-2.24.0.tar.gz
tar -zxvf bedtools-2.24.0.tar.gz
cd bedtools2
make
cd ..
mv bedtools-2.24.0.tar.gz install_packages

### install homer
# please Follow http://homer.salk.edu/homer/

### install HTSeq
pip install HTSeq

###install tabix and pytabix
wget http://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2
tar -jxvf tabix-0.2.6.tar.bz2
cd tabix-0.2.6
make
cd ..
mv tabix-0.2.6.tar.bz2 install_packages
pip install pytabix

After that, download this script:

cd $PYTHONPATH  # path for put the python packages. path/to/anaconda/lib/python2.7/site-packages/ for default
git clone https://github.com/hubqoaing/NormSeq

Secondly, go to the ./setting file, and change the following values to your own path:

self.Database       = "DIR/TO/DATABASE"          #line 43
self.sftw_py        = "DIR/TO/SOFTWARE_EXE_FILE" #line 65
self.sftw_pl        = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bgzip     = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_tabix     = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_ucsc_dir  = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_igvtools  = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_homer     = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_trim      = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bismark   = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bowtie_dir= "DIR/TO/SOFTWARE_EXE_FILE"

Go to the analysis dictionary and copy the bin file here.

cd PATH/FOR/ANALYSIS   # go to
copy $PYTHONPATH/NormSeq/run_meth.py ./

Next, make the input files. You can download these files in UCSC or so on and then using own-scripts to merge the ERCC information, and generate files in this format.

vim sample_input.xls
==> sample_input.xls <==
sample                                  stage   type        tissue  brief_name      merge_name
Sample_PD10_TFC_150713-mES-gWBS1-1      c       5mC_scBS    c       mESC_gWBS1_1    mESC_gWBS1

Notice that only NAME_FOR_RAW_FQ were required that this NAME should be the same as 00.0.raw_fq/NAME. NAME_FOR_PROCESSING will be the name for the rest analysis's results. NAME_FOR_READING will be the name for files in statinfo. stage and sample_group could be writen as anything. It was here only for make the downstream analysis easily.

Before running this pipeline, put the fastq reads in the ./00.0.raw_data dictionary.

mkdir 00.0.raw_data
for i in `tail -n +2 sample_input.xls | awk '{print $1}`
do
    mkdir 00.0.raw_data/$i && ln -s PATH/TO/RAW_DATA/$i/*gz 00.0.raw_data/$i
done

After that, running this pipeline:

python run_meth.py --ref YOUR_REF --cutSites GCA.GCC.GCT,ACG.TCG sample_input

Wait for the results. Notice if you have to run it in a cluster, please do not running this scripts directly. For example, if SGE system used, then:

Comments this command

        my_job.running_multi(cpu=8, is_debug = self.is_debug)

and using this command in modules in ./frame/*py

       my_job.running_SGE(vf="400m", maxjob=100, is_debug = self.is_debug)

Method for submit jobs in other system were still developing.

About

The pipeline for NORM-Seq in tangfuchou lab for Epigenetic analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published