Python package for prepping 454 data for use with AmpliconNoise (Quince et al BMC Bioinformatics 2011, Quince et al Nature Methods 2009):
raw.sff -> anoisetools -> Processed Data
The source for AmpliconNoise is also included.
For flowgram data, we target the original .sff
files.
Ensure that your computer meets the minimum requirements. Currently, just Python 2.7, plus the requirements of AmpliconNoise.
Install BioPython if you don't have it. Note that numpy is not required for this project. If you're not planning to use BioPython, you can answer "No" when the BioPython installer prompts you about numpy.
Download and install:
curl -L https://github.com/fhcrc/ampliconnoise/tarball/master | tar xjf - cd fhcrc-ampliconnoise-* python2.7 setup.py install # may require sudo
See Installing Python Modules for more information and options.
Build the AmpliconNoise binaries, and ensure they're present in your
path
Running setup.py installs the anoisetools
package, mostly accessible from
the anoise
script.
anoise
is called with a subcommand:
anoise [subcommand]
Help can be accessed via anoise -h
or anoise <subcommand> -h
.
For our analyses, initial preprocessing is two steps:
- Split the original
.sff
file into one.sff
per barcoded sample - Process each sample using wrappers for PyroNoise and SeqNoise
To split an .sff
, use anoise split
, providing a file with comma-delimited
base_path_for_output,barcode,primer
records, e.g.:
sample1/sample1,ATAG,TAAATGGCAGTCTAGCAGAARAAG
will fill ./sample1/sample1.sff
, with all sequences starting with
ATAG
, followed by TAAATGGCAGTCTAGCAGAARAAG
.
Degenerate primers should be specified as such.
If the barcode map is named barcodes.csv
, and the full SFF is G0YK51K01.sff
,
one would call [1]:
anoise split barcodes.csv G0YK51K01.sff
For each sample in our analyses, we follow a process along the lines of:
#!/bin/sh MPIARGS="-np 12" TMP_DIR="." # Run PyroNoise # This cleans flowgrams prior anoise pyronoise \ --mpi-args "$MPIARGS" \ --temp-dir $TMP_DIR \ sample1.sff anoise truncate "{barcode}" 400 < sample1-pnoise_cd.fa > sample1-pnoise_trunc.fa # Run SeqNoise anoise seqnoise \ --mpi-args "$MPIARGS" \ --stub sample1 \ --temp-dir $TMP_DIR \ sample1-pnoise_trunc.fa \ sample1-pnoise.mapping
Both pyronoise
and seqnoise
create a temporary direcory for processing.
If running MPI jobs spanning multiple nodes, be sure to set --temp-dir
to a
location accessible from all.
[1] | Note: the split step creates a child process for each sample. |