Physlr physical-map
constructs a de novo physical map using linked reads from 10X Genomics or MGI stLFR. This physical map can then be used for various genomics analyses, including scaffolding. Physlr scaffolds
uses the physical map generated in the first stage to scaffold an existing genome assembly to yield chromosome-level contiguity.
You can install Physlr either via Conda or by compiling from source. We recommend installing Physlr via Conda package manager (Linux, MacOS), which will handle compilation and dependencies automatically.
In an active conda
environment:
conda install -c bioconda physlr
physlr help
Physlr can generate complmentary reports (included in the pipeline by default) - you can install dependencies for these optional features using conda:
conda install -c r r-rmarkdown
conda install -c r r-essentials
conda install -c conda-forge r-ggplot2
We recommend using pypy3
over regular python3 for speed.
pypy v3 or pypy3
is the default python executable for Physlr. To switch to other executables set the python_executable
argument:
physlr [OPTION]... python_executable=python3
You can install pypy3 using conda:
conda install -c conda-forge pypy3.8 # Change specified version based on your conda environment's python version (3.6 to 3.9 are supported)
Compile Physlr using the following commands:
pip3 install --user git+https://github.com/bcgsc/physlr
git clone https://github.com/bcgsc/physlr
cd physlr/src && make install
or, to install Physlr in a specified directory (like /opt/physlr
):
pip3 install --user git+https://github.com/bcgsc/physlr
git clone https://github.com/bcgsc/physlr
cd physlr/src && make install PREFIX=/opt/physlr
after compiling, Physlr commands will be available through:
bin/physlr-make
bin/physlr-make help
- ntCard
- ntHits
- btllib
- GCC 5 or newer with OpenMP and boost
- Python 3.5 or newer and the following packages
There are additional functions in Physlr (especially the python version) for developers to generate more granular reports. The dependencies of these functions are listed below:
- pygraphviz for graph visualization purposes
To construct a physical map de novo, you need linked reads (from 10X Genomics or MGI stLFR).
In this example, the linked reads dataset is called linkedreads.fq.gz
. The linked reads are from stLFR so we specify protocol=stlfr
to use the default value for stLFR reads.
cd experiment # Change to working directory
physlr physical-map lr=linkedreads protocol=stlfr # Constructs the physical map
You also have the option to provide a reference genome (with ref
) for Physlr to evaluate the physical map. Assuming the reference is called reference.fa
, you can run the following command for the previous example:
cd experiment
physlr physical-map lr=linkedreads ref=reference protocol=stlfr # Constructs the physical map and reference-based evaluations for it
If you provide a reference genome, Physlr first constructs a physical map and then maps it to the input reference. In this case, Physlr automatically outputs a *.map-quality.tsv
file reporting assembly-like quality metrics for the physical map. In addition, Physlr visualizes the correctness and contiguity of the physical map.
You can also independently run the physical map construction and evaluation steps:
cd experiment
physlr physical-map lr=linkedreads protocol=stlfr
physlr map-quality lr=linkedreads ref=reference
To scaffold a draft assembly, you need linked reads from 10X Genomics or stLFR, and an existing assembly.
In this example, the linked reads and draft assembly are called linkedreads.fq.gz
and draft.fa
, respectively. The linked reads are from 10X Genomics so we specify protocol=10x
to use the default value for 10X Genomics reads.
cd experiment
bin/physlr-make scaffolds lr=linkedreads draft=draft protocol=10x
You can also include a reference genome ('reference.fa' in this example) for Physlr to calculate Quast summary metrics for the Physlr scaffolded assembly:
cd experiment
bin/physlr-make scaffolds lr=linkedreads ref=reference draft=draft protocol=10x
See the help page for further information.
bin/physlr-make help
lr.physlr.physical-map.path
: Paths of barcodes (backbones).lr.physlr.physical-map.ref.n10.paf.gz.*.pdf
: Various graphs showing the contiguity and correctness of the backbones with respect to the reference.draft.physlr.fa
: Physlr scaffolded assembly using the physical map.draft.physlr.quast.tsv
: Quast metrics comparing the Physlr scaffolded assembly against the reference.
If you use Physlr in your research, please cite:
Afshinfard A, Jackman SD, Wong J, Coombe L, Chu J, Nikolic V, Dilek G, Malkoç Y, Warren RL, Birol I. Physlr: Next-Generation Physical Maps. DNA. 2022 Jun 10;2(2):116-30. doi: https://doi.org/10.3390/dna2020009
This projects uses:
- btl_bloomfilter BTL C/C++ Common bloom filters for bioinformatics projects implemented by Justin Chu
- nthash rolling hash implementation by Hamid Mohamadi
- readfq Fast multi-line FASTA/Q reader API implemented by Heng Li
- robin-map C++ implementation of a fast hash map and hash set using robin hood hashing by Thibaut G.