Skip to content

Commit

Permalink
Merge pull request #415 from JoseEspinosa/updates
Browse files Browse the repository at this point in the history
Get rid of oras from module images
  • Loading branch information
JoseEspinosa committed Sep 30, 2024
2 parents 686c054 + 372e393 commit f47e8ed
Show file tree
Hide file tree
Showing 13 changed files with 95 additions and 65 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.1.0dev - [date]
## v2.1.0 - [date]

### Enhancements & fixes

Expand Down Expand Up @@ -31,6 +31,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [[#347](https://github.com/nf-core/chipseq/issues/347)] - Add read group tag to bam files processed by bowtie2.
- [[PR #406](https://github.com/nf-core/chipseq/pull/406)] - Update metro map to show macs3 instead of macs2.
- [[#409](https://github.com/nf-core/chipseq/issues/409)] - Bulk modules and subworkflows update.
- [[PR #415](https://github.com/nf-core/chipseq/pull/415)] - Get rid of `oras` in modules.

### Software dependencies

Expand Down
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,17 @@ You can find numerous talks on the [nf-core events page](https://nf-co.re/events
To run on your data, prepare a tab-separated samplesheet with your input data. Please follow the [documentation on samplesheets](https://nf-co.re/chipseq/usage#samplesheet-input) for more details. An example samplesheet for running the pipeline looks as follows:

```csv
sample,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP_REP1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP1
WT_BCATENIN_IP_REP2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP2,BLA203A25_S16_L003_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
WT_BCATENIN_IP_REP3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP3
WT_INPUT_REP1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT_REP2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT_REP2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT_REP3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A25_S16_L001_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A25_S16_L002_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A25_S16_L003_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A49_S40_L001_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

Now, you can run the pipeline using:
Expand All @@ -96,8 +96,7 @@ nextflow run nf-core/chipseq --input samplesheet.csv --outdir <OUTDIR> --genome
See [usage docs](https://nf-co.re/chipseq/usage) for all of the available options when running the pipeline.

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see the [docs](https://nf-co.re/usage/configuration#custom-configuration-files) here.
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/chipseq/usage) and the [parameter documentation](https://nf-co.re/chipseq/parameters).

Expand All @@ -113,7 +112,7 @@ These scripts were originally written by Chuan Wang ([@chuan-wang](https://githu

The pipeline workflow diagram was designe by Sarah Guinchard ([@G-Sarah](https://github.com/G-Sarah)).

Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).
Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@bjlang](https://github.com/bjlang), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).

## Contributions and Support

Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/chipseq/tree/dev" target="_blank">nf-core/chipseq</a>
This report has been generated by the <a href="https://github.com/nf-core/chipseq/releases/tag/2.1.0" target="_blank">nf-core/chipseq</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/chipseq/dev/docs/output" target="_blank">documentation</a>.
<a href="https://nf-co.re/chipseq/2.1.0/docs/output" target="_blank">documentation</a>.
data_format: "yaml"

Expand Down
23 changes: 21 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,8 @@ process {
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
mode: params.publish_dir_mode,
pattern: '*.{stats,flagstat,idxstats}'
pattern: '*.{stats,flagstat,idxstats}',
enabled: params.save_align_intermeds
]
}

Expand Down Expand Up @@ -415,6 +416,24 @@ process {
]
}

withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.mLb.clN.sorted" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library" },
mode: params.publish_dir_mode,
pattern: "*.{bam,bai}"
]
}

withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.mLb.clN.sorted.bam" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
mode: params.publish_dir_mode,
pattern: "*.{stats,flagstat,idxstats}"
]
}

withName: 'PHANTOMPEAKQUALTOOLS' {
ext.args = { "--max-ppsize=500000" }
ext.args2 = { "-p=$task.cpus" }
Expand Down Expand Up @@ -553,7 +572,7 @@ process {
params.save_macs_pileup ? '--bdg --SPMR' : '',
params.macs_fdr ? "--qvalue ${params.macs_fdr}" : '',
params.macs_pvalue ? "--pvalue ${params.macs_pvalue}" : '',
params.aligner == "chromap" ? "--format BAM" : '' //TODO check if not needed anymore with new chromap versions
params.aligner == "chromap" ? "--format BAM" : ''
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/${params.aligner}/merged_library/macs3/${params.narrow_peak ? 'narrow_peak' : 'broad_peak'}" },
Expand Down
10 changes: 4 additions & 6 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The directories listed below will be created in the output directory after the p

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`introduction`](../..) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.

See [Illumina website](https://emea.illumina.com/techniques/sequencing/dna-sequencing/chip-seq.html) for more information regarding the ChIP-seq protocol, and for an extensive list of publications.

Expand Down Expand Up @@ -50,7 +50,7 @@ The initial QC and alignments are performed at the library-level e.g. if the sam

</details>

[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`usage.md`](usage.md) for more details about the trimming options.
[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`parameters`](../parameters/#adapter-trimming-options) for more details about the trimming options.

![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png)

Expand All @@ -70,12 +70,10 @@ The pipeline has been written in a way where all the files generated downstream
</details>

Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of this aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times.
Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of these aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can be time-consuming for large genomes, so you can use the `--save_reference` parameter to save the indices for future pipeline runs, thereby reducing processing times.

![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png)

> **NB:** Currently, paired-end files produced by `Chromap` are excluded from downstream analysis due to [this](https://github.com/nf-core/chipseq/issues/291) issue. Single-end files are processed normally.
#### Unmapped reads

The `--save_unaligned` parameter enables to obtain FastQ files containing unmapped reads (only available for STAR and Bowtie2).
Expand Down Expand Up @@ -202,7 +200,7 @@ The results from deepTools plotProfile gives you a quick visualisation for the g

[MACS3](https://github.com/macs3-project/MACS) is one of the most popular peak-calling algorithms for ChIP-seq data. By default, the peaks are called with the MACS3 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS3 outputs](https://github.com/macs3-project/MACS/blob/master/docs/callpeak.md#output-files) for a description of the output files generated by MACS3.

![MultiQC - MACS3 total peak count plot](images/mqc_macs2_peak_count_plot.png)
![MultiQC - MACS3 total peak count plot](images/mqc_macs3_peak_count_plot.png)

[HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. _Annotation_, _Distance to TSS_ and _Nearest Promoter ID_.

Expand Down
6 changes: 3 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,13 +87,13 @@ NAIVE_INPUT,BLA203A49_S1_L006_R1_001.fastq.gz,,3,,,
| `control` | Sample name for control sample. |
| `control_replicate` | Integer representing replicate number for control sample. |

Example design files have bee_n provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.
Example design files have been provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.

> **NB:** The `group` and `replicate` columns were replaced with a single `sample` column as of v2.0 of the pipeline. The `sample` column is essentially a concatenation of the `group` and `replicate` columns. If all values of `sample` have the same number of underscores, fields defined by these underscore-separated names may be used in the PCA plots produced by the pipeline, to regain the ability to represent different groupings.
## Reference genome files

The minimum reference genome requirements are a FASTA and GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a custom config file.
The minimum reference genome requirements are a FASTA and a GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a [custom config file](https://nf-co.re/usage/configuration#custom-configuration-files).

- If `--genome` is provided then the FASTA and GTF files (and existing indices) will be automatically obtained from AWS-iGenomes unless these have already been downloaded locally in the path specified by `--igenomes_base`.
- If `--gene_bed` is not provided then it will be generated from the GTF file.
Expand Down Expand Up @@ -126,7 +126,7 @@ cd v3.0
wget -L https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz && gunzip ENCFF356LFX.bed.gz && mv ENCFF356LFX.bed hg38-blacklist.v3.bed
```

> **NB:** A detailed description of the different versions of the files can be found [here](https://sites.google.com/site/anshulkundaje/projects/blacklists). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).
> **NB:** A detailed description of the different versions of the files can be found [here](https://github.com/Boyle-Lab/Blacklist/blob/master/README.md). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).
## Running the pipeline

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
},
"phantompeakqualtools": {
"branch": "master",
"git_sha": "2dfe9afa90fefc70e320140e5f41287f01f324b0",
"git_sha": "ec48f56f6e1571e23800aaaba41cceda13408e02",
"installed_by": ["modules"]
},
"picard/collectmultiplemetrics": {
Expand Down
2 changes: 1 addition & 1 deletion modules/local/multiqc_custom_phantompeakqualtools.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ process MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS {
tag "$meta.id"
conda "conda-forge::r-base=4.3.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'oras://community.wave.seqera.io/library/r-base:4.3.3--452dec8277637366':
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/45/4569ff9993578b8402d00230ab9dd75ce6e63529731eb24f21579845e6bd5cdb/data':
'community.wave.seqera.io/library/r-base:4.3.3--14bb33ac537aea22' }"

input:
Expand Down
2 changes: 0 additions & 2 deletions modules/nf-core/phantompeakqualtools/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/nf-core/phantompeakqualtools/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f47e8ed

Please sign in to comment.