Merge pull request #415 from JoseEspinosa/updates

Get rid of oras from module images
nf-core · Sep 30, 2024 · f47e8ed · f47e8ed
2 parents 686c054 + 372e393
commit f47e8ed
Show file tree

Hide file tree

Showing 13 changed files with 95 additions and 65 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v2.1.0dev - [date]
+## v2.1.0 - [date]
 
 ### Enhancements & fixes
 
@@ -31,6 +31,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [[#347](https://github.com/nf-core/chipseq/issues/347)] - Add read group tag to bam files processed by bowtie2.
 - [[PR #406](https://github.com/nf-core/chipseq/pull/406)] - Update metro map to show macs3 instead of macs2.
 - [[#409](https://github.com/nf-core/chipseq/issues/409)] - Bulk modules and subworkflows update.
+- [[PR #415](https://github.com/nf-core/chipseq/pull/415)] - Get rid of `oras` in modules.
 
 ### Software dependencies
 

diff --git a/README.md b/README.md
@@ -74,17 +74,17 @@ You can find numerous talks on the [nf-core events page](https://nf-co.re/events
 
 To run on your data, prepare a tab-separated samplesheet with your input data. Please follow the [documentation on samplesheets](https://nf-co.re/chipseq/usage#samplesheet-input) for more details. An example samplesheet for running the pipeline looks as follows:
 
-```csv
-sample,fastq_1,fastq_2,antibody,control
-WT_BCATENIN_IP_REP1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP1
-WT_BCATENIN_IP_REP2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
-WT_BCATENIN_IP_REP2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
-WT_BCATENIN_IP_REP2,BLA203A25_S16_L003_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP2
-WT_BCATENIN_IP_REP3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT_REP3
-WT_INPUT_REP1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
-WT_INPUT_REP2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
-WT_INPUT_REP2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
-WT_INPUT_REP3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
+```csv title="samplesheet.csv"
+sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
+WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
+WT_BCATENIN_IP,BLA203A25_S16_L001_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
+WT_BCATENIN_IP,BLA203A25_S16_L002_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
+WT_BCATENIN_IP,BLA203A25_S16_L003_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
+WT_BCATENIN_IP,BLA203A49_S40_L001_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
+WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
+WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
+WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
+WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
 ```
 
 Now, you can run the pipeline using:
@@ -96,8 +96,7 @@ nextflow run nf-core/chipseq --input samplesheet.csv --outdir <OUTDIR> --genome
 See [usage docs](https://nf-co.re/chipseq/usage) for all of the available options when running the pipeline.
 
 > [!WARNING]
-> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
-> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
+> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see the [docs](https://nf-co.re/usage/configuration#custom-configuration-files) here.
 
 For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/chipseq/usage) and the [parameter documentation](https://nf-co.re/chipseq/parameters).
 
@@ -113,7 +112,7 @@ These scripts were originally written by Chuan Wang ([@chuan-wang](https://githu
 
 The pipeline workflow diagram was designe by Sarah Guinchard ([@G-Sarah](https://github.com/G-Sarah)).
 
-Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).
+Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@bjlang](https://github.com/bjlang), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).
 
 ## Contributions and Support
 

diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
@@ -1,7 +1,7 @@
 report_comment: >
-  This report has been generated by the <a href="https://github.com/nf-core/chipseq/tree/dev" target="_blank">nf-core/chipseq</a>
+  This report has been generated by the <a href="https://github.com/nf-core/chipseq/releases/tag/2.1.0" target="_blank">nf-core/chipseq</a>
   analysis pipeline. For information about how to interpret these results, please see the
-  <a href="https://nf-co.re/chipseq/dev/docs/output" target="_blank">documentation</a>.
+  <a href="https://nf-co.re/chipseq/2.1.0/docs/output" target="_blank">documentation</a>.
 
 data_format: "yaml"
 

diff --git a/conf/modules.config b/conf/modules.config
@@ -354,7 +354,8 @@ process {
         publishDir = [
             path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
             mode: params.publish_dir_mode,
-            pattern: '*.{stats,flagstat,idxstats}'
+            pattern: '*.{stats,flagstat,idxstats}',
+            enabled: params.save_align_intermeds
         ]
     }
 
@@ -415,6 +416,24 @@ process {
         ]
     }
 
+    withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:.*' {
+        ext.prefix = { "${meta.id}.mLb.clN.sorted" }
+        publishDir = [
+            path: { "${params.outdir}/${params.aligner}/merged_library" },
+            mode: params.publish_dir_mode,
+            pattern: "*.{bam,bai}"
+        ]
+    }
+
+    withName: '.*:BAM_FILTER_BAMTOOLS:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' {
+        ext.prefix = { "${meta.id}.mLb.clN.sorted.bam" }
+        publishDir = [
+            path: { "${params.outdir}/${params.aligner}/merged_library/samtools_stats" },
+            mode: params.publish_dir_mode,
+            pattern: "*.{stats,flagstat,idxstats}"
+        ]
+    }
+
     withName: 'PHANTOMPEAKQUALTOOLS' {
         ext.args   = { "--max-ppsize=500000" }
         ext.args2  = { "-p=$task.cpus" }
@@ -553,7 +572,7 @@ process {
             params.save_macs_pileup     ? '--bdg --SPMR' : '',
             params.macs_fdr             ? "--qvalue ${params.macs_fdr}" : '',
             params.macs_pvalue          ? "--pvalue ${params.macs_pvalue}" : '',
-            params.aligner == "chromap" ? "--format BAM" : '' //TODO check if not needed anymore with new chromap versions
+            params.aligner == "chromap" ? "--format BAM" : ''
         ].join(' ').trim()
         publishDir = [
             path: { "${params.outdir}/${params.aligner}/merged_library/macs3/${params.narrow_peak ? 'narrow_peak' : 'broad_peak'}" },

diff --git a/docs/output.md b/docs/output.md
@@ -12,7 +12,7 @@ The directories listed below will be created in the output directory after the p
 
 ## Pipeline overview
 
-The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
+The pipeline is built using [Nextflow](https://www.nextflow.io/). See [`introduction`](../..) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
 
 See [Illumina website](https://emea.illumina.com/techniques/sequencing/dna-sequencing/chip-seq.html) for more information regarding the ChIP-seq protocol, and for an extensive list of publications.
 
@@ -50,7 +50,7 @@ The initial QC and alignments are performed at the library-level e.g. if the sam
 
 </details>
 
-[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`usage.md`](usage.md) for more details about the trimming options.
+[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`parameters`](../parameters/#adapter-trimming-options) for more details about the trimming options.
 
 ![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png)
 
@@ -70,12 +70,10 @@ The pipeline has been written in a way where all the files generated downstream
 
 </details>
 
-Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of this aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times.
+Adapter-trimmed reads are mapped to the reference assembly using the aligner set by the `--aligner` parameter. Available aligners are [BWA](http://bio-bwa.sourceforge.net/bwa.shtml) (default), [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [Chromap](https://github.com/haowenz/chromap) and [STAR](https://github.com/alexdobin/STAR). A genome index is required to run any of these aligners so if this is not provided explicitly using the corresponding parameter (e.g. `--bwa_index`), then it will be created automatically from the genome fasta input. The index creation process can be time-consuming for large genomes, so you can use the `--save_reference` parameter to save the indices for future pipeline runs, thereby reducing processing times.
 
 ![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png)
 
-> **NB:** Currently, paired-end files produced by `Chromap` are excluded from downstream analysis due to [this](https://github.com/nf-core/chipseq/issues/291) issue. Single-end files are processed normally.
-
 #### Unmapped reads
 
 The `--save_unaligned` parameter enables to obtain FastQ files containing unmapped reads (only available for STAR and Bowtie2).
@@ -202,7 +200,7 @@ The results from deepTools plotProfile gives you a quick visualisation for the g
 
 [MACS3](https://github.com/macs3-project/MACS) is one of the most popular peak-calling algorithms for ChIP-seq data. By default, the peaks are called with the MACS3 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS3 outputs](https://github.com/macs3-project/MACS/blob/master/docs/callpeak.md#output-files) for a description of the output files generated by MACS3.
 
-![MultiQC - MACS3 total peak count plot](images/mqc_macs2_peak_count_plot.png)
+![MultiQC - MACS3 total peak count plot](images/mqc_macs3_peak_count_plot.png)
 
 [HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. _Annotation_, _Distance to TSS_ and _Nearest Promoter ID_.
 

diff --git a/docs/usage.md b/docs/usage.md
@@ -87,13 +87,13 @@ NAIVE_INPUT,BLA203A49_S1_L006_R1_001.fastq.gz,,3,,,
 | `control`           | Sample name for control sample.                                                                                                                                                        |
 | `control_replicate` | Integer representing replicate number for control sample.                                                                                                                              |
 
-Example design files have bee_n provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.
+Example design files have been provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.
 
 > **NB:** The `group` and `replicate` columns were replaced with a single `sample` column as of v2.0 of the pipeline. The `sample` column is essentially a concatenation of the `group` and `replicate` columns. If all values of `sample` have the same number of underscores, fields defined by these underscore-separated names may be used in the PCA plots produced by the pipeline, to regain the ability to represent different groupings.
 
 ## Reference genome files
 
-The minimum reference genome requirements are a FASTA and GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a custom config file.
+The minimum reference genome requirements are a FASTA and a GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. those unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--bwa_index '/path/to/bwa/index/'`) or via a [custom config file](https://nf-co.re/usage/configuration#custom-configuration-files).
 
 - If `--genome` is provided then the FASTA and GTF files (and existing indices) will be automatically obtained from AWS-iGenomes unless these have already been downloaded locally in the path specified by `--igenomes_base`.
 - If `--gene_bed` is not provided then it will be generated from the GTF file.
@@ -126,7 +126,7 @@ cd v3.0
 wget -L https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz && gunzip ENCFF356LFX.bed.gz && mv ENCFF356LFX.bed hg38-blacklist.v3.bed
 ```
 
-> **NB:** A detailed description of the different versions of the files can be found [here](https://sites.google.com/site/anshulkundaje/projects/blacklists). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).
+> **NB:** A detailed description of the different versions of the files can be found [here](https://github.com/Boyle-Lab/Blacklist/blob/master/README.md). Also, to to see which blacklist bed files are assigned by default to the respective reference genome check the [igenomes.config](https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config).
 
 ## Running the pipeline
 

diff --git a/modules.json b/modules.json
@@ -98,7 +98,7 @@
                     },
                     "phantompeakqualtools": {
                         "branch": "master",
-                        "git_sha": "2dfe9afa90fefc70e320140e5f41287f01f324b0",
+                        "git_sha": "ec48f56f6e1571e23800aaaba41cceda13408e02",
                         "installed_by": ["modules"]
                     },
                     "picard/collectmultiplemetrics": {

diff --git a/modules/local/multiqc_custom_phantompeakqualtools.nf b/modules/local/multiqc_custom_phantompeakqualtools.nf
@@ -2,7 +2,7 @@ process MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS {
     tag "$meta.id"
     conda "conda-forge::r-base=4.3.3"
     container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
-        'oras://community.wave.seqera.io/library/r-base:4.3.3--452dec8277637366':
+        'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/45/4569ff9993578b8402d00230ab9dd75ce6e63529731eb24f21579845e6bd5cdb/data':
         'community.wave.seqera.io/library/r-base:4.3.3--14bb33ac537aea22' }"
 
     input:

diff --git a/modules/nf-core/phantompeakqualtools/environment.yml b/modules/nf-core/phantompeakqualtools/environment.yml
diff --git a/modules/nf-core/phantompeakqualtools/main.nf b/modules/nf-core/phantompeakqualtools/main.nf