Option to keep chimeric reads after UMI deduplication #1373

siddharthab · 2024-09-04T21:46:20Z

Description of feature

@MatthiasZepper Please take over this issue.

MatthiasZepper · 2024-09-06T16:55:30Z

While reviewing #1369, I noticed that we have set the parameter --chimeric-pairs=discard for umi-tools and wondered if that is actually a good default choice. I planned to briefly discuss that in the #rnaseq_dev Slack channel, but since it is now an official issue, we can also track it here :-)

Purely from a biological view, particularly the transcriptome alignments may comprise a significant amount of chimeric read pairs, simply because of an unannotated splice variant or because of an antisense long non-coding RNA spanning several annotated transcripts. Also, many users use the pipeline on cancer data, where fusion genes or chromosomal rearrangements are to be expected.

However, I have in the meantime read in the UMI-tools FAQ that disabling the option significantly increases the memory demands, so the computational complexity clearly argues for disregarding this complexity by default and leave it to the users of the pipeline to look at chimeric transcripts specifically, if of interest.

siddharthab added the enhancement label Sep 4, 2024

siddharthab mentioned this issue Sep 4, 2024

Add umicollapse as an alternative to umi-tools #1369

Open

5 tasks

MatthiasZepper added this to the 3.16.0 milestone Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to keep chimeric reads after UMI deduplication #1373

Option to keep chimeric reads after UMI deduplication #1373

siddharthab commented Sep 4, 2024

MatthiasZepper commented Sep 6, 2024

Option to keep chimeric reads after UMI deduplication #1373

Option to keep chimeric reads after UMI deduplication #1373

Comments

siddharthab commented Sep 4, 2024

Description of feature

MatthiasZepper commented Sep 6, 2024