Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to keep chimeric reads after UMI deduplication #1373

Open
siddharthab opened this issue Sep 4, 2024 · 1 comment
Open

Option to keep chimeric reads after UMI deduplication #1373

siddharthab opened this issue Sep 4, 2024 · 1 comment
Milestone

Comments

@siddharthab
Copy link

Description of feature

Following up on #1369 (comment).

@MatthiasZepper Please take over this issue.

@MatthiasZepper
Copy link
Member

While reviewing #1369, I noticed that we have set the parameter --chimeric-pairs=discard for umi-tools and wondered if that is actually a good default choice. I planned to briefly discuss that in the #rnaseq_dev Slack channel, but since it is now an official issue, we can also track it here :-)

Purely from a biological view, particularly the transcriptome alignments may comprise a significant amount of chimeric read pairs, simply because of an unannotated splice variant or because of an antisense long non-coding RNA spanning several annotated transcripts. Also, many users use the pipeline on cancer data, where fusion genes or chromosomal rearrangements are to be expected.

However, I have in the meantime read in the UMI-tools FAQ that disabling the option significantly increases the memory demands, so the computational complexity clearly argues for disregarding this complexity by default and leave it to the users of the pipeline to look at chimeric transcripts specifically, if of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants