Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

Open
JustBioinfo opened this issue Aug 5, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@JustBioinfo
Copy link

Description of feature

Hi,

I'm testing this pipeline with a view to using it routinely for the many bulk BCR targeted sequencing data of the research team where I work.
We have data from 5'RACE with UMI. Our R1 reads consist of UMI + race linker preceded by a 27nt sequences (slightly variable in size) and our R2 reads start directly with the cprimer. I thought of cutting the sequence upstream of the UMI with cutadapt before launching the pipeline, but I realize that this will introduce errors in the analysis because cutadapt will search for the pattern of our 27nt sequence without taking into account the sequence of the UMI + race linker, which creates offsets for the alignment of the race linker for some of our R1 reads (same problem if I cut 27nt at the beginning of all the R1s).

First, I would like to add the possibility of cutting a sequence upstream of the UMI by looking for UMI+race linker pattern and cut what is there before the match. This is possible with MaskPrimers.py align in trim mode and a fasta containing the UMI+ race linker pattern. I am new to the analysis of this type of data, and therefore have difficulty to understand AIRR library_generation_methods, but I understand that we need to add a new supported library_generation_method and add a new process PRESTO_MASKPRIMERS_ALIGN_TRIM especially for this protocol that would launch MaskPrimers.py align in trim mode just before PRESTO_MASKPRIMERS_UMI step, right?
I already did a successful test by giving raw reads to the pipeline with dt_5p_race_umi library_generation_method and adding directly MaskPrimers.py align in trim mode command in front of the two MaskPrimers.py score commands in the .command.sh of PRESTO_MASKPRIMERS_UMI cache files, then by running corresponding .command.run and finaly resume the pipeline.

I would also like to add a step AssemblePairs.py join to join the reads that failed at the step PRESTO_ASSEMBLIES _UMI by their ends. In the same way should we reserve this new step to the new supported protocol ?

I would also like to know if it is easy to add an option for running AssignGenes.py with --format airr instead of --format blast, because there are columns that we need in IgBlast 19 columns mode and I saw that the expected file in the corresponding process must be in .fmt7 format.

I would be grateful for your advice and sorry for my poor English.

Justine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant