New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

JustBioinfo · 2024-08-05T11:21:21Z

Description of feature

Hi,

I'm testing this pipeline with a view to using it routinely for the many bulk BCR targeted sequencing data of the research team where I work.
We have data from 5'RACE with UMI. Our R1 reads consist of UMI + race linker preceded by a 27nt sequences (slightly variable in size) and our R2 reads start directly with the cprimer. I thought of cutting the sequence upstream of the UMI with cutadapt before launching the pipeline, but I realize that this will introduce errors in the analysis because cutadapt will search for the pattern of our 27nt sequence without taking into account the sequence of the UMI + race linker, which creates offsets for the alignment of the race linker for some of our R1 reads (same problem if I cut 27nt at the beginning of all the R1s).

First, I would like to add the possibility of cutting a sequence upstream of the UMI by looking for UMI+race linker pattern and cut what is there before the match. This is possible with MaskPrimers.py align in trim mode and a fasta containing the UMI+ race linker pattern. I am new to the analysis of this type of data, and therefore have difficulty to understand AIRR library_generation_methods, but I understand that we need to add a new supported library_generation_method and add a new process PRESTO_MASKPRIMERS_ALIGN_TRIM especially for this protocol that would launch MaskPrimers.py align in trim mode just before PRESTO_MASKPRIMERS_UMI step, right?
I already did a successful test by giving raw reads to the pipeline with dt_5p_race_umi library_generation_method and adding directly MaskPrimers.py align in trim mode command in front of the two MaskPrimers.py score commands in the .command.sh of PRESTO_MASKPRIMERS_UMI cache files, then by running corresponding .command.run and finaly resume the pipeline.

I would also like to add a step AssemblePairs.py join to join the reads that failed at the step PRESTO_ASSEMBLIES _UMI by their ends. In the same way should we reserve this new step to the new supported protocol ?

I would also like to know if it is easy to add an option for running AssignGenes.py with --format airr instead of --format blast, because there are columns that we need in IgBlast 19 columns mode and I saw that the expected file in the corresponding process must be in .fmt7 format.

I would be grateful for your advice and sorry for my poor English.

Justine

The text was updated successfully, but these errors were encountered:

JustBioinfo added the enhancement New feature or request label Aug 5, 2024

JustBioinfo mentioned this issue Aug 8, 2024

New bulk 5'-RACE supported protocol, non-overlaping reads rescue #343

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

JustBioinfo commented Aug 5, 2024

New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

New bulk 5'-RACE supported protocol, rescue non-overlaping reads and IgBlast 19 columns format #342

Comments

JustBioinfo commented Aug 5, 2024

Description of feature