Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New stage describing how to model protein families in pathways #31

Open
khanspers opened this issue Apr 6, 2023 · 8 comments
Open

New stage describing how to model protein families in pathways #31

khanspers opened this issue Apr 6, 2023 · 8 comments
Assignees

Comments

@khanspers
Copy link
Member

From a curation report issue, how to best model protein families in pathways. Either use Pfam id on a single node representing the protein family, or list out all members of the protein family (if feasible).

@khanspers khanspers self-assigned this Apr 6, 2023
@danidi
Copy link
Contributor

danidi commented Jun 6, 2023

The Pfam database is retired, and included in interpro. Will interpro identifiers work as well?

@Chris-Evelo
Copy link

Chris-Evelo commented Jun 6, 2023

Having InterPro identifiers work would be great anyway. But what I read at the Interpro website is that they will host PFAM which sounds different from using Interpro identifiers instead. Does anybody know how that really works?

@danidi
Copy link
Contributor

danidi commented Jun 6, 2023

It seems like PFAM is still actively providing content, but this will be found via the interprot webpage only: https://xfam.wordpress.com/2022/08/04/pfam-website-decommission/
In the interprot search, you can then see a list of results coming from the different sources.

@khanspers
Copy link
Member Author

khanspers commented Aug 4, 2023

For this particular case, the best match I could find is this InterPro identifier, for Ribosomal protein S6 kinase:
https://www.ebi.ac.uk/interpro/entry/InterPro/IPR016238/

I added a stage here, using the pathway from the curation report as the example: https://academy.wikipathways.org/stages/draw-protein-families/ (not yet integrated in the path). Please review.

One thing to add is a comment about data mapping (i.e. won't work for these nodes)

@danidi
Copy link
Contributor

danidi commented Aug 22, 2023

Looks good! Only the upload doesn't work yet, is that intended? I got the following error: Oops! That doesn't look quite right. Please try again. Incorrect number of objects: 5 detected, 0 expected.
Are there plans to to include the data mapping at some point? Would be great if the family could be connected to the actual proteins somehow.

@khanspers
Copy link
Member Author

Thanks @danidi! There was a typo in the gpml validation, it is fixed now.

For the data mapping, there is no plan to make that work as far as I know. These instructions were only meant to solve the issue raised in the curation report, basically the alternative to leaving it empty. I can to add a comment to the task that data mapping from individual proteins that are part of the family won't work, and maybe also describe the alternate approach of adding individual proteins as a stack of nodes off to the side of the pathway (like we do with other groupings or genes/proteins)?

@khanspers
Copy link
Member Author

On second thought, Im not sure this should be a stage in the Academy. Although the idea to use an Interpro ID instead of leaving the xref blank is still valid for individual cases (for example the original question by Javi), it's potentially counter-intuitive and confusing as a stage in the Academy since it doesn't enable data mapping at all (at least in PathVisio, or in a straight-forward way in Cytoscape). We can keep this issue open for discussion, but Im not going to add the stage to the path for now.

@Chris-Evelo
Copy link

I think that that is fine for now. But it is one of the ideas that often come up in discussions about sequencing data to functionally evaluate sequencing data from multi-species mixtures, e.g. microbiome samples. If we can assign motifs in. sequences to functional protein motifs, and through that to pathways we could in principle evaluate the functionality or the functional capacity of such a mixture without assigning the sequences to species or complete genes. Of course we do not even have complete methods for that yet indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants