Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Berkeley refactor preparation #65

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Conversation

kheal
Copy link
Collaborator

@kheal kheal commented Aug 1, 2024

Long running branch to poise notebooks for conversion to Berkeley schema.
This addresses: microbiomedata/issues#726

Leave "#TODO" notes in notebooks for where we will need to edit urls when berkeley schema is production

Workflows will check the validity of each notebook when a push is made to this branch.

R notebooks that have been updated and are passing checks in this branch:

  • NEON metadata
  • bioscales metadata
  • taxonomy by soil layer
  • NOM visualization notebook

python notebooks that have been updated and are passing checks in this branch:

  • NEON metadata
  • bioscales metadata
  • taxonomy by soil layer
  • NOM visualization notebook

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@kheal kheal changed the title Berkeley refactor [DO NOT MERGE] Berkeley refactor preparation Aug 1, 2024
@brynnz22
Copy link
Contributor

brynnz22 commented Aug 26, 2024

@kheal After doing my changes I noticed that we can't just query workflow_execution_set, we need to specify the type as nmdc:MetagenomeAnnoation because it messes with the results. You'll see even in your diff that you are getting records that aren't metagenome annotation records. So in order to count for that, I added a type variable to the get_id_results function and added this in all the calls. Its not really necessary for the material processing steps or the data generation step (although it is nice to specify type is nmdc:NucleotideSequencing). But I would recommend doing this to your notebooks as well. My counts were back on a similar track to the original notebooks (they were a little off, but I assume this had something to do with re-iding).

Also, I noticed you still have two mentions omics processing in your heading and description in step 6.

@brynnz22 brynnz22 self-requested a review August 26, 2024 22:57
@brynnz22
Copy link
Contributor

Tried to add you as a reviewer but looks like I can't cause its your PR

@kheal
Copy link
Collaborator Author

kheal commented Aug 26, 2024

Good catches @brynnz22. I'll fix those and review your changes this week.

Add filter for MetagenomeAnnotation records only
@kheal
Copy link
Collaborator Author

kheal commented Aug 27, 2024

@brynnz22 . I've updated the R taxonomy notebook.

The python taxonomy notebook looks good, but I think we lost step 8's narrative (code looks good, but there's no text describing it anymore).

We also have dill as an import and reference it in the first chunk of text, but don't use it anymore. I should have taken that out earlier (#47), but I missed it. It works fine for me because I have dill loaded in my venv, but it's not consistent with the requirements if someone else pulls it locally to use.

@samobermiller
Copy link
Collaborator

Created branch berkeley_nom off of main to update NOM visualizations folder with Berkeley API calls. Didn't branch off of this berkeley_refactor branch because the branch was created after our notebooks. #80

@brynnz22
Copy link
Contributor

brynnz22 commented Oct 1, 2024

Created branch berkeley_nom off of main to update NOM visualizations folder with Berkeley API calls. Didn't branch off of this berkeley_refactor branch because the branch was created after our notebooks. #80

@samobermiller this works too! In the future though, you can just merge main into the Berkeley branch to get main's updates into the Berkeley branch and then branch off the Berkeley. But this way is fine too - it just means we'll have to merge a couple of PRs in after Berkeley.

@brynnz22
Copy link
Contributor

brynnz22 commented Oct 1, 2024

@brynnz22 . I've updated the R taxonomy notebook.

The python taxonomy notebook looks good, but I think we lost step 8's narrative (code looks good, but there's no text describing it anymore).

We also have dill as an import and reference it in the first chunk of text, but don't use it anymore. I should have taken that out earlier (#47), but I missed it. It works fine for me because I have dill loaded in my venv, but it's not consistent with the requirements if someone else pulls it locally to use.

@kheal I added step 8 narrative back in and removed the dill import and reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants