-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add notebook with conversion examples #67
base: master
Are you sure you want to change the base?
Conversation
Typo fixes, expand a few comments
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@erindiel @melissalinkert I started my review, and did some minor improvements to simplify access to data from IDC. You can find my edits here: https://colab.research.google.com/drive/1gkJpKr1cL5R4uEQkQtFE0UPGxtiJHXk_?usp=sharing. Overall, the structure looks great! I have few minor comments, but I first wanted to bring up the issue that I think is a major one. The cells corresponding to conversion from DICOM to alternative representation are extremely slow. This one was 48 minutes for a single H&E slide on a default Google Colab CPU instance. The next cell has been running around that same time and is still not finished. Can you comment on why this is so slow and what can be done about this? Is ome/bioformats#4190 going to remedy this? |
Thanks, @fedorov. We're looking into the performance issue, as that seems to be noticeably slower than what we saw when originally testing. ome/bioformats#4190 is expected not to affect conversion with bioformats2raw/raw2ometiff - that set of changes is around expanding access to "precompressed" tiles, which the bioformats2raw/raw2ometiff conversion workflow cannot currently make use of. |
Thanks again @fedorov for noting the conversion time issue. We confirmed that when testing the notebook locally, the conversion took <10 minutes, even when lowering the max worker count using A couple of options to improve the situation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
], | ||
"source": [ | ||
"# IDC supports image download via s5cmd\n", | ||
"!pip install s5cmd\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, do this:
# idc-index is a convenience package to support access to IDC data
!pip install idc-index --upgrade
], | ||
"source": [ | ||
"# Download sample data from IDC\n", | ||
"!s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp \"s3://idc-open-data/6d7f4ec7-2c84-4a46-86ac-acde279195bb/*\" rgb-dicom\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, do this:
!idc download-from-selection --series-instance-uid 1.3.6.1.4.1.5962.99.1.3140643155.174517037.1639523215699.2.0 --download-dir ./rgb-dicom --dir-template ""
This will report the total size and check that you have enough size, and will report download progress.
"id": "BSgOKNYErcGz" | ||
}, | ||
"source": [ | ||
"### Install required packages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the other notebook, I would install all prerequisites in a dedicated cell in the beginning of the notebook.
Hi Erin, also for this notebook, Andrey asked me to have a look. |
…xample files (#2) * use idc-index and smaller files for conversion examples * clarify compression option * use bio-formats 8.0.0 for precompressed option
Thanks for the reviews here. The notebook has been updated to:
|
This notebook summarizes available conversion tools that rely on Bio-Formats for reading and writing various file formats. Specifically, it includes sample commands for converting using bfconvert and bioformats2raw, as well as a description of scenarios where one tool might be preferred over the other.
Sample data comes primarily from IDC. There are examples of both reading and writing DICOM. Are there other preferred datasets or methods for getting this data than what is used here?
This notebook can be run in Google Colab or it can be run locally; however, commands like
wget
will not work on Windows, so some sections will not be testable locally by Windows users. Is this acceptable?cc @melissalinkert @dclunie @fedorov