Add notebook with conversion examples #67

erindiel · 2024-06-17T20:04:36Z

This notebook summarizes available conversion tools that rely on Bio-Formats for reading and writing various file formats. Specifically, it includes sample commands for converting using bfconvert and bioformats2raw, as well as a description of scenarios where one tool might be preferred over the other.

Sample data comes primarily from IDC. There are examples of both reading and writing DICOM. Are there other preferred datasets or methods for getting this data than what is used here?

This notebook can be run in Google Colab or it can be run locally; however, commands like wget will not work on Windows, so some sections will not be testable locally by Windows users. Is this acceptable?

cc @melissalinkert @dclunie @fedorov

Typo fixes, expand a few comments

review-notebook-app · 2024-06-17T20:04:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

fedorov · 2024-08-08T21:02:07Z

@erindiel @melissalinkert I started my review, and did some minor improvements to simplify access to data from IDC. You can find my edits here: https://colab.research.google.com/drive/1gkJpKr1cL5R4uEQkQtFE0UPGxtiJHXk_?usp=sharing.

Overall, the structure looks great! I have few minor comments, but I first wanted to bring up the issue that I think is a major one. The cells corresponding to conversion from DICOM to alternative representation are extremely slow.

This one was 48 minutes for a single H&E slide on a default Google Colab CPU instance.

The next cell has been running around that same time and is still not finished.

Can you comment on why this is so slow and what can be done about this? Is ome/bioformats#4190 going to remedy this?

fedorov · 2024-08-08T22:16:00Z

The following one took almost 2 hours!

melissalinkert · 2024-08-13T21:04:32Z

Thanks, @fedorov. We're looking into the performance issue, as that seems to be noticeably slower than what we saw when originally testing.

ome/bioformats#4190 is expected not to affect conversion with bioformats2raw/raw2ometiff - that set of changes is around expanding access to "precompressed" tiles, which the bioformats2raw/raw2ometiff conversion workflow cannot currently make use of.

erindiel · 2024-08-26T18:29:23Z

Thanks again @fedorov for noting the conversion time issue. We confirmed that when testing the notebook locally, the conversion took <10 minutes, even when lowering the max worker count using --max-workers. We therefore assume the I/O speeds on Google Colab are slower, increasing the conversion time dramatically.

A couple of options to improve the situation:

Use a smaller dataset for the example (do you have any recommendation?)
Include a comment to recommend running conversions in an appropriate environment, perhaps linking to https://github.com/glencoesoftware/bioformats2raw?tab=readme-ov-file#performance for further information.

fedorov

Trying to run this on linux, I got this error... Maybe it is because of limited disk space, not sure. I will try again a bit later.

fedorov · 2024-09-04T16:41:39Z

notebooks/advanced_topics/IDC_Recipes_Conversion.ipynb

+   ],
+   "source": [
+    "# IDC supports image download via s5cmd\n",
+    "!pip install s5cmd\n",


Instead, do this:

# idc-index is a convenience package to support access to IDC data !pip install idc-index --upgrade

fedorov · 2024-09-04T16:42:38Z

notebooks/advanced_topics/IDC_Recipes_Conversion.ipynb

+   ],
+   "source": [
+    "# Download sample data from IDC\n",
+    "!s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp \"s3://idc-open-data/6d7f4ec7-2c84-4a46-86ac-acde279195bb/*\" rgb-dicom\n",


Instead, do this:

!idc download-from-selection --series-instance-uid 1.3.6.1.4.1.5962.99.1.3140643155.174517037.1639523215699.2.0 --download-dir ./rgb-dicom --dir-template ""

This will report the total size and check that you have enough size, and will report download progress.

fedorov · 2024-09-04T16:47:43Z

notebooks/advanced_topics/IDC_Recipes_Conversion.ipynb

+    "id": "BSgOKNYErcGz"
+   },
+   "source": [
+    "### Install required packages"


As in the other notebook, I would install all prerequisites in a dedicated cell in the beginning of the notebook.

DanielaSchacherer · 2024-09-05T12:56:59Z

Hi Erin, also for this notebook, Andrey asked me to have a look.
I think it's a very useful notebook for everyone that might have questions about conversion tools (I looked at the version where @fedorov already made some edits). I can confirm the running times he experienced in Colab (even a little longer) and would also suggest to take a small slide for exemplary use as well as add in the text that this is not something supposed to be run in Colab for a whole dataset.
Apart from that, I would not push to the repository including the output (except for the two images close to the end of the notebook).

…xample files (#2) * use idc-index and smaller files for conversion examples * clarify compression option * use bio-formats 8.0.0 for precompressed option

erindiel · 2024-10-25T19:00:10Z

Thanks for the reviews here. The notebook has been updated to:

use idc-index for image download
use smaller example files from IDC
recommend local conversion for larger data
use the latest Bio-Formats 8.0.0 with improvements in -precompressed option for bfconvert (compression type no longer needs to be specified): https://www.openmicroscopy.org/2024/10/24/bio-formats-8-0-0.html
bioformats2raw commands use --compression zlib to avoid installation issues with blosc
outputs of each block are removed

erindiel and others added 3 commits June 6, 2024 16:49

add notebook with conversion recipes

420aa13

Typo fixes, expand a few comments

01f081d

Merge pull request #1 from melissalinkert/conversion-notebook

e2504a3

Typo fixes, expand a few comments

melissalinkert mentioned this pull request Jun 17, 2024

Add QuPath viewing notebook #68

Closed

erindiel mentioned this pull request Jun 21, 2024

Add analysis notebook using MCMICRO #69

Open

fedorov requested changes Sep 4, 2024

View reviewed changes

melissalinkert mentioned this pull request Oct 10, 2024

Add notebook describing conversion with supplemental metadata #75

Open

Conversion notebook updates for compression and download of smaller e…

83f784c

…xample files (#2) * use idc-index and smaller files for conversion examples * clarify compression option * use bio-formats 8.0.0 for precompressed option

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add notebook with conversion examples #67

Add notebook with conversion examples #67

erindiel commented Jun 17, 2024

review-notebook-app bot commented Jun 17, 2024

fedorov commented Aug 8, 2024

fedorov commented Aug 8, 2024

melissalinkert commented Aug 13, 2024

erindiel commented Aug 26, 2024

fedorov left a comment

fedorov Sep 4, 2024

fedorov Sep 4, 2024

fedorov Sep 4, 2024

DanielaSchacherer commented Sep 5, 2024

erindiel commented Oct 25, 2024

Add notebook with conversion examples #67

Are you sure you want to change the base?

Add notebook with conversion examples #67

Conversation

erindiel commented Jun 17, 2024

review-notebook-app bot commented Jun 17, 2024

fedorov commented Aug 8, 2024

fedorov commented Aug 8, 2024

melissalinkert commented Aug 13, 2024

erindiel commented Aug 26, 2024

fedorov left a comment

Choose a reason for hiding this comment

fedorov Sep 4, 2024

Choose a reason for hiding this comment

fedorov Sep 4, 2024

Choose a reason for hiding this comment

fedorov Sep 4, 2024

Choose a reason for hiding this comment

DanielaSchacherer commented Sep 5, 2024

erindiel commented Oct 25, 2024