Seriesgrouping #684

neurorepro · 2023-06-07T13:27:57Z

This is to address #624

…to --files option

neurorepro · 2023-06-07T13:30:05Z

You may want to consider #682 first (changes are included in this PR)

codecov · 2023-06-07T13:36:55Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (e3c53b8) 81.94% compared to head (e9c9289) 81.98%.
Report is 43 commits behind head on master.

Files	Patch %	Lines
heudiconv/utils.py	73.68%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #684      +/-   ##
==========================================
+ Coverage   81.94%   81.98%   +0.04%     
==========================================
  Files          41       41              
  Lines        4132     4153      +21     
==========================================
+ Hits         3386     3405      +19     
- Misses        746      748       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

neurorepro · 2023-06-09T06:28:36Z

@yarikoptic As discussed in #624 I implemented a hash to shorten the name, but the full Series UID could be used too

yarikoptic · 2023-06-09T14:07:05Z

In general I think the approach makes sense in general. Re why not before: I guess (@satra and @mih might correct if have any recoll of the motivations) is that series id and name (instead of UID) were chosen for human accessibility since they make it easier to understand which particular series in question while annotating .edit.txt and just debugging the operation. So far I do not see immediate problem with adding series UID since AFAIK indeed it should be the same for all files within a single acquisition. Might be worth checking though on more of combined ones like T1PD etc.

TL;DR: we need to approach it with caution and thus need more checking/work. Some reservations/concerns and thinking out loud:

the fact that we ran into it just now suggests that situation is not typical thus questions the change of the default behavior:
- It seems it would apply ONLY in the case of -g all or -g accession i.e. where we force multiple scanning sessions into a single conversion study. That would partially explain why we have not ran into it I guess. Such grouppings were added only few years back IIRC and aren't default mode of operation
- as you note in Fieldmaps acquired during each part of a session splits into two not differentiated #624 (comment) dcm2niix would still do "the right thing" and convert those incorrectly "groupped" files into different files since it would rely on series UUID. So it is just that neither a heuristic nor heudiconv would know how to tease them apart.
it would introduce the core change of behavior
- possibly making it then impossible for heudiconv to operate on prior produced annotations in -c none mode for the subsequent conversion with -c dcm2niix since all groups names would change. Needs a check:
- may be heudiconv needs to become smart and detect in the mode of work while reloading already existing mapping file and then not bother adding UID? might help in transition period but generally not needed/too cumbersome... may be we would just add a check while loading an existing mapping that all the series ids do include that UID hash and error out if they don't with informative message. I think the latter is better.
- but it also might break some heuristics, which might have relied on those series ids (did any which we ship? needs to be checked)
- it could also be made optional - either via command line option (more preferable) or automagicly enabled only in -g all or -g accession (I don't like that). But then it adds differences in behavior based on CLI options, not good

heudiconv/dicoms.py

neurorepro · 2023-06-16T13:36:31Z

@yarikoptic thank you very much for the feedback

Regarding scope of use case: i think it is actually common to have artificially split a single BIDS session into two when the scanning protocol is too long (e.g. AM: anat, resting state, ... PM: diffusion MRI, ...) or the subject needs to exit. It is actually mentioned explicitely in the BIDS definition of session (def number 5):

if a subject has to leave the scanner room and then be re-positioned on the scanner bed, the set of MRI acquisitions will still be considered as a session and match sessions acquired in other subjects. Similarly, in situations where different data types are obtained over several visits (for example fMRI on one day followed by DWI the day after) those can be grouped in one session.

The fact that not many people reported it i think is because users did not know what went wrong and would not think this may be due to heudiconv (who would think of blaming such a magnificent tool ?)

Regarding change of behavior i totally agree that this should be done carefully. I think what seems to be your preferred option would be best:

may be we would just add a check while loading an existing mapping that all the series ids do include that UID hash and error out if they don't with informative message. I think the latter is better.

I will update the PR with that implemented.

If the error is too much for users (for which the use case of needing to keep previous data without uid is justified) then maybe we could add later a temporary option --with-deprecated-series-id ?

neurorepro · 2023-06-16T15:55:15Z

@yarikoptic I modified the code to check that the conversion table does not use deprecated series IDs. Otherwise it recomputes the conversation table with the non-deprecated series IDs.

I think in the end this may be less frustrating than an error since it identifies the deprecated IDs and recomputes the conversion table accordingly.

I also set the hash length to 8 as you suggested.

neurorepro · 2023-06-27T12:32:29Z

Hello @yarikoptic does it look like the code is ok now ? Cheers !

yarikoptic

@yarikoptic I modified the code to check that the conversion table does not use deprecated series IDs. Otherwise it recomputes the conversation table with the non-deprecated series IDs.

I worry that it is a bit too intrusive, in my comment above I thought it might be better to be more polite and not to just abolish the work people might have done already by just saying that "now we need another way". But Ok, I guess I can swallow this, we would just announce some "major"ish release.

But one major concern about the PR is absent "setting in stone" for such new behaviors in tests: not a single test was changed or added, thus suggesting actual problem of us not bothering to test that behavior before. Could you please add/change some test so it does get duplicated seqinfo (I guess we could just copy some subfolder with dicoms into a new one thus causing duplicates) and seeing it fail?

I also left some comments on how to possibly shorten that function to determine if outdated using re -- can become just 1 liner. But we would need to have a test either way.

yarikoptic · 2023-06-27T13:06:56Z

heudiconv/utils.py

+    """Return True if any series_id in the info_file is in deprecated format."""
+    info = read_config(info_file)
+    series_ids = [series_id for sublist in info.values() for series_id in sublist]
+    if any(is_deprecated_seriesid(series_id) for series_id in series_ids):


Could be shorter

Suggested change

if any(is_deprecated_seriesid(series_id) for series_id in series_ids):

if any(map(is_deprecated_seriesid, series_ids)):

but it is ok either way.

yarikoptic · 2023-06-27T13:07:33Z

heudiconv/utils.py

+            "Found deprecated series identifier in info file in the format"
+            "<series num>-<protocol name> instead of <series num>-<protocol name>-<UID hash>"


Suggested change

"Found deprecated series identifier in info file in the format"

"<series num>-<protocol name> instead of <series num>-<protocol name>-<UID hash>"

"Found deprecated series identifier in info file in the format "

"<series num>-<protocol name> instead of <series num>-<protocol name>-<UID hash>. "

yarikoptic · 2023-06-27T13:08:51Z

heudiconv/utils.py

+    if len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex):
+        return True
+    # If all previous tests passed, then the series_id is not deprecated
+    return False


Suggested change

if len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex):

return True

# If all previous tests passed, then the series_id is not deprecated

return False

return len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex)

or may be better to use re (might need an import)

Suggested change

if len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex):

return True

# If all previous tests passed, then the series_id is not deprecated

return False

return re.match(f"[{string.hexdigits}]{{{len_hash_hex}}}$")

this function needs a unittest which would be trivial to do

yarikoptic · 2023-06-27T13:18:14Z

heudiconv/utils.py

+    if series_id.count('-') <= 1:
+        return True
+    # Check the first part of the series_id is a number
+    series_number = series_id.split('-')[0]
+    if not series_number.isdigit():
+        return True
+    # Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex
+    hash_hex = series_id.split('-')[-1]
+    hex_digits = set(string.hexdigits)
+    if len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex):
+        return True
+    # If all previous tests passed, then the series_id is not deprecated
+    return False


overall can be

Suggested change

if series_id.count('-') <= 1:

return True

# Check the first part of the series_id is a number

series_number = series_id.split('-')[0]

if not series_number.isdigit():

return True

# Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex

hash_hex = series_id.split('-')[-1]

hex_digits = set(string.hexdigits)

if len(hash_hex) != len_hash_hex or not all(c in hex_digits for c in hash_hex):

return True

# If all previous tests passed, then the series_id is not deprecated

return False

return not re.match(f"[{string.digits}]+-.+-[{string.hexdigits}]{{{len_hash_hex}}}$")

to just match regex... again -- just needs a unittest

yarikoptic · 2023-06-27T13:20:25Z

heudiconv/utils.py

+    series_number = series_id.split('-')[0]
+    if not series_number.isdigit():
+        return True
+    # Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex


Suggested change

# Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex

# Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex

# Note that protocol_name could contain -

is that correct and that is why [-1]?

yarikoptic · 2023-09-20T18:29:28Z

@neurorepro ping on above comments.

neurorepro · 2023-09-20T18:34:49Z

Yes totally for the tests. I actually tested it on two datasets (already having old data, and newly converted) and it went fine. But indeed tests need to be included for continuous testing.

I'll get back to you!

yarikoptic · 2024-02-28T14:45:03Z

a gentle ping

Michael Dayan added 3 commits June 7, 2023 10:30

Fix inconsistent behavior of existing session when using -d compared …

735e727

…to --files option

Fix linting

a7d4af1

Correct wrong series fusion when grouping across study UIDs

c6c7106

neurorepro mentioned this pull request Jun 7, 2023

Fieldmaps acquired during each part of a session splits into two not differentiated #624

Open

yarikoptic mentioned this pull request Jun 9, 2023

test-compare-two-versions.sh: also ignore differences in HeudiconvVersion field in jsons since we have it there now #685

Merged

yarikoptic reviewed Jun 9, 2023

View reviewed changes

heudiconv/dicoms.py Outdated Show resolved Hide resolved

Michael Dayan added 2 commits June 16, 2023 17:49

Add check that conversion table using deprecated series UID is not used

7e75a8a

Add minor formatting and documentation precision

a667aca

Remove hardcoded reference to hash length of 8 in doc

e9c9289

yarikoptic requested changes Jun 27, 2023

View reviewed changes

yarikoptic added the minor Increment the minor version when merged label Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seriesgrouping #684

Seriesgrouping #684

neurorepro commented Jun 7, 2023 •

edited

Loading

neurorepro commented Jun 7, 2023

codecov bot commented Jun 7, 2023 •

edited

Loading

neurorepro commented Jun 9, 2023

yarikoptic commented Jun 9, 2023

neurorepro commented Jun 16, 2023

neurorepro commented Jun 16, 2023 •

edited

Loading

neurorepro commented Jun 27, 2023

yarikoptic left a comment

yarikoptic Jun 27, 2023

yarikoptic Jun 27, 2023

yarikoptic Jun 27, 2023

yarikoptic Jun 27, 2023

yarikoptic Jun 27, 2023

yarikoptic Jun 27, 2023

yarikoptic commented Sep 20, 2023

neurorepro commented Sep 20, 2023

yarikoptic commented Feb 28, 2024

	if any(is_deprecated_seriesid(series_id) for series_id in series_ids):
	if any(map(is_deprecated_seriesid, series_ids)):

		"Found deprecated series identifier in info file in the format"
		"<series num>-<protocol name> instead of <series num>-<protocol name>-<UID hash>"

	# Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex
	# Check the last part of the series_id is a hash in hexadecimal format of length len_hash_hex
	# Note that protocol_name could contain -

Seriesgrouping #684

Are you sure you want to change the base?

Seriesgrouping #684

Conversation

neurorepro commented Jun 7, 2023 • edited Loading

neurorepro commented Jun 7, 2023

codecov bot commented Jun 7, 2023 • edited Loading

Codecov Report

neurorepro commented Jun 9, 2023

yarikoptic commented Jun 9, 2023

neurorepro commented Jun 16, 2023

neurorepro commented Jun 16, 2023 • edited Loading

neurorepro commented Jun 27, 2023

yarikoptic left a comment

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic Jun 27, 2023

Choose a reason for hiding this comment

yarikoptic commented Sep 20, 2023

neurorepro commented Sep 20, 2023

yarikoptic commented Feb 28, 2024

neurorepro commented Jun 7, 2023 •

edited

Loading

codecov bot commented Jun 7, 2023 •

edited

Loading

neurorepro commented Jun 16, 2023 •

edited

Loading