draft validation functions #90

viktorpm · 2023-10-12T15:40:50Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?

We want to test if all the atlases fit the brainglobe framework

What does this PR do?

Adds validation function for the existence of important files
Adds validation function for mesh and annotation size (roughly the same)
Adds a script that loops over all atlases, updates and validates them
adds initial tests for these validation functions (one positive and one negative unit test for each)
drops python 3.8 support for GH actions and tox, and ensures pytest is run in tox.
suggests an initial structure for validation functionality

References

gets started with brainglobe/brainglobe-atlasapi#217
also fixes #94
fixes #100
fixes #97

How has this PR been tested?

It was run on the HPC. We got the expected output: two lists with passed and failed atlases

Summary
['example_mouse_100um', 'allen_mouse_10um', 'allen_mouse_25um', 'allen_mouse_50um', 'allen_mouse_100um', 'kim_mouse_50um', 'osten_mouse_10um', 'osten_mouse_25um', 'osten_mouse_50um', 'osten_mouse_100um']
[('mpin_zfish_1um', AssertionError('Mesh coordinate 962.676331 and annotation coordinate 872.0 differ by more than 10 times pixel size 1.0')), ('allen_human_500um', AssertionError('Mesh coordinate 232583.5 and annotation coordinate 24500.0 differ by more than 10 times pixel size 500.0')), ('kim_mouse_10um', AssertionError('Mesh coordinate 175.0 and annotation coordinate 860.0 differ by more than 10 times pixel size 10.0')), ('kim_mouse_25um', AssertionError('Mesh coordinate 437.5 and annotation coordinate 875.0 differ by more than 10 times pixel size 25.0')), ('kim_mouse_100um', AssertionError('Mesh coordinate 25550.0 and annotation coordinate 13000.0 differ by more than 10 times pixel size 100.0')), ('allen_cord_20um', AssertionError('Mesh coordinate 390.0 and annotation coordinate 0.0 differ by more than 10 times pixel size 20.0')), ('azba_zfish_4um', AssertionError('Mesh coordinate 59.675 and annotation coordinate 244.0 differ by more than 10 times pixel size 4.0')), ('whs_sd_rat_39um', AssertionError('Mesh coordinate 955.5 and annotation coordinate 0.0 differ by more than 10 times pixel size 39.0')), ('perens_lsfm_mouse_20um', AssertionError('Mesh coordinate 120209.2 and annotation coordinate 120.0 differ by more than 10 times pixel size 20.0')), ('admba_3d_e11_5_mouse_16um', AssertionError('Mesh coordinate 2136.0 and annotation coordinate 5088.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e13_5_mouse_16um', AssertionError('Mesh coordinate 2648.0 and annotation coordinate 6160.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e15_5_mouse_16um', AssertionError('Mesh coordinate 3768.0 and annotation coordinate 7296.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e18_5_mouse_16um', AssertionError('Mesh coordinate 120.0 and annotation coordinate 400.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_p4_mouse_16.752um', AssertionError('Mesh coordinate 6307.128 and annotation coordinate 11240.591999999999 differ by more than 10 times pixel size 16.752')), ('admba_3d_p14_mouse_16.752um', AssertionError('Mesh coordinate 6374.1359999999995 and annotation coordinate 13016.303999999998 differ by more than 10 times pixel size 16.752')), ('admba_3d_p28_mouse_16.752um', AssertionError('Mesh coordinate 6558.407999999999 and annotation coordinate 14205.696 differ by more than 10 times pixel size 16.752')), ('admba_3d_p56_mouse_25um', AssertionError('Mesh coordinate 437.5 and annotation coordinate 0.0 differ by more than 10 times pixel size 25.0')), ('princeton_mouse_20um', AssertionError('Mesh coordinate 618.5 and annotation coordinate 12360.0 differ by more than 10 times pixel size 20.0')), ('kim_dev_mouse_stp_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_idisco_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_a0_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_adc_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_dwi_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_fa_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_mtr_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_t2_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0'))]

Is this a breaking change?

No

Does this PR require an update to the documentation?

Not for now.

Checklist:

[NA] The code has been tested locally
[NA] Tests have been added to cover all new functionality (unit & integration)
[NA] The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

for more information, see https://pre-commit.ci

fixing atlas path

Clearer output printing

for more information, see https://pre-commit.ci

codecov · 2023-11-07T16:17:47Z

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

niksirbi

Great job @viktorpm and @alessandrofelder!

The validation functions appear to do what they claim and are very understandable.

I've sprinkled several pedantic comments throughout, but feel free to take them or leave them, as many of them are a matter of style.

Now, I am coming to my only substantive comment, which concerns the capacity of this design to accommodate multiple future validation functions.

In the validate_atlas() function, the current two validations are performed sequentially, i.e. first assert validate_atlas_files() and then assert validate_mesh_matches_image_extents().

In this case, it makes sense to have them run sequentially, since there is no point in looking at the image extents if the files are not there to begin with.

However, in future, we may add validators that are not dependent on each other, for example:

A: validate_mesh_normals
B: validate_annotation_image_orientation

In this case, if you assert A, then B, the first assertion will fail if there is a problem with the mesh normals, and the second assertion will never run. So you will not learn if there is also a problem with image orientation. At least that's my prediction of what will happen in that case, correct me if I have misread the code.

It would be nice to have a report of all the independent problems with an atlas, not just the first problem encountered.

I wonder if you guys have thought about such use cases - i.e. how to run multiple independent assertions. One solution could be catching each "AssertionError" and logging it into a list of errors instead of erroring out outright. But I'm sure there are also other ways. Since this problem concerns future validators, and not the currently existing ones, feel free to open an issue instead of fixing it here (if you prefer).

P.S: This PR contains updates to Python tooling, like the dev dependencies and the gh actions. Normally, I'd say it's best practice to implement those in an independent PR, but I understand the practical considerations in this case, so I won't complain about it.

niksirbi · 2023-11-14T11:13:31Z

bg_atlasgen/validate_atlases.py

+def validate_atlas_files(atlas_path: Path):
+    """Checks if basic files exist in the atlas folder"""
+
+    assert atlas_path.exists(), f"Atlas path {atlas_path} not found"


The exists() method will return True whether the path in question is a directory or a file.
Since we explicitly expect atlas_path to be a directory, would it make sense to be stricter and check for atlas_path.is_dir()?

niksirbi · 2023-11-14T11:16:11Z

bg_atlasgen/validate_atlases.py

+    for expected_file_name in expected_files:
+        expected_path = Path(atlas_path / expected_file_name)
+        assert (
+            expected_path.exists()


Similarly, here one could be stricter and test expected_path.is_file() (except for "meshes", which I presume is a directory).
I don't foresee a real case scenario in which a file would be created instead of a folder, or vice versa, but I also don't see a downside to being stricter.
Up to you to decide.

niksirbi · 2023-11-14T11:17:18Z

bg_atlasgen/validate_atlases.py

+
+
+def _assert_close(mesh_coord, annotation_coord, pixel_size):
+    """Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""


Docstring exceeds usual line length limits, use multi-line strings?
For example:

"""Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""

niksirbi · 2023-11-14T11:19:44Z

bg_atlasgen/validate_atlases.py

+    return True
+
+
+def _assert_close(mesh_coord, annotation_coord, pixel_size):


type annotations are missing here. I presume mesh_coord and annotation_coord are numpy arrays?
The could be marked as mesh_coord: np.ndarray

Would also be helpful to have a full docstring with expected parameters and returns here (since there are several arguments).
In the description of each argument, the expected array shape could also be indicated.

niksirbi · 2023-11-14T11:20:44Z

bg_atlasgen/validate_atlases.py

+    """Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""
+    assert (
+        abs(mesh_coord - annotation_coord) <= 10 * pixel_size
+    ), f"Mesh coordinate {mesh_coord} and annotation coordinate {annotation_coord} differ by more than 10 times pixel size {pixel_size}"


Make this into a multi-line string to avoid violating line length limits

niksirbi · 2023-11-14T11:42:24Z

bg_atlasgen/validate_atlases.py

+    z_min_scaled, z_max_scaled = z_min * resolution[0], z_max * resolution[0]
+    y_min_scaled, y_max_scaled = y_min * resolution[1], y_max * resolution[1]
+    x_min_scaled, x_max_scaled = x_min * resolution[2], x_max * resolution[2]


Since this block operates on the annotation image, I would move it further above, just after x_min, x_max = ... and before mesh_points = ....
This is to visually split operations on the image vs operations on the mesh vs their comparisons. I would also add a one-line comment above each of these 3 blocks, sth like # get min and max image extents along each axis etc., to improve readability at a glance, but this a more subjective "aesthetic" point, so feel absolutely free to ignore.

If you decide to follow my advice on refactoring this function, the above points may be rendered mute anyway.

niksirbi · 2023-11-14T11:51:57Z

bg_atlasgen/validate_atlases.py

+def validate_atlas(atlas_name, version):
+    """Validates the latest version of a given atlas"""


If the atlas get updated to latest during validation, why do you need to pass the version as an argument? Even if you pass an older version, it will be overriden by the update, and the function will actually validate the newer version. Or am I wrong?

niksirbi · 2023-11-14T11:52:35Z

bg_atlasgen/validate_atlases.py

+    updated = get_atlases_lastversions()[atlas_name]["updated"]
+    if not updated:
+        update_atlas(atlas_name)
+    atlas_path = Path(get_brainglobe_dir()) / f"{atlas_name}_v{version}"


Won't the version change after the update?

you would have to get the version after the update

niksirbi · 2023-11-14T11:54:19Z

bg_atlasgen/validate_atlases.py

+    print("Summary")
+    print("### Valid atlases ###")
+    print(valid_atlases)
+    print("### Invalid atlases ###")
+    print(invalid_atlases)


Would it be useful to also save the output in a .txt (or .md) file in addition to printing it?

niksirbi · 2023-11-14T12:04:06Z

bg_atlasgen/validate_atlases.py

+    ), f"Atlas file {atlas_path} validation failed"
+    assert validate_mesh_matches_image_extents(
+        atlas
+    ), "Atlas object validation failed"


"Atlas object validation" is too abstract for this message. You didn't check the entire object, just the extents of the annotation image and the mesh, right?

Unless this message anticipates the addition of more checks in this group

…erance argument to _assert_close function

for more information, see https://pre-commit.ci

…mments

…idation

niksirbi

I've looked at the changes you've already implemented, and they are fine, with the exception of the failing test that I've highlighted.

Let me know if you need help with tackling some of the other trickier suggestions.

niksirbi · 2023-12-06T14:30:54Z

bg_atlasgen/validate_atlases.py

+        assert (
+            expected_path.is_file()
+        ), f"Expected file not found at {expected_path}"


Hey @viktorpm, the test failure is caused by this assertion. The problem is that you are checking if all elements in expected_files are indeed existing files (as I suggested), but "meshes" is a folder not a file. I would check the meshes separately with is_dir(), for example:

assert atlas_path.is_dir(), f"Atlas path {atlas_path} not found" expected_files = [ "annotation.tiff", "reference.tiff", "metadata.json", "structures.json", ] for expected_file_name in expected_files: expected_path = Path(atlas_path / expected_file_name) assert ( expected_path.is_file() ), f"Expected file not found at {expected_path}" meshes_path = atlas_path / "meshes" assert meshes_path.is_dir(), f"Meshes path {meshes_path} not found" return True

It's important that you check the meshes folder after you check individual files, otherwise the test_invalid_atlas_path() will fail.

Thank you! It's fixed now

…al errors

niksirbi

I like the solution you guys came up with for running multiple validations and collecting all the results! Nice work!

I left only 4 tiny comments, 3 of which have to do with appeasing Solar Lint, so that the CI checks won't fail.

Approved 🎉

niksirbi · 2024-01-18T17:49:42Z

bg_atlasgen/validate_atlases.py

+    updated = get_atlases_lastversions()[atlas_name]["updated"]
+    if not updated:
+        update_atlas(atlas_name)
+    Path(get_brainglobe_dir()) / f"{atlas_name}_v{version}"


I think this line can be removed, as far as I can see it's not being assigned to avariable to used in any other way.

In fact, the removal of this line is probably necessary for the Sonar Lint tests to pass.

niksirbi · 2024-01-18T17:50:44Z

bg_atlasgen/validate_atlases.py

+        # validate_atlas(atlas_name, version)
+        (atlas_name, version),


In total I count 5 validation functions being passed, what's this 6th set of parameters for?

It was left there by mistake. Removed now

niksirbi · 2024-01-18T17:53:21Z

bg_atlasgen/validate_atlases.py

+def open_for_visual_check():
+    pass
+
+
+def validate_checksum():
+    pass


Sonar lint complains because it requires you to add a comment to such "empty" functions, mentioning what you intend these for (like you have done for the following check_additional_references() function).
If you add a comment inside each, the checks should pass.

viktorpm · 2024-01-19T09:47:11Z

Thank you @niksirbi for reviewing it!

viktorpm and others added 8 commits October 12, 2023 16:39

draft validation functions

dec80c1

[pre-commit.ci] auto fixes from pre-commit.com hooks

566fc44

for more information, see https://pre-commit.ci

run on all atlases, don't crash on assertion error

8bb2150

fix merging

8aa73e7

fixing atlas path

9423015

Merge pull request #1 from viktorpm/fix-path

1fcafa4

fixing atlas path

Clearer output printing

319e721

Merge pull request #2 from viktorpm/minor-tweaks-for-validation

22d6c96

Clearer output printing

alessandrofelder assigned alessandrofelder and viktorpm Nov 3, 2023

alessandrofelder and others added 10 commits November 3, 2023 11:04

tidy up validation script, remove weird test_git

9402b66

add dev install, make test structure, initial tests

598a0e1

[pre-commit.ci] auto fixes from pre-commit.com hooks

96c8584

for more information, see https://pre-commit.ci

add tests and return for _assert_close()

ffef504

add test for validate mesh matches annotation

92e94e6

fix linting

29343dc

update version for actions

5995c43

drop py3.8 in tox, run pytest in tox

cb9cc02

[pre-commit.ci] auto fixes from pre-commit.com hooks

85419d8

for more information, see https://pre-commit.ci

fix copy-paste error in pytest command

dc4aebc

alessandrofelder and others added 2 commits November 7, 2023 16:17

drop py3.8 from gh action workflow file too

a59da92

Adding docstrings to validation script

f06c82f

viktorpm marked this pull request as ready for review November 8, 2023 10:47

viktorpm requested a review from niksirbi November 8, 2023 10:48

niksirbi approved these changes Nov 14, 2023

View reviewed changes

viktorpm and others added 4 commits November 29, 2023 10:48

Making path tests stricter, breaking up long strings, adding diff_tol…

dace6b3

…erance argument to _assert_close function

[pre-commit.ci] auto fixes from pre-commit.com hooks

9faf48f

for more information, see https://pre-commit.ci

restructuring validate_mesh_matches_image_extents function, adding co…

777d309

…mments

Merge branch 'validation' of github.com:viktorpm/bg-atlasgen into val…

b8377c5

…idation

viktorpm requested a review from niksirbi November 29, 2023 17:01

niksirbi suggested changes Dec 6, 2023

View reviewed changes

viktorpm and others added 6 commits January 3, 2024 13:31

testing expected files and meshes directory separately

8736701

looping through validation functions and parameters to catch individu…

b517507

…al errors

removing hard coded path, generalising to all atlases

26b9dcf

adding successful_validations list

f7fa093

tidying up duplications

af84ec3

fix recursive bug

bd1f185

viktorpm requested a review from niksirbi January 18, 2024 11:29

viktorpm mentioned this pull request Jan 18, 2024

Structure validation #110

Merged

7 tasks

niksirbi approved these changes Jan 18, 2024

View reviewed changes

addressing Niko's final comments, cleaning code

d0f81ab

alessandrofelder approved these changes Jan 22, 2024

View reviewed changes

alessandrofelder merged commit 93e95ab into brainglobe:main Jan 22, 2024
8 checks passed

alessandrofelder deleted the validation branch January 22, 2024 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft validation functions #90

draft validation functions #90

viktorpm commented Oct 12, 2023 •

edited by alessandrofelder

Loading

codecov bot commented Nov 7, 2023

niksirbi left a comment •

edited

Loading

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi Nov 14, 2023

niksirbi left a comment

niksirbi Dec 6, 2023

viktorpm Jan 3, 2024

niksirbi left a comment

niksirbi Jan 18, 2024

niksirbi Jan 18, 2024

viktorpm Jan 22, 2024

niksirbi Jan 18, 2024

viktorpm Jan 22, 2024

niksirbi Jan 18, 2024

viktorpm commented Jan 19, 2024



		def _assert_close(mesh_coord, annotation_coord, pixel_size):
		"""Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""

		return True


		def _assert_close(mesh_coord, annotation_coord, pixel_size):

		def validate_atlas(atlas_name, version):
		"""Validates the latest version of a given atlas"""

draft validation functions #90

draft validation functions #90

Conversation

viktorpm commented Oct 12, 2023 • edited by alessandrofelder Loading

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

codecov bot commented Nov 7, 2023

Welcome to Codecov 🎉

niksirbi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksirbi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksirbi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viktorpm commented Jan 19, 2024

viktorpm commented Oct 12, 2023 •

edited by alessandrofelder

Loading

niksirbi left a comment •

edited

Loading