Skip to content
This repository has been archived by the owner on Feb 23, 2024. It is now read-only.

draft validation functions #90

Merged
merged 31 commits into from
Jan 22, 2024
Merged

Conversation

viktorpm
Copy link
Collaborator

@viktorpm viktorpm commented Oct 12, 2023

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

We want to test if all the atlases fit the brainglobe framework

What does this PR do?

  • Adds validation function for the existence of important files
  • Adds validation function for mesh and annotation size (roughly the same)
  • Adds a script that loops over all atlases, updates and validates them
  • adds initial tests for these validation functions (one positive and one negative unit test for each)
  • drops python 3.8 support for GH actions and tox, and ensures pytest is run in tox.
  • suggests an initial structure for validation functionality

References

gets started with brainglobe/brainglobe-atlasapi#217
also fixes #94
fixes #100
fixes #97

How has this PR been tested?

It was run on the HPC. We got the expected output: two lists with passed and failed atlases

Summary
['example_mouse_100um', 'allen_mouse_10um', 'allen_mouse_25um', 'allen_mouse_50um', 'allen_mouse_100um', 'kim_mouse_50um', 'osten_mouse_10um', 'osten_mouse_25um', 'osten_mouse_50um', 'osten_mouse_100um']
[('mpin_zfish_1um', AssertionError('Mesh coordinate 962.676331 and annotation coordinate 872.0 differ by more than 10 times pixel size 1.0')), ('allen_human_500um', AssertionError('Mesh coordinate 232583.5 and annotation coordinate 24500.0 differ by more than 10 times pixel size 500.0')), ('kim_mouse_10um', AssertionError('Mesh coordinate 175.0 and annotation coordinate 860.0 differ by more than 10 times pixel size 10.0')), ('kim_mouse_25um', AssertionError('Mesh coordinate 437.5 and annotation coordinate 875.0 differ by more than 10 times pixel size 25.0')), ('kim_mouse_100um', AssertionError('Mesh coordinate 25550.0 and annotation coordinate 13000.0 differ by more than 10 times pixel size 100.0')), ('allen_cord_20um', AssertionError('Mesh coordinate 390.0 and annotation coordinate 0.0 differ by more than 10 times pixel size 20.0')), ('azba_zfish_4um', AssertionError('Mesh coordinate 59.675 and annotation coordinate 244.0 differ by more than 10 times pixel size 4.0')), ('whs_sd_rat_39um', AssertionError('Mesh coordinate 955.5 and annotation coordinate 0.0 differ by more than 10 times pixel size 39.0')), ('perens_lsfm_mouse_20um', AssertionError('Mesh coordinate 120209.2 and annotation coordinate 120.0 differ by more than 10 times pixel size 20.0')), ('admba_3d_e11_5_mouse_16um', AssertionError('Mesh coordinate 2136.0 and annotation coordinate 5088.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e13_5_mouse_16um', AssertionError('Mesh coordinate 2648.0 and annotation coordinate 6160.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e15_5_mouse_16um', AssertionError('Mesh coordinate 3768.0 and annotation coordinate 7296.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_e18_5_mouse_16um', AssertionError('Mesh coordinate 120.0 and annotation coordinate 400.0 differ by more than 10 times pixel size 16.0')), ('admba_3d_p4_mouse_16.752um', AssertionError('Mesh coordinate 6307.128 and annotation coordinate 11240.591999999999 differ by more than 10 times pixel size 16.752')), ('admba_3d_p14_mouse_16.752um', AssertionError('Mesh coordinate 6374.1359999999995 and annotation coordinate 13016.303999999998 differ by more than 10 times pixel size 16.752')), ('admba_3d_p28_mouse_16.752um', AssertionError('Mesh coordinate 6558.407999999999 and annotation coordinate 14205.696 differ by more than 10 times pixel size 16.752')), ('admba_3d_p56_mouse_25um', AssertionError('Mesh coordinate 437.5 and annotation coordinate 0.0 differ by more than 10 times pixel size 25.0')), ('princeton_mouse_20um', AssertionError('Mesh coordinate 618.5 and annotation coordinate 12360.0 differ by more than 10 times pixel size 20.0')), ('kim_dev_mouse_stp_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_idisco_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_a0_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_adc_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_dwi_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_fa_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_mtr_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0')), ('kim_dev_mouse_mri_t2_10um', AssertionError('Mesh coordinate 515.0 and annotation coordinate 30.0 differ by more than 10 times pixel size 10.0'))]

Is this a breaking change?

No

Does this PR require an update to the documentation?

Not for now.

Checklist:

  • [NA] The code has been tested locally
  • [NA] Tests have been added to cover all new functionality (unit & integration)
  • [NA] The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented Nov 7, 2023

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

@viktorpm viktorpm marked this pull request as ready for review November 8, 2023 10:47
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @viktorpm and @alessandrofelder!

The validation functions appear to do what they claim and are very understandable.

I've sprinkled several pedantic comments throughout, but feel free to take them or leave them, as many of them are a matter of style.

Now, I am coming to my only substantive comment, which concerns the capacity of this design to accommodate multiple future validation functions.

In the validate_atlas() function, the current two validations are performed sequentially, i.e. first assert validate_atlas_files() and then assert validate_mesh_matches_image_extents().

In this case, it makes sense to have them run sequentially, since there is no point in looking at the image extents if the files are not there to begin with.

However, in future, we may add validators that are not dependent on each other, for example:

  • A: validate_mesh_normals
  • B: validate_annotation_image_orientation

In this case, if you assert A, then B, the first assertion will fail if there is a problem with the mesh normals, and the second assertion will never run. So you will not learn if there is also a problem with image orientation. At least that's my prediction of what will happen in that case, correct me if I have misread the code.

It would be nice to have a report of all the independent problems with an atlas, not just the first problem encountered.

I wonder if you guys have thought about such use cases - i.e. how to run multiple independent assertions. One solution could be catching each "AssertionError" and logging it into a list of errors instead of erroring out outright. But I'm sure there are also other ways. Since this problem concerns future validators, and not the currently existing ones, feel free to open an issue instead of fixing it here (if you prefer).

P.S: This PR contains updates to Python tooling, like the dev dependencies and the gh actions. Normally, I'd say it's best practice to implement those in an independent PR, but I understand the practical considerations in this case, so I won't complain about it.

def validate_atlas_files(atlas_path: Path):
"""Checks if basic files exist in the atlas folder"""

assert atlas_path.exists(), f"Atlas path {atlas_path} not found"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exists() method will return True whether the path in question is a directory or a file.
Since we explicitly expect atlas_path to be a directory, would it make sense to be stricter and check for atlas_path.is_dir()?

for expected_file_name in expected_files:
expected_path = Path(atlas_path / expected_file_name)
assert (
expected_path.exists()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, here one could be stricter and test expected_path.is_file() (except for "meshes", which I presume is a directory).
I don't foresee a real case scenario in which a file would be created instead of a folder, or vice versa, but I also don't see a downside to being stricter.
Up to you to decide.



def _assert_close(mesh_coord, annotation_coord, pixel_size):
"""Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring exceeds usual line length limits, use multi-line strings?
For example:

"""Helper function to check if the mesh and the annotation coordinate are
closer to each other than 10 times the pixel size"""

return True


def _assert_close(mesh_coord, annotation_coord, pixel_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annotations are missing here. I presume mesh_coord and annotation_coord are numpy arrays?
The could be marked as mesh_coord: np.ndarray

Would also be helpful to have a full docstring with expected parameters and returns here (since there are several arguments).
In the description of each argument, the expected array shape could also be indicated.

"""Helper function to check if the mesh and the annotation coordinate are closer to each other than 10 times the pixel size"""
assert (
abs(mesh_coord - annotation_coord) <= 10 * pixel_size
), f"Mesh coordinate {mesh_coord} and annotation coordinate {annotation_coord} differ by more than 10 times pixel size {pixel_size}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this into a multi-line string to avoid violating line length limits

Comment on lines 65 to 67
z_min_scaled, z_max_scaled = z_min * resolution[0], z_max * resolution[0]
y_min_scaled, y_max_scaled = y_min * resolution[1], y_max * resolution[1]
x_min_scaled, x_max_scaled = x_min * resolution[2], x_max * resolution[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this block operates on the annotation image, I would move it further above, just after x_min, x_max = ... and before mesh_points = ....
This is to visually split operations on the image vs operations on the mesh vs their comparisons. I would also add a one-line comment above each of these 3 blocks, sth like # get min and max image extents along each axis etc., to improve readability at a glance, but this a more subjective "aesthetic" point, so feel absolutely free to ignore.

If you decide to follow my advice on refactoring this function, the above points may be rendered mute anyway.

Comment on lines 92 to 93
def validate_atlas(atlas_name, version):
"""Validates the latest version of a given atlas"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the atlas get updated to latest during validation, why do you need to pass the version as an argument? Even if you pass an older version, it will be overriden by the update, and the function will actually validate the newer version. Or am I wrong?

updated = get_atlases_lastversions()[atlas_name]["updated"]
if not updated:
update_atlas(atlas_name)
atlas_path = Path(get_brainglobe_dir()) / f"{atlas_name}_v{version}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the version change after the update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would have to get the version after the update

Comment on lines +120 to +124
print("Summary")
print("### Valid atlases ###")
print(valid_atlases)
print("### Invalid atlases ###")
print(invalid_atlases)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to also save the output in a .txt (or .md) file in addition to printing it?

), f"Atlas file {atlas_path} validation failed"
assert validate_mesh_matches_image_extents(
atlas
), "Atlas object validation failed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Atlas object validation" is too abstract for this message. You didn't check the entire object, just the extents of the annotation image and the mesh, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this message anticipates the addition of more checks in this group

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the changes you've already implemented, and they are fine, with the exception of the failing test that I've highlighted.

Let me know if you need help with tackling some of the other trickier suggestions.

Comment on lines +29 to +31
assert (
expected_path.is_file()
), f"Expected file not found at {expected_path}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @viktorpm, the test failure is caused by this assertion. The problem is that you are checking if all elements in expected_files are indeed existing files (as I suggested), but "meshes" is a folder not a file. I would check the meshes separately with is_dir(), for example:

assert atlas_path.is_dir(), f"Atlas path {atlas_path} not found"
expected_files = [
    "annotation.tiff",
    "reference.tiff",
    "metadata.json",
    "structures.json",
]
for expected_file_name in expected_files:
    expected_path = Path(atlas_path / expected_file_name)
    assert (
        expected_path.is_file()
    ), f"Expected file not found at {expected_path}"
meshes_path = atlas_path / "meshes"
assert meshes_path.is_dir(), f"Meshes path {meshes_path} not found"
return True

It's important that you check the meshes folder after you check individual files, otherwise the test_invalid_atlas_path() will fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It's fixed now

@viktorpm viktorpm mentioned this pull request Jan 18, 2024
7 tasks
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the solution you guys came up with for running multiple validations and collecting all the results! Nice work!

I left only 4 tiny comments, 3 of which have to do with appeasing Solar Lint, so that the CI checks won't fail.

Approved 🎉

updated = get_atlases_lastversions()[atlas_name]["updated"]
if not updated:
update_atlas(atlas_name)
Path(get_brainglobe_dir()) / f"{atlas_name}_v{version}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed, as far as I can see it's not being assigned to avariable to used in any other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the removal of this line is probably necessary for the Sonar Lint tests to pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 126 to 127
# validate_atlas(atlas_name, version)
(atlas_name, version),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In total I count 5 validation functions being passed, what's this 6th set of parameters for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was left there by mistake. Removed now

Comment on lines 92 to 97
def open_for_visual_check():
pass


def validate_checksum():
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonar lint complains because it requires you to add a comment to such "empty" functions, mentioning what you intend these for (like you have done for the following check_additional_references() function).
If you add a comment inside each, the checks should pass.

@viktorpm
Copy link
Collaborator Author

Thank you @niksirbi for reviewing it!

@alessandrofelder alessandrofelder merged commit 93e95ab into brainglobe:main Jan 22, 2024
8 checks passed
@alessandrofelder alessandrofelder deleted the validation branch January 22, 2024 17:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants