Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for auditing physical files and file metadata #11016

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

stevenwinship
Copy link
Contributor

What this PR does / why we need it: Find Datasets with missing files so Admins can either delete the file reference or work with authors to re-upload the files.
See: IQSS/dataverse.harvard.edu#220

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this: Create multiple Datasets with multiple files. If running in Docker locally delete a file from docker-dev-volumes/app/data/store...
call the api and see the missing file listed in the json response.
Other test could include deleting a FileMetadata row from the DB
Request specific Datasets as well as firstId and lastId

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Included

Additional documentation:

@stevenwinship stevenwinship self-assigned this Nov 13, 2024
@coveralls
Copy link

coveralls commented Nov 13, 2024

Coverage Status

coverage: 21.825% (-0.03%) from 21.856%
when pulling 3eec366 on 220-audit-physical-files
into 61b8046 on develop.

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick pass through the docs and code. @stevenwinship please let me know what you think.

doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
Auditing specific Datasets (comma separated list)::
curl "$SERVER_URL/api/admin/datafiles/auditFiles?DatasetIdentifierList=doi.org/10.5072/FK2/JXYBJS,doi.org/10.7910/DVN/MPU019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this pattern of passing in the URL form of a PID minus "https://" anywhere else? It seems ok. Can we pass in the normal PIDs (the non-URL form) instead?

Copy link
Contributor Author

@stevenwinship stevenwinship Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's different... "doi.org/10... vs. doi:10...".

In this PR we. should use the pattern from reExportDataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the doc

"identifier": "DVN/MPU019",
"persistentURL": "https://doi.org/10.7910/DVN/MPU019",
"missingFiles": [
"s3://dvn-cloud:298910, jihad_metadata_edited.csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Easier parsing would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-formatted the json output:

"missingFiles": [
{
"StorageIdentifier": "s3://dvn-cloud:298910",
"label": "jihad_metadata_edited.csv"
}
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks. Do we need the directoryLabel too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added directoryLabel

doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
src/main/java/edu/harvard/iq/dataverse/api/Admin.java Outdated Show resolved Hide resolved
@pdurbin pdurbin changed the title audit physical files API for auditing physical files and file metadata Nov 19, 2024

This comment has been minimized.

4 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:220-audit-physical-files
ghcr.io/gdcc/configbaker:220-audit-physical-files

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) Original size: 30 Type: Feature a feature request
Projects
Status: In Review 🔎
Development

Successfully merging this pull request may close these issues.

3 participants