Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for physdesc elements that are not extent #1448

Closed
marlo-longley opened this issue Dec 8, 2023 · 4 comments
Closed

Allow for physdesc elements that are not extent #1448

marlo-longley opened this issue Dec 8, 2023 · 4 comments
Assignees

Comments

@marlo-longley
Copy link
Contributor

For example
physfacet
dimensions

This ticket covers indexing, display, and formatting.

This ticket is broken out from #898

@seanaery
Copy link
Contributor

seanaery commented Dec 8, 2023

This likely warrants some community input on what approach we should take in core.

At Duke, we would be happy to PR our solution into core if the community finds this applicable. Our current strategy is to capture all values from within <physdesc>, whether it's direct children of <physdesc> or within child elements, e.g., <extent>, <dimensions>, or <physfacet>. Because <extent> has its own special rules, e.g., for display in extent badges, we omit any extent values from Physical Description on the display end, and then display the rest under a Physical Description header.

Collection-level indexing:

# DUL CUSTOMIZATION: Capture text in physical description; separate values for text directly
# in physdesc vs. in individual child elements e.g., <extent>, <dimensions>, or <physfacet>
# NOTE arclight 1.0.x currently only captures extent from physdesc & ignores the rest.
to_field 'physdesc_tesim', extract_xpath('/ead/archdesc/did/physdesc/child::*')
to_field 'physdesc_tesim', extract_xpath('/ead/archdesc/did/physdesc[not(child::*)]')

Component-level indexing:

to_field 'physdesc_tesim', extract_xpath('./did/physdesc/child::*')
to_field 'physdesc_tesim', extract_xpath('./did/physdesc[not(child::*)]')

solr_document.rb

# DUL CUSTOMIZATION: we want the physdesc accessor to return the physdesc values
# that are not extent values.
def physdesc
  all_physdesc = fetch('physdesc_tesim', [])
  all_extent = fetch('extent_ssm', [])
  all_physdesc - all_extent
end

Then display it via the catalog_controller.rb...

config.add_summary_field ...
config.add_component_field ...

Search in our codebase for physdesc

@randalldfloyd
Copy link
Contributor

randalldfloyd commented Dec 8, 2023

Our customizations at IU assume that all child elements are excluded while indexing physdesc, and that the indexing/display of those children are handled individually elsewhere. So our indexing for collection and component skips the individual elements like:

to_field 'physdesc_ssm', extract_xpath('/ead/archdesc/did/physdesc', to_text: false) do |_record, accumulator|
  accumulator.map! do |element|
    physdesc = []
    element.children.map do |child|
      next if child.class == Nokogiri::XML::Element
...

And then lots of logic scattered about to pick up the child elements...

(collection)
to_field 'physfacet_ssm', extract_xpath('/ead/archdesc/did/physdesc/physfacet')
to_field 'dimensions_ssm', extract_xpath('/ead/archdesc/did/physdesc/dimensions')
to_field 'extent_ssm', extract_xpath('/ead/archdesc/did/physdesc', to_text: false) do
    (logic to pick out just the extent elements...)
...
(component)
to_field 'physfacet_ssm', extract_xpath('./did/physdesc/physfacet')
to_field 'dimensions_ssm', extract_xpath('./did/physdesc/dimensions')
to_field 'extent_ssm', extract_xpath('./did/physdesc', to_text: false) do
    (logic to pick out just the extent elements...)

And then the individual fields above are used directly in the catalog controller to add to summary and component fields.

See:
Collection indexing:
https://github.com/IUBLibTech/ngao/blob/main/lib/ngao/traject/ead2_config.rb#L213-L269

Component indexing:
https://github.com/IUBLibTech/ngao/blob/main/lib/ngao/traject/ead2_config.rb#L472-L528

Collection catalog field definitions:
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L285
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L301-L302
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L307

Component catalog field definitions:
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L349
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L361-L362
https://github.com/IUBLibTech/ngao/blob/main/app/controllers/catalog_controller.rb#L372

@randalldfloyd randalldfloyd self-assigned this Dec 8, 2023
@randalldfloyd
Copy link
Contributor

@seanaery Just making sure I understand your implementation correctly...

You are essentially just getting the text only out of physdesc directly and then also the text out of its child elements, but then the accessor subtracts any values found from the separate extent field. The catalog controller receives the resulting array of text values and applies format per the the defined summary/component field definitions (the separator etc.)

So in the end you have a single field in the display, and all values are contained there with no inner labeling of where those values originally came from (i.e. physfacet vs. dimensions etc.)

The difference with IU is direct access of those child elements, which are labeled separately. Just wondering how to resolve questions of granularity like this for core default behaviors.

@seanaery
Copy link
Contributor

That's exactly correct @randalldfloyd -- that how ours currently works (this was a change we made in our recent upgrade). The way you have physdesc & descendants set up at IU seems more appropriate for the core engine than our setup is, I think.

I'm tagging @mmmmcode for her thoughts on this as well. Your IU code might even be preferable for Duke's site. It does give you more granular control over those different elements...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants