Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Import log investigation #302

Open
ahafele opened this issue Apr 17, 2024 · 3 comments
Open

Data Import log investigation #302

ahafele opened this issue Apr 17, 2024 · 3 comments

Comments

@ahafele
Copy link

ahafele commented Apr 17, 2024

Jeanette and Yael have reported some discrepancies in data import loads - the numbers in the logs are not consistent and sometimes records error but on subsequent loads those same records will load just fine. Some records are skipped all together but you only know this by scanning the data import log and on subsequent loads that same record will load fine. I'm including here information about one particular file in hopes maybe @shelleydoljack could look at the logs for some clues. Other problem loads are being tracked here.

Springer file troubleshooting
811 records in file
Profile should create new or update if match is found

Production loads (Nolana)
Initial load
https://folio.stanford.edu/data-import/job-summary/51fe7eb9-1536-4501-b775-de1ddae6864f

Resent from VMA to prod
Main log - 500
389 records found
https://folio.stanford.edu/data-import/job-summary/5796cbe6-aef0-4a76-97b9-9933a2123270
Main log - 311
264 records found
https://folio.stanford.edu/data-import/job-summary/4477b856-9c7c-4faf-badd-84f18d53c169
653 total

Resent direct to data import
Main log - 811
665 records found
scanning the log 356 jumps to 359 indicating that 357 and 358 have been skipped any clues in the backend logs as to why? 357 loaded fine in previous load

Errors are showing as io.vertx.core.impl.NoStackTraceThrowable: Timeoutin the DI log but based on a slack convo I saw I checked to see if an error/discarded record was actually updated and it was - #292 Practice and theory of automated timetabling V… https://folio.stanford.edu/data-import/job-summary/f5a82d0b-ebe3-435f-8dae-947ebc5c882b?errorsOnly=true and the instance was last updated at 12:48 pm which corresponds to time of this load.
When we tried to view source the record "broke" and is now disconnected from the MARC. This corresponds to what I saw in this slack thread from 2 years ago.

Test loads (Poppy)
Initial load https://folio-test.stanford.edu/data-import/job-summary/39ced836-cbbd-41a9-b6d1-a15be0f2856d
811 records found
Resent no issues - https://folio-test.stanford.edu/data-import/job-summary/adf6e4d5-7e5a-4788-9ff7-c2d4bd75ba50
811 records found

@shelleydoljack
Copy link
Contributor

select * from sul_mod_source_record_storage.records_lb where external_id = '11bf23c4-82a8-4466-96cf-993a81310a3c'; returns 0 results. I then looked in MetaDB for the instance ID in the folio_source_record marc__t table and it retrieved 0 results. I don't think this one has a source record. Oh yea, it's in my list of instances missing SRS on ticket sul-dlss/libsys-airflow#838 .

@shelleydoljack shelleydoljack removed their assignment Apr 24, 2024
@shelleydoljack
Copy link
Contributor

As for the skipping in the log, records 356 then 359, what happened to 357 and 358? https://folio.stanford.edu/data-import/job-summary/f5a82d0b-ebe3-435f-8dae-947ebc5c882b Is this maybe where a duplicate record was in the input file? 🤷

@ahafele
Copy link
Author

ahafele commented Apr 24, 2024

No duplicates in the file from what I can tell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants