Add support for searching within/across IIIF annotations for Japanese language transcription #201

caaster · 2019-07-31T23:15:14Z

Use cases drawn from SUL Text Search Study Report (July 2019). Please note: this is ultimately likely to be an overlapping set of requirements that requires further investigation and specification.

Use case 1:
The Magario diaries include 40 years of handwritten pages in Japanese by donor Steven Yoba, representing a rare instance of trans-Japanese history (Japan + US). The Japanese diary pages have been accessioned as individual images in the SDR. OCR does not work well for Japanese; Japanese transcriptions for each page were created by hand and are currently in non-accessioned individual MS Word pages. This is a high profile collection with broad faculty support. The content should be searchable and ideally accessible to text-mining. Curator: Murphy Kao

Use case 2:
The NDC collection comprises Japanese books cataloged by Hoover using the Nippon Decimal Classification system and housed at SAL1/2. The collection, which was transferred to EAL in the early 2000s, contains many rare books related to 20th century history and was digitized by Google Books years ago. Curator: Regan Murphy Kao

@anarchivist comment:
Our implementation of the IIIF Content Search API does not currently support the level of analysis for CJK query terms as provided for SearchWorks. However, the Content Search API supports CJK text, and examples of CJK transcription via annotation do exist.

Quinn Dombrowski comment:
Dombrowski has experimented with creating page-level Japanese-language OCR files (TXT) for the Magario Family Diaries (see below for more information about this collection). Note that in addition to requiring Japanese-language support in Content Search, remediating the accessioned diary page images with the OCR files and enabling text search support for the collection would also require infrastructure development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for searching within/across IIIF annotations for Japanese language transcription #201

Add support for searching within/across IIIF annotations for Japanese language transcription #201

caaster commented Jul 31, 2019 •

edited

Loading

Add support for searching within/across IIIF annotations for Japanese language transcription #201

Add support for searching within/across IIIF annotations for Japanese language transcription #201

Comments

caaster commented Jul 31, 2019 • edited Loading

caaster commented Jul 31, 2019 •

edited

Loading