Skip to content

Latest commit

 

History

History
2 lines (2 loc) · 824 Bytes

README.md

File metadata and controls

2 lines (2 loc) · 824 Bytes

TextDiscovery

Linear-progressive text discovery engine in C#. Exposes functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word detection, tokenization, and matching options are configurable. Use a common adapter interface with interchangeable DOM libraries (HtmlAgility, AngleSharp, etc.) to do the following: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and extract text content with selectively preserved formatting indicators. High degree of extensibility leveraging dependency injection. While regex can be used in advanced configurations, it is not required.