Extracting and Tagging Unstructured Citation of a Hebrew Religious Document

Dror Mughaz, Yaakov HaCohen-Kerner, Dov Gabbay
InSITE 2019  •  2019  •  pp. 461-473
Aim/Purpose: Finding and tagging citation on an ancient Hebrew religious document. These documents have no structured citations and have no bibliography.

Background: We look for common patterns within Hebrew religious texts.

Methodology: We developed a method that goes over the texts and extracts sentences con-taining the names of three famous authors. Within these sentences we find common ways of addressing those three authors and with these patterns we find references to various other authors.

Contribution: This type of text is rich in citations and references to authors, but because there is no structure of references it is very difficult for a computer to automatically identify the references. We hope that with the method we have developed it will be easier for a computer to identify references and even turn them into hyper-links.

Findings: We have provided an algorithm to solve the problem of non-structured cita-tions in an old Hebrew plain text. The algorithm definitely was able to find many citations but it has missed out some types of citations.

Impact on Society: When the computer recognizes references, it will be able to build (at least par-tially) a bibliography that currently does not exist in such texts at all. Over time, OCR scans more and more ancient texts. This method can make people's access and understanding much.

Future Research: After we identify the references, we plan to automatically create a bibliography for these texts and even transform those references into hyperlinks.
citations, text-mining, information extraction, Hebrew
16 total downloads
