ДОСВІД ЗАСТОСУВАННЯ ШТУЧНОГО ІНТЕЛЕКТУ  В РОБОТІ З СЕРЕДНЬОВІЧНИМИ МАНУСКРИПТАМИ

Maksym Voloshchuk; Bohdana Zarembovska

doi:10.15330/gal.39.171-179

Authors

Maksym Voloshchuk https://orcid.org/0009-0005-4950-6234
Bohdana Zarembovska https://orcid.org/0009-0002-7673-5970

DOI:

https://doi.org/10.15330/gal.39.171-179

Keywords:

paleography, artificial intelligence, computer vision, handwritten text recognition, machine learning.

Abstract

The article presents the principles and approaches to the application of artificial intelligence (AI) in the work with handwritten historical documents. With the development of machine learning methods and computer vision, the use of automated systems for the analysis, structuring, and recognition of texts from scanned documents has become increasingly widespread, as evidenced by a substantial body of contemporary scholarly research in this field. The use of such mechanisms is becoming a standard practice in large-scale archival projects. However, despite the significant number of available tools, most existing solutions are primarily oriented toward the processing of documents from the modern period, while the problem of automated processing of medieval manuscripts remains insufficiently studied due to the variability of handwriting and the physical deterioration of the materials.

In this article, we analyze current approaches to the application of machine learning in the recognition of handwritten historical texts, in particular methods for the detection and segmentation of structural elements of documents. We also propose our own computer system capable of processing Latin-language documents from the Carolingian and Ottonian periods of the 9th–11th centuries. The particular complexity of working with documents from this period is due to the specifics of Carolingian minuscule, including the presence of numerous ligatures, ascenders and descenders, and characteristic medieval abbreviations, which pose serious challenges for standard OCR algorithms. At the same time, documents of the Carolingian period possess high source value for historical and palaeographic research, as they record early forms of administrative, legal, and written practices in Western Europe and reflect key stages in the formation of the medieval documentary tradition.

The system proposed in this paper is modular and consists of four interconnected machine learning models, each performing a specific role in the overall document processing pipeline. The system provides step-by-step detection of text lines and words, as well as their recognition through the combination of different models designed to identify visually and syntactically similar words. This approach improves the robustness of recognition under conditions of limited training data and ensures better adaptation to the specific characteristics of medieval handwriting.

References

Aguilar S. (2025). From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents. DOI: 10.48550/arXiv.2506.20326. (in English).

Copeland B. J. Artificial intelligence. URL: https://www.britannica.com/technology/artificial-intelligence (in English).

Boillet M., Kermorvant Ch., Paquet Th. (2022). Robust text line detection in historical documents: learning and evaluation methods. International Journal on Document Analysis and Recognition (IJDAR). Vol. 25. P. 1–20. DOI: 10.1007/s10032-022-00395-7 (in English).

Stryker C., Kavlakoglu E. What is AI? DOI: https://www.ibm.com/think/topics/artificial-intelligence. (in English).

Simistira F., Seuret M., Eichenberger N., Garz A., Liwicki M., and Ingold R. (2016). DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts. International Conference on Frontiers in Handwriting Recognition. P. 471–476 (in English).

NASA. What is Artificial Intelligence? DOI: https://www.nasa.gov/what-is-artificial-intelligence (in English).

Aguilar S. T., Jolivet V. Handwritten Text Recognition for Documentary Medieval Manuscripts. 2022. DOI: hal-03892163v1. (in English).

Clérice Th., Pinche A., Vlachou-Efstathiou M., Chagué A., Camps J.-B. et al. (2024). CATMuS Medieval: A multilingual large-scale cross-century dataset in Latin script for handwritten text recognition and beyond. DOI: hal-04453952 (in English).

Voloshchuk, M., Zarembovska, B. (2024). Vykorystannia shtuchnoho intelektu (mashynnoho nav¬chan-nia) dlia rozchytky serednovichnykh istorychnykh dokumentiv. Students’ki istorychni zoshyty. Vol. 16. S. 116–125 (in Ukrainian).

EXPERIENCE OF USING ARTIFICIAL INTELLIGENCE IN WORKING WITH MEDIEVAL MANUSCRIPTS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Language

Information