The Open Islamicate Texts Initiative (OpenITI) is a multi-institutional effort led by researchers at the Aga Khan University’s Institute for the Study of Muslim Civilisations in London, Roshan Institute for Persian Studies at the University of Maryland, College Park, and Universität Hamburg that aims to develop the digital infrastructure for the study of Islamicate cultures.
Since its founding in 2016, OpenITI's work has focused on the tasks necessary to build digital capacity in Islamicate studies, including improving Arabic-script optical character recognition (OCR) and handwritten text recognition (HTR), developing robust Arabic-script standards for OCR and HTR output and text encoding, and creating platforms for the collaborative creation of Islamicate text corpora and digital editions.
Our primary work on OCR and HTR—done in collaboration with researchers from Northeastern University, Université Paris Sciences et Lettres, and the University of California, San Diego among others—has produced the most accurate results to date on Arabic-script texts (see relevant publications). Most importantly, these results were achieved on the open-source OCR and HTR engine Kraken which is retrainable and can be adapted for precise scholarly needs. This work has been funded by The Andrew W. Mellon Foundation, the National Endowment for the Humanities, and the National Science Foundation. Please see the OpenITI Arabic-script OCR Catalyst Project, Phase One, OpenITI Arabic-script OCR Catalyst Project, Phase Two, Automatic Collation for Diversifying Corpora, and Textual Lacunae Reconstruction Tool project pages for more information.
OpenITI's secondary focus comes out of our OCR and HTR work: we want to create a machine-actionable and standards-compliant scholarly corpus of Islamicate texts, covering an ever-increasing number of Persian, Arabic, Ottoman Turkish, and Urdu works. We will make these works available in a variety of formats (plaintext, OpenITI mARkdown, TEI XML) and enrich them with as much verified metadata as possible. Please see the OpenITI corpus project page for more information.