*Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*) Phase Two

Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) Phase Two

Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) was the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.

For full detail on the first phase of OpenITI AOCP, please see this article.

Phase II of OpenITI AOCP brings together a highly interdisciplinary team of experts in Islamicate studies, digital humanities, and computer science to build textual resources for the digital study of the Islamicate world and transform open-source optical character (OCR)/handwritten textual recognition (HTR) technology for all languages. This project is led by investigators from the Roshan Institute for Persian Studies and Maryland Institute for Technology in the Humanities at the University of Maryland; Northeastern University’s (NU) NULab for Texts, Maps, and Networks; University of California, San Diego; and the Centre for Digital Humanities at the Aga Khan University’s Institute for the Study of Muslim Civilisations (London).

OpenITI AOCP Phase II will build on the considerable successes of the Phase I project in piloting corpus production for Persian and Arabic and advancing OCR character accuracy rates (CARs) on the most common typefaces in Persian and Arabic print history. In Phase II, the OpenITI AOCP team will dramatically expand the size of OpenITI’s Persian and Arabic corpus through large-scale OCR work in Persian and Arabic; extend the linguistic capabilities of OpenITI’s OCR tools into Ottoman Turkish and Urdu; transform its open-source optical character recognition (OCR)/handwritten text recognition (HTR) pipeline by incorporating newly developed unsupervised machine learning tools into its workflow (including into its user-friendly interface, eScriptorium); build individual scholarly and institutional Islamicate manuscript HTR workflows; and convene an experts workshop to critically assess the ethical and technological issues for next-generation digital text dissemination.