*Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*) Phase Two

Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) Phase Two

Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) was the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.

For full detail on the first phase of OpenITI AOCP, please see this article.

Phase II of OpenITI AOCP brings together a highly interdisciplinary team of experts in Islamicate studies, digital humanities, and computer science to build textual resources for the digital study of the Islamicate world and transform open-source optical character (OCR)/handwritten textual recognition (HTR) technology for all languages. This project is led by investigators from the Roshan Institute for Persian Studies and Maryland Institute for Technology in the Humanities at the University of Maryland; Northeastern University’s (NU) NULab for Texts, Maps, and Networks; University of California, San Diego; and the Centre for Digital Humanities at the Aga Khan University’s Institute for the Study of Muslim Civilisations (London).

OpenITI AOCP Phase II will build on the considerable successes of the Phase I project in piloting corpus production for Persian and Arabic and advancing OCR character accuracy rates (CARs) on the most common typefaces in Persian and Arabic print history. In Phase II, the OpenITI AOCP team will dramatically expand the size of OpenITI’s Persian and Arabic corpus through large-scale OCR work in Persian and Arabic; extend the linguistic capabilities of OpenITI’s OCR tools into Ottoman Turkish and Urdu; transform its open-source optical character recognition (OCR)/handwritten text recognition (HTR) pipeline by incorporating newly developed unsupervised machine learning tools into its workflow (including into its user-friendly interface, eScriptorium); build individual scholarly and institutional Islamicate manuscript HTR workflows; and convene an experts workshop to critically assess the ethical and technological issues for next-generation digital text dissemination.

Funding and Project Duration: $1,749,722.00 from July 2022 to June 2025.

Workshops: OpenITI AOCP, Phase II convened one workshop at the University of Maryland, College Park. For more information, see the program for the workshop here.

Primary Project Personnel

Anjum Alam

Programme Manager, Institute for the Study of Muslim Civilisations, Aga Khan University, London

Jonathan Parkes Allen

Postdoctoral Research Associate, Roshan Institute for Persian Studies, University of Maryland, College Park

Mathew Barber

Research and Data Visualization Specialist, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Data Visualization, KITAB project

Taylor Berg-Kirkpatrick

Associate Professor of Computer Science, Department of Computer Science and Engineering, University of California, San Diego

Osama Eshera

Assistant Research Professor, Roshan Institute for Persian Studies, University of Maryland, College Park

Mary Hoppe

Digital Projects Assistant, Hill Museum and Manuscript Library

Matthew Thomas Miller

Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; Affiliate, Maryland Institute for Technology in the Humanities

John Mullan

Faculty Assistant, Roshan Institute for Persian Studies, University of Maryland, College Park; Digital Specialist, Roshan Initiative in Persian Digital Humanities

Jacob Murel

Postdoctoral Research Associate, Khoury College of Computer Sciences, Northeastern University

Ryan Muther

Doctoral Candidate, Khoury College of Computer Sciences, Northeastern University

Lorenz Nigst

Assistant Professor of Digital Humanities, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Corpus Management, KITAB project

Janny Peng

Assistant Director for Finance and Administration, School of Languages, Literatures, and Cultures, University of Maryland, College Park

Sarah Bowen Savant

Professor of History, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Principal Investigator, KITAB project

Masoumeh Seydi

Digital Lead, KITAB project

Taimoor Shahid

Mellon Islamicate Digital Humanities Postdoctoral Associate, Roshan Institute for Persian Studies, University of Maryland, College Park

Farrukh Shahzad

Chief Librarian, Forman Christian College University

David Smith

Associate Professor, Khoury College of Computer Sciences, Northeastern University; Founding Member, NULab for Texts, Maps, and Networks

Raffaele Viglianti

Senior Research Software Developer, Maryland Institute for Technology in the Humanities, University of Maryland, College Park

Nikolai Vogler

Doctoral Candidate, Department of Computer Science and Engineering, University of California, San Diego

OpenITI AOCP Phase II Partner Projects

Carl Ernst

Professor Emeritus & William R. Kenan, Jr. Distinguished Professor of Islamic Studies, Department of Religious Studies, University of North Carolina at Chapel Hill; Principal Investigator, Omar ibn Said Digitization Project

Wayne Graham

Chief Information Officer and Director of Informatics, Cultural Networks, and Knowledge Systems, Council on Library and Information Resources; Technical Lead, Digital Library of the Middle East

Intisar A. Rabb

Professor of Law, Harvard Law School; Professor of History, Harvard University; Director, Program in Islamic Law

Marina Rustow

Khedouri A. Zilkha Professor of Jewish Civilization in the Near East, Princeton University; Director, Princeton Geniza Lab