*Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project* (*OpenITI AOCP*) Phase One

Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) Phase One

Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) is the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.

OpenITI AOCP is led by an interdisciplinary team of humanities, computer science, and digital humanities co-principal investigators from Roshan Institute for Persian Studies at the University of Maryland, College Park, Northeastern University’s NULab for Texts, Maps, and Networks, the Aga Khan University’s Institute for the Study of Muslim Civilisations in London, and the Maryland Institute for Technology in the Humanities at the University of Maryland, College Park. We are proud to partner with the SHARIAsource project of the Program in Islamic Law at Harvard Law School and the eScripta project of Université Paris Sciences et Lettres for the technical development portion of the project.

The primary technical goal of the first phase of OpenITI AOCP is to achieve ≥97% character accuracy rates (CARs) for OCR on the most used Persian and Arabic print typefaces. We are well on our way to achieving this goal, and we have begun preliminary testing and training data production for Ottoman Turkish and Urdu typefaces as well. Our latest OCR CAR test results can be found here.

The second major deliverable of OpenITI AOCP is an open-source and user-friendly digital text production pipeline for Persian and Arabic texts. After initially partnering with SHARIAsource to develop our own pipeline, CorpusBuilder, we decided to join the eScripta team in expanding eScriptorium.

OpenITI’s beta testing instance of eScriptorium can be found here. Please contact John Mullan if you are interested in being a part of our user testing groups.

For full detail on the first phase of OpenITI AOCP, please see this article.

Funding and Project Duration: $799,533.00 from July 2019 to June 2021, with an additional $100,000.00 in bridge funding from July 2021 to June 2022.

Workshops: OpenITI AOCP, Phase I convened two workshops at the University of Maryland, College Park. For more information, see the program for the first workshop and the program for the second workshop.

Primary Project Personnel

Şaban Ağalar

Doctoral Candidate, Department of History, University of Maryland, College Park; Graduate Fellow, Roshan Institute for Persian Studies

Jonathan Parkes Allen

Mellon Post-Doctoral Fellow, Roshan Institute for Persian Studies, University of Maryland, College Park; Acting Assistant Director, OpenITI AOCP project

Matthew Thomas Miller

Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; Affiliate, Maryland Institute for Technology in the Humanities

John Mullan

Faculty Assistant, Roshan Institute for Persian Studies, University of Maryland, College Park; Digital Specialist, Roshan Initiative in Persian Digital Humanities

Ryan Muther

Doctoral Candidate, Khoury College of Computer Sciences, Northeastern University

Mehdy Sedaghat Payam

Doctoral Candidate, Department of English, University of Maryland, College Park; Graduate Fellow, Roshan Institute for Persian Studies

Janny Peng

Assistant Director for Finance and Administration, School of Languages, Literatures, and Cultures, University of Maryland, College Park

Sarah Bowen Savant

Professor of History, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Principal Investigator, KITAB project

Masoumeh Seydi

Digital Lead, KITAB project

David Smith

Associate Professor, Khoury College of Computer Sciences, Northeastern University; Founding Member, NULab for Texts, Maps, and Networks

Alejandro Toselli

Associate Research Scientist, Khoury College of Computer Sciences, Northeastern University

Raffaele Viglianti

Research Programmer, Maryland Institute for Technology in the Humanities, University of Maryland, College Park

Advisory Board

Bridget Almas

Director of Data Innovation Strategy, State University of New York

Arezou Azad

Senior Research Fellow, Oriental Institute, University of Oxford; Programme Director, Invisible East

Gregory Crane

Alexander von Humboldt Professor of Digital Humanities, Humboldt Chair of Digital Humanities, Universität Leipzig; Professor of Classics, Tufts University

Ahmet T. Karamustafa

Professor of History, University of Maryland, College Park

Fatemeh Keshavarz

Roshan Institute Chair in Persian Language & Literature, Roshan Institute for Persian Studies, University of Maryland, College Park

Intisar A. Rabb

Professor of Law, Harvard Law School; Professor of History, Harvard University; Director, Program in Islamic Law

Marina Rustow

Khedouri A. Zilkha Professor of Jewish Civilization in the Near East, Princeton University; Director, Princeton Geniza Lab