Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) is the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.
OpenITI AOCP is led by an interdisciplinary team of humanities, computer science, and digital humanities co-principal investigators from Roshan Institute for Persian Studies at the University of Maryland, College Park, Northeastern University’s NULab for Texts, Maps, and Networks, the Aga Khan University’s Institute for the Study of Muslim Civilisations in London, and the Maryland Institute for Technology in the Humanities at the University of Maryland, College Park. We are proud to partner with the SHARIAsource project of the Program in Islamic Law at Harvard Law School and the eScripta project of Université Paris Sciences et Lettres for the technical development portion of the project.
The primary technical goal of the first phase of OpenITI AOCP is to achieve ≥97% character accuracy rates (CARs) for OCR on the most used Persian and Arabic print typefaces. We are well on our way to achieving this goal, and we have begun preliminary testing and training data production for Ottoman Turkish and Urdu typefaces as well. Our latest OCR CAR test results can be found here.
The second major deliverable of OpenITI AOCP is an open-source and user-friendly digital text production pipeline for Persian and Arabic texts. After initially partnering with SHARIAsource to develop our own pipeline, CorpusBuilder, we decided to join the eScripta team in expanding eScriptorium.
OpenITI’s beta testing instance of eScriptorium can be found here. Please contact John Mullan if you are interested in being a part of our user testing groups.
For full detail on the first phase of OpenITI AOCP, please see this article.
Funding and Project Duration: $799,533.00 from July 2019 to June 2021, with an additional $100,000.00 in bridge funding from July 2021 to June 2022.
Workshops: OpenITI AOCP, Phase I convened two workshops at the University of Maryland, College Park. For more information, see the program for the first workshop and the program for the second workshop.
Primary Project Personnel
Şaban Ağalar
Doctoral Candidate, Department of History, University of Maryland, College Park; Graduate Fellow, Roshan Institute for Persian Studies
Jonathan Parkes Allen
Mellon Post-Doctoral Fellow, Roshan Institute for Persian Studies, University of Maryland, College Park; Acting Assistant Director, OpenITI AOCP project
Matthew Thomas Miller
Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; Affiliate, Maryland Institute for Technology in the Humanities
John Mullan
Faculty Assistant, Roshan Institute for Persian Studies, University of Maryland, College Park; Digital Specialist, Roshan Initiative in Persian Digital Humanities
Ryan Muther
Doctoral Candidate, Khoury College of Computer Sciences, Northeastern University
Mehdy Sedaghat Payam
Doctoral Candidate, Department of English, University of Maryland, College Park; Graduate Fellow, Roshan Institute for Persian Studies
Janny Peng
Assistant Director for Finance and Administration, School of Languages, Literatures, and Cultures, University of Maryland, College Park
Sarah Bowen Savant
Professor of History, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Principal Investigator, KITAB project
Masoumeh Seydi
Digital Lead, KITAB project
David Smith
Associate Professor, Khoury College of Computer Sciences, Northeastern University; Founding Member, NULab for Texts, Maps, and Networks
Alejandro Toselli
Associate Research Scientist, Khoury College of Computer Sciences, Northeastern University
Raffaele Viglianti
Research Programmer, Maryland Institute for Technology in the Humanities, University of Maryland, College Park
Advisory Board
Bridget Almas
Director of Data Innovation Strategy, State University of New York
Arezou Azad
Senior Research Fellow, Oriental Institute, University of Oxford; Programme Director, Invisible East
Gregory Crane
Alexander von Humboldt Professor of Digital Humanities, Humboldt Chair of Digital Humanities, Universität Leipzig; Professor of Classics, Tufts University
Ahmet T. Karamustafa
Professor of History, University of Maryland, College Park
Fatemeh Keshavarz
Roshan Institute Chair in Persian Language & Literature, Roshan Institute for Persian Studies, University of Maryland, College Park
Intisar A. Rabb
Professor of Law, Harvard Law School; Professor of History, Harvard University; Director, Program in Islamic Law
Marina Rustow
Khedouri A. Zilkha Professor of Jewish Civilization in the Near East, Princeton University; Director, Princeton Geniza Lab