Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) was the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.
For full detail on the first phase of OpenITI AOCP, please see this article.
Phase II of OpenITI AOCP brings together a highly interdisciplinary team of experts in Islamicate studies, digital humanities, and computer science to build textual resources for the digital study of the Islamicate world and transform open-source optical character (OCR)/handwritten textual recognition (HTR) technology for all languages. This project is led by investigators from the Roshan Institute for Persian Studies and Maryland Institute for Technology in the Humanities at the University of Maryland; Northeastern University’s (NU) NULab for Texts, Maps, and Networks; University of California, San Diego; and the Centre for Digital Humanities at the Aga Khan University’s Institute for the Study of Muslim Civilisations (London).
OpenITI AOCP Phase II will build on the considerable successes of the Phase I project in piloting corpus production for Persian and Arabic and advancing OCR character accuracy rates (CARs) on the most common typefaces in Persian and Arabic print history. In Phase II, the OpenITI AOCP team will dramatically expand the size of OpenITI’s Persian and Arabic corpus through large-scale OCR work in Persian and Arabic; extend the linguistic capabilities of OpenITI’s OCR tools into Ottoman Turkish and Urdu; transform its open-source optical character recognition (OCR)/handwritten text recognition (HTR) pipeline by incorporating newly developed unsupervised machine learning tools into its workflow (including into its user-friendly interface, eScriptorium); build individual scholarly and institutional Islamicate manuscript HTR workflows; and convene an experts workshop to critically assess the ethical and technological issues for next-generation digital text dissemination.
Funding and Project Duration: $1,749,722.00 from July 2022 to June 2025.
Workshops: OpenITI AOCP, Phase II convened one workshop at the University of Maryland, College Park. For more information, see the program for the workshop here.
Primary Project Personnel
Anjum Alam
Programme Manager, Institute for the Study of Muslim Civilisations, Aga Khan University, London
Jonathan Parkes Allen
Postdoctoral Research Associate, Roshan Institute for Persian Studies, University of Maryland, College Park
Mathew Barber
Research and Data Visualization Specialist, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Data Visualization, KITAB project
Taylor Berg-Kirkpatrick
Associate Professor of Computer Science, Department of Computer Science and Engineering, University of California, San Diego
Osama Eshera
Assistant Research Professor, Roshan Institute for Persian Studies, University of Maryland, College Park
Mary Hoppe
Digital Projects Assistant, Hill Museum and Manuscript Library
Matthew Thomas Miller
Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; Affiliate, Maryland Institute for Technology in the Humanities
John Mullan
Faculty Assistant, Roshan Institute for Persian Studies, University of Maryland, College Park; Digital Specialist, Roshan Initiative in Persian Digital Humanities
Jacob Murel
Postdoctoral Research Associate, Khoury College of Computer Sciences, Northeastern University
Ryan Muther
Doctoral Candidate, Khoury College of Computer Sciences, Northeastern University
Lorenz Nigst
Assistant Professor of Digital Humanities, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Corpus Management, KITAB project
Janny Peng
Assistant Director for Finance and Administration, School of Languages, Literatures, and Cultures, University of Maryland, College Park
Sarah Bowen Savant
Professor of History, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Principal Investigator, KITAB project
Masoumeh Seydi
Digital Lead, KITAB project
Taimoor Shahid
Mellon Islamicate Digital Humanities Postdoctoral Associate, Roshan Institute for Persian Studies, University of Maryland, College Park
Farrukh Shahzad
Chief Librarian, Forman Christian College University
David Smith
Associate Professor, Khoury College of Computer Sciences, Northeastern University; Founding Member, NULab for Texts, Maps, and Networks
Raffaele Viglianti
Senior Research Software Developer, Maryland Institute for Technology in the Humanities, University of Maryland, College Park
Nikolai Vogler
Doctoral Candidate, Department of Computer Science and Engineering, University of California, San Diego
OpenITI AOCP Phase II Partner Projects
Carl Ernst
Professor Emeritus & William R. Kenan, Jr. Distinguished Professor of Islamic Studies, Department of Religious Studies, University of North Carolina at Chapel Hill; Principal Investigator, Omar ibn Said Digitization Project
Wayne Graham
Chief Information Officer and Director of Informatics, Cultural Networks, and Knowledge Systems, Council on Library and Information Resources; Technical Lead, Digital Library of the Middle East
Intisar A. Rabb
Professor of Law, Harvard Law School; Professor of History, Harvard University; Director, Program in Islamic Law
Marina Rustow
Khedouri A. Zilkha Professor of Jewish Civilization in the Near East, Princeton University; Director, Princeton Geniza Lab