The Automatic Collation for Diversifying Corpora (ACDC) project, funded by a Level III Digital Humanities Advanced Grant from the National Endowment for the Humanities, aims to significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts. Our team will develop a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts.
The ACDC project will accomplish this task by extending the capabilities of the text alignment tool passim and the OCR/HTR engine Kraken to align poor initial HTR transcriptions of diverse manuscript exemplars with existing digital texts in order to automatically produce training data in a “distantly supervised” manner.
The ACDC tool’s acceleration of the training data production process will mark an important step towards the creation of the generalizable Arabic and Persian HTR models required for the digital transcription of large-scale Persian and Arabic manuscript collections.
Funding and Project Duration: $282,905.00 from January 2021 to June 2023 (see the National Endowment for the Humanities’ website for more information).
Primary Project Personnel
Jonathan Parkes Allen
Mellon Post-Doctoral Fellow, Roshan Institute for Persian Studies, University of Maryland, College Park; Acting Assistant Director, OpenITI AOCP project
Matthew Thomas Miller
Assistant Professor of Persian Literature & Digital Humanities, Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; Affiliate, Maryland Institute for Technology in the Humanities
David Smith
Associate Professor, Khoury College of Computer Sciences, Northeastern University; Founding Member, NULab for Texts, Maps, and Networks
Alejandro Toselli
Associate Research Scientist, Khoury College of Computer Sciences, Northeastern University
Si Wu
Doctoral Candidate, Khoury College of Computer Sciences, Northeastern University
Advisory Board
Carl Ernst
William R. Kenan, Jr. Distinguished University Professor, University of North Carolina, Chapel Hill; Co-Director, UNC Center for Middle East and Islamic Studies
Adi Keinan-Schoonbaert
Digital Curator, Asian and African Collections, British Library
Evyn Kropf
Librarian for Middle Eastern & North African Studies and Religious Studies, University of Michigan; Curator, Islamic Manuscripts Collection, University of Michigan
Sarah Bowen Savant
Professor of History, Institute for the Study of Muslim Civilisations, Aga Khan University, London; Principal Investigator, KITAB project
Sabine Schmidtke
Professor of Islamic Intellectual History, School of Historical Studies, Institute for Advanced Study; Principal Investigator, The Zaydi Manuscript Tradition (ZMT) project
Columba Stewart
Executive Director, Hill Museum & Manuscript Library; Professor of Theology, Saint John’s University
Daniel Stoekl Ben Ezra
Directeur d’Études, École Pratique des Hautes Études (EPHE), Paris, Section des Sciences historiques et philologiques; Principal Investigator, eScripta project