The OpenITI is involved in a number of exciting projects. See a list of our projects (past and present) below:
Automatic Collation for Diversifying Corpora (ACDC)
The Automatic Collation for Diversifying Corpora (ACDC) project, funded by a Level III Digital Humanities Advanced Grant from the National Endowment for the Humanities, aims to significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts. Our team will develop a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts.
CorpusBuilder
In 2017, OpenITI joined forces with the SHARIAsource project of the Program in Islamic Law at Harvard Law School to develop a robust and user-friendly OCR pipeline called CorpusBuilder. This project was funded by the Program in Islamic Law at Harvard Law School.
Digital Publications
OpenITI has begun piloting the production of the first digital publications of Persian and Arabic works, taken straight from their original manuscript form into a digital publication without a print intermediary. We are developing two projects in collaboration with Carl Ernst for our digital publication pipeline.
Textual Lacunae Reconstruction Tool (TLR)
The textual lacunae reconstruction tool (TLR), funded by the National Science Foundation, will leverage new techniques for unsupervised transcription to automatically transcribe vast quantities of handwritten Arabic-script text.
Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) Phase One
Funded through two grants from The Andrew W. Mellon Foundation, Phase One of the Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) is the first undertaking of its kind to tackle the technical and organizational barriers that historically have stymied the development of Arabic-script OCR and digital text production for Islamicate Studies.
Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP) Phase Two
OpenITI AOCP Phase II will build on the considerable successes of the Phase I project in piloting corpus production for Persian and Arabic and advancing OCR character accuracy rates (CARs) on the most common typefaces in Persian and Arabic print history
OpenITI Corpus
The OpenITI corpus is a open-access and machine-actionable collection of Persian and Arabic texts.
Projects Affiliated with OpenITI
Descriptions of many projects that are (or have been) affiliated with OpenITI