Posts

Using eScriptorium for Manuscript Transcription

  • 7 min read

There are any number of reasons a researcher might want to transcribe a digitized manuscript, and there are now a number of tools available to help one in doing so. Transcribing Arabic-script manuscripts poses certain challenges right from the get-go, regardless of whether one is operating in a digital environment or not. Deciphering difficult handwriting, reconstructing lacunae caused by loss of material, navigating manuscripts with text running…

Read More

Challenges of Layout Analysis across Arabic-Script Training Data

  • 3 min read

Layout Analysis is the process of identifying regions (e.g., title, body text, footnotes, etc.) on a page of text before sending it through the OCR engine. Preparing documents to train our OCR models involves several distinct steps, including semantic annotation, fixing segmentation errors, and editing faulty transcriptions. eScriptorium allows users to associate specific labels with regions…

Read More

The Challenge of an Unknown Typeface

  • 3 min read

When I came to the University of Maryland, I wanted to do a computational literary study of contemporary Persian novels. To do that, I started collecting PDFs of all the Persian novels that I could find, around 600 titles at present. I was particularly interested in finding digital copies of the works of my favorite writer, Houshang…

Read More