Shorthand Secrets: Deciphering Astrid Lindgren's Stenographed Drafts with HTR Methods

Astrid Lindgren, Swedish author of children’s books, is known for having both composed and edited her literary work in the Melin system of shorthand (a Swedish shorthand system based on Gabelsberger). Her original drafts and manuscripts are preserved in 670 stenographed notepads kept at the National Library of Sweden and The Swedish Institute of Children’s Books. For long these notepads have been considered undecipherable and are until recently untouched by research. This paper introduces handwritten text recognition (HTR) and document image analysis (DIA) approaches to address the challenges inherent in Lindgren’s original drafts and manuscripts. It broadly covers aspects such as preprocessing and extraction of words, alignment of transcriptions and the fast transcription of large amounts of words. This is the first work to apply HTR and DIA to Gabelsberger-based shorthand material. In particular, it presents early-stage results which demonstrate that these stenographed manuscripts can indeed be transcribed, both manually by experts and by employing computerised approaches.

[1]  R. Manmatha,et al.  Aligning Transcripts to Automatically Segmented Handwritten Manuscripts , 2006, Document Analysis Systems.

[2]  Andrea Marchetti,et al.  Improved Illumination Correction that Preserves Medium-Sized Objects , 2014 .

[3]  Dietrich Klakow,et al.  A Dataset and a Novel Neural Approach for Optical Gregg Shorthand Recognition , 2018, TSD.

[4]  Anders Hast,et al.  Automatic Document Image Binarization using Bayesian Optimization , 2017, HIP@ICDAR.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Carl Nettelblad,et al.  Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment , 2020, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[7]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Marco Hutter,et al.  Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study , 2018, IEEE Transactions on Visualization and Computer Graphics.

[10]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[11]  Anders Hast,et al.  Creating an Atlas over Handwritten Script Signs , 2019, DHN.

[12]  Alexandre X. Falcão,et al.  Semi-Automatic Data Annotation guided by Feature Space Projection , 2020, Pattern Recognit..