Please use this identifier to cite or link to this item: https://hdl.handle.net/11108/220
Title: 

Multi-oriented Text Extraction from Information Graphics

Authors: 
Böschen, Falk
Scherp, Ansgar
Year of Publication: 
2015
Citation: 
[Title:] Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng 2015, Lausanne, Switzerland, September 8-11, 2015. ACM 2015
Abstract: 
Existing research on analyzing information graphics assume to have a perfect text detection and extraction available. However, text extraction from information graphics is far from solved. To fill this gap, we propose a novel processing pipeline for multi-oriented text extraction from infographics. The pipeline applies a combination of data mining and computer vision techniques to identify text elements, cluster them into text lines, compute their orientation, and uses a state-of-the-art open source OCR engine to perform the text recognition. We evaluate our method on 121 infographics extracted from an open access corpus of scientific publications. The results show that our approach is effective and significantly outperforms a state-of-the-art baseline.
Persistent Identifier of the first edition: 
ISBN: 
978-1-4503-3307-8

Files in This Item:
There are no files associated with this item.





Items in ZBWPub are protected by copyright, with all rights reserved, unless otherwise indicated.