Bitte verwenden Sie diesen Link, um diese Publikation zu zitieren, oder auf sie als Internetquelle zu verweisen: https://hdl.handle.net/11108/220
Titel: 

Multi-oriented Text Extraction from Information Graphics

Autoren: 
Böschen, Falk
Scherp, Ansgar
Datum: 
2015
Quellenangabe: 
[Title:] Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng 2015, Lausanne, Switzerland, September 8-11, 2015. ACM 2015
Zusammenfassung: 
Existing research on analyzing information graphics assume to have a perfect text detection and extraction available. However, text extraction from information graphics is far from solved. To fill this gap, we propose a novel processing pipeline for multi-oriented text extraction from infographics. The pipeline applies a combination of data mining and computer vision techniques to identify text elements, cluster them into text lines, compute their orientation, and uses a state-of-the-art open source OCR engine to perform the text recognition. We evaluate our method on 121 infographics extracted from an open access corpus of scientific publications. The results show that our approach is effective and significantly outperforms a state-of-the-art baseline.
Persistent Identifier der Erstveröffentlichung: 
ISBN: 
978-1-4503-3307-8

Datei(en):
Mit dieser Publikation sind keine Dateien verknüpft.





Publikationen in ZBWPub sind urheberrechtlich geschützt.