Visual Content Analysis to Improve Document Accessibility
Supervisors: Rainer Stiefelhagen (KIT), Matthias Wölfel (HKA)
When working with digital lecture material, such as text books, slides, scientific publications and the like, visually impaired users can get access to the written content in these documents, when some basic accessibility guidelines have been met when producing the documents. By using screenreader software, the written content can then be accessed via text-to-speech synthesis or by using braille output devices to display text. This is however not the case for visual content in lecture material, such as images, tables, diagrams, mathematical plots, charts and other visual content.
In recent years, fueled by the progress in deep-learning techniques (and the increasing availability of computing power and annotated data), impressive progress has been made in the field of computer vision, including improved techniques for image classification and segmentation, image captioning (i.e. automatically describing image content using natural language), and even visual question answering (i.e. building systems that can answer questions about visual content). This leads to very promising new possibilities for using such computer vision techniques to make visual content more accessible to visually impaired users, e.g. by automatically extracting the relevant content in the visual material and delivering it to the user in some kind of textual format. In fact, making visual document content available in a structured descriptive format is expected to improve the accessibility of documents for all user groups.
The focus of the project will be to investigate novel methods for visual content analysis of lecture material in order to improve accessibility of such material to visually impaired users. More specifically, in this thesis the focus will be on automatically extracting relevant information from a selected number of highly relevant visual content classes, such as e.g. tables, flow-charts and block-diagrams. The goal will be to develop methods to first detect such visual representations in documents, extracting as well the structural layout and the written content (such as e.g. the various text-elements in table cells, or along flow charts, etc.), and then delivering it to the user in different formats.