Visual Content Analysis to Improve Document Accessibility

Supervisors: Rainer Stiefelhagen (KIT), Matthias Wölfel (HKA)

Faculty: Informatics

When working with digital lecture material, such as text books, slides, scientific publications and the like, visually impaired users can get access to the written content in these documents, when some basic accessibility guidelines have been met when producing the documents. By using screenreader software, the written content can then be accessed via text-to-speech synthesis or by using braille output devices to display text. This is however not the case for visual content in lecture material, such as images, tables, diagrams, mathematical plots, charts and other visual content.

In recent years, fueled by the progress in deep-learning techniques (and the increasing availability of computing power and annotated data), impressive progress has been made in the field of computer vision, including improved techniques for image classification and segmentation, image captioning (i.e. automatically describing image content using natural language), and even visual question answering (i.e. building systems that can answer questions about visual content). This leads to very promising new possibilities for using such computer vision techniques to make visual content more accessible to visually impaired users, e.g. by automatically extracting the relevant content in the visual material and delivering it to the user in some kind of textual format. In fact, making visual document content available in a structured descriptive format is expected to improve the accessibility of documents for all user groups.

The focus of the project will be to investigate novel methods for visual content analysis of lecture material in order to improve accessibility of such material to visually impaired users. More specifically, in this thesis the focus will be on automatically extracting relevant information from a selected number of highly relevant visual content classes, such as e.g. tables, flow-charts and block-diagrams. The goal will be to develop methods to first detect such visual representations in documents, extracting as well the structural layout and the written content (such as e.g. the various text-elements in table cells, or along flow charts, etc.), and then delivering it to the user in different formats.

The project will thus advance the current state of the art in deep learning methods for extracting metadata from graphical teaching and learning content, for the selected classes. As part of this thesis, in depth experimental evaluations of the investigated deep learning model will be conducted, as well as some user studies to measure the effectiveness and usability of the overall system for visually impaired users. Also, we are aiming at including the developed methods in a tool used by our universities accessibility center to further automate the process of making lecture material more accessible to visually impaired students.

Desired qualifications of the PhD student:

University degree (M.Sc.) with very good grades in Computer Science or related fields
Strong programming skills in at least one programming language (preferably Python and with experience in TensorFlow, PyTorch or similar)
Good English language skills (your responsibilities include writing publications and giving international presentations); although not required, German language skills are a plus, as it will facilitate to also work with German teaching/learning materials as well.
An interest in working with people with visual impairment and in Accessibility