Vision-Language Navigation: Visual-Question-Answering to support Orientation and Navigation of Blind Users

Supervisors: Rainer Stiefelhagen (KIT), Kathrin Gerling (KIT)

Faculty: Informatics

Because of poorly accessible built environments, moving from one place to another can be challenging for people with visual impairments when in new environments: Taking new routes requires extensive planning and information gathering about the course and possible barriers, including those that dynamically appear. Recently, there has been a lot of work and progress in visual-language models, image captioning, and visual question answering (VQA). These methods are well suited for providing intelligent interaction to assist people with visual impairments in orientation and mobility by offering a navigation system, i.e., Vision-Language Navigation (VLN). In this project, the focus is to develop a navigation system and a VQA user interface that aligns with the needs of visually impaired and blind people, and examines their preferences, needs and concerns with respect to the technology. Given a natural language interactive question by users, a mobile agent answers based on scene understanding or suggests a series of instructions to support the orientation and navigation of blind users. At KIT, we have developed the Vision4blind system, a mobility assistance system that blind users can leverage to obtain additional information on their surroundings. The system has been developed and it can perform object recognition, walking path suggestion and obstacle avoidance. It is the intention to build on this work in the proposed PhD.

In this PhD project, the research fields include:

  • To improve the existing mobility assistance system and extend it with vision-language models.

  • To implement the user interface by using visual question answering methods and apply to both navigation and scene understanding applications.

  • To investigate specific domain adaptation methods to transfer the model from the virtual to the real environment

  • The research in this project will follow a user-centric design approach. I.e., blind and seeing paired users will be involved from the beginning, their preferences and needs will be assessed, and user studies will be performed to assess relevant aspects, including but not limited to safety, accessibility, and the experience of the real world mediated by the technology.

Aiming to create a general mobility assistance system, this PhD project will focus on designing a VLN system and a VQA user interface. The navigation system has functions of mapping, visual localization, path planning, and instruction generation. The VLN system will first be developed on a virtual environment and transferred to the real environment. The goal will be to provide several prototypes for daily use and to conduct evaluation and user study with end users.

Desired qualifications of the PhD student:

  • University degree (M.Sc.) with excellent grades in Computer Science or related fields

  • Strong programming skills in at least one programming language (preferably Python and with experience in TensorFlow, PyTorch or similar)

  • An interest in Human-Computer Interaction and associated methods to carry out work that involves end users

  • Good English language skills (your responsibilities include writing publications and giving international presentations) and interpersonal skills (working with different stakeholders)