Empowering Computer Vision Models with Deep Learning for Robotic Perception and Drone Geolocation

Zhao, Yiming

Etd

Empowering Computer Vision Models with Deep Learning for Robotic Perception and Drone Geolocation

Public

Robotic vision is an application domain with knowledge from computer vision, machine learning, and robotics. Witnessing the vast performance improvement of several computer vision challenges boosted by deep neural networks, this thesis is dedicated to novel research on combining deep learning with existing robotic vision models, to design more powerful solutions for real-world robotics including autonomous vehicles and drones. For autonomous vehicles, this thesis mainly focuses on 3D perception with LiDAR. Two basic questions are explored: one is how to allow many computer vision models on the 2D image to utilize the LiDAR range measurement; the other one is how to directly extract surrounding information from only the LiDAR sensor. To answer the first question, the LiDAR depth completion task is investigated. This task predicts depth values for every pixel on the camera image frame based on the sparse depth map generated by mapping the LiDAR point cloud on the image. A non-learning geometry model is firstly proposed to exploit the local geometry relationship between nearby points. Then, a deep learning extension further improves the performance with other advantages such as involving the RGB image and removing uncertain predictions. To answer the second question, this thesis develops a unique hybrid solution for LiDAR panoptic segmentation. The point cloud panoptic segmentation task predicts the semantic label for each point and further assigns an instance id to those points that belong to the same object. The proposed hybrid solution consists of a semantic segmentation neural network and a traditional point cloud cluster algorithm as the post-process step to group points for instance. This hybrid solution achieves state-of-the-art performance compared with all recent published end-to-end neural network methods. For drones or more general UAV (Unmanned Aerial Vehicle), this thesis covers a generic solution to align multimodality image pairs from different spectral sensors. The solution extends the classic Lucas-Kanade homography estimation model with neural networks. Trained with a specially designed loss function, the neural network is able to spontaneously extract invariant features that help traditional Lucas-Kanade converge better. This deep Lucas-Kanade model provides a potential solution to many related applications, such as geolocation, sensor calibration, remote sensing, navigation, etc. Although deep learning has become a prevailing technique these days, this thesis shows the hybrid approach of combining deep learning with traditional computer vision models can achieve state-of-the-art performance on many challenging tasks of robotics, yet with the advantages of low complexity and less computation.

Creator