The vision of smart autonomous robots in the indoor environment is becoming a reality in the current decade. This vision is now becoming a reality because of emerging technologies of Sensor Fusion and Artificial Intelligence. Sensor fusion is aggregating informative features from disparate hardware resources. Just like autonomous vehicles, the robotic industry is quickly moving towards automatic smart robots for handling indoor tasks.
Now the major question arises. When does the fusion of two or more sensors result in better performance than using a single sensor? There are a lot of examples where multiple sensors can be fused to get better predictions. Sensor fusion can be helpful in numerous applications including, sentimental analysis, activity detection, autonomous vehicles, and robotics. This blog primarily focuses on fusing visual frames generated from the camera and point cloud generated from LIDAR.
What is LIDAR?
Lidar is an impressive technology that is mostly used for measuring the distance of different objects by illuminating them using a laser beam and capturing the reflected beam with the sensor. Lidar sensor uses ultraviolet, near-infrared, and visible light for illuminating the target objects. Lidar can be employed to cover various materials and objects like metallic, non-metallic, liquid, and chemicals. The Lidar sensor creates a 3D representation of the object based on the time taken by the light beam to reflect back and the wavelength difference of the bounced beam.
There has been a plethora of work on fusing different sensors for indoor localization tasks. We will be discussing a paper titled as “Indoor Layout Estimation by 2D LiDAR and Camera Fusion,” presented by J. Li. This paper provides an algorithm for estimating the indoor layout based on frames captured through visual sensors and information from Lidar sensors. The semantic segmentation of indoor layout is achieved by combining Lidar points with line segments from the image.
Indoor localization can also be achieved using an RGBD sensor. But the issue with these sensors is that they provide depth information up to a specific range, which is usually 5m and mostly contains noise elements in the depth map. Given this, vision and Lidar point cloud emerged as a promising cloud solution. 2D points from Lidar can help in defining the outlines of walls of a room and can also assist in finding corner points. The process of fusion starts with the simple calibration of Lidar by manually placing the targets.
Proposed Technique for Fusion
A 2D Lidar scanner is mounted on a moveable platform along with a visual sensor (camera) with a fixed distance. Both of these sensors are aligned parallel to the floor. The method of indoor layout estimation is by four different steps that are,
- Lidar based segmentation,
- Image line detection
- Floor-wall boundary detection
- 3D reconstruction.
Lidar Point Clustering
Data points captured from Lidar are grouped to form a line representation using the split and merge method. The points are first converted into clusters based on points distance. If the distance is greater than a threshold, then the points are placed in a separate cluster. After this, a line fitting method is employed to fit a line in each cluster. If the maximum distance (point to the line) is greater than a set threshold, then the cluster is further divided at that point. The process is repeated until the maximum deviation of all lines is less than a specific set threshold. The following algorithm shows the overall steps for clustering the Lidar points.
- Initial: set s1 consists of N points.
- Put in a list L
- Fit a line to the next set in L
- Detect point P with a maximum distance to the line L
- If is less than a threshold, continue (go to 2)
- Otherwise, split at P into and , replace in L by and
- continue (go to 2)
- When all sets (segments) in L have been checked, merge collinear segments.
Line and Vanishing Point Detection
For detecting the lines in the layout, a visual stream from the camera is employed. A simple canny edge detection algorithm is used for the detection of edges. Hough transform-based methods have been used, in the past, to group the edges into a straight line, but the proposed system exploited the octave functions for grouping the edges. The process works nearly like the split and merge method with the difference in the initial grouping achieved by concatenating the edge points up to the junction point.
After detecting the lines, the vanishing points are estimated as a post-processing step. The detected edges lines are assigned to these points for removing false positive and combining the lines for a consolidated set of lines. The J-linkage algorithm is employed for assigning the edges to vanishing points.
Ground-Wall Boundary Detection
This part primarily focuses on boundary line detection of wall and ground connectivity. The system measures the boundary in two parts, calculating the homography matrix and 2D similarity to align the Lidar output with the ground map. The top-down homography is carried out by using a vertical vanishing point. The vanishing point on the image can be described as the location where the 2D lines seem to be converging in 3D perspective space. We are not going into mathematical modeling that can be comprehended by reading the original article. The final layout estimation is shown in the figure below:
The concept of combining these two sensors is heavily used in the autonomous vehicles and robotics industry. Sensor fusion, on the top of advancement in machine learning and premium cloud solutions, opens up new opportunities in diverse fields. For example, combining video and Lidar data can help in a real-time traffic situation on roads. Fusing these two sensors can be of significant help for indoor autonomous robots for pathfinding and wandering.