Elon Musk has been a critic of LIDAR technology, pushing instead for camera only self driving powered by AI
Tesla included a self-driving technology component in its July update, demonstrating the company’s commitment to a big gamble that it can provide autonomous vehicles using only cameras. Despite advances in vision-based self-driving technology, experts warn it still confronts significant challenges.
Tesla released version 9 of its “Full Self-Driving” (FSD) software in July, which allows Tesla vehicles to operate independently to a limited extent. Since last October, a restricted set of drivers has been in beta testing with the package, which is already available for $10,000 as an add-on. However, the most recent upgrade represents a big shift, as it no longer uses radar sensors and instead relies only on the car’s cameras.
This comes after Tesla announced in May that it would eliminate radar from all Model 3 and Model Y cars made in the United States, indicating that the company is doubling down on a strategy that is at odds with most other self-driving programs. Autonomous vehicles developed by Alphabet subsidiary Waymo and GM-owned Cruise combine data from cameras, radar, and ultra-precise lidar, and only travel on streets that have been pre-mapped using high-resolution 3D laser scans.
Elon Musk, the CEO of Tesla, has been an outspoken opponent of lidar because to its expensive cost, advocating instead for a “pure vision” approach. Because of the lack of redundancy that comes with relying on a single sensor, this is contentious. According to Kilian Weinberger, an associate professor at Cornell University who works on computer vision for autonomous vehicles, the rationale is simple.
“Compared to lidar, cameras are dirt inexpensive,” he argues. “By doing so, they will be able to incorporate this technology into all of their vehicles.” If they sell 500,000 automobiles, every single one of them is collecting data for them.”
Machine learning systems at the heart of self-driving technologies rely on data to function. Tesla’s key bet, according to Weinberger, is that the mountain of video collected by its fleet will help it achieve complete autonomy sooner than competitors that rely on a small number of more sensor-laden cars operated by staff.
Andrej Karpathy, Tesla’s AI director, stated last month at the Conference on Computer Vision and Pattern Recognition that the business has created a supercomputer to handle all this data, which he said was the fifth most powerful in the world. He also revealed why they dropped radar, claiming that after training on 1.5 petabytes of video enhanced with both radar data and human classification, the vision-only system now surpasses their old approach significantly.
According to Weinberger, the case for eliminating radar is compelling, and the gap between lidar and cameras has reduced in recent years. Lidar’s main selling feature is its very accurate depth detection, which is accomplished by bouncing lasers off objects. However, vision-based systems can also estimate depth, and their capabilities have substantially increased.
In 2019, Weinberger and colleagues made a breakthrough by turning camera-based depth estimations into the same type of 3D point clouds utilized by lidar, which improved accuracy dramatically. At the Scaled Machine Learning Conference last year, Karpathy stated that the company was adopting a “pseudo-lidar” approach.
However, how you assess depth is crucial. One method compares photos from two cameras spaced far enough away to triangulate item distances. The other option is to train AI on a large number of images until it picks up depth clues. According to Weinberger, Tesla adopts this approach because its front-facing cameras are too close together for the first method.
According to Leaf Jiang, CEO of start-up NODAR, which develops camera-based 3D vision technology based on this approach, the advantage of triangulation-based techniques is that measurements are founded in physics, similar to lidar. In confusing situations, such as separating an adult at 50 meters from a toddler at 25 meters, inferring distance is intrinsically more sensitive to errors, he claims. “It tries to figure out distance based on perspective cues or shading signals or anything,” he adds, adding that this isn’t always accurate.
However, how you perceive depth is only part of the issue. Machine learning in its current state only recognizes patterns, which means it has trouble dealing with unique circumstances. If it hasn’t experienced a circumstance before, unlike a human driver, it has no ability to reason about what to do. According to Weinberger, “any AI system has no comprehension of what’s actually going on.”
The idea behind gathering more data is that you’ll catch more of the unusual scenarios that can stump your AI, however this technique has a fundamental flaw. “At some point, you’ll run into one-of-a-kind cases.” And there are some circumstances that you can’t prepare for,” Weinberger says. “At some point, the benefits of adding more and more data start to wane.”
According to Marc Pollefeys, a professor at ETH Zurich who has worked on camera-based self-driving, this is the so-called “long tail problem,” and it is a significant barrier to moving from the types of driver assistance systems presently widespread in modern automobiles to totally autonomous vehicles. He claims that the basic technology is comparable. While an autonomous braking system designed to supplement a driver’s reactions can afford to miss the rare pedestrian, when in complete control of the vehicle, the margin for error is fractions of a percent.
Other self-driving businesses strive to avoid this by lowering the amount of uncertainty in their systems. According to Pollefeys, if you pre-map roads, you just need to focus on the small quantity of input that doesn’t match. Similarly, the chances of three different sensors making the same error at the same time are extremely unlikely.
The scalability of such an approach is obviously doubtful. However, just pushing more data through a machine learning pipeline to get from a system that largely works to one that virtually never makes mistakes is “doomed to fail,” according to Pollefeys.
Enjoy stuff like this? Click here to subscribe.
You may also like: A Brand New Tesla Might Be the Best Depreciation Hedge — An Analysis of Tesla’s Incredible Resale Value