After ARKit and ARCore’s mobile augmented reality platforms obsolete Google’s previously groundbreaking Project Tango (the AR platform that gave us the first depth-sensing smartphones) in 2018, we’ve seen a bit of a resurgence of what was then a niche. was component for flagship devices.
Samsung has revived the time-of-flight sensor with its Galaxy Note 10 and Galaxy S10 5G, although it has ditched the sensor in its current generation models. Radar made a brief cameo through Project Soli in Google Pixel 4. More recently, Apple implemented LiDAR sensors in the iPhone 12 Pro and iPad Pro series after breakthroughs with the TrueDepth front camera that ushered in the era of The Notch.
This week, Google added TensorFlow 3D (TF 3D), a library of 3D models for depth learning, including 3D semantic segmentation, 3D object detection and 3D instance segmentation, to the TensorFlow repository for use in autonomous cars and robots , as well as mobile AR experiences for devices with deep 3D insights.
“The field of computer vision has recently begun to make good progress in understanding 3D scenes, including models for mobile 3D object detection, transparent object detection and more, but access to the field can be challenging due to limited availability of tools and sources that can be applied to 3D data, ”said Alireza Fathi (a research scientist) and AI Rui Huang (an AI resident of Google Research) in an official blog post. “TF 3D offers a range of popular editing, loss functions, data processing tools, models and metrics that enable the wider research community to develop, train and implement advanced 3D models for scene understanding.”
The 3D Semantic Segmentation model allows apps to distinguish between object or objects in the foreground and the background of a scene, just like the virtual backgrounds on Zoom. Google has implemented similar technology with virtual video backgrounds for YouTube.
The 3D instance segmentation model, on the other hand, identifies a group of objects as individual objects, as with Snapchat lenses that can place virtual masks on more than one person in the camera view.
Finally, the 3D object detection model takes instance segmentation one step further by also classifying objects in view. The TF 3D library is available through GitHub.
While these capabilities have been demonstrated with standard smartphone cameras, the availability of depth data from LiDAR and other time-of-flight sensors opens up new possibilities for advanced AR experiences.
Even without the 3D repository, TensorFlow has contributed to some useful AR experiences. Wannaby used TensorFlow to try out nail polish, and it also helped Capital One with a mobile app feature that can identify cars and display information about them in AR. In the more weird and wild category, an independent developer used TensorFlow to turn a rolled-up piece of paper with InstaSaber into a lightsaber.
In recent years, Google has used machine learning through TensorFlow for other AR purposes as well. In 2017, the company released its MobileNets repository for image sensing a la Google Lens. And TensorFlow is also the technology behind its Augmented Faces API (which also works on iOS) that brings Snapchat-like selfie filters to other mobile apps.
It’s also not the first time Google has used depth sensor data for AR experiences. While Depth API for ARCore enables occlusion, the ability to have virtual content appear in front of and behind real-world objects, for mobile apps via standard smartphone cameras, the technology works better with depth sensors.
Machine learning has proven to be indispensable in creating advanced AR experiences. Based on its focus on AI research alone, Google plays as critical a role in the future of AR as Apple, Facebook, Snap and Microsoft.