Deep Image Generation Based on Optics and Physics

[Japanese|English]

Abstract

Recently, the development of image generative AI has been remarkable, enabling the creation of images so realistic that humans cannot distinguish them. This advancement has been driven by progress in deep learning; however, typical deep learning models are composed of black-box neural networks and do not always generate images that are optically or physically natural. To overcome this limitation, new deep learning models that incorporate principles of optics and physics have attracted attention in recent years. Motivated by this background, we are conducting research on image generative AI based on optics and physics.

Contributions

【Research 1: Geometry-Agnostic System Identification from Limited-View Videos】
Geometry-agnostic system identification is a technique for estimating an object’s geometry and physical properties from multi-view videos. Achieving high accuracy in this task typically requires many cameras; however, setting up multiple cameras is costly and can be cumbersome. To address this challenge, we propose a new optimization method called Lagrangian Particle Optimization. This method is notable for its ability to estimate an object’s geometry and physical properties with high precision even when the available data is limited, such as when only a few cameras are available.

【Research 2: Structure from Collision】
Recent advances in neural 3D representations have enabled high-accuracy estimation of 3D shapes from multi-view images. However, these techniques primarily focus on the surfaces of objects, and estimating the unobservable internal structures remains a challenging problem. To address this, we propose a new task called Structure from Collision. Specifically, we introduce a novel model, SfC-NeRF, which leverages appearance changes occurring during collisions as cues, allowing the estimation of both the exterior and the internal structure of objects.

Future work

The proposed method can estimate an object’s shape, physical properties, and internal structure from video, enabling precise prediction of shape changes and improving the accuracy and reliability of robotic manipulation and computer interaction.

Publications

  1. T. Kaneko, “Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization,” in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
    https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/lpo/
  2. T. Kaneko, “Structure from Collision,” in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
    https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/sfc/

Contact

Takuhiro Kaneko
Recognition Research Group, Media Information Laboratory, NTT Communication Science Laboratories

Related Research