Our proposed architecture SDFNet is able to successfuly reconstruct the shape from a single
image of object shape categories seen during training as well as new, unseen object categories.
SDFNet is trained to predict SDF values in the same pose as the input image without requiring knowledge of camera parameters or object pose at test time.
Qualitative results show superiority compared to baselines GenRe [1] and OccNet [2]
Seen Class
Input
SDFNet
OccNet
Ground Truth
Input
GenRe
Ground Truth
Novel Class
Input
SDFNet
OccNet
Ground Truth
Input
GenRe
Ground Truth
Novel Class
Input
SDFNet
OccNet
Ground Truth
Input
GenRe
Ground Truth
Novel Class
Input
SDFNet
OccNet
Ground Truth
Input
GenRe
Ground Truth
Qualitative results of SDFNet, OccNet VC and GenRe on seen and unseen classes of 2 DOF ShapeNetCore.v2
Generalizing to Seen and Unseen Shapes
SDFNet is able to capture fine-grained details on the surface of seen classes (airplane example below) and infer occluded
surfaces of unseen categories (bathtub and camera). By explicitly estimating 2.5D sketches, SDFNet can capture concave surfaces (bathtub)
and protruding surfaces (camera lens).
Seen Class
Predicted
Ground Truth
Unseen Class
Predicted
Ground Truth
Unseen Class
Predicted
Ground Truth
SDFNet reconstruction performance on classes seen during training and novel classes not seen during training
Viewer Centered Training Affects Generalization
When evaluated on 3 Degree-of-Freedom Viewer Centered (3 DOF VC)—object pose varies along azimuth, elevation and tilt,
our empirical findings show marginal decrease in performance between seen and unseen classes for the 3 DOF VC model.
This is new evidence that it is possible to learn a general shape representation with correct depth estimation and 3 DOF VC training.
Quantitative evaluation of 3 DOF VC shows high performance on both seen and unseen categories. Model is trained on ground truth depth and normal images.
Generalization Across Different Datasets
Sample images of our renders of the four most common ShapeNet categories and of objects from ABC. It is evident that the two datasets have different shape properties.
To further test the generalization ability of SDFNet, we train it one one shape dataset and test it on a significantly different shape dataset. Our findings show that when trained on ABC and tested on the 42 unseen categories of ShapeNet, 3 DOF VC SDFNet obtains comparable performance to SDFNet trained on the 13 ShapeNet categories. SDFNet trained on ShapeNet performs relatively worse when tested on ABC.
Qualitative comparison of models trained on ABC and tested on ShapeNet and vice-versa. Note the good reconstruction quality on the occluded part of the object.
Citation
Bibliography information of this work:
Thai, A., Stojanov, S., Upadhya, V., Rehg, J. (2020). 3D Reconstruction of Novel Object Shapes from Single Images. arXiv preprint:2006.07752.
References
- Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J., Freeman, B., & Wu, J. (2018). Learning to reconstruct shapes from unseen classes.
In Advances in Neural Information Processing Systems (pp. 2257-2268).
- Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A. (2019).
Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4460-4470).