What is: Voxel RoI Pooling?
Source | Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Voxel RoI Pooling is a RoI feature extractor extracts RoI features directly from voxel features for further refinement. It starts by dividing a region proposal into regular sub-voxels. The center point is taken as the grid point of the corresponding sub-voxel. Since feature volumes are extremely sparse (non-empty voxels account for spaces), we cannot directly utilize max pooling over features of each sub-voxel. Instead, features are integrated from neighboring voxels into the grid points for feature extraction. Specifically, given a grid point , we first exploit voxel query to group a set of neighboring voxels \Gamma\_{i}=\left\(\mathbf{v}\_{i}^{1}, \mathbf{v}\_{i}^{2}, \cdots, \mathbf{v}\_{i}^{K}\right\) . Then, we aggregate the neighboring voxel features with a PointNet module as:
where represents the relative coordinates, is the voxel feature of , and indicates an MLP. The max pooling operation is performed along the channels to obtain the aggregated feature vector Particularly, Voxel RoI pooling is exploited to extract voxel features from the 3D feature volumes out of the last two stages in the backbone network. And for each stage, two Manhattan distance thresholds are set to group voxels with multiple scales. Then, we concatenate the aggregated features pooled from different stages and scales to obtain the RoI features.