Résumé | Range sensors have drawn much interest for human activity related research since they provide explicit 3D information about the shape that is invariant to clothing, skin color and illumination changes. However, triangulationbased systems like structured-light sensors generate occlusions in the image when parts of the scene cannot be seen by both the projector and the camera. Those occlusions, as well as missing data points and measurement noise, depend on the structured-light system design. These artifacts add a level of difficulty to the task of human body segmentation that is typically not addressed in the literature. In this work, we design a segmentation model that is able to reason about 3D spatial information, to identify the different body parts in motion and is robust to artifacts inherent to the structured-light system, such as triangulation occlusions, noise and missing data. First, we build the first realistic sensor-specific training set by closely simulating the actual acquisition scenario with the same intrinsic parameters as our sensor and the artifacts it generates. Second, we adapt a state-of-the-art fully convolutional network to range images of the human body in order for it to transfer its learning toward 3D spatial information instead of light intensities. Third, we quantitatively demonstrate the importance of simulating sensor-specific artifacts in the training set to improve the robustness of the segmentation of actual range images. Finally, we show the capability of the model to accurately segment human body parts on real range image sequences acquired by our structured light sensor, with high inter-frame consistency and in real-time. |
---|