Text
Fine-Grained Human-Centric Tracklet Segmentation with Single Frame Supervision
In this paper, we target at the Fine-grAined human-Centric Tracklet Segmentation (FACTS) problem, where 12 human parts, e.g., face, pants, left-leg, are segmented. To reduce the heavy and tedious labeling efforts, FACTS requires only one labeled frame per video during training. The small size of human parts and the labeling scarcity makes FACTS very challenging. Considering adjacent frames of videos are continuous and human usually do not change clothes in a short time, we explicitly consider the pixel-level and frame-level context in the proposed Temporal Context segmentation Network (TCNet). On the one hand, optical flow is on-line calculated to propagate the pixel-level segmentation results to neighboring frames. On the other hand, frame-level classification likelihood vectors are also propagated to nearby frames. By fully exploiting the pixel-level and frame-level context, TCNet indirectly uses the large amount of unlabeled frames during training and produces smooth segmentation results during inference. Experimental results on four video datasets show the superiority of TCNet over the state-of-the-arts. The newly annotated datasets can be downloaded via http://liusi-group.com/projects/FACTS for the further studies.
Barcode | Tipe Koleksi | Nomor Panggil | Lokasi | Status | |
---|---|---|---|---|---|
art141694 | null | Artikel | Gdg9-Lt3 | Tersedia namun tidak untuk dipinjamkan - No Loan |
Tidak tersedia versi lain