Description
关于行人检测中的常用数据集
栏目:行业资讯 发布时间:2024-07-16
 Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parser

  Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution

  and clean images. However, directly applying the parsers

  trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport

  or workplace, often gives non-satisfactory performance due

  to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source

  domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our

  proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain

  differences. A discriminative feature adversarial network is

  introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions

  of two domains. Besides, our proposed model also introduces

  a structured label adversarial network to guide the parsing

  results of the target domain to follow the high-order relationships of the structured labels shared across domains. The

  proposed framework is end-to-end trainable, practical and

  scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target

  domains. The results consistently confirm data efficiency and

  performance advantages of the proposed method for the challenging cross-domain human parsing problem.

  Abstract—This paper presents a robust Joint Discriminative

  appearance model based Tracking method using online random

  forests and mid-level feature (superpixels). To achieve superpixel-

  wise discriminative ability, we propose a joint appearance model

  that consists of two random forest based models, i.e., the

  Background-Target discriminative Model (BTM) and Distractor-

  Target discriminative Model (DTM). More specifically, the BTM

  effectively learns discriminative information between the target

  object and background. In contrast, the DTM is used to suppress

  distracting superpixels which significantly improves the tracker’s

  robustness and alleviates the drifting problem. A novel online

  random forest regression algorithm is proposed to build the

  two models. The BTM and DTM are linearly combined into

  a joint model to compute a confidence map. Tracking results

  are estimated using the confidence map, where the position

  and scale of the target are estimated orderly. Furthermore,

  we design a model updating strategy to adapt the appearance

  changes over time by discarding degraded trees of the BTM and

  DTM and initializing new trees as replacements. We test the

  proposed tracking method on two large tracking benchmarks,

  the CVPR2013 tracking benchmark and VOT2014 tracking

  challenge. Experimental results show that the tracker runs at

  real-time speed and achieves favorable tracking performance

  compared with the state-of-the-art methods. The results also sug-

  gest that the DTM improves tracking performance significantly

  and plays an important role in robust tracking.