SOFTWARE: ASP.NET | VB.NET | C#.NET | RAZOR MVC 4 ASP.NET | RESTful Web services
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
As described above, most available video face databases are rather small and lack diversity in facial variations compared to still face image databases. We propose artificially generating video-like face data from existing large-scale still face image databases. Specifically, we simulate two major challenges during surveillance or mobile camera imaging: motion blur and out-of-focus blur.
For simplicity, we assume the blur kernels are uniform since faces usually occupy a small area in video frames. Due to face movement or mobile device camera shake during exposure, motion blur often appears in video frames.
We propose the Trunk-Branch Ensemble CNN (TBE- CNN) model to efficiently learn pose- and occlusion- robust face representations. TBE-CNN incorporates one trunk network and several branch networks. The trunk network is trained to learn face representations for holis- tic face images, and each branch network is trained to learn face representations for image patches cropped from one facial component.
In this paper, the trunk net- work implementation is based on GoogLeNet and results of experiments based on VGG Net are avail- able in the supplementary appendix. The most important parameters of the trunk network are tabulated in Table 1. For the other model parameters, we directly follow its original configuration
Compared to still image-based face recognition (SIFR), VFR is significantly more challenging. Images in stan- dard SIFR datasets are usually captured under good con- ditions or even framed by professional photographers, e.g., in the Labeled Faces in the Wild (LFW) database . In comparison, the image quality of video frames tends to be significantly lower and faces exhibit much richer variations because video acquisition is much less constrained.In particular, subjects in videos are usually mobile, resulting in serious motion blur, out-of-focus blur, and a large range of pose variations. surveillance and mobile cameras are often low-cost (and therefore low-quality) devices, which further exacerbates problems with video frame clarity
Here we approach the blur-robust representation learning problem from the perspective of training data. Since the volume of real-world video training data is small, we propose simulating large amounts of video frames from existing still face image databases. Dur- ing training, we provide CNN with two training data streams: one composed of still face images, and the other composed of simulated video frames created by apply- ing random artificial blur to the first stream. The network aims to classify each still image and its artificially blurred version into the same class;
therefore, the learnt face representations must be blur-insensitive. To the best of our knowledge, this is the first CNN-based approach to solve the image blur problem in VFR. To learn pose- and occlusion-robust representation- s for VFR efficiently, we propose a novel end-to-end ensemble CNN model called Trunk-Branch Ensemble CNN (TBE-CNN). TBE-CNN includes one trunk net- work and several branch networks.
In this work, three new location privacy metrics that can capture the influence on privacy of the ambient environment are proposed. In addition, a stochastic model based on reflected random walk is developed to characterize the spatial variation of the location privacy along the user’s route. Based on this modeling, a new optimal stopping based privacy-aware LBS access algorithm that allows the mobile users to fully leverage the spatial diversity of location privacy is developed.
Corre- sponding analysis shows that the optimal stopping decision and values of the privacy-aware LBS access problem can be obtained through iterated computations. Results of both numerical and real-world examples show that the proposed scheme can achieve a significantly better performance as compared to the baseline approach.