Finding Achilles' Heel: Adversarial Attack on Multi-modal Action Recognition



Neural network-based models are notoriously known for their adversarial vulnerability. Recent adversarial machine learning mainly focused on images, where a small perturbation can be simply added to fool the learning model. Very recently, this practice has been explored in human action video attacks by adding perturbation to key frames. Unfortunately, frame selection is usually computationally expensive in run-time, and adding noises to all frames is unrealistic, either. In this paper, we present a novel yet efficient approach to address this issue. Multi-modal video data such as RGB, depth and skeleton data have been widely used for human action modeling, and they have been demonstrated with superior performance than a single modality. Interestingly, we observed that the skeleton data is more "vulnerable" under adversarial attack, and we propose to leverage this "Achilles' Heel" to attack multi-modal video data. In particular, first, an adversarial learning paradigm is designed to perturb skeleton data for a specific action under a black box setting, which highlights how body joints and key segments in videos are subject to attack. Second, we propose a graph attention model to explore the semantics between segments from different modalities and within a modality. Third, the attack will be launched in run-time on all modalities through the learned semantics. The proposed method has been extensively evaluated on multi-modal visual action datasets, including PKU-MMD and NTU-RGB+D to validate its effectiveness. [Paper]

Adversary for Social Good: Protecting Familial Privacy through Joint Adversarial Attacks



Social media has been widely used among billions of people with dramatical participation of new users every day. Among them, social networks maintain the basic social characters and host huge amount of personal data. While protecting user sensitive data is obvious and demanding, information leakage due to adversarial attacks is somehow unavoidable, yet hard to detect. For example, implicit social relation such as family information may be simply exposed by network structure and hosted face images through off-the-shelf graph neural networks (GNN), which will be empirically proved in this paper. To address this issue, in this paper, we propose a novel adversarial attack algorithm for social good. First, we start from conventional visual family understanding problem, and demonstrate that familial information can be easily exposed to attackers by connecting sneak shots to social networks. Second, to protect family privacy on social networks, we propose a novel adversarial attack algorithm that produces both adversarial features and graph under a given budget. Specifically, both features on the node and edges between nodes will be perturbed gradually such that the probe images and its family information can not be correctly identified through conventional GNN. Extensive experiments on a popular visual social dataset have demonstrated that our defense strategy can significantly mitigate the impacts of family information leakage. [Paper]

Cross-database mammographic image analysis through unsupervised domain adaptation

IEEE BigData Conference 2017


World Health Organization report shows 519,000 deaths due to breast cancer in 2014 and it was much more in 2008. Therefore, it is required to take early steps in detection and diagnosis of breast cancer to decrease the associated death rate. Computer Aided Diagnosis (CAD) is useful in mass screening of breast cancer datasets. Data mining and machine learning technologies have already achieved significant success in many knowledge engineering areas including classification, regression and clustering, and most recently, have been employed to assist the diagnosis of cancers with promising outcomes. Traditional machine learning models are characterized by training and testing data with the same input feature space and data distribution. But when distribution changes, most machine learning models need to be modified or rebuilt from scratch to work on newly collected data. In many real world applications, it is expensive or impossible to recollect the needed data and rebuild the models. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred as Transfer Learning. In this paper, we explore the usage of transfer learning, specially, unsupervised domain adaptation for breast cancer diagnosis to address the issues of fewer training data on target image dataset. On the strength of recent developed deep descriptors, we are able to adapt recent transfer learning methodologies, e.g., TCA (Transfer Component Analysis), CORAL (Correlation Alignment), BDA(Balanced Distribution Adaptation) to breast cancer diagnosis across multiple mammographic image databases including CBIS-DDSM, InBreast, MIAS, etc, and evaluate their performance. Experiments demonstrate that, without any labels in the target database, transfer learning is able to help improve the classification accuracy. [Paper]

Skeleton Based Action Recognition using Convolutional Neural Network

Data Science Practicum
data science practicum


Human action recognition on 3D skeleton representation is becoming popular due to its speed and robustness. Multiple skeleton databases under controlled environment has been collected and various methods have been proposed for action recognition using these databases. But different databases have different settings of these sensors which detects different number of joints, and also gives different data formats of these joints, etc. In this work, we make a comprehensive study to formulate the differences between these databases and also we collect our own skeleton database using Kinect V2 without any controlled or laboratory environment in order to distinguish between synthetic and non-synthetic databases. We also study how these databases behave when these are used in different conditions such as when these are trained and tested alone and when these are jointly trained and tested with our database. For this study, we have used Convolutional Neural Network (CNN) model which works on two stream integration in which it uses raw skeleton coordinates and as well as temporal differences between each frame of a video sequence. Furthermore we have formulated a live working model for action recognition on skeleton representation by making use of Kinect. [Link]