Title: Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Authors: Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin
Published: 5th May 2018 (Saturday) @ 00:47:01
Link: http://arxiv.org/abs/1805.01978v1

Abstract

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances? We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin. Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.


Introduces the non-parametric softmax which is better called the self-supervised pretext task of instance discrimination where the CNN backbone is trained to recognise itself and distinguish all other samples (no augmentations). The complexity of the softmax denominator and circumvented via Noise-Contrastive Estimation and there is additional regularization used (refer back to to the paper for details on and the role of the regularization in performance)

Seminal paper. I believe this introduces the Instance Discrimination pretext task that is basically just varied in many subsequent papers e.g. SimCLR, which iterates on it by introducing augmentations and other improvements