Revisiting Feature Prediction for Learning Visual Representations from Video | Research - AI at Meta
Excerpt
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of visionâŠ
Related Publications
RANKING AND RECOMMENDATIONS
CORE MACHINE LEARNING
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning
Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1Ă speedup in training time.
Danny Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Chris Leung (AI), Jianbo Li, Rajgopal Kannan, Viktor Prasanna
CORE MACHINE LEARNING
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK Work Decomposition
We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65\% speed improvement on A100, and an average of 124\% speed improvement on H100 (with a peak of 295\%) for a range of matrix dimensions including those found in a llama-style model, where m < n = k.
Less Wright, Adnan Hoque
RANKING AND RECOMMENDATIONS
REINFORCEMENT LEARNING
Learning to bid and rank together in recommendation systems
Many Internet applications adopt real-time bidding mechanisms to ensure different services (types of content) are shown to the users through fair competitions. The service offering the highest bid price gets the content slot to present a list of items in its candidate pool. Through user interactions with the recommended items, the service obtains the desired engagement activities. We propose a contextual-bandit framework to jointly optimize the price to bid for the slot and the order to rank its candidates for a given service in this type of recommendation systems. Our method can take as input any feature that describes the user and the candidates, including the outputs of other machine learning models. We train reinforcement learning policies using deep neural networks, and compute top-K Gaussian propensity scores to exclude the variance in the gradients caused by randomness unrelated to the reward. This setup further facilitates us to automatically find accurate reward functions that trade off between budget spending and user engagements. In online A/B experiments on two major services of Facebook Home Feed, Groups You Should Join and Friend Requests, our method statistically significantly boosted the number of groups joined by 14.7%, the number of friend requests accepted by 7.0%, and the number of daily active Facebook users by about 1 million, against strong hand-tuned baselines that have been iterated in production over years.
Geng Ji, Wentao Jiang, Jiang Li, Fahmid Morshed Fahid, Zhengxing Chen, Yinghua Li, Jun Xiao, Chongxi Bao, Zheqing (Bill) Zhu
REINFORCEMENT LEARNING
CORE MACHINE LEARNING
TaskMet: Task-driven Metric Learning for Model Learning
Deep learning models are often used with some downstream task. Models solely trained to achieve accurate predictions may struggle to perform well on the desired downstream tasks. We propose using the task loss to learn a metric which parameterizes a loss to train the model. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states.
Dishank Bansal, Ricky Chen, Mustafa Mukadam, Brandon Amos