Anil Keshwani ✍️

Nov 21, 2022
Information Theory
Sketches of some concepts from Information Theory. Readers are referred to Shannon’s original 1948 paper A Mathematical Theory of Communication.
Jan 27, 2022
The Hierarchical Softmax
The Hierarchical Softmax is useful for efficient classification as it has logarithmic time complexity in the number of output classes, $log(N)$ for $N$ output classes. This utility is pronounced in the context of language modelling where words must be predicted over time steps to generate a sentence, for example, by a decoder that selects them from a vocabulary which could be in the order of $\vert \mathbf{V} \vert = 30,000$.
Nov 23, 2021
CPC: Representation Learning with Contrastive Predictive Coding
Notes on Representation Learning with Contrastive Predictive Coding (CPC) by Aaron van den Oord, Yazhe Li and Oriol Vinyals.
Nov 20, 2021
Self-Supervised Visual Representation Learning
This post consolidates several literature summaries from the field of self-supervised visual representation learning.
Nov 11, 2021
Four Early Lessons from Working on Machine Learning Projects
Some high-level reflections from working on a Computer Vision project in PyTorch.
Nov 3, 2021
The Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
I discussed the Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu and colleagues published at ICCV ‘21 at the PINLab Reading Group on the 3d November 2021.
Sep 6, 2021
LSTMs + Grammar as a Foreign Language
A short explanation of long short-term memory networks (LSTMs), a form of recurrent neural network (RNN) and a breakdown of Vinyals et al. (2015) Grammar as a Foreign Language, which uses LSTMs with attention to perform syntactic constituency parsing.
Jun 14, 2021
Generalized Linear Models and the Exponential Family
An introduction to the Exponential Family of probability distributions. Familiarity with the exponential family is the basis for understanding the Generalized Linear Modeling (GLM) framework, which includes logistic and log regression models for binary (Binomial) and count (Poisson) data.
May 12, 2021
Mean, Median and Mode as Representatives
This brief post discusses the underlying basis of the three measures of central tendency that we typically use to represent a distribution, sample or population: the mean, median and mode, prompted by a passing comment made by my Bayesian statistics professor.
Apr 19, 2021
Bayes: Conjugate Inference
Bayesian inference for the cases when the data generating process allows for a conjugate setup. Conjugacy in Bayesian statistics is the scenario when the prior and posterior distributions belong to the same family, for example Beta in the case of binary outcome data (Binomial likelihood) or gamma in the case of count data (Poisson likelihood).
Apr 6, 2021
Graphs: Community Structure
Graph communities are sets of tightly-connected nodes, which can usefully represent groups of entities characterised by proximity or direct interaction. These may be people grouped according to social relationships or proteins interacting for a given metabolic process. In this post we look at a fast heuristic used to extract non-overlapping communities (the Louvain algorithm) and a method for overlapping community detection, which relies on maximising the likelihood of observing a graph given relaxed generative models of node-community membership.
Apr 1, 2021
Graphs: Motifs, Graphlets and Structural Roles in Networks
Networks and nodes can be characterised and compared by finding and profiling their network motifs (subgraphs), specifically induced subgraphs and graphlets (connected subgraphs). The structural roles of nodes can be learned in an unsupervised way via recursive node feature extraction and clustering, and these techniques can be combined with subgraph-based analysis.
Mar 31, 2021
Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk
I discussed Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk published at NeurIPS at the March 31st session of the PINLab Reading Group. It proposes a self-supervised method (constructing palindromic video cycles) for learning representations for visual correspondences across time, which can then be used for labelling e.g. of objects, semantic labels or pose keypoints.
Mar 17, 2021
The Probability Distributions
This post introduces a few of the most commonly encountered families of probability distributions: the Poisson, which models rates, and the geometric distribution, which models waiting times for discrete stochastic processes and the exponential, which is a continuous analogue of the geometric relating closely to the Poisson family.
Apr 24, 2015
An Evolutionary Perspective on Language
In this post, I look at some aspects of human language that are claimed to be unique to our linguistic communication, as opposed to existing in some analogue or homologue form in the systems that other animals use to communicate. I’ll review some of the evidence that leads researchers in linguistics and behavioural ecology to make such claims starting with a brief summary of animal signals and communication and then moving into human language to discuss its purportedly unique hallmarks.
Feb 17, 2015
Animal Navigation Systems
Animals display an amazingly reliable, accurate and sometimes mysterious capacity to navigate. When this is done over large distances such as during migrations, without the goal in sight or the cognitive ability to encode all geographical information, we must ask how they achieve such a feat, especially considering the need to correct for wind or oceanic currents and in the absence of discernible landmarks.