• Information Theory

    Sketches of some concepts from Information Theory. Readers are referred to Shannon’s original 1948 paper A Mathematical Theory of Communication.

  • The Hierarchical Softmax

    The Hierarchical Softmax is useful for efficient classification as it has logarithmic time complexity in the number of output classes, $log(N)$ for $N$ output classes. This utility is pronounced in the context of language modelling where words must be predicted over time steps to generate a sentence, for example, by a decoder that selects them from a vocabulary which could be in the order of $\vert \mathbf{V} \vert = 30,000$.

  • CPC: Representation Learning with Contrastive Predictive Coding

    Notes on Representation Learning with Contrastive Predictive Coding (CPC) by Aaron van den Oord, Yazhe Li and Oriol Vinyals.

  • Self-Supervised Visual Representation Learning

    This post consolidates several literature summaries from the field of self-supervised visual representation learning.

  • Four Early Lessons from Working on Machine Learning Projects

    Some high-level reflections from working on a Computer Vision project in PyTorch.

  • The Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

    I discussed the Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu and colleagues published at ICCV ‘21 at the PINLab Reading Group on the 3d November 2021.

  • LSTMs + Grammar as a Foreign Language

    A short explanation of long short-term memory networks (LSTMs), a form of recurrent neural network (RNN) and a breakdown of Vinyals et al. (2015) Grammar as a Foreign Language, which uses LSTMs with attention to perform syntactic constituency parsing.

  • Generalized Linear Models and the Exponential Family

    An introduction to the Exponential Family of probability distributions. Familiarity with the exponential family is the basis for understanding the Generalized Linear Modeling (GLM) framework, which includes logistic and log regression models for binary (Binomial) and count (Poisson) data.

  • Mean, Median and Mode as Representatives

    This brief post discusses the underlying basis of the three measures of central tendency that we typically use to represent a distribution, sample or population: the mean, median and mode, prompted by a passing comment made by my Bayesian statistics professor.

  • Bayes: Conjugate Inference

    Bayesian inference for the cases when the data generating process allows for a conjugate setup. Conjugacy in Bayesian statistics is the scenario when the prior and posterior distributions belong to the same family, for example Beta in the case of binary outcome data (Binomial likelihood) or gamma in the case of count data (Poisson likelihood).

  • Graphs: Community Structure

    Graph communities are sets of tightly-connected nodes, which can usefully represent groups of entities characterised by proximity or direct interaction. These may be people grouped according to social relationships or proteins interacting for a given metabolic process. In this post we look at a fast heuristic used to extract non-overlapping communities (the Louvain algorithm) and a method for overlapping community detection, which relies on maximising the likelihood of observing a graph given relaxed generative models of node-community membership.

  • Graphs: Motifs, Graphlets and Structural Roles in Networks

    Networks and nodes can be characterised and compared by finding and profiling their network motifs (subgraphs), specifically induced subgraphs and graphlets (connected subgraphs). The structural roles of nodes can be learned in an unsupervised way via recursive node feature extraction and clustering, and these techniques can be combined with subgraph-based analysis.

  • Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk

    I discussed Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk published at NeurIPS at the March 31st session of the PINLab Reading Group. It proposes a self-supervised method (constructing palindromic video cycles) for learning representations for visual correspondences across time, which can then be used for labelling e.g. of objects, semantic labels or pose keypoints.

  • The Probability Distributions

    This post introduces a few of the most commonly encountered families of probability distributions: the Poisson, which models rates, and the geometric distribution, which models waiting times for discrete stochastic processes and the exponential, which is a continuous analogue of the geometric relating closely to the Poisson family.

  • An Evolutionary Perspective on Language

    In this post, I look at some aspects of human language that are claimed to be unique to our linguistic communication, as opposed to existing in some analogue or homologue form in the systems that other animals use to communicate. I’ll review some of the evidence that leads researchers in linguistics and behavioural ecology to make such claims starting with a brief summary of animal signals and communication and then moving into human language to discuss its purportedly unique hallmarks.

  • Animal Navigation Systems

    Animals display an amazingly reliable, accurate and sometimes mysterious capacity to navigate. When this is done over large distances such as during migrations, without the goal in sight or the cognitive ability to encode all geographical information, we must ask how they achieve such a feat, especially considering the need to correct for wind or oceanic currents and in the absence of discernible landmarks.