Notes / Remember Remember / Q & A

  • How do I find the location of my Python site-packages directory?
  • f.write(string) writes the contents of string to the file, returning the number of characters written: f.write('This is a test\n') evaluates to 15
  • Python Closures Common Use Cases and Examples
    • To create a Python closure, you need the following components:
      1. **An outer or enclosing function
      2. **Variables that are local to the outer function
      3. **An inner or nested function
    • update the value of a variable that points to an immutable object, you need to use the nonlocal statement
    • Use cases of closures:
      • factory functions (i.e. function factories!)
      • functions with state e.g. running average / exponential moving average (EMA)
        • for a lot of these (i.e. any non-trivial ones?) I suppose a class might be better suited
      • callbacks: “Closures are commonly used in event-driven programming when you need to create callback functions that carry additional context or state information.”
      • decorators: Python has two types of decorators: function-based and class-based. “A function-based decorator is a function that takes a function object as an argument and returns another function object with extended functionality. This latter function object is also a closure. So, to create function-based decorators, you use closures.”
      • Memoization e.g. LRU cache (via a decorator; @lru_cache)
    • You can replace a closure with a class that produces callable instances by implementing the .__call__() special method 👈 alternative to closures

Getting the Size of Objects (in Bytes)

From the Python docs:

sys.getsizeof(object[, default])

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

— Source: https://docs.python.org/3/library/sys.html#sys.getsizeof

Seeking in Files

f.tell() returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode.

To change the file object’s position, use f.seek(offset, whence). The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.

>>>
 
>>> f = open('workfile', 'rb+')
>>> f.write(b'0123456789abcdef')
16
>>> f.seek(5)      # Go to the 6th byte in the file
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2)  # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.

File objects have some additional methods, such as isatty() and truncate() which are less frequently used; consult the Library Reference for a complete guide to file objects.

— Source: https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

open(..., newline='') when opening files for use with csv.reader from std lib

Per the docs for csv from the standard library, you should open files with the newline argument set: newline=''

Return a reader object that will process lines from the given csvfile. A csvfile must be an iterable of strings, each in the reader’s defined csv format. A csvfile is most commonly a file-like object or list. If csvfile is a file object, it should be opened with newline=''. [1]

The footnote reads:

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Learnt this thanks to ChatGPT when editing the ‘Dotfiles Symlink’ canvas.

Asterisk (*) and forward slash in function arguments /

Example:

def func(self, param1, param2, /, param3, *, param4, param5):
     print(param1, param2, param3, param4, param5)
  1. param1, param2 must be specified positionally.
  2. param3 can be called either with positional or keyword argument.
  3. param4 and param5 must be called with keyword argument.

See What do * (single star) and / (slash) do as independent parameters? [duplicate]

What is the @property decorator in Python used for? What are some best practices around its usage?

The @property decorator in Python is used to define “getter” methods for class attributes, which allow you to access a class attribute like a property (i.e., without using parentheses like a method call) and also enables you to add extra functionality or validation when accessing it. This is a part of the concept of “encapsulation” in object-oriented programming, where the internal representation of an object is hidden from the outside world, and access to it is controlled through methods.

Here are some best practices around its usage:

  1. Use @property when you want to compute a value or perform some action when an attribute is accessed. For example, you may want to lazily compute a value, or ensure that an attribute is always within a certain range.
  2. Use @property to provide a consistent API for your class, even if the internal representation changes. By using getters and setters, you can change the internal representation of an object without affecting the code that uses it.
  3. Make sure the getter method does not have any side effects. The @property getter should be idempotent, meaning that calling it multiple times should not change the state of the object or have any side effects.
  4. Keep the logic inside the getter simple. If you find yourself adding complex logic to the getter, it might be better to move that logic to a separate method and call that method from the getter.
  5. Use the @<attribute>.setter decorator to define a “setter” method for a property. This allows you to add extra functionality or validation when setting the value of the attribute, such as type checking or ensuring that the value is within a certain range.

Here’s an example of using @property and @<attribute>.setter decorators:

class Circle:
    def __init__(self, radius):
        self._radius = radius
 
    @property
    def radius(self):
        return self._radius
 
    @radius.setter
    def radius(self, value):
        if value < 0:
            raise ValueError("Radius cannot be negative")
        self._radius = value
 
    @property
    def diameter(self):
        return self._radius * 2
 
    @property
    def area(self):
        return 3.14159 * (self._radius ** 2)

In this example, we have a Circle class with radius, diameter, and area properties. The radius property has a getter and a setter, while diameter and area have only getters. The radius setter also includes validation to ensure that the radius is never set to a negative value.

try
except
else
finally

try:
       # Some Code.... 
 
except:
       # optional block
       # Handling of exception (if required)
 
else:
       # execute if no exception
 
finally:
      # Some code .....(always executed, _even if you return in the except block_)

Note: The the utility of the finally block is that it always executes, even when you return from the except block. Source: geeksforgeeks.

Use and Definition/Structure of __main__.py

Example from nvitop.

(main) anilkeshwani@Anils-MacBook-Air:~/miniconda3/envs/main/lib/python3.12/site-packages/nvitop$ dog __main__.py:

# This file is part of nvitop, the interactive NVIDIA-GPU process viewer.
# License: GNU GPL version 3.
 
"""The interactive NVIDIA-GPU process viewer."""
 
import sys
 
from nvitop.cli import main
 
 
if __name__ == '__main__':
    sys.exit(main())

Aquestion on this is that it seemingly conflicts with the official Python 3 advice: [[main — Top-level code environment#idiomatic-usage|Idiomatic Usage]]. They recommend a __main__.py like the one in the venv package from the standard library, without a if __name__=="__main__" guard:

import sys
from . import main
 
rc = 1
try:
    main()
    rc = 0
except Exception as e:
    print('Error: %s' % e, file=sys.stderr)
sys.exit(rc)

(main) anilkeshwani@Anils-MacBook-Air:~/miniconda3/envs/main/lib/python3.12/site-packages/nvitop$ dog $(which nvitop):

#!/Users/anilkeshwani/miniconda3/envs/main/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from nvitop.cli import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

Hydra

Hydra Changes the Logging Level for You
 How to Stop It!

See Logging from the Hydra v1.3 docs or clipped version: Logging Hydra

People often do not use Python logging due to the setup cost. Hydra solves this by configuring the Python logging for you.

By default, Hydra logs at the INFO level to both the console and a log file in the automatic working directory.

– Source: Logging from the Hydra v1.3 docs

How to stop it:

You can also set hydra/job_logging=none and hydra/hydra_logging=none if you do not want Hydra to configure the logging.

Testing in Python

  • -The doctest module provides a tool for scanning a module and validating tests embedded in a program’s docstrings. Test construction is as simple as cutting-and-pasting a typical call along with its results into the docstring. — Source: Python Tutorial §10.11. Quality Control

Unit Testing in Python

The unittest module allows a comprehensive set of tests to be maintained in a separate file:

import unittest
 
class TestStatisticalFunctions(unittest.TestCase):
 
    def test_average(self):
        self.assertEqual(average([20, 30, 70]), 40.0)
        self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
        with self.assertRaises(ZeroDivisionError):
            average([])
        with self.assertRaises(TypeError):
            average(20, 30, 70)
 
unittest.main()  # Calling from the command line invokes all tests

Debugging

Debugging Segmentation Faults (segfaults)

Segmentation faults in Python code occur when an underlying C library crashes. Sadly, a helpful error message is not always (never?) shown, nor the offending library.

One way to find out which program is causing the segfault is to capture the stacktrace of the C program by using gdb (GNU Debugger).

# ensure you have `gdb` installed
sudo apt install -y gdb
 
# run the program to debug with gdb, in most of our cases `python`
gdb python
 
# this will put you inside the gdb console, type `run` to start the program, potentially passing args
run -c "import pytorch_lightning"
 
# then, when the segfault occurs, you can type `bt` to print the stacktrace
bt
 
# this will print something like this:
# 0x00007fff6391e1e1 in google::protobuf::internal::ReflectionOps::FindInitializationErrors(google::protobuf::Message const&, std::string const&, std::vector<std::string, std::allocator<std::string> >*) () from /home/michael/miniconda3/envs/nemo/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so

This indicates the google-protobuf library is the culprit. Now you can google “${library} segfault” to see if anyone has a similar problem (and ideally solution), though usually the solution is to up/downgrade the library.

Another way to run gdb is to use gdb --args to pass the arguments to the program.

gdb --args python <your_script>.py ARG1 ARG2 ...

As before, type run and then bt when the segfault occurs.

This way you can run a complicated training command and debug it (the full command that caused the segfault is often printed above the error message).

Garbage Collection

Garbage collection in Python: things you need to know

gc — Garbage Collector interface

To debug a leaking program call gc.set_debug(gc.DEBUG_LEAK). Notice that this includes gc.DEBUG_SAVEALL, causing garbage-collected objects to be saved in gc.garbage for inspection.

Introduction

Objects, Types and Reference Counts

Most Python/C API functions have one or more arguments as well as a return value of type PyObject. This type is a pointer to an opaque data type representing an arbitrary Python object. Since all Python object types are treated the same way by the Python language in most situations (e.g., assignments, scope rules, and argument passing), it is only fitting that they should be represented by a single C type. Almost all Python objects live on the heap: you never declare an automatic or static variable of type [PyObject](https://docs.python.org/3.11/c-api/structures.html#c.PyObject), only pointer variables of type PyObject can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically static [PyTypeObject](https://docs.python.org/3.11/c-api/type.html#c.PyTypeObject) objects.

All Python objects (even Python integers) have a type and a reference count. An object’s type determines what kind of object it is (e.g., an integer, a list, or a user-defined function; there are many more as explained in The standard type hierarchy). For each of the well-known types there is a macro to check whether an object is of that type; for instance, PyList_Check(a) is true if (and only if) the object pointed to by a is a Python list.

GIL

Memory

Performance Optimisation

Performance optimisations from Talk - Anthony Shaw: Write faster Python! Common performance anti patterns - YouTube:

  1. Loop-invariant statements - Python’s compiler does not do loop invariant code motion (aka “hoisting”) for you
  2. Use comprehensions - including set, dictionary and other comprehensions (creating generators)
  3. Select the right data structures - custom classes > dataclasses > named tuple
  4. Avoid tiny functions in hot areas of code - especially hot loops - Python has big overhead to function calls even as of 3.11

How to go about optimising

  • Sample correctly and track regressions - track regressions / speedups on a commit-by-commit basis
  • Catch issues in code review - bring the team in
  • Use perflint to highlight performance anti-patterns

Profiling

  • Austin
  • Scalene
  • cProfile

See also CPU + GPU profiling.

Type Hints (Type Annotations) and mypy

What is TypeVar for and can you give me an example

TypeVar is a generic type that is used in Python to define type variables. It allows you to create generic functions or classes that can operate on different types without specifying the exact type explicitly. This enables you to write more flexible and reusable code.

Here’s a simple example to illustrate the usage of TypeVar:

from typing import TypeVar, List
 
T = TypeVar('T')  # Define a type variable T
 
def reverse_list(lst: List[T]) -> List[T]:
    return lst[::-1]
 
# Usage
int_list = [1, 2, 3, 4, 5]
reversed_int_list = reverse_list(int_list)
print(reversed_int_list)  # Output: [5, 4, 3, 2, 1]
 
str_list = ['apple', 'banana', 'cherry']
reversed_str_list = reverse_list(str_list)
print(reversed_str_list)  # Output: ['cherry', 'banana', 'apple']

In this example, we define a generic function reverse_list that takes a list of any type T and returns a reversed list of the same type. We use TypeVar('T') to create the type variable T, which represents an arbitrary type. When the function is called, the actual type of the list is inferred based on the argument passed, and the reversed list is returned with the same type.

By using TypeVar, you can create generic functions or classes that can operate on different types without explicitly specifying the type, making your code more flexible and reusable.

Linting & Linters

Modules

train.svg

  • Example: pydeps --show-cycles utils[.py](http://train.py) running on sushi/sushi/utils.py

utils.svg

Python Source

typing accelerator C extension: _typing module @ cpython/Modules/_typingmodule.c: Example of a module importable into Python not written in or present in native Python code

See reference to this module (classes defined therein being imported) at

Work seems to be done here:

static struct PyModuleDef typingmodule = {
        PyModuleDef_HEAD_INIT,
        "_typing",
        typing_doc,
        0,
        typing_methods,
        _typingmodule_slots,
        NULL,
        NULL,
        NULL
};
 
PyMODINIT_FUNC
PyInit__typing(void)
{
    return PyModuleDef_Init(&typingmodule);
}