Notes / Remember Remember / Q & A

Asterisk (*) and forward slash in function arguments /

Example:

def func(self, param1, param2, /, param3, *, param4, param5):
     print(param1, param2, param3, param4, param5)
  1. param1, param2 must be specified positionally.
  2. param3 can be called either with positional or keyword argument.
  3. param4 and param5 must be called with keyword argument.

See What do * (single star) and / (slash) do as independent parameters? [duplicate]

What is the @property decorator in Python used for? What are some best practices around its usage?

The @property decorator in Python is used to define “getter” methods for class attributes, which allow you to access a class attribute like a property (i.e., without using parentheses like a method call) and also enables you to add extra functionality or validation when accessing it. This is a part of the concept of “encapsulation” in object-oriented programming, where the internal representation of an object is hidden from the outside world, and access to it is controlled through methods.

Here are some best practices around its usage:

  1. Use @property when you want to compute a value or perform some action when an attribute is accessed. For example, you may want to lazily compute a value, or ensure that an attribute is always within a certain range.
  2. Use @property to provide a consistent API for your class, even if the internal representation changes. By using getters and setters, you can change the internal representation of an object without affecting the code that uses it.
  3. Make sure the getter method does not have any side effects. The @property getter should be idempotent, meaning that calling it multiple times should not change the state of the object or have any side effects.
  4. Keep the logic inside the getter simple. If you find yourself adding complex logic to the getter, it might be better to move that logic to a separate method and call that method from the getter.
  5. Use the @<attribute>.setter decorator to define a “setter” method for a property. This allows you to add extra functionality or validation when setting the value of the attribute, such as type checking or ensuring that the value is within a certain range.

Here’s an example of using @property and @<attribute>.setter decorators:

class Circle:
    def __init__(self, radius):
        self._radius = radius
 
    @property
    def radius(self):
        return self._radius
 
    @radius.setter
    def radius(self, value):
        if value < 0:
            raise ValueError("Radius cannot be negative")
        self._radius = value
 
    @property
    def diameter(self):
        return self._radius * 2
 
    @property
    def area(self):
        return 3.14159 * (self._radius ** 2)

In this example, we have a Circle class with radius, diameter, and area properties. The radius property has a getter and a setter, while diameter and area have only getters. The radius setter also includes validation to ensure that the radius is never set to a negative value.

try
except
else
finally

try:
       # Some Code.... 
 
except:
       # optional block
       # Handling of exception (if required)
 
else:
       # execute if no exception
 
finally:
      # Some code .....(always executed, _even if you return in the except block_)

Note: The the utility of the finally block is that it always executes, even when you return from the except block. Source: geeksforgeeks.

Debugging

Debugging Segmentation Faults (segfaults)

Segmentation faults in Python code occur when an underlying C library crashes. Sadly, a helpful error message is not always (never?) shown, nor the offending library.

One way to find out which program is causing the segfault is to capture the stacktrace of the C program by using gdb (GNU Debugger).

# ensure you have `gdb` installed
sudo apt install -y gdb
 
# run the program to debug with gdb, in most of our cases `python`
gdb python
 
# this will put you inside the gdb console, type `run` to start the program, potentially passing args
run -c "import pytorch_lightning"
 
# then, when the segfault occurs, you can type `bt` to print the stacktrace
bt
 
# this will print something like this:
# 0x00007fff6391e1e1 in google::protobuf::internal::ReflectionOps::FindInitializationErrors(google::protobuf::Message const&, std::string const&, std::vector<std::string, std::allocator<std::string> >*) () from /home/michael/miniconda3/envs/nemo/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so

This indicates the google-protobuf library is the culprit. Now you can google “${library} segfault” to see if anyone has a similar problem (and ideally solution), though usually the solution is to up/downgrade the library.

Another way to run gdb is to use gdb --args to pass the arguments to the program.

gdb --args python <your_script>.py ARG1 ARG2 ...

As before, type run and then bt when the segfault occurs.

This way you can run a complicated training command and debug it (the full command that caused the segfault is often printed above the error message).

Garbage Collection

Garbage collection in Python: things you need to know

gc — Garbage Collector interface

To debug a leaking program call gc.set_debug(gc.DEBUG_LEAK). Notice that this includes gc.DEBUG_SAVEALL, causing garbage-collected objects to be saved in gc.garbage for inspection.

Introduction

Objects, Types and Reference Counts

Most Python/C API functions have one or more arguments as well as a return value of type PyObject. This type is a pointer to an opaque data type representing an arbitrary Python object. Since all Python object types are treated the same way by the Python language in most situations (e.g., assignments, scope rules, and argument passing), it is only fitting that they should be represented by a single C type. Almost all Python objects live on the heap: you never declare an automatic or static variable of type [PyObject](https://docs.python.org/3.11/c-api/structures.html#c.PyObject), only pointer variables of type PyObject can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically static [PyTypeObject](https://docs.python.org/3.11/c-api/type.html#c.PyTypeObject) objects.

All Python objects (even Python integers) have a type and a reference count. An object’s type determines what kind of object it is (e.g., an integer, a list, or a user-defined function; there are many more as explained in The standard type hierarchy). For each of the well-known types there is a macro to check whether an object is of that type; for instance, PyList_Check(a) is true if (and only if) the object pointed to by a is a Python list.

GIL

Memory

Memory management in Python

Python Internals

Performance Optimisation

Performance optimisations from Talk - Anthony Shaw: Write faster Python! Common performance anti patterns - YouTube:

  1. Loop-invariant statements - Python’s compiler does not do loop invariant code motion (aka “hoisting”) for you
  2. Use comprehensions - including set, dictionary and other comprehensions (creating generators)
  3. Select the right data structures - custom classes > dataclasses > named tuple
  4. Avoid tiny functions in hot areas of code - especially hot loops - Python has big overhead to function calls even as of 3.11

How to go about optimising

  • Sample correctly and track regressions - track regressions / speedups on a commit-by-commit basis
  • Catch issues in code review - bring the team in
  • Use perflint to highlight performance anti-patterns

Profiling

  • Austin
  • Scalene
  • cProfile

See also CPU + GPU profiling.

Typing

What is TypeVar for and can you give me an example

TypeVar is a generic type that is used in Python to define type variables. It allows you to create generic functions or classes that can operate on different types without specifying the exact type explicitly. This enables you to write more flexible and reusable code.

Here’s a simple example to illustrate the usage of TypeVar:

from typing import TypeVar, List
 
T = TypeVar('T')  # Define a type variable T
 
def reverse_list(lst: List[T]) -> List[T]:
    return lst[::-1]
 
# Usage
int_list = [1, 2, 3, 4, 5]
reversed_int_list = reverse_list(int_list)
print(reversed_int_list)  # Output: [5, 4, 3, 2, 1]
 
str_list = ['apple', 'banana', 'cherry']
reversed_str_list = reverse_list(str_list)
print(reversed_str_list)  # Output: ['cherry', 'banana', 'apple']

In this example, we define a generic function reverse_list that takes a list of any type T and returns a reversed list of the same type. We use TypeVar('T') to create the type variable T, which represents an arbitrary type. When the function is called, the actual type of the list is inferred based on the argument passed, and the reversed list is returned with the same type.

By using TypeVar, you can create generic functions or classes that can operate on different types without explicitly specifying the type, making your code more flexible and reusable.

Modules

train.svg

  • Example: pydeps --show-cycles utils[.py](http://train.py) running on sushi/sushi/utils.py

utils.svg

Python Source

typing accelerator C extension: _typing module @ cpython/Modules/_typingmodule.c: Example of a module importable into Python not written in or present in native Python code

See reference to this module (classes defined therein being imported) at

Work seems to be done here:

static struct PyModuleDef typingmodule = {
        PyModuleDef_HEAD_INIT,
        "_typing",
        typing_doc,
        0,
        typing_methods,
        _typingmodule_slots,
        NULL,
        NULL,
        NULL
};
 
PyMODINIT_FUNC
PyInit__typing(void)
{
    return PyModuleDef_Init(&typingmodule);
}