Notes / Remember Remember / Q & A
- How do I find the location of my Python site-packages directory?
f.write(string)
 writes the contents of string to the file, returning the number of characters written:f.write('This is a test\n')
evaluates to15
- Python Closures Common Use Cases and Examples
- To create a Python closure, you need the following components:
- **An outer or enclosing function
- **Variables that are local to the outer function
- **An inner or nested function
- update the value of a variable that points to an immutable object, you need to use theÂ
nonlocal
 statement - Use cases of closures:
- factory functions (i.e. function factories!)
- functions with state e.g. running average / exponential moving average (EMA)
- for a lot of these (i.e. any non-trivial ones?) I suppose a class might be better suited
- callbacks: âClosures are commonly used in event-driven programming when you need to create callback functions that carry additional context or state information.â
- decorators: Python has two types of decorators: function-based and class-based. âA function-based decorator is a function that takes a function object as an argument and returns another function object with extended functionality. This latter function object is also a closure. So, to create function-based decorators, you use closures.â
- Memoization e.g. LRU cache (via a decorator;
@lru_cache
)
- You can replace a closure with a class that produces callable instances by implementing theÂ
.__call__()
 special method đ alternative to closures
- To create a Python closure, you need the following components:
Getting the Size of Objects (in Bytes)
See in particular this post on How do I determine the size of an object in Python?
From the Python docs:
sys.getsizeof(object[, default])
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
If given, default will be returned if the object does not provide means to retrieve the size. Otherwise aÂ
TypeError
 will be raised.
getsizeof()
 calls the objectâsÂ__sizeof__
 method and adds an additional garbage collector overhead if the object is managed by the garbage collector.See recursive sizeof recipe for an example of usingÂ
getsizeof()
 recursively to find the size of containers and all their contents.
â Source: https://docs.python.org/3/library/sys.html#sys.getsizeof
Seeking in Files
f.tell()
 returns an integer giving the file objectâs current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode.
To change the file objectâs position, use f.seek(offset, whence)
. The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.
>>>
>>> f = open('workfile', 'rb+')
>>> f.write(b'0123456789abcdef')
16
>>> f.seek(5) # Go to the 6th byte in the file
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'
In text files (those opened without a b
 in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)
) and the only valid offset values are those returned from the f.tell()
, or zero. Any other offset value produces undefined behaviour.
File objects have some additional methods, such as isatty()
 and truncate()
 which are less frequently used; consult the Library Reference for a complete guide to file objects.
â Source: https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
open(..., newline='')
when opening files for use with csv.reader
from std lib
Per the docs for csv from the standard library, you should open files with the newline
argument set: newline=''
Return a reader object that will process lines from the given csvfile. A csvfile must be an iterable of strings, each in the readerâs defined csv format. A csvfile is most commonly a file-like object or list. If csvfile is a file object, it should be opened withÂ
newline=''
. [1]
The footnote reads:
IfÂ
newline=''
 is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that useÂ\r\n
 linendings on write an extraÂ\r
 will be added. It should always be safe to specifyÂnewline=''
, since the csv module does its own (universal) newline handling.
Learnt this thanks to ChatGPT when editing the âDotfiles Symlinkâ canvas.
Asterisk (*
) and forward slash in function arguments /
- The function parameter syntax(
/
) is to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments. (This is new in Python 3.8) - Documentation specifies some of the use cases/benefits of positional-only parameters
- AsteriskÂ
*
 is used to force the caller to use named arguments
Example:
def func(self, param1, param2, /, param3, *, param4, param5):
print(param1, param2, param3, param4, param5)
param1
,Âparam2
 must be specified positionally.param3
 can be called either with positional or keyword argument.param4
 andÂparam5
 must be called with keyword argument.
See What do * (single star) and / (slash) do as independent parameters? [duplicate]
What is the @property decorator in Python used for? What are some best practices around its usage?
The @property
decorator in Python is used to define âgetterâ methods for class attributes, which allow you to access a class attribute like a property (i.e., without using parentheses like a method call) and also enables you to add extra functionality or validation when accessing it. This is a part of the concept of âencapsulationâ in object-oriented programming, where the internal representation of an object is hidden from the outside world, and access to it is controlled through methods.
Here are some best practices around its usage:
- Use
@property
when you want to compute a value or perform some action when an attribute is accessed. For example, you may want to lazily compute a value, or ensure that an attribute is always within a certain range. - Use
@property
to provide a consistent API for your class, even if the internal representation changes. By using getters and setters, you can change the internal representation of an object without affecting the code that uses it. - Make sure the getter method does not have any side effects. The
@property
getter should be idempotent, meaning that calling it multiple times should not change the state of the object or have any side effects. - Keep the logic inside the getter simple. If you find yourself adding complex logic to the getter, it might be better to move that logic to a separate method and call that method from the getter.
- Use the
@<attribute>.setter
decorator to define a âsetterâ method for a property. This allows you to add extra functionality or validation when setting the value of the attribute, such as type checking or ensuring that the value is within a certain range.
Hereâs an example of using @property
and @<attribute>.setter
decorators:
class Circle:
def __init__(self, radius):
self._radius = radius
@property
def radius(self):
return self._radius
@radius.setter
def radius(self, value):
if value < 0:
raise ValueError("Radius cannot be negative")
self._radius = value
@property
def diameter(self):
return self._radius * 2
@property
def area(self):
return 3.14159 * (self._radius ** 2)
In this example, we have a Circle
class with radius
, diameter
, and area
properties. The radius
property has a getter and a setter, while diameter
and area
have only getters. The radius
setter also includes validation to ensure that the radius is never set to a negative value.
tryâŠexceptâŠelseâŠfinally
try:
# Some Code....
except:
# optional block
# Handling of exception (if required)
else:
# execute if no exception
finally:
# Some code .....(always executed, _even if you return in the except block_)
Note: The the utility of the finally
block is that it always executes, even when you return from the except
block. Source: geeksforgeeks.
Use and Definition/Structure of __main__.py
Example from nvitop
.
(main) anilkeshwani@Anils-MacBook-Air:~/miniconda3/envs/main/lib/python3.12/site-packages/nvitop$ dog __main__.py
:
# This file is part of nvitop, the interactive NVIDIA-GPU process viewer.
# License: GNU GPL version 3.
"""The interactive NVIDIA-GPU process viewer."""
import sys
from nvitop.cli import main
if __name__ == '__main__':
sys.exit(main())
Aquestion on this is that it seemingly conflicts with the official Python 3 advice: [[main â Top-level code environment#idiomatic-usage|Idiomatic Usage]]. They recommend a __main__.py
like the one in the venv
package from the standard library, without a if __name__=="__main__"
guard:
import sys
from . import main
rc = 1
try:
main()
rc = 0
except Exception as e:
print('Error: %s' % e, file=sys.stderr)
sys.exit(rc)
(main) anilkeshwani@Anils-MacBook-Air:~/miniconda3/envs/main/lib/python3.12/site-packages/nvitop$ dog $(which nvitop)
:
#!/Users/anilkeshwani/miniconda3/envs/main/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from nvitop.cli import main
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())
Hydra
Hydra Changes the Logging Level for You⊠How to Stop It!
See Logging from the Hydra v1.3 docs or clipped version: Logging Hydra
People often do not use Python logging due to the setup cost. Hydra solves this by configuring the Python logging for you.
By default, Hydra logs at the INFO level to both the console and a log file in the automatic working directory.
â Source: Logging from the Hydra v1.3 docs
How to stop it:
You can also setÂ
hydra/job_logging=none
 andÂhydra/hydra_logging=none
 if you do not want Hydra to configure the logging.
Testing in Python
- -TheÂ
doctest
 module provides a tool for scanning a module and validating tests embedded in a programâs docstrings. Test construction is as simple as cutting-and-pasting a typical call along with its results into the docstring. â Source: Python Tutorial §10.11. Quality Control
Unit Testing in Python
The unittest
 module allows a comprehensive set of tests to be maintained in a separate file:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self):
self.assertEqual(average([20, 30, 70]), 40.0)
self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
with self.assertRaises(ZeroDivisionError):
average([])
with self.assertRaises(TypeError):
average(20, 30, 70)
unittest.main() # Calling from the command line invokes all tests
Debugging
Debugging Segmentation Faults (segfaults)
Segmentation faults in Python code occur when an underlying C library crashes. Sadly, a helpful error message is not always (never?) shown, nor the offending library.
One way to find out which program is causing the segfault is to capture the stacktrace of the C program by using gdb
(GNU Debugger).
# ensure you have `gdb` installed
sudo apt install -y gdb
# run the program to debug with gdb, in most of our cases `python`
gdb python
# this will put you inside the gdb console, type `run` to start the program, potentially passing args
run -c "import pytorch_lightning"
# then, when the segfault occurs, you can type `bt` to print the stacktrace
bt
# this will print something like this:
# 0x00007fff6391e1e1 in google::protobuf::internal::ReflectionOps::FindInitializationErrors(google::protobuf::Message const&, std::string const&, std::vector<std::string, std::allocator<std::string> >*) () from /home/michael/miniconda3/envs/nemo/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
This indicates the google-protobuf library is the culprit. Now you can google â${library} segfaultâ to see if anyone has a similar problem (and ideally solution), though usually the solution is to up/downgrade the library.
Another way to run gdb
is to use gdb --args
to pass the arguments to the program.
gdb --args python <your_script>.py ARG1 ARG2 ...
As before, type run
and then bt
when the segfault occurs.
This way you can run a complicated training command and debug it (the full command that caused the segfault is often printed above the error message).
Garbage Collection
Garbage collection in Python: things you need to know
gc â Garbage Collector interface
To debug a leaking program call gc.set_debug(gc.DEBUG_LEAK). Notice that this includes gc.DEBUG_SAVEALL, causing garbage-collected objects to be saved in gc.garbage for inspection.
Objects, Types and Reference Counts
Most Python/C API functions have one or more arguments as well as a return value of type PyObject. This type is a pointer to an opaque data type representing an arbitrary Python object. Since all Python object types are treated the same way by the Python language in most situations (e.g., assignments, scope rules, and argument passing), it is only fitting that they should be represented by a single C type. Almost all Python objects live on the heap: you never declare an automatic or static variable of typeÂ
[PyObject](https://docs.python.org/3.11/c-api/structures.html#c.PyObject)
, only pointer variables of type PyObject can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically staticÂ[PyTypeObject](https://docs.python.org/3.11/c-api/type.html#c.PyTypeObject)
 objects.All Python objects (even Python integers) have a type and a reference count. An objectâs type determines what kind of object it is (e.g., an integer, a list, or a user-defined function; there are many more as explained in The standard type hierarchy). For each of the well-known types there is a macro to check whether an object is of that type; for instance,Â
PyList_Check(a)
 is true if (and only if) the object pointed to by a is a Python list.
GIL
- A Steering Council notice about PEP 703 (Making the Global Interpreter Lock Optional in CPython)
- GlobalInterpreterLock - Python Wiki
Memory
Performance Optimisation
Performance optimisations from Talk - Anthony Shaw: Write faster Python! Common performance anti patterns - YouTube:
- Loop-invariant statements - Pythonâs compiler does not do loop invariant code motion (aka âhoistingâ) for you
- Use comprehensions - including set, dictionary and other comprehensions (creating generators)
- Select the right data structures - custom classes > dataclasses > named tuple
- Avoid tiny functions in hot areas of code - especially hot loops - Python has big overhead to function calls even as of 3.11
How to go about optimising
- Sample correctly and track regressions - track regressions / speedups on a commit-by-commit basis
- Catch issues in code review - bring the team in
- Use perflint to highlight performance anti-patterns
Profiling
- Austin
- Scalene
- cProfile
See also CPU + GPU profiling.
Type Hints (Type Annotations) and mypy
- Type hints cheat sheet - mypy 1.15.0 documentation
- Since Python 3.11, you can type class methods with
Self
- import it like:from typing import Self
- Pythonâs Self Type How to Annotate Methods That Return self - What does typing.cast do in Python? - This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally donât check anything (we want this to be as fast as possible).
- Python type hints how to use typing.cast() - Adam Johnson
- Python type hints manage âtype ignoreâ comments with Mypy - Adam Johnson
- Protocols â typing documentation
What is TypeVar for and can you give me an example
TypeVar
is a generic type that is used in Python to define type variables. It allows you to create generic functions or classes that can operate on different types without specifying the exact type explicitly. This enables you to write more flexible and reusable code.
Hereâs a simple example to illustrate the usage of TypeVar
:
from typing import TypeVar, List
T = TypeVar('T') # Define a type variable T
def reverse_list(lst: List[T]) -> List[T]:
return lst[::-1]
# Usage
int_list = [1, 2, 3, 4, 5]
reversed_int_list = reverse_list(int_list)
print(reversed_int_list) # Output: [5, 4, 3, 2, 1]
str_list = ['apple', 'banana', 'cherry']
reversed_str_list = reverse_list(str_list)
print(reversed_str_list) # Output: ['cherry', 'banana', 'apple']
In this example, we define a generic function reverse_list
that takes a list of any type T
and returns a reversed list of the same type. We use TypeVar('T')
to create the type variable T
, which represents an arbitrary type. When the function is called, the actual type of the list is inferred based on the argument passed, and the reversed list is returned with the same type.
By using TypeVar
, you can create generic functions or classes that can operate on different types without explicitly specifying the type, making your code more flexible and reusable.
Linting & Linters
Modules
- https://github.com/thebjorn/pydeps
- Example:
pydeps [train.py](http://train.py)
running on sushi/sushi/train.py
- Example:
pydeps --show-cycles utils[.py](http://train.py)
running on sushi/sushi/utils.py
Python Source
typing accelerator C extension: _typing module @ cpython/Modules/_typingmodule.c: Example of a module importable into Python not written in or present in native Python code
-
a C extension?
See reference to this module (classes defined therein being imported) at
Work seems to be done here:
static struct PyModuleDef typingmodule = {
PyModuleDef_HEAD_INIT,
"_typing",
typing_doc,
0,
typing_methods,
_typingmodule_slots,
NULL,
NULL,
NULL
};
PyMODINIT_FUNC
PyInit__typing(void)
{
return PyModuleDef_Init(&typingmodule);
}