Write Pythonic and Clean Code With namedtuple – Real Python

Excerpt

In this step-by-step tutorial, you’ll learn what Python’s namedtuple is and how to use it in your code. You’ll also learn about the main differences between named tuples and other data structures, such as dictionaries, data classes, and typed named tuples.


Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Writing Clean, Pythonic Code With namedtuple

Python’s collections module provides a factory function called namedtuple(), which is specially designed to make your code more Pythonic when you’re working with tuples. With namedtuple(), you can create immutable sequence types that allow you to access their values using descriptive field names and the dot notation instead of unclear integer indices.

If you have some experience using Python, then you know that writing Pythonic code is a core skill for Python developers. In this tutorial, you’ll level up that skill using namedtuple.

In this tutorial, you’ll learn how to:

  • Create namedtuple classes using namedtuple()
  • Identify and take advantage of cool features of namedtuple
  • Use namedtuple instances to write Pythonic code
  • Decide whether to use a namedtuple or a similar data structure
  • Subclass a namedtuple to provide new features

To get the most out of this tutorial, you need to have a general understanding of Python’s philosophy related to writing Pythonic and readable code. You also need to know the basics of working with:

If you don’t have all the required knowledge before starting this tutorial, then that’s okay! You can stop and review the above resources as needed.

Using namedtuple to Write Pythonic Code

Python’s namedtuple() is a factory function available in collections. It allows you to create tuple subclasses with named fields. You can access the values in a given named tuple using the dot notation and the field names, like in obj.attr.

Python’s namedtuple was created to improve code readability by providing a way to access values using descriptive field names instead of integer indices, which most of the time don’t provide any context on what the values are. This feature also makes the code cleaner and more maintainable.

In contrast, using indices to values in a regular tuple can be annoying, difficult to read, and error-prone. This is especially true if the tuple has a lot of fields and is constructed far away from where you’re using it.

Besides this main feature of named tuples, you’ll find out that they:

  • Are immutable data structures
  • Have a consistent hash value
  • Can work as dictionary keys
  • Can be stored in sets
  • Have a helpful docstring based on the type and field names
  • Provide a helpful string representation that prints the tuple content in a name=value format
  • Support indexing
  • Provide additional methods and attributes, such as ._make(), _asdict(), ._fields, and so on
  • Are backward compatible with regular tuples
  • Have similar memory consumption to regular tuples

In general, you can use namedtuple instances wherever you need a tuple-like object. Named tuples have the advantage that they provide a way to access their values using field names and the dot notation. This will make your code more Pythonic.

With this brief introduction to namedtuple and its general features, you can dive deeper into creating and using them in your code.

Creating Tuple-Like Classes With namedtuple()

You use a namedtuple() to create an immutable and tuple-like data structure with field names. A popular example that you’ll find in tutorials about namedtuple is to create a class to represent a mathematical point.

Depending on the problem, you probably want to use an immutable data structure to represent a given point. Here’s how you can create a two-dimensional point using a regular tuple:

Here, you create an immutable two-dimensional point using a regular tuple. This code works: You have a point with two coordinates, and you can’t modify any of those coordinates. However, is this code readable? Can you tell up front what the 0 and 1 indices mean? To prevent these ambiguities, you can use a namedtuple like this:

Now you have a point with two appropriately named fields, x and y. Your point provides a user-friendly and descriptive string representation (Point(x=2, y=4)) by default. It allows you to access the coordinates using the dot notation, which is convenient, readable, and explicit. You can also use indices to access the value of each coordinate.

Finally, since namedtuple classes are subclasses of tuple, they’re immutable as well. So if you try to change a value of a coordinate, then you’ll get an AttributeError.

Providing Required Arguments to namedtuple()

As you learned before, namedtuple() is a factory function rather than a typical data structure. To create a new namedtuple, you need to provide two positional arguments to the function:

  1. typename provides the class name for the namedtuple returned by namedtuple(). You need to pass a string with a valid Python identifier to this argument.
  2. field_names provides the field names that you’ll use to access the values in the tuple. You can provide the field names using:
    • An iterable of strings, such as ["field1", "field2", ..., "fieldN"]
    • A string with each field name separated by whitespace, such as "field1 field2 ... fieldN"
    • A string with each field name separated by commas, such as "field1, field2, ..., fieldN"

To illustrate how to provide field_names, here are different ways to create points:

In these examples, you first create Point using a list of field names. Then you use a string with comma-separated field names. Finally, you use a generator expression. This last option might look like overkill in this example. However, it’s intended to illustrate the flexibility of the process.

You can use any valid Python identifier for the field names, except for:

  • Names starting with an underscore (_)
  • Python keywords

If you provide field names that violate either of these conditions, then you get a ValueError:

In this example, the second field name starts with and underscore, so you get a ValueError telling you that field names can’t start with that character. This is intended to avoid name conflicts with the namedtuple methods and attributes.

In the case of typename, a question can arise when you look at the examples above: Why do I need to provide the typename argument? The answer is that you need a name for the class returned by namedtuple(). This is like creating an alias for an existing class:

In the first example, you create Point using namedtuple(). Then you assign this new type to the global variable Point1. In the second example, you create a regular Python class also named Point, and then you assign the class to Point2. In both cases, the class name is Point. Point1 and Point2 are aliases for the class at hand.

Finally, you can also create a named tuple using keyword arguments or providing an existing dictionary, like this:

In the first example, you use keyword arguments to create a Point object. In the second example, you use a dictionary whose keys match the fields of Point. In this case, you need to perform a dictionary unpacking.

Using Optional Arguments With namedtuple()

Besides the two required arguments, the namedtuple() factory function also takes the following optional arguments:

  • rename
  • defaults
  • module

If you set rename to True, then all the invalid field names are automatically replaced with positional names.

Say your company has an old database application, written in Python, to manage the data about the passengers that travel with the company. You’re asked to update the system, and you start creating named tuples to store the data you read from the database.

The application provides a function called get_column_names() that returns a list of strings with the column names, and you think you can use that function to create a namedtuple class. You end up with the following code:

However, when you run the code, you get an exception traceback like the following:

This tells you that the class column name isn’t a valid field name for your namedtuple class. To prevent this situation, you decide to use rename:

This causes namedtuple() to automatically replace invalid names with positional names. Now suppose you retrieve one row from the database and create your first Passenger instance, like this:

In this case, get_passenger_by_id() is another function available in your hypothetical application. It retrieves the data for a given passenger in a tuple. The final result is that your newly created passenger has three positional field names, and only name reflects the original column name. When you dig into the database, you realize that the passengers table has the following columns:

ColumnStoresReplaced?Reason
_idA unique identifier for each passengerYesIt starts with an underscore.
nameA short name for each passengerNoIt’s a valid Python identifier.
classThe class in which the passenger travelsYesIt’s a Python keyword.
nameThe passenger’s full nameYesIt’s repeated.

In situations where you create named tuples based on values outside your control, the rename option should be set to True so the invalid fields get renamed with valid positional names.

The second optional argument to namedtuple() is defaults. This argument defaults to None, which means that the fields won’t have default values. You can set defaults to an iterable of values. In this case, namedtuple() assigns the values in the defaults iterable to the rightmost fields:

In this example, the level and language fields have default values. This makes them optional arguments. Since you don’t define a default value for name, you need to provide a value when you create the namedtuple instance. So, arguments without a default value are required. Note that the default values are applied to the rightmost fields.

The last argument to namedtuple() is module. If you provide a valid module name to this argument, then the .__module__ attribute of the resulting namedtuple is set to that value. This attribute holds the name of the module in which a given function or callable is defined:

In this example, when you access .__module__ on Point, you get 'custom' as a result. This indicates that your Point class is defined in your custom module.

The motivation to add the module argument to namedtuple() in Python 3.6 was to make it possible for named tuples to support pickling through different Python implementations.

Exploring Additional Features of namedtuple Classes

Besides the methods inherited from tuple, such as .count() and .index(), namedtuple classes also provide three additional methods and two attributes. To prevent name conflicts with custom fields, the names of these attributes and methods start with an underscore. In this section, you’ll learn about these methods and attributes and how they work.

Creating namedtuple Instances From Iterables

You can use ._make() to create named tuple instances. The method takes an iterable of values and returns a new named tuple:

Here, you first create a Person class using namedtuple(). Then you call ._make() with a list of values for each field in the namedtuple. Note that ._make() is a class method that works as an alternative class constructor and returns a new named tuple instance.

Finally, ._make() expects a single iterable as an argument, a list in the above example. On the other hand, the namedtuple constructor can take positional or keyword arguments, as you already learned.

Converting namedtuple Instances Into Dictionaries

You can convert existing named tuple instances into dictionaries using ._asdict(). This method returns a new dictionary that uses the field names as keys. The keys of the resulting dictionary are in the same order as the fields in the original namedtuple:

When you call ._asdict() on a named tuple, you get a new dict object that maps field names to their corresponding values in the original named tuple.

Since Python 3.8, ._asdict() has returned a regular dictionary. Before that, it returned an OrderedDict object:

Python 3.8 updated ._asdict() to return a regular dictionary because dictionaries remember the insertion order of their keys in Python 3.6 and up. Note that the order of keys in the resulting dictionary is equivalent to the order of fields in the original named tuple.

Replacing Fields in Existing namedtuple Instances

The last method you’ll learn about is ._replace(). This method takes keyword arguments of the form field=value and returns a new namedtuple instance updating the values of the selected fields:

In this example, you update Jane’s age after her birthday. Although the name of ._replace() might suggest that the method modifies the existing named tuple, that’s not what happens in practice. This is because namedtuple instances are immutable, so ._replace() doesn’t update jane in place.

Exploring Additional namedtuple Attributes

Named tuples also have two additional attributes: ._fields and ._field_defaults. The first attribute holds a tuple of strings listing the field names. The second attribute holds a dictionary that maps field names to their respective default values, if any.

In the case of ._fields, you can use it to introspect your namedtuple classes and instances. You can also create new classes from existing ones:

In this example, you create a new namedtuple called ExtendedPerson with a new field, weight. This new type extends your old Person. To do that, you access ._fields on Person and unpack it to a new list along with an additional field, weight.

You can also use ._fields to iterate over the fields and the values in a given namedtuple instance using Python’s zip():

In this example, zip() yields tuples of the form (field, value). This way, you can access both elements of the field-value pair in the underlying named tuple. Another way to iterate over fields and values at the same time could be using ._asdict().items(). Go ahead and give it a try!

With ._field_defaults, you can introspect namedtuple classes and instances to find out what fields provide default values. Having default values makes your fields optional. For example, say your Person class should include an additional field to hold the country in which the person lives. Since you’re mostly working with people from Canada, you set the appropriate default value for the country field like this:

With a quick query to ._field_defaults, you can figure out which fields in a given namedtuple provide default values. In this example, any other programmer on your team can see that your Person class provides "Canada" as a handy default value for country.

If your namedtuple doesn’t provide default values, then .field_defaults holds an empty dictionary:

If you don’t provide a list of default values to namedtuple(), then it relies on the default value to defaults, which is None. In this case, ._field_defaults holds an empty dictionary.

Writing Pythonic Code With namedtuple

Arguably, the fundamental use case of named tuples is to help you write more Pythonic code. The namedtuple() factory function was created to allow you to write readable, explicit, clean, and maintainable code.

In this section, you’ll write a bunch of practical examples that will help you spot good opportunities for using named tuples instead of regular tuples so you can make your code more Pythonic.

Using Field Names Instead of Indices

Say you’re creating a painting application and you need to define the pen properties to use according to the user’s choice. You’ve coded the pen’s properties in a tuple:

This line of code defines a tuple with three values. Can you tell what the meaning of each value is? Maybe you can guess that the second value is related to the line style, but what’s the meaning of 2 and True?

You could add a nice comment to provide some context for pen, in which case you would end up with something like this:

Cool! Now you know the meaning of each value in the tuple. However, what if you or another programmer is using pen far away from this definition? They’d have to go back to the definition just to remember what each value means.

Here’s an alternative implementation of pen using a namedtuple:

Now your code makes it clear that 2 represents the pen’s width, "Solid" is the line style, and so on. Anyone reading your code can see and understand that. Your new implementation of pen has two additional lines of code. That’s a small amount of work that produces a big win in terms of readability and maintainability.

Returning Multiple Named Values From Functions

Another situation in which you can use a named tuple is when you need to return multiple values from a given function. In this case, using a named tuple can make your code more readable because the returned values will also provide some context for their content.

For example, Python provides a built-in function called divmod() that takes two numbers as arguments and returns a tuple with the quotient and remainder that result from the integer division of the input numbers:

To remember the meaning of each number, you might need to read the documentation of divmod() because the numbers themselves don’t provide much information on their individual meaning. The function’s name doesn’t help very much either.

Here’s a function that uses a namedtuple to clarify the meaning of each number that divmod() returns:

In this example, you add context to each returned value, so any programmer reading your code can immediately understand what each number means.

Reducing the Number of Arguments to Functions

Reducing the number of arguments a function can take is considered a best programming practice. This makes your function’s signature more concise and optimizes your testing process because of the reduced number of arguments and possible combinations between them.

Again, you should consider using named tuples to approach this use case. Say you’re coding an application to manage your clients’ information. The application uses a database to store clients’ data. To process the data and update the database, you’ve created several functions. One of your high-level functions is create_user(), which looks like this:

This function takes four arguments. The first argument, db, represents the database you’re working with. The rest of the arguments are closely related to a given client. This is a great opportunity to reduce the number of arguments to create_user() using a named tuple:

Now create_user() takes only two arguments: db and user. Inside the function, you use convenient and descriptive field names to provide the arguments to db.add_user() and db.complete_user_profile(). Your high-level function, create_user(), is more focused on the user. It’s also easier to test because you just need to provide two arguments to each test.

Reading Tabular Data From Files and Databases

A quite common use case for named tuples is to use them to store database records. You can define namedtuple classes using the column names as field names and retrieve the data from the rows in the database to named tuples. You can also do something similar with CSV files.

For example, say you have a CSV file with data regarding the employees of your company, and you want to read that data into a suitable data structure for further processing. Your CSV file looks like this:

You’re thinking of using Python’s csv module and its DictReader to process the file, but you have an additional requirement—you need to store the data into an immutable and lightweight data structure. In this case, a namedtuple might be a good choice:

In this example, you first open the employees.csv file in a with statement. Then you use csv.reader() to get an iterator over the lines in the CSV file. With namedtuple(), you create a new Employee class. The call to next() retrieves the first row of data from reader, which contains the CSV file header. This header provides the field names for your namedtuple.

Finally, the for loop creates an Employee instance from each row in the CSV file and prints the list of employees to the screen.

Using namedtuple vs Other Data Structures

So far, you’ve learned how to create named tuples to make your code more readable, explicit, and Pythonic. You’ve also written some examples that help you spot opportunities for using named tuples in your code.

In this section, you’ll take a general look at the similarities and differences between namedtuple classes and other Python data structures, such as dictionaries, data classes, and typed named tuples. You’ll compare named tuples with other data structures regarding the following characteristics:

  • Readability
  • Mutability
  • Memory usage
  • Performance

This way, you’ll be better prepared to choose the right data structure for your specific use case.

namedtuple vs Dictionary

The dictionary is a fundamental data structure in Python. The language itself is built around dictionaries, so they’re everywhere. Since they’re so common and useful, you probably use them a lot in your code. But how different are dictionaries and named tuples?

In terms of readability, you can probably say that dictionaries are as readable as named tuples. Even though they don’t provide a way to access attributes through the dot notation, the dictionary-style key lookup is quite readable and straightforward:

In both examples, you have a total understanding of the code and its intention. The named tuple definition requires two additional lines of code, though: one line to import the namedtuple() factory function and another to define your namedtuple class, Person.

A big difference between both data structures is that dictionaries are mutable and named tuples are immutable. This means that you can modify dictionaries in place, but you can’t modify named tuples:

You can update the value of an existing key in a dictionary, but you can’t do something similar in a named tuple. You can add new key-value pairs to existing dictionaries, but you can’t add field-value pairs to existing named tuples.

In general, if you need an immutable data structure to properly solve a given problem, then consider using a named tuple instead of a dictionary so you can meet your requirements.

Regarding memory usage, named tuples are a quite lightweight data structure. Fire up your code editor or IDE and create the following script:

This small script uses asizeof.asizeof() from Pympler to get the memory footprint of a named tuple and its equivalent dictionary.

If you run the script from your command line, then you’ll get the following output:

This output confirms that named tuples consume less memory than equivalent dictionaries. So if memory consumption is a restriction for you, then you should consider using a named tuple instead of a dictionary.

Finally, you need to have an idea of how different named tuples and dictionaries are in terms of operations performance. To do that, you’ll test membership and attribute access operations. Get back to your code editor and create the following script:

This script times operations common to both dictionaries and named tuples, such as membership tests and attribute access. Running the script on your current system displays an output similar to the following:

This output shows that operations on named tuples are slightly faster than similar operations on dictionaries.

namedtuple vs Data Class

Python 3.7 came with a new cool feature: data classes. According to PEP 557, data classes are similar to named tuples, but they’re mutable:

Data Classes can be thought of as “mutable namedtuples with defaults.” (Source)

However, it’d be more accurate to say that data classes are like mutable named tuples with type hints. The “defaults” part isn’t a difference at all because named tuples can also have default values for their fields. So, at first glance, the main differences are mutability and type hints.

To create a data class, you need to import the dataclass() decorator from dataclasses. Then you can define your data classes using the regular class definition syntax:

In terms of readability, there are no significant differences between data classes and named tuples. They provide similar string representations, and you can access their attributes using the dot notation.

Mutability-wise, data classes are mutable by definition, so you can change the value of their attributes when needed. However, they have an ace up their sleeve. You can set the dataclass() decorator’s frozen argument to True and make them immutable:

If you set frozen to True in the call to dataclass(), then you make the data class immutable. In this case, when you try to update Jane’s name, you get a FrozenInstanceError.

Another subtle difference between named tuples and data classes is that the latter aren’t iterable by default. Stick to the Jane example and try to iterate over her data:

If you try to iterate over a bare-bones data class, then you get a TypeError. This is common to regular classes. Fortunately, there are ways to work around it. For example, you can add a special method, .__iter__(), to Person like this:

Here, you first import astuple() from dataclasses. This function converts the data class into a tuple. Then you pass the resulting tuple to iter() so you can build and return an iterator from .__iter__(). With this addition, you can start iterating over Jane’s data.

Regarding memory consumption, named tuples are more lightweight than data classes. You can confirm this by creating and running a small script similar to the one you saw in the above section. To view the complete script, expand the box below.

Here’s a script that compares memory usage between a namedtuple and its equivalent data class:

In this script, you create a named tuple and a data class containing similar data. Then you compare their memory footprint.

Here are the results of running your script:

Unlike namedtuple classes, data classes keep a per-instance .__dict__ to store writable instance attributes. This contributes to a bigger memory footprint.

Next, you can expand the section below to see an example of code that compares namedtuple classes and data classes in terms of their performance on attribute access.

The following script compares the performance of attribute access on a named tuple and its equivalent data class:

Here, you time the attribute access operation because that’s almost the only common operation between a named tuple and a data class. You can also time membership operations, but you would have to access the data class’ .__dict__ attribute to do it.

In terms of performance, here are the results:

The performance difference is minimal, so you can say that both data structures have equivalent performance when it comes to attribute access operations.

namedtuple vs typing.NamedTuple

Python 3.5 introduced a provisional module called typing to support function type annotations or type hints. This module provides NamedTuple, which is a typed version of namedtuple. With NamedTuple, you can create namedtuple classes with type hints. Following with the Person example, you can create an equivalent typed named tuple like this:

With NamedTuple, you can create tuple subclasses that support type hints and attribute access through the dot notation. Since the resulting class is a tuple subclass, it’s immutable as well.

A subtle detail to notice in the above example is that NamedTuple subclasses look even more similar to data classes than named tuples.

When it comes to memory consumption, both namedtuple and NamedTuple instances use the same amount of memory. You can expand the box below to view a script that compares memory usage between the two.

Here’s a script that compares the memory usage of a namedtuple and its equivalent typing.NamedTuple:

In this script, you create a named tuple and an equivalent typed NamedTuple instance. Then you compare the memory usage of both instances.

This time, the script that compares memory usage produces the following output:

In this case, both instances consume the same amount of memory, so there’s no winner this time.

Since namedtuple classes and NamedTuple subclasses are both subclasses of tuple, they have a lot in common. In this case, you can time membership tests for fields and values. You can also time attribute access with the dot notation. Expand the box below to view a script that compares the performance of both namedtuple and NamedTuple.

The following script compares namedtuple and typing.NamedTuple performance-wise:

In this script, you first create a named tuple and then a typed named tuple with similar content. Then you compare the performance of common operations over both data structures.

Here are the results:

In this case, you can say that both data structures behave almost the same in terms of performance. Other than that, using NamedTuple to create your named tuples can make your code even more explicit because you can add type information to the fields. You can also provide default values, add new functionality, and write docstrings for your typed named tuples.

In this section, you’ve learned a lot about namedtuple and other similar data structures and classes. Here’s a table that summarizes how namedtuple compares to the data structures covered in this section:

dictData ClassNamedTuple
ReadabilitySimilarEqualEqual
ImmutabilityNoNo by default, yes if using @dataclass(frozen=True)Yes
Memory usageHigherHigherEqual
PerformanceSlowerSimilarSimilar
IterabilityYesNo by default, yes if providing .__iter__()Yes

With this summary, you’ll be able to choose the data structure that best fits your current needs. Additionally, you should consider that data classes and NamedTuple allow you to add type hints, which is currently quite a desirable feature in Python code.

Subclassing namedtuple Classes

Since namedtuple classes are regular Python classes, you can subclass them if you need to provide additional functionalities, a docstring, a user-friendly string representation, and so on.

For example, storing the age of a person in an object isn’t considered a best practice. So you probably want to store the birth date and compute the age when needed:

Person inherits from BasePerson, which is a namedtuple class. In the subclass definition, you first add a docstring to describe what the class does. Then you set __slots__ to an empty tuple, which prevents the automatic creation of a per-instance .__dict__. This keeps your BasePerson subclass memory efficient.

You also add a custom .__repr__() to provide a nice string representation for the class. Finally, you add a property to compute the person’s age using datetime.

Measuring Creation Time: tuple vs namedtuple

So far, you’ve compared namedtuple classes with other data structures according to several features. In this section, you’ll take a general look at how regular tuples and named tuples compare in terms of creation time.

Say you have an application that creates a ton of tuples dynamically. You decide to make your code more Pythonic and maintainable using named tuples. Once you’ve updated all your codebase to use named tuples, you run the application and notice some performance issues. After some tests, you conclude that the issues could be related to creating named tuples dynamically.

Here’s a script that measures the average time required to create several tuples and named tuples dynamically:

In this script, you calculate the average time it takes to create several tuples and their equivalent named tuples. If you run the script from your command line, then you’ll get an output similar to the following:

When you look at this output, you can see that creating tuple objects dynamically is a lot faster than creating similar named tuples. In some situations, such as working with large databases, the additional time required to create a named tuple can seriously affect your application’s performance, so keep an eye on this if your code creates a lot of tuples dynamically.

Conclusion

Writing Pythonic code is an in-demand skill in the Python development space. Pythonic code is readable, explicit, clean, maintainable, and takes advantage of Python idioms and best practices. In this tutorial, you learned about creating namedtuple classes and instances and how they can help you improve the quality of your Python code.

In this tutorial, you learned:

  • How to create and use namedtuple classes and instances
  • How to take advantage of cool namedtuple features
  • When to use namedtuple instances to write Pythonic code
  • When to use a namedtuple instead of a similar data structure
  • How to subclass a namedtuple to add new features

With this knowledge, you can deeply improve the quality of your existing and future code. If you frequently use tuples, then consider turning them into named tuples whenever it makes sense. Doing so will make your code much more readable and Pythonic.

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Writing Clean, Pythonic Code With namedtuple