setdefault() vs defaultdict — Which Should You Use?

Both of these Python features are designed to handle the same situation: when a key is missing from a dictionary, they automatically create it with a specified default value.

If d is a dictionary, calling d.setdefault(k, default) evaluates the default argument. If the key k is not already in the dictionary, the resulting object becomes the value associated with k, and the key–value pair is inserted into d. The return value of setdefault() is always the current value associated with the key.

A defaultdict, on the other hand, is a special type of dictionary from the collections module in the standard library. When creating it, you must pass a parameterless callable as the first argument. If a requested key is missing from the dictionary, the callable is executed and its return value becomes the value associated with that key. The new key–value pair is then automatically added to the defaultdict.

The main difference between the two approaches is that d.setdefault(k, default) creates the object passed as default every time the method is called, regardless of whether the key already exists in the dictionary. In contrast, the callable provided to a defaultdict is executed only when the key is actually missing.

When working with many keys, this difference can lead to a significant difference in running time.

To illustrate this, consider a simple example. Suppose we have a database that stores only names and ages. The task is to create an instance of a Person class based on a given name and collect the name–instance pairs in a dictionary.

An additional requirement is that each Person instance should store the name in uppercase. The input names arrive as a sequence, and the same name may appear multiple times.

In the following program snippets, the task is solved using both setdefault() and defaultdict. In the Person class, during initialization the value of the age attribute either comes from the constructor argument or—if it is not provided—from the name–age database, which we model here with a dictionary.

Whenever a Person instance is created, an informational message is printed. The names are provided in a list where each name appears twice. This makes it easy to see when the dictionaries actually need to generate a default value and when they do not.


from collections import defaultdict

# Dictionary modeling a database.
database = {'Adam': 45, 'John': 68, 'Eve': 18}

class Person:
    def __init__(self, name: str, age=None):
        self.name = name
        print('Instantiating...')
        self.age = age if age is not None else database.get(name)

    def __repr__(self):
        return f'{type(self).__name__}({self.name},{self.age})'

# Sequence of names to be processed.
names = ['Adam', 'John', 'Eve'] * 2

# Using dict.setdefault()
d = {}
for name in names:
    d[name].name = d.setdefault(name, Person(name)).name.upper()

print(dict(d))
# Output:
# Instantiating...
# Instantiating...
# Instantiating...
# Instantiating...
# Instantiating...
# Instantiating...
# {'Adam': Person(ADAM,45), 'John': Person(JOHN,68), 'Eve': Person(EVE,18)}

# Using defaultdict()
dd = defaultdict(lambda: Person(name))
for name in names:
    dd[name].name = dd[name].name.upper()

print(dict(dd))
# Output:
# Instantiating...
# Instantiating...
# Instantiating...
# {'Adam': Person(ADAM,45), 'John': Person(JOHN,68), 'Eve': Person(EVE,18)}

from collections import defaultdict

# Dictionary modeling a database.

database = {'Adam': 45, 'John': 68, 'Eve': 18}

class Person:

def __init__(self, name: str, age=None):

self.name = name

print('Instantiating...')

self.age = age if age is not None else database.get(name)

def __repr__(self):

return f'{type(self).__name__}({self.name},{self.age})'

# Sequence of names to be processed.

names = ['Adam', 'John', 'Eve'] * 2

# Using dict.setdefault()

d = {}

for name in names:

d[name].name = d.setdefault(name, Person(name)).name.upper()

print(dict(d))

# Output:

# Instantiating...

# {'Adam': Person(ADAM,45), 'John': Person(JOHN,68), 'Eve': Person(EVE,18)}

# Using defaultdict()

dd = defaultdict(lambda: Person(name))

for name in names:

dd[name].name = dd[name].name.upper()

print(dict(dd))

# Output:

# Instantiating...

# {'Adam': Person(ADAM,45), 'John': Person(JOHN,68), 'Eve': Person(EVE,18)}

Both solutions produce the same final result. However, you can see that when using setdefault(), twice as many Person instances are created compared to the defaultdict version.

In real applications, creating an instance may involve a database query. If many objects are created unnecessarily, the time cost of those queries may become significant.

This does not mean that setdefault() should be avoided. But in use cases like the one shown above, defaultdict is often the better choice.

You can find detailed descriptions of setdefault() and defaultdict, along with more examples, in the e-book Python Knowledge Building Step by Step, especially in the chapter “Public methods of built-in types” and in the section “Dictionary producing default value for a given key – defaultdict” in the chapter “Using the standard library modules”.

setdefault() vs defaultdict — Which Should You Use?

Interested in the e-book Python Knowledge Building Step by Step: From the Basics to Your First Desktop Application?