contents

Writing Optimized Python Code: Reducing Execution Time

Published on 2025-04-16

Learn techniques to write faster, more efficient Python code. Explore optimization strategies like algorithm tuning, built-in functions, and concurrency. Reduce execution time and improve performance for real-world Python applications.

Introduction:

Python is known for its simplicity and readability, but this often comes at the cost of execution speed. However, with thoughtful code optimization, you can significantly reduce execution time without sacrificing clarity. In this guide, we will discuss effective strategies to optimize Python code, helping you achieve better performance in your applications.

Techniques:

Use Built-in Functions and Libraries:

When optimizing Python code, one of the simplest yet most effective strategies is to leverage Python's built-in functions and libraries. These functions are implemented in C, making them much faster than equivalent Python code. They also allow you to write more concise and readable code.

For example, Python's function is optimized for performance and can often replace custom loops for accumulating values. Similarly, libraries like and provide a range of utility functions that can help you avoid writing inefficient code from scratch.sum() min() itertools functools

Let's take a real-world example of calculating the sum of squares of numbers in a list.

Before Optimization:

We might start with a custom implementation using a loop to accumulate the sum of squares.

def sum_of_squares(nums):
    result = 0
    for num in nums:
        result += num ** 2
    return result

nums = list(range(1, 1000000))
print(sum_of_squares(nums))

While this approach works, it's not the most efficient. The explicit loop introduces overhead, and if you're processing large datasets, this can add up quickly.

After Optimization:

Now, let’s optimize the same code by using Python’s built-in function along with a generator expression.sum()

def sum_of_squares(nums):
    return sum(num ** 2 for num in nums)

nums = range(1, 1000000)  # Using range for lazy evaluation
print("sum_of_squares",sum_of_squares(nums))

sum() Function: The function is optimized in C, providing a faster way to accumulate values compared to a manual loop.sum()
Generator Expression: Using a generator expression instead of a list reduces memory usage since values are generated lazily and not stored in memory.

Performance Impact:

The optimized version using and a generator expression is faster because it reduces overhead by using built-in functions optimized in lower-level code, while also reducing memory consumption through lazy evaluation.sum()

Avoid Using Global Variables:

Global variables might seem convenient, but they can introduce significant performance penalties and complicate your code. When Python accesses a global variable, it incurs an additional lookup step, first checking the local scope and then searching the global scope if necessary. This extra work can slow down your code, especially if you frequently access global variables in performance-critical sections.

In addition to performance concerns, global variables can also lead to hard-to-debug side effects. Since they can be modified from anywhere in your program, it becomes difficult to track down where changes are made, leading to subtle bugs that can be challenging to fix.

Let’s explore an example to understand the impact of global variables on performance and see how we can avoid them.

Before Optimization:

Here’s a function that calculates the sum of squares of numbers, but it uses a global variable to store the result.

result = 0

def sum_of_squares(nums):
    global result # Declare result as global
    for num in nums:
        result += num ** 2
    return result

nums = list(range(1, 1000000))
print(sum_of_squares(nums))

This approach works but has several downsides:

Performance Overhead: Accessing the global variable result adds extra overhead due to the global lookup.
Risk of Side Effects: Since result is global, it can be modified anywhere in the code, leading to potential bugs and unintended behavior.

After Optimization:

Now, let’s rewrite the code to avoid using global variables by keeping the variable local to the function.

def sum_of_squares(nums):
    result = 0
    for num in nums:
        result += num ** 2
    return result

nums = range(1, 1000000)
print(sum_of_squares(nums))

Local Variable: By using a local variable result, the function avoids the global lookup overhead, leading to faster execution.
Reduced Side Effects: Local variables are isolated within the function, reducing the risk of unintended modifications from other parts of the code.

Performance Impact:

Avoiding global variables results in faster execution due to the elimination of global lookup overhead. It also makes your code more predictable and easier to debug, as the state of the program is contained within function scopes.

Minimize Use of Loops :

Loops can be time-consuming, especially if they are nested. Each iteration adds overhead, which can accumulate and significantly slow down your code, particularly when dealing with large datasets or complex operations. Python offers several alternatives that are often more efficient than traditional loops, including list comprehensions, generator expressions, and the map() and filter() functions. These alternatives are optimized for performance and often result in cleaner, more readable code.

List Comprehensions: A list comprehension provides a concise way to create lists. It’s often faster than a traditional loop because it eliminates the need for appending elements one by one and is implemented in C, making it more efficient.
Generator Expressions: Similar to list comprehensions, generator expressions allow you to generate elements on the fly. They are memory-efficient because they yield items one at a time rather than storing them all in memory at once.
map() and filter() Functions: These built-in functions apply a function to all elements of a list (or other iterable ). map() transforms each element, while filter() selects elements based on a condition. Both are implemented in C and provide a faster alternative to manually iterating over elements.

List Comprehensions:

Before Optimization (Using Loop):

squares = []
for num in range(1, 11):
    squares.append(num ** 2)

After Optimization (Using List Comprehension):

squares = [num ** 2 for num in range(1, 11)]

The list comprehension is more concise and faster since it eliminates the overhead of repeatedly calling the append() method.

Generator Expressions:

Before Optimization (Using Loop with List):

squares = []
for num in range(1, 1000000):
    squares.append(num ** 2)

After Optimization (Using Generator Expression):

squares = (num ** 2 for num in range(1, 1000000))

The generator expression is more memory-efficient because it generates each square on demand instead of storing all the squares in memory at once.

map() Function:

Before Optimization (Using Loop):

doubled = []
for num in range(1, 11):
    doubled.append(num * 2)

After Optimization (Using map()):

doubled = list(map(lambda num: num * 2, range(1, 11)))

The map() function is implemented in C and performs the transformation more efficiently than a manual loop.

filter() Function:

Before Optimization (Using Loop):

evens = []
for num in range(1, 11):
    if num % 2 == 0:
        evens.append(num)

After Optimization (Using filter()):

evens = list(filter(lambda num: num % 2 == 0, range(1, 11)))

The filter() function is more efficient than manually filtering elements in a loop because it is optimized in C and directly returns an iterator, avoiding the need to manually append elements.

By minimizing the use of traditional loops and leveraging list comprehensions, generator expressions, and functions like map() and filter(), you can significantly improve the performance and readability of your Python code. These alternatives are designed to be faster and more efficient, especially when working with large datasets or complex operations.

Use Efficient Data Structures :

Choosing the right data structure is crucial for writing optimized Python code. The efficiency of your code can greatly depend on how you store and access your data. Python offers a variety of built-in data structures like lists, dictionaries, sets, and tuples, each with different time complexities for common operations such as insertion, deletion, and lookup. By selecting the appropriate data structure for your task, you can significantly reduce execution time and improve overall performance.

Lists: Lists are versatile and allow for dynamic resizing, but operations like searching or deleting an element can be slow (O(n) time complexity). If you need fast lookups, a list might not be the best choice.
Dictionaries: Python dictionaries (hash maps) provide average O(1) time complexity for lookups, insertions, and deletions, making them an excellent choice when you need to quickly access elements by a key.
Sets: Like dictionaries, sets also offer O(1) time complexity for membership tests, making them useful when you need to check for the existence of an element without concern for duplicates.
Tuples: Tuples are immutable and can be more memory-efficient than lists. They are a good choice when you have fixed-size data that doesn’t need to be modified.

By using the most appropriate data structure for your specific use case, you can reduce the time complexity of your code and improve performance.

Using a List (Before Optimization):

Suppose you want to check if a large number of elements exist in a dataset. Using a list for membership testing can be inefficient because it requires a linear search (O(n) time complexity).

nums = list(range(1, 1000000))
print(999999 in nums)

In this case, checking for membership in a list requires iterating through each element, which can be slow for large datasets.

Using a Set (After Optimization):

Sets, on the other hand, provide O(1) average time complexity for membership testing. This can significantly reduce the execution time when checking for the existence of an element.

nums_set = set(range(1, 1000000))
print(999999 in nums_set)

By using a set instead of a list, you drastically improve the performance of membership testing, especially for large datasets. While creating the set incurs an initial cost, the benefit of O(1) lookups far outweighs this when performing multiple checks.

Performance Comparison:

List: Membership testing in a list has O(n) time complexity, meaning the operation slows down as the list grows larger.
Set: Membership testing in a set has O(1) time complexity, providing consistent and fast lookups regardless of the set size.

Using a Dictionary for Fast Lookups:

Dictionaries are one of Python’s most powerful data structures due to their O(1) average time complexity for key lookups. Let’s compare using a list versus a dictionary for fast data retrieval.

Before Optimization (Using a List):

Here, we use a list of tuples to store key-value pairs and manually search for a value based on a key.

# Using a list of tuples to store key-value pairs
data = [(1, 'a'), (2, 'b'), (3, 'c')]

# Searching for a value based on a key
def find_value(key, data):
    for k, v in data:
        if k == key:
            return v
    return None

print(find_value(2, data))  # O(n) time complexity

This code requires iterating through the list to find the key, which can be slow for large datasets.

After Optimization (Using a Dictionary):

Now, let’s use a dictionary for fast lookups.

# Using a dictionary for fast lookups
data_dict = {1: 'a', 2: 'b', 3: 'c'}

# Retrieving a value based on a key
print(data_dict.get(2))  # O(1) time complexity

The dictionary allows for constant time lookups (O(1)), making it far more efficient than searching through a list, especially as the dataset grows larger.

Using Tuples for Fixed-Size Data:

Tuples are immutable, which means they cannot be changed after creation. This immutability can lead to performance benefits, especially when working with fixed-size data. Tuples also have faster iteration compared to lists, making them a good choice when you need to store a collection of items that won’t change.

Before Optimization (Using a List):

Here, we use a list to store a fixed collection of items. However, lists are mutable and take up more memory, which might be unnecessary if the data doesn’t need to change.

# Using a list to store fixed-size data
coordinates = [10.0, 20.0, 30.0]

# Accessing elements
x, y, z = coordinates

After Optimization (Using a Tuple):

Using a tuple instead reduces memory usage and provides a slight performance improvement when iterating through or accessing elements.

# Using a tuple to store fixed-size data
coordinates = (10.0, 20.0, 30.0)

# Accessing elements
x, y, z = coordinates

Tuples are more memory-efficient than lists and provide faster access and iteration times. For immutable collections of data, tuples are a better choice

Performance Comparison:

Dictionaries: Provide O(1) time complexity for key-based lookups, making them highly efficient for tasks that involve frequent data retrieval.
Tuples: More memory-efficient and faster than lists for fixed-size data, offering slight performance benefits when immutability is acceptable.

Leverage Generators and Iterators :

Generators allow you to iterate over data without creating large data structures in memory, which is extremely useful when working with large datasets. Instead of generating all the data at once and storing it in memory (as you would with lists or list comprehensions), generators produce data on-the-fly using the yield keyword. This technique saves memory because it only generates one item at a time as you need it.

In addition to using the yield keyword to create generators, you can also use generator expressions as a memory-efficient alternative to list comprehensions. Generator expressions have a similar syntax to list comprehensions but return an iterator instead of a list, meaning the values are generated lazily.

Example:

Let's compare traditional approaches using lists or list comprehensions with generator-based approaches to see the benefits of using generators for large datasets.

Using a List Comprehension (Before Optimization):

Suppose you want to generate a list of squares for a large range of numbers. Using a list comprehension will compute all the squares and store them in memory at once, which can be memory-intensive for large datasets.

Before Optimization (Using a List Comprehension):

# Generating a list of squares (memory-intensive)
squares = [x ** 2 for x in range(1000000)]

# Process squares
for square in squares:
    pass  # Do something with each square

In this code, the entire list of squares is created and stored in memory. This approach can consume a large amount of memory, especially when working with a large dataset like 1,000,000 elements.

Using a Generator with yield (After Optimization):

By using a generator with the yield keyword, you can produce each square on-the-fly, which avoids storing the entire list in memory. This is particularly useful for handling large datasets.

After Optimization (Using a Generator):

# Generating squares using a generator (memory-efficient)
def generate_squares(n):
    for x in range(n):
        yield x ** 2

# Process squares one at a time
for square in generate_squares(1000000):
    pass  # Do something with each square

This approach generates each square as needed and does not store all values in memory at once. This makes it highly memory-efficient, especially for large datasets.

Using a Generator Expression (After Optimization):

You can achieve similar memory efficiency by using a generator expression instead of a list comprehension. The syntax is almost identical, but it uses parentheses () instead of square brackets [].

After Optimization (Using a Generator Expression):

# Using a generator expression for memory-efficient square generation
squares_gen = (x ** 2 for x in range(1000000))

# Process squares one at a time
for square in squares_gen:
    pass  # Do something with each square

Like the yield-based generator, the generator expression produces values lazily without storing them all in memory. It offers the same memory efficiency as the generator function but with more concise syntax.

Performance Comparison:

List Comprehension: Generates and stores the entire list in memory, which can be inefficient for large datasets.
Generator with yield: Produces values on-the-fly, significantly reducing memory usage.
Generator Expression: Similar to list comprehension but memory-efficient, as it returns an iterator instead of storing the list in memory.

Use Built-in itertools Module :

The itertools module in Python provides a collection of fast, memory-efficient tools for creating and working with iterators. It offers a variety of functions that can help you handle iterables more efficiently, reduce code complexity, and improve performance by allowing you to work with data lazily (i.e., without generating all values at once).

Some of the most commonly used functions in itertools include:

count(): Infinite counter, similar to a range() but with no end.
cycle(): Repeats an iterable indefinitely.
repeat(): Repeats an object multiple times.
chain(): Combines multiple iterables into one.
islice(): Slices an iterator without consuming memory to store the intermediate values.
combinations() and permutations(): Generate combinations and permutations of elements, useful in combinatorial problems.

These functions are designed to work efficiently with large datasets by generating results lazily, meaning they do not create large in-memory data structures but instead produce values as needed.

Combining Iterables (Before Optimization):

Suppose you have multiple lists and want to iterate over all their elements. A common approach would be to concatenate the lists and then iterate over the combined list.

Before Optimization (Using List Concatenation):

list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2  

for item in combined:
    print(item)

Here, we create a new list by concatenating list1 and list2. This approach can consume a lot of memory, especially when working with large datasets.

Combining Iterables (After Optimization with itertools.chain()):

Using itertools.chain(), you can combine multiple iterables without creating a new list in memory. chain() creates an iterator that produces elements from the input iterables one by one.

After Optimization (Using itertools.chain()):

import itertools

# Combining iterables using itertools.chain
list1 = [1, 2, 3]
list2 = [4, 5, 6]

for item in itertools.chain(list1, list2):
    print(item)

itertools.chain() produces items lazily without creating a new list in memory. This approach is more memory-efficient, especially when dealing with large datasets.

Slicing an Iterator (Before Optimization):

You may want to take a subset of elements from a large iterator. If you convert the iterator to a list and then slice it, this will create a large list in memory.

Before Optimization (Using List Slicing):

# Slicing a large list (memory-intensive)
large_list = list(range(1000000))
subset = large_list[:10]

for item in subset:
    print(item)

This approach requires generating and storing the entire list in memory, which can be very inefficient for large datasets.

Slicing an Iterator (After Optimization with itertools.islice()):

With itertools.islice(), you can take a slice of an iterator without generating the entire list in memory. This is much more efficient for large datasets.

After Optimization (Using itertools.islice()):

import itertools

large_range = range(1000000)
subset = itertools.islice(large_range, 10)

for item in subset:
    print(item)

itertools.islice() returns a generator that produces items from the input iterable on-the-fly, without creating a large intermediate list in memory.

Performance Comparison:

List Concatenation vs. itertools.chain(): Concatenating lists creates a new list in memory, while itertools.chain() produces items lazily, reducing memory usage.
List Slicing vs. itertools.islice(): Slicing a list creates a new list in memory, while itertools.islice() slices the iterator without generating the entire list, which is more memory-efficient.

Avoid Unnecessary Calculations :

1. Repeating Calculations Inside a Loop(Before Optimization):

In this example, the square root of 144 is calculated inside the loop. Since 144 does not change, recalculating its square root on every iteration is unnecessary.

import math

def calculate():
    result = 0
    for i in range(1, 1000000):
        result += math.sqrt(144)  # Recomputing the square root each time
    return result

print(calculate())

Explanation:

The math.sqrt(144) calculation is performed every time the loop iterates.
This results in one million calls to math.sqrt(), which is inefficient because the result is always the same.

2. Moving Calculation Outside the Loop(After Optimization):

To optimize this, we compute the square root of 144 once, before the loop begins, and store it in a variable. We then use this precomputed value inside the loop.

import math

def calculate():
    result = 0
    sqrt_144 = math.sqrt(144)  # Calculate once before the loop
    for i in range(1, 1000000):
        result += sqrt_144  # Use the precomputed result
    return result

print(calculate())

Explanation:

The square root of 144 is calculated only once, and the result is stored in the sqrt_144 variable.
The loop then adds the precomputed value to result on each iteration, avoiding redundant calculations.

3.Repeated Function Calls:

Repeatedly calling the same function within a loop can be inefficient, especially if the function performs expensive computations or always returns the same result. By avoiding unnecessary function calls, you can optimize your code and improve performance.

Efficiency: Repeated function calls can significantly slow down your program, particularly if the function involves heavy computations or I/O operations.
Simplification: Storing the result of a function in a variable, instead of calling the function multiple times, reduces redundancy and improves code clarity.

Before Optimization (Repeated Function Calls Inside a Loop):

In this example, we are repeatedly calling the len() function inside a loop. Since the length of the list does not change, recalculating it on every iteration is unnecessary.

def repeated_len_calls(lst):
    for i in range(len(lst)):  # Repeatedly calling len(lst)
        print(lst[i])

my_list = [1, 2, 3, 4, 5]
repeated_len_calls(my_list)

Explanation:

The len(lst) function is called on every iteration of the loop, even though the length of the list remains constant.
This results in redundant function calls, which can degrade performance when working with large datasets.

After Optimization (Storing the Function Result):

To optimize this, we store the result of the len() function in a variable before the loop starts. This eliminates the need to repeatedly call the function.

def optimized_len_calls(lst):
    length = len(lst)  # Precompute the length of the list
    for i in range(length):
        print(lst[i])

my_list = [1, 2, 3, 4, 5]
optimized_len_calls(my_list)

Explanation:

The length of the list is calculated once and stored in the length variable.
The loop uses this precomputed value instead of repeatedly calling len(lst).

Avoiding repeated function calls is a key optimization strategy in Python. By storing the result of a function and reusing it instead of recalculating it, you can reduce the overhead of function calls and improve the performance of your code. This is especially important in scenarios where functions are called inside loops or the function involves expensive operations.

Use set() for Membership Testing :

When checking if an item exists within a collection, the choice of data structure can have a significant impact on performance. Using a list or tuple for membership testing (in operation) has a time complexity of O(n) because the search is linear. On the other hand, using a set for membership testing reduces the time complexity to O(1) due to the underlying hash table implementation. Therefore, if you need to perform frequent membership tests, converting your collection to a set can greatly improve performance.

Performance: Membership testing in lists and tuples is slow for large datasets because the time taken grows with the size of the collection.
Efficiency: Sets provide O(1) average-time complexity for membership tests, making them ideal for scenarios where performance is critical.

Before Optimization (Using a List for Membership Testing):

In this example, we are using a list to check if a value exists in the collection. As the size of the list grows, the time taken to check membership increases.

def check_membership(lst, value):
    if value in lst:  # O(n) time complexity
        return True
    return False

my_list = [i for i in range(1000000)]
print(check_membership(my_list, 999999))  # True

Explanation:

The in operation on a list has O(n) time complexity because it checks each element one by one.
For large lists, this becomes inefficient and slow.

After Optimization (Using a Set for Membership Testing):

To optimize this, we convert the list to a set, which provides O(1) average-time complexity for membership tests.

def check_membership(s, value):
    if value in s:  # O(1) time complexity
        return True
    return False

my_set = set(range(1000000))  # Convert list to set
print(check_membership(my_set, 999999))  # True

Explanation:

The set data structure uses a hash table, which allows for O(1) average-time complexity for membership tests.
By converting the list to a set, the membership check becomes much faster, especially for large datasets.

If you frequently need to check if an item exists within a collection, using a set instead of a list can greatly improve performance. This simple change can transform a slow O(n) membership test into a fast O(1) operation, which is particularly beneficial when dealing with large datasets.

Limit Use of try/except Blocks :

Using try/except blocks in Python is a common practice for handling exceptions and errors. However, excessive use of these blocks, especially in performance-critical code, can negatively impact performance and lead to less readable code. It’s generally better to use try/except blocks for exceptional cases and to avoid them for regular control flow.

Performance: Exception handling can be costly in terms of performance, especially if used frequently within loops or performance-critical sections.
Readability: Overusing try/except blocks can make code harder to understand and maintain.
Error Handling: It’s important to use exceptions for truly exceptional conditions rather than for normal control flow

Before Optimization (Excessive Use of try/except)

In this example, try/except blocks are used inside a loop to handle cases where division by zero might occur. This approach can be inefficient if division by zero is a frequent possibility.

def process_numbers(numbers):
    results = []
    for number in numbers:
        try:
            result = 10 / number  # Attempt division
        except ZeroDivisionError:
            result = 'undefined'  # Handle the error
        results.append(result)
    return results

numbers = [1, 2, 0, 4, 0, 6]
print(process_numbers(numbers))

Explanation:

The try/except block is used to handle division by zero, which can be inefficient if 0 appears frequently in the numbers list.
Handling exceptions inside a loop can degrade performance, especially if exceptions are common.

After Optimization (Handling Exceptions More Efficiently):

To optimize, you can check for conditions that lead to exceptions before performing operations. This avoids unnecessary use of try/except blocks for control flow

def process_numbers_optimized(numbers):
    results = []
    for number in numbers:
        if number == 0:
            result = 'undefined'  # Preemptively handle zero
        else:
            result = 10 / number  # Perform the division
        results.append(result)
    return results

numbers = [1, 2, 0, 4, 0, 6]
print(process_numbers_optimized(numbers))

Explanation:

The code checks if number is 0 before attempting division, avoiding the use of a try/except block for this common condition.
This approach is more efficient because it prevents the need to handle exceptions frequently.

Limiting the use of try/except blocks can enhance performance and readability. Reserve try/except for truly exceptional cases rather than using them for regular control flow or anticipated conditions. By preemptively checking for conditions that might lead to exceptions, you can write more efficient and maintainable code

String Concatenation :

String concatenation is the process of combining multiple strings into one. In Python, you can concatenate strings using several methods, including the + operator, join(), and formatted strings. The choice of method can affect performance, especially when dealing with a large number of strings or frequent concatenation operations.

Performance: Inefficient string concatenation methods can lead to performance issues, particularly in loops or large data processing tasks.
Readability: Choosing the right method for concatenation can make your code more readable and maintainable.

Before Optimization (Using + Operator for Concatenation in a Loop):

In this example, the + operator is used to concatenate strings within a loop. This method can be inefficient because each concatenation creates a new string object.

def concatenate_strings(strings):
    result = ""
    for s in strings:
        result += s  # Inefficient concatenation in a loop
    return result

strings = ["hello", "world", "this", "is", "a", "test"]
print(concatenate_strings(strings))

Explanation:

The + operator creates a new string each time concatenation occurs.
In a loop, this results in multiple intermediate string objects being created, which can be inefficient for large datasets.

After Optimization (Using join() for Concatenation):

To optimize the concatenation, use the join() method, which is more efficient as it performs the concatenation in a single operation.

def concatenate_strings_optimized(strings):
    return "".join(strings)  # Efficient concatenation

strings = ["hello", "world", "this", "is", "a", "test"]
print(concatenate_strings_optimized(strings))

Explanation:

The join() method concatenates all strings in the list in a single operation.
This method is more efficient than using the + operator in a loop because it minimizes the creation of intermediate string objects.

Choosing the right method for string concatenation can significantly impact performance. For concatenating a large number of strings, use the join() method instead of the + operator within loops. This approach reduces the overhead of creating multiple intermediate string objects and improves the efficiency of your code.

Conclusion:

Optimizing Python code is a vital practice for enhancing the performance and efficiency of your applications. By employing strategies such as minimizing loops, avoiding global variables, utilizing built-in functions, and choosing efficient data structures, you can significantly reduce execution time and improve overall application performance. Profiling your code helps identify bottlenecks and focus optimization efforts where they are most needed. Limiting the use of try/except blocks and using sets for membership testing further contribute to more efficient and readable code. By incorporating these optimization techniques, you will not only speed up your code but also create a more scalable and maintainable Python application.