Skip to main content

Making Python Programs Blazingly Fast

Let’s look at the performance of our Python programs and see how to make them up to 30% faster!

Python haters always say, that one of the reasons they don’t want to use it, is that it’s slow. Well, whether specific program — regardless of the programming language used — is fast or slow is very much dependent on the developer who wrote it and their skill and ability to write optimized and fast programs.
So, let’s prove some people wrong and let’s see how we can improve performance of our Python programs and make them really fast!

Timing and Profiling

Before we start optimizing anything, we first need to find out which parts of our code actually slow down the whole program. Sometimes the bottleneck of the program might be obvious, but in case you don’t know where it is, then here are options you have for finding out:
Note: This is the program I will be using for demonstration purposes, it computes e to power of X (taken from Python docs):



# slow_program.py
from decimal import *
def exp(x):
getcontext().prec += 2
i, lasts, s, fact, num = 0, 0, 1, 1, 1
while s != lasts:
lasts = s
i += 1
fact *= i
num *= x
s += num / fact
getcontext().prec -= 2
return +s
exp(Decimal(150))
exp(Decimal(400))
exp(Decimal(3000)) 

The Laziest “Profiling”

First off, the simplest and honestly very lazy solution — Unix time command:
~ $ time python3.8 slow_program.py
real 0m11,058s
user 0m11,050s
sys 0m0,008s
This could work if you just want to time your whole program, which is usually not enough…

The Most Detailed Profiling

On the other end of the spectrum is cProfile, which will give you too much information:
~ $ python3.8 -m cProfile -s time slow_program.py
1297 function calls (1272 primitive calls) in 11.081 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3 11.079 3.693 11.079 3.693 slow_program.py:4(exp)
1 0.000 0.000 0.002 0.002 {built-in method _imp.create_dynamic}
4/1 0.000 0.000 11.081 11.081 {built-in method builtins.exec}
6 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x9d12c0}
6 0.000 0.000 0.000 0.000 abc.py:132(__new__)
23 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__)
245 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
2 0.000 0.000 0.000 0.000 {built-in method marshal.loads}
10 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
8/4 0.000 0.000 0.000 0.000 abc.py:196(__subclasscheck__)
15 0.000 0.000 0.000 0.000 {built-in method posix.stat}
6 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__}
1 0.000 0.000 0.000 0.000 __init__.py:357(namedtuple)
48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:57(_path_join)
48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
1 0.000 0.000 11.081 11.081 slow_program.py:1(<module>)
...
Here, we ran the testing script with cProfile module and time argument, so that lines are ordered by internal time ( cumtime). This gives us a lot of information, the lines you can see above are about 10% of the actual output. From this, we can see that exp function is the culprit ( surprise, surprise) and now we can get little more specific with timing and profiling...

Timing Specific Functions

Now that we know where to direct our attention, we might want to time the slow function, without measuring the rest of the code. For that we can use simple decorator:


def timeit_wrapper(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter() # Alternatively, you can use time.process_time()
func_return_val = func(*args, **kwargs)
end = time.perf_counter()
print('{0:<10}.{1:<8} : {2:<8}'.format(func.__module__, func.__name__, end - start))
return func_return_val
return wrapper
This decorator can be then applied to function under test like so:
@timeit_wrapper
def exp(x):
...
print('{0:<10} {1:<8} {2:^8}'.format('module', 'function', 'time'))
exp(Decimal(150))
exp(Decimal(400))
exp(Decimal(3000))
This gives us output like this:
~ $ python3.8 slow_program.py
module function time
__main__ .exp : 0.003267502994276583
__main__ .exp : 0.038535295985639095
__main__ .exp : 11.728486061969306
One thing to consider is what kind of time we actually (want to) measure. Time package provides time.perf_counter and time.process_time. The difference here is that perf_counter returns absolute value, which includes time when your Python program process is not running, therefore it might be impacted by machine load. On the other hand process_time returns only user time (excluding system time), which is only the time of your process.

Making It Faster

Now, for the fun part. Let’s make your Python programs run faster. I’m (mostly) not going to show you some hacks, tricks and code snippets that will magically solve your performance issues. This is more about general ideas and strategies, which when used, can make a huge impact on performance, in some cases up to 30% speed-up.

Use Built-in Data Types

This one is pretty obvious. Built-in data types are very fast, especially in comparison to our custom types like trees or linked lists. That’s mainly because the built-ins are implemented in C, which we can’t really match in speed when coding in Python.

Caching/Memoization with lru_cache


import functools
import time
# caching up to 12 different results
@functools.lru_cache(maxsize=12)
def slow_func(x):
time.sleep(2) # Simulate long computation
return x
slow_func(1) # ... waiting for 2 sec before getting result
slow_func(1) # already cached - result returned instantaneously!
slow_func(3) # ... waiting for 2 sec before getting result
The function above simulates heavy computation using time.sleep. When called first time with parameter 1, it waits for 2 seconds and only then returns the result. When called again, the result is already cached so it skips the body of the function and returns the result immediately. 

Use Local Variables

This has to do with the speed of lookup of variables in each scope. I’m writing each scope, because it’s not just about using local vs. global variables. There’s actually a difference in speed of lookup even between — let’s say — local variable in function (fastest), class-level attribute (e.g. self.name - slower) and global for example imported function like time.time (slowest).
You can improve performance, by using seemingly unnecessary (straight-up useless) assignments like this:

# Example #1
class FastClass:
def do_stuff(self):
temp = self.value # this speeds up lookup in loop
for i in range(10000):
... # Do something with `temp` here
# Example #2
import random
def fast_function():
r = random.random
for i in range(10000):
print(r()) # calling `r()` here, is faster than global random.random()

Use Functions

This might seem counter-intuitive, as calling function will put more stuff onto the stack and create overhead from function returns, but it relates to the previous point. If you just put your whole code into one file without putting it into function, it will be much slower because of global variables. Therefore you can speed up your code just by wrapping whole code in main function and calling it once, like so:
def main():
... # All your previously global code
main()

Don’t Access Attributes

Another thing that might slow down your programs is dot operator (.) which is used when accessing object attributes. This operator triggers dictionary lookup using __getattribute__, which creates extra overhead in your code. So, how can we actually avoid (limit) using it?
# Slow:
import re
def slow_func():
for i in range(10000):
re.findall(regex, line) # Slow!
# Fast:
from re import findall
def fast_func():
for i in range(10000):
findall(regex, line) # Faster!

Beware of Strings

Operations on strings can get quite slow when ran in loop using for example modulus (%s) or .format(). What better options do we have? Based on recent , the only thing we should be using is f-string, it's most readable, concise AND the fastest method. So, based on that tweet, this is the list of methods you can use - fastest to slowest:


f'{s} {t}' # Fast!
s + ' ' + t
' '.join((s, t))
'%s %s' % (s, t)
'{} {}'.format(s, t)
Template('$s $t').substitute(s=s, t=t) # Slow!
Generators are not inherently faster as they were made to allow for lazy computation, which saves memory rather than time. However, the saved memory can be cause for your program to actually run faster. How? Well, if you have a large dataset and you don’t use generators (iterators), then the data might overflow CPUs L1 cache, which will slow down lookup of values in memory significantly.
When it comes to performance, it’s very import that CPU can save all the data it’s working on, as close as possible, which is in the cache. You can watch , where he mentions these issues.

Conclusion

The first rule of optimization is to not do itBut, if you really have to, then I hope these few tips help you with that. However, be mindful when optimizing your code as it might end up making your code hard to read and therefore hard to maintain, which might outweigh benefits of optimization.
Note: This was originally posted at 



Comments

Popular posts from this blog

What Why How SDN..???????

What is SDN?   If you follow any number of news feeds or vendor accounts on Twitter, you've no doubt noticed the term "software-defined networking" or SDN popping up more and more lately. Depending on whom you believe, SDN is either the most important industry revolution since Ethernet or merely the latest marketing buzzword (the truth, of course, probably falls somewhere in between). Few people from either camp, however, take the time to explain what SDN actually means. This is chiefly because the term is so new and different parties have been stretching it to encompass varying definitions which serve their own agendas. The phrase "software-defined networking" only became popular over roughly the past eighteen months or so. So what the hell is it? Before we can appreciate the concept of SDN, we must first examine how current networks function. Each of the many processes of a router or switch can be assigned to one of three conceptual planes of operatio...

NETWORKING BASICS

This article is referred from windowsnetworking.com In this article series, I will start with the absolute basics, and work toward building a functional network. In this article I will begin by discussing some of the various networking components and what they do. If you would like to read the other parts in this article series please go to: Networking Basics: Part 2 - Routers Networking Basics: Part 3 - DNS Servers Networking Basics: Part 4 - Workstations and Servers Networking Basics: Part 5 - Domain Controllers Networking Basics: Part 6 - Windows Domain Networking Basics: Part 7 - Introduction to FSMO Roles Networking Basics: Part 8 - FSMO Roles continued Networking Basics: Part 9 – Active Directory Information Networking Basics: Part 10 - Distinguished Names Networking Basics, Part 11: The Active Directory Users and Computers Console Networking Basics: Part 12 - User Account Management Networking Basics: Part 13 - Creating ...

How to install and setup Docker on RHEL 7/CentOS 7

H ow do I install and setup Docker container on an RHEL 7 (Red Hat Enterprise Linux) server? How can I setup Docker on a CentOS 7? How to install and use Docker CE on a CentOS Linux 7 server? Docker is free and open-source software. It automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere. Typically you develop software on your laptop/desktop. You can build a container with your app, and it can test run on your computer. It will scale in cloud, VM, VPS, bare-metal and more. There are two versions of docker. The first one bundled with RHEL/CentOS 7 distro and can be installed with the yum. The second version distributed by the Docker project called docker-ce (community free version) and can be installed by the official Docker project repo. The third version distributed by the Docker project called docker-ee (Enterprise paid version) and can be installed by the official Docker project repo.  This page shows...