Efficient Programming in Python
Sources:
-
Books:
Profiling
Profiling - technique that allows to pinpoint the most resource-intensive parts of an application.
Profiler - program that runs an aplication and monitors how long each function takes to execute, so it detects the functions on which our application spends of its time.
pytest-benchmark
is used for testing running time of specific functions. See documentation of pytest-benchmark for details.
Another profiling tool is cProfile
.
from simul import benchmark
import cProfile
pr = cProfile.Profile()
pr.enable()
benchmark()
pr.disable()
pr.print_stats()
KCachegrind - graphical user interface visualizing profiling output.
How to do profiling:
- Check modules of the app with cProfile or other profiler in order to understand which module is time-consuming
- Use
line_profiler
to check functions line by line. See the documentation
Profiling memory usage
Memory profiler summarizes the information of memory usage of process.
Optimizing code
Methods of optimzation:
- improve algorithms used
- minimize the number of instructions
Using the right data structures
Estimation time of operations per data sctructure:
list of 10,000 size
deque (double-ended queue) of 10,000 size
Tip
A simple way to speed up searches in lists is to keep the array sorted and perform a binary search using the bisect
module. bisect
allows fast search on sorted arrays.
The efficiency of the bisect
function:
dictionaries
The efficiency of the Counter
function:
Building an in-memory search index using a hash map
docs = ["the cat is under the table", "the dog is under the table", "cats and dogs smell roses", "Carla eats an apple"]
index = {} #(1)
for i, doc in enumerate(docs):
for word in doc.split(): #(2)
if word not in index: #(3)
index[word] = [i]
else:
index[word].append(i)
- Building an index
- We iterate over each term in the document
- We build a list containing the indices where the term appears
Now if we want to do search (for example we want to retrieve all documents contain the table
term), we can easily use it to query the index:
And it takes us \(O(1)\) time complexity. So we can query any number of documents in constant time.
sets
Unordered collections of unique elements.
Time complexity of union, intersection and difference operations of two sets.
Application of sets in practice: boolean search - for example, query all documents contain multiple terms. For example, we may want to search for all the documents that contain the words cat and table. This kind of query can be efficiently computed by taking the intersection between the set of documents containing cat and the set of documents containing table.
index = {} #(1)
for i, doc in enumerate(docs):
for word in doc.split(): #(2)
if word not in index:
index[word] = {i} #(3)
else:
index[word].add(i)
index['cat'].intersection(index['table']) #(4)
- Building an index using sets
- We iterate over each term in the document
- We build a set containing the indices where the term appears
- Querying the documents containing both "cat" and "table"
heaps
Heaps are data structures designed to quickly fin and extract the maximum (or minimum) value in a collection.
Application of heaps in practice: process of incoming tasks in order of maximum priority. Insertion or extraction of maximum value takes \(O(log(N))\) time complexity.
import heapq
collection = [10, 3, 3, 4, 5, 6]
heapq.heapify(collection)
heapq.heappop(collection) #(1)
- extract the minimum value, returns: 3
Another example of performing insertion and extraction
from queue import PriorityQueue
queue = PriorityQueue()
for element in collection:
queue.put(element)
queue.get() #(1)
- returns: 3
If the maximum element is required we can just multiply each element of the list by -1, i.e. invert the order of elements.
If it is needed to associate an object (for example, task) with each number (and make priority order) we can insert tuples (number, object)
:
queue = PriorityQueue()
queue.put((3, "priority 3"))
queue.put((2, "priority 2"))
queue.put((1, "priority 1"))
queue.get() #(1)
- returns: (1, "priority 1")
tries
Tries are extremely fast at matching a list of strings against a prefix. This is especially useful when implementing features such as search as you type and autocompletion, where the list of available completions is very large and short response times are required.
from patricia import trie
from random import choice
from string import ascii_uppercase
def random_string(length):
"""Produce a random string made of *length* uppercase \
ascii characters"""
return ''.join(choice(ascii_uppercase) for i in \
range(length))
strings = [random_string(32) for i in range(10000)]
strings_dict = {s:0 for s in strings} #(1)
strings_trie = trie(**strings_dict)
matches = list(strings_trie.iter('AA'))
- a dictionary where all values are 0
Algorithmic optimization
Using caching and memorization to improve efficiency
The idea behind caching is to store expensive results in a temporary location (cache) that can be located in memory, on disk, or in a remote location.
The cache functions are located inside functools
python module. See documentation
Example of lru_cache()
function:
from functools import lru_cache
@lru_cache()
def sum2(a, b):
print("Calculating {} + {}".format(a, b))
return a + b
print(sum2(1, 2)) #(1)
print(sum2(1, 2)) #(2)
- Output: # Calculating 1 + 2 # 3
- Second time function is not called again. Result was cached. Output: # 3
We can restrict max size of cache by @lru_cache(max_size=16)
. In this case new values will replace old ones based on strategy least recently used.
Other third-party module which allows to implement cache is Joblib
Basically it works same as lru_cache()
but the results will be stored on disk in the directory specified by the cachedir
argument:
from joblib import Memory
memory = Memory(cachedir='/path/to/cachedir')
@memory.cache
def sum2(a, b):
return a + b
Using iteration with comprehensions and generators
To speed up looping we can use comprehensions and generators. Time comparison:
def loop():
res = []
for i in range(100000):
res.append(i * i)
return sum(res)
def dict_loop():
res = {}
for i in range(100000):
res[i] = i
return res
def comprehension():
return sum([i * i for i in range(100000)])
def generator():
return sum(i * i for i in range(100000))
def dict_comprehension():
return {i: i for i in range(100000)}
%timeit loop()
100 loops, best of 3: 16.1 ms per loop
%timeit comprehension()
100 loops, best of 3: 10.1 ms per loop
%timeit generator()
100 loops, best of 3: 12.4 ms per loop
%timeit dict_loop()
100 loops, best of 3: 13.2 ms per loop
%timeit dict_comprehension()
100 loops, best of 3: 12.8 ms per loop
Example of efficient looping using filter
and map
functions in combination with generators:
def map_comprehension(numbers):
a = [n * 2 for n in numbers]
b = [n ** 2 for n in a]
c = [n ** 0.33 for n in b]
return max(c)
The problem here is that for every list comprehension we are allocating a new list, which increases memory usage. We can solve it by implementing generators:
def map_normal(numbers):
a = map(lambda n: n * 2, numbers)
b = map(lambda n: n ** 2, a)
c = map(lambda n: n ** 0.33, b)
return max(c)
%load_ext memory_profiler
numbers = range(1000000)
%memit map_comprehension(numbers)
peak memory: 166.33 MiB, increment: 102.54 MiB
%memit map_normal(numbers)
peak memory: 71.04 MiB, increment: 0.00 MiB
More about generators
Using Numpy for high-speed calculations on big arrays
Numpy allows to manipulate multidimensional numerical data and perform mathematical computations that are highly optimized.
Using Pandas with database-style data
Pandas allows to work with labeled, categorical data (a.k.a dictionaries in Python)
Create pd.Series
object:
import pandas as pd
patients = [0, 1, 2, 3]
effective = [True, True, False, False]
effective_series = pd.Series(effective, index=patients)
In pandas keys are not limited to integers compared to numpy:
patients = ["a", "b", "c", "d"]
effective = [True, True, False, False]
effective_series = pd.Series(effective, index=patients)
We can create a dataframe to hold key-value data:
patients = ["a", "b", "c", "d"]
columns = {
"systolic_blood_pressure_initial": [120, 126, 130, 115],
"diastolic_blood_pressure_initial": [75, 85, 90, 87],
"systolic_blood_pressure_final": [115, 123, 130, 118],
"diastolic_blood_pressure_final": [70, 82, 92, 87]
}
df = pd.DataFrame(columns, index=patients)
Indexing Series and DataFrame objects:
- Using key of the element
- Using position of the element
- Using position of the element
Also pandas supports such useful operations as join
for filtering data and extracting specific rows.
Using xarray
Xarray combines best from Numpy and Pandas - working with labeled multidimensional data. For details see the xarray-documentation
Example:
Using Cython to increase performance
Use documentation for detailed overview of its functionality