Notable Python Libraries - Examples#
ndarray
vs list
#
Let’s evaluate the performance difference when working with ndarray
and list
objects.
Filling ndarray
#
Due to the memory management involved, it is better to create the ndarray
once all its elements are defined rather than filling it an event at a time.
We will use the time
library to calculate the elapsed (CPU) time of each operation.
First import the libraries
import numpy as np
import time
Then calculate the timing of filling the ndarray
event-by-event
start_time, start_time_cpu = time.time(), time.process_time()
X = np.array([])
for i in range(10000):
X.fill(i)
end_time, end_time_cpu = time.time(), time.process_time()
print(f'It took {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
It took 3.891ms (CPU: 7.779ms)
Now fill the ndarray
at once
start_time, start_time_cpu = time.time(), time.process_time()
x = list(range(10000))
X = np.array(x)
end_time, end_time_cpu = time.time(), time.process_time()
print(f'It took {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
It took 1.252ms (CPU: 2.501ms)
Operations with arrays#
Numeric operations with NumPy arrays are faster (and easier to write) than those involving Python lists. Let’s compare the timing and syntax of a term-by-term product of two arrays.
x, y = list(range(10000)), list(range(10000))
X, Y = np.array(x), np.array(y)
start_time, start_time_cpu = time.time(), time.process_time()
z = [ x[i]*y[i] for i in range(len(x)) ]
end_time, end_time_cpu = time.time(), time.process_time()
print(f'lists multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
start_time, start_time_cpu = time.time(), time.process_time()
Z = X*Y
end_time, end_time_cpu = time.time(), time.process_time()
print(f'ndarray multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
lists multiplication: 1.128ms (CPU: 1.128ms)
ndarray multiplication: 0.252ms (CPU: 0.292ms)
Already when dealing with arrays of 10k events, NumPy is 10x faster than built-in Python. Let’s see the performance when dealing with 1M events arrays.
x, y = list(range(1000000)), list(range(1000000))
X, Y = np.array(x), np.array(y)
start_time, start_time_cpu = time.time(), time.process_time()
z = [ x[i]*y[i] for i in range(len(x)) ]
end_time, end_time_cpu = time.time(), time.process_time()
print(f'lists multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
start_time, start_time_cpu = time.time(), time.process_time()
Z = X*Y
end_time, end_time_cpu = time.time(), time.process_time()
print(f'ndarray multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
lists multiplication: 105.891ms (CPU: 105.538ms)
ndarray multiplication: 2.216ms (CPU: 2.445ms)