Notable Python Libraries - Examples#

ndarray vs list#

Let’s evaluate the performance difference when working with ndarray and list objects.

Filling ndarray#

Due to the memory management involved, it is better to create the ndarray once all its elements are defined rather than filling it an event at a time.

We will use the time library to calculate the elapsed (CPU) time of each operation.

First import the libraries

import numpy as np
import time

Then calculate the timing of filling the ndarray event-by-event

start_time, start_time_cpu = time.time(), time.process_time()
X = np.array([])
for i in range(10000): 
    X.fill(i)
end_time, end_time_cpu = time.time(), time.process_time()
print(f'It took {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
It took 3.891ms (CPU: 7.779ms)

Now fill the ndarray at once

start_time, start_time_cpu = time.time(), time.process_time()
x = list(range(10000))
X = np.array(x)
end_time, end_time_cpu = time.time(), time.process_time()
print(f'It took {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
It took 1.252ms (CPU: 2.501ms)

Operations with arrays#

Numeric operations with NumPy arrays are faster (and easier to write) than those involving Python lists. Let’s compare the timing and syntax of a term-by-term product of two arrays.

x, y = list(range(10000)), list(range(10000))
X, Y = np.array(x), np.array(y)

start_time, start_time_cpu = time.time(), time.process_time()
z = [ x[i]*y[i] for i in range(len(x)) ]
end_time, end_time_cpu = time.time(), time.process_time()
print(f'lists multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')

start_time, start_time_cpu = time.time(), time.process_time()
Z = X*Y
end_time, end_time_cpu = time.time(), time.process_time()
print(f'ndarray multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
lists multiplication: 1.128ms (CPU: 1.128ms)
ndarray multiplication: 0.252ms (CPU: 0.292ms)

Already when dealing with arrays of 10k events, NumPy is 10x faster than built-in Python. Let’s see the performance when dealing with 1M events arrays.

x, y = list(range(1000000)), list(range(1000000))
X, Y = np.array(x), np.array(y)

start_time, start_time_cpu = time.time(), time.process_time()
z = [ x[i]*y[i] for i in range(len(x)) ]
end_time, end_time_cpu = time.time(), time.process_time()
print(f'lists multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')

start_time, start_time_cpu = time.time(), time.process_time()
Z = X*Y
end_time, end_time_cpu = time.time(), time.process_time()
print(f'ndarray multiplication: {1000*(end_time-start_time):.3f}ms (CPU: {1000*(end_time_cpu-start_time_cpu):.3f}ms)')
lists multiplication: 105.891ms (CPU: 105.538ms)
ndarray multiplication: 2.216ms (CPU: 2.445ms)