Initial data:
with lzma.open(file_name, 'rb') as file:
data = pickle.load(file)
- 40Mb file
- 90000 of objects inside the file
- File created by CPython-optimized code
Problem:
This code needs only 6.43s when launched under CPython, but in needs more than 5min to complete under PyPy3 (in fact, I did not even wait for the end: I just shut it down after 5 minutes of running)!
Solution:
with lzma.open(file_name, 'rb') as file:
raw_data = file.read()
data = pickle.loads(raw_data)
This code needs 6.35s when launched under CPython and 39s under PyPy (of which 36s is spent on pickle.loads() ).
Also recreating file under PyPy is not the case:
- this is a very slow operation (82s for pickle.dumps() and 74s for file.write() );
- it leads to slightly slower reading (54s for pickle.loads() ) under PyPy.
Conclusions:
It is good to prepare pickled data under CPython.