PyPy3 + pickle + lzma = slowness. Solution.

Initial data:

        with lzma.open(file_name, 'rb') as file:
            data = pickle.load(file)
  • 40Mb file
  • 90000 of objects inside the file
  • File created by CPython-optimized code

Problem:

This code needs only 6.43s when launched under CPython, but in needs more than 5min to complete under PyPy3 (in fact, I did not even wait for the end: I just shut it down after 5 minutes of running)!

Solution:

        with lzma.open(file_name, 'rb') as file:
            raw_data = file.read()
            data = pickle.loads(raw_data)

This code needs 6.35s when launched under CPython and 39s under PyPy (of which 36s is spent on pickle.loads() ).

Also recreating file under PyPy is not the case:

  • this is a very slow operation (82s for pickle.dumps() and 74s for file.write() );
  • it leads to slightly slower reading (54s for pickle.loads() ) under PyPy.

Conclusions:

It is good to prepare pickled data under CPython.