Lompat ke konten Lompat ke sidebar Lompat ke footer

Reading Multiple Objects From a Pickle File Python

If yous want to serialize and deserialize Python objects you lot might have considered using the Python Pickle module.

The Python Pickle module allows to serialize and deserialize a Python object structure. Pickle provides two functions to write/read to/from file objects (dump() and load()). It also provides two functions to write/read to/from bytes objects.

We will go through few examples to show how pickle works both with file objects and bytes objects. We will as well exam information technology with multiple information types.

It's time to pickle!

Python Pickle Example

The Python Pickle module is used to perform serialization and deserialization of Python objects.

Serializing a Python object means converting it into a byte stream that can exist stored in a file or in a string. Pickled data can then be read using the procedure called deserialization.

To store a pickled object into a string use the dumps() function. To read an object from a cord that contains its pickled representation utilize the loads() role.

Permit's see an case of how you can employ the pickle module to serialize a Python list.

          >>> import pickle >>> animals = ['tiger', 'king of beasts', 'giraffe'] >>> pickle.dumps(animals) b'\x80\x04\x95\x1e\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x05tiger\x94\x8c\x04lion\x94\x8c\x07giraffe\x94e.'                  

Afterwards importing the pickle module we define a list and so use the pickle dumps() role to generate a bytes representation of our list.

At present, we will store the pickled string in a variable and use the loads() function to catechumen the bytes string dorsum to our original list.

          >>> pickled_animals = pickle.dumps(animals) >>> unpickled_animals = pickle.loads(pickled_animals) >>> print(unpickled_animals) ['tiger', 'lion', 'giraffe']                  

The letter due south at the end of the dumps() and loads() pickle functions stands for cord. The pickle module too provides 2 functions that use files to store and read pickled information: dump() and load().

Save a Python Lexicon Using Pickle

With the pickle module y'all tin can save different types of Python objects.

Allow's use the dumps() function to pickle a Python dictionary.

          >>> animals = {'tiger': 23, 'lion': 45, 'giraffe': 67} >>> pickled_animals = pickle.dumps(animals) >>> impress(pickled_animals) b'\x80\x04\x95$\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x05tiger\x94K\x17\x8c\x04lion\x94K-\x8c\x07giraffe\x94KCu.'                  

And then the loads() part to go the dictionary back from its pickled representation.

          >>> new_animals = pickle.loads(pickled_animals) >>> print(new_animals) >>> {'tiger': 23, 'lion': 45, 'giraffe': 67}                  

So, this confirms that we can also salve dictionary objects in a string of bytes using Pickle.

Write Pickled Python Dictionary to a File

The pickle module also allows to store the pickled representation of a Python object to a file.

To store a pickled object to a file use the dump() office. To read an object from its pickled representation stored in a file use the load() function.

Firstly, we will open a file in binary fashion using the Python open function, store the pickled dictionary in the file and close the file.

          >>> import pickle >>> animals = {'tiger': 23, 'lion': 45, 'giraffe': 67} >>> f = open('data.pickle', 'wb') >>> pickle.dump(animals, f) >>> f.shut()        

The information.pickle file will go created in the same directory as your Python plan.

Note: recall to close the file when you are done with information technology.

If you lot look at the content of the data.pickle file with a text editor you will see data in binary format.

          €•$       }"(Å’tiger"MÅ’king of beasts"K-Å’giraffe"KCu.        

At present, read the bytes from the file and get back the original dictionary object using the load() office.

          >>> f = open up('information.pickle', 'rb') >>> unpickled_animals = pickle.load(f) >>> f.shut() >>> impress(unpickled_animals) {'tiger': 23, 'panthera leo': 45, 'giraffe': 67}                  

This time we have opened the file in read binary mode considering that we only want to read its content.

In the next section we will see if the pickle module can also serialize nested objects.

Pickle a Nested Lexicon Object

Allow's notice out if a Python nested dictionary tin can exist serialized and deserialized using the Pickle module.

Update the dictionary used in the previous section to include dictionaries as values mapped to each cardinal.

          >>> animals = {'tiger': {'count': 23}, 'lion': {'count': 45}, 'giraffe': {'count': 67}}                  

Write the pickled nested lexicon to a file. The lawmaking is identical to the one nosotros accept seen before to pickle a basic lexicon.

          >>> f = open('data.pickle', 'wb') >>> pickle.dump(animals, f) >>> f.close()        

No errors then far…

Now, catechumen the pickled information back to the nested dictionary:

          >>> f = open('data.pickle', 'rb') >>> unpickled_animals = pickle.load(f) >>> f.close() >>> impress(unpickled_animals) {'tiger': {'count': 23}, 'lion': {'count': 45}, 'giraffe': {'count': 67}}                  

The nested dictionary looks good.

Using Pickle With a Custom Class

I want to find out if I tin can pickle a Python custom class…

Let's create a class called Animal that contains 2 attributes.

          class Animal:     def __init__(self, name, grouping):         self.name = name         self.group = group        

Then create one object and pickle it into a file.

          tiger = Fauna('tiger', 'mammals') f = open('data.pickle', 'wb') pickle.dump(tiger, f) f.close()        

And finally, read the data using the pickle load() function.

          f = open('information.pickle', 'rb') data = pickle.load(f) print(data) f.close()        

This is the content of the data object:

          <main.Brute object at 0x0353BF58>        

And here are the attributes of our object…as you lot can come across they are correct.

          >>> print(data.__dict__) {'name': 'tiger', 'group': 'mammals'}                  

You can customise this output by adding the __str__ method to the class.

Save Multiple Objects with Pickle

Using the same class divers in the previous section we will salvage two objects in a file using the pickle module.

Create two objects of type Creature and pickle them into a file as a listing of objects:

          tiger = Animal('tiger', 'mammals') crocodile = Creature('crocodile', 'reptiles') f = open up('data.pickle', 'wb') pickle.dump([tiger, crocodile], f) f.close()        

You lot can access each object using a for loop.

          f = open('data.pickle', 'rb') information = pickle.load(f) f.close()  for animal in data:     impress(animate being.__dict__)  [output] {'name': 'tiger', 'group': 'mammals'} {'name': 'crocodile', 'group': 'reptiles'}        

Pickle and Python With Statement

So far nosotros had to remember to close the file object every fourth dimension after finishing working with it.

Instead of doing that nosotros can use the with open statement that takes care of endmost the file automatically.

Hither is how our lawmaking to write multiple objects becomes:

          tiger = Animal('tiger', 'mammals') crocodile = Fauna('crocodile', 'reptiles')  with open('information.pickle', 'wb') as f:     pickle.dump([tiger, crocodile], f)                  

And at present use the with open statement also to read the pickled data…

          with open up('data.pickle', 'rb') equally f:     data = pickle.load(f)  print(information)  [output] [<__main__.Animal object at 0x7f98a015d2b0>, <__main__.Animal object at 0x7f98a01a4fd0>]                  

Nice, it's a lot more than curtailed.

No more f.close() every time we read or write a file.

Using Python Pickle with Lambdas

So far we have used the pickle module with variables, but what happens if we utilize it with a role?

Define a simple lambda function that returns the sum of two numbers:

          >>> import pickle >>> pickle.dumps(lambda x,y : x+y) Traceback (nigh recent call last):   File "<stdin>", line i, in <module> _pickle.PicklingError: Can't pickle <office <lambda> at 0x7fbc60296c10>: attribute lookup <lambda> on __main__ failed                  

The pickle module doesn't allow to serialize a lambda function.

As an alternative nosotros can use the dill module that extends the functionality of the pickle module.

You might get the post-obit mistake when you try to import the dill module…

          >>> import dill Traceback (virtually recent call last):   File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'dill'                  

In that case you take to install the dill module using pip:

          $ pip install dill Collecting dill   Downloading dill-0.iii.3-py2.py3-none-any.whl (81 kB)      |████████████████████████████████| 81 kB 4.4 MB/s  Installing collected packages: dill Successfully installed dill-0.three.3                  

The dill module provides the dumps and loads functions in the same manner the pickle module does.

Let's first create a bytes object from the lambda using the dumps function:

          >>> import dill >>> pickled_lambda = dill.dumps(lambda x,y : x+y) >>> impress(pickled_lambda) b'\x80\x04\x95\x9e\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x10_create_function\x94\x93\x94(h\x00\x8c\x0c_create_code\x94\x93\x94(Thousand\x02K\x00K\x00K\x02K\x02KCC\x08|\x00|\x01\x17\x00S\x00\x94N\x85\x94)\x8c\x01x\x94\x8c\x01y\x94\x86\x94\x8c\x07<stdin>\x94\x8c\x08<lambda>\x94K\x01C\x00\x94))t\x94R\x94c__builtin__\n__main__\nh\x0bNN}\x94Nt\x94R\x94.'                  

And so unpickle the information using the loads function:

          >>> impress(dill.loads(pickled_lambda)) <office <lambda> at 0x7f9558408280> >>> unpickled_lambda = dill.loads(pickled_lambda) >>> unpickled_lambda(one,3) four                  

It works!

The lambda function returns the effect we expect.

Error When Pickling a Class with a Lambda Aspect

Let'due south get back to the custom grade we have defined before…

Nosotros have already seen how to serialize and deserialize it. At present let's add a new aspect and prepare its value to a lambda function.

          course Animate being:     def __init__(self, proper name, group):         self.name = proper noun         self.group = group         self.clarification = lambda: impress("The {} belongs to {}".format(self.proper noun, self.group))                  

Annotation: this lambda attribute doesn't take whatever input arguments. It merely prints a cord based on the values of the other 2 class instance attributes.

Firstly, confirm that the class works fine:

          tiger = Fauna('tiger', 'mammals') tiger.clarification() crocodile = Brute('crocodile', 'reptiles') crocodile.description()                  

And here you tin see the output of the lambda function:

          $ python3 exclude_class_attribute.py The tiger belongs to mammals  The crocodile belongs to reptiles        

You lot know that the pickle module cannot serialize a lambda function. And here is what happens when nosotros serialize our two objects created from the custom class.

          Traceback (nearly recent call terminal):   File "multiple_objects.py", line sixteen, in <module>     pickle.dump([tiger, crocodile], f) AttributeError: Can't pickle local object 'Fauna.__init__.<locals>.<lambda>'                  

This is caused past the lambda aspect inside our ii objects.

Exclude Python Form Attribute from Pickling

Is there a manner to exclude the lambda attribute from the serialization procedure of our custom object?

Aye, to do that we tin use the class __getstate__() method.

Python Pickle __getstate__

To sympathise what the __getstate__ method does permit's start by looking at the content of __dict__ for one of our class instances.

          tiger = Animal('tiger', 'mammals') print(tiger.__dict__)  [output] {'name': 'tiger', 'grouping': 'mammals', 'clarification': <part Animal.__init__.<locals>.<lambda> at 0x7fbc9028ca60>}                  

To be able to serialize this object using pickle we want to exclude the lambda attribute from the serialization process.

In order to avoid serializing the lambda attribute using __getstate__() we volition first copy the state of our object from self.__dict__ and and then remove the attribute that cannot be pickled.

          course Animal:     def __init__(self, proper name, group):         self.proper name = proper name         cocky.group = grouping         self.description = lambda: print("The {} is a {}".format(self.name, self.group))      def __getstate__(cocky):         state = cocky.__dict__.copy()         del state['description']         render state                  

Notation: nosotros are using the dict.copy() method to make certain nosotros don't modify the original state of the object.

Let's see if we can pickle this object now…

          tiger = Brute('tiger', 'mammals') pickled_tiger = pickle.dumps(tiger)        

Before standing confirm that no exception is raised by the Python interpreter when pickling the object.

At present, unpickle the data and verify the value of __dict__.

          unpickled_tiger = pickle.loads(pickled_tiger) print(unpickled_tiger.__dict__)  [output] {'name': 'tiger', 'group': 'mammals'}                  

It worked! And the unpickled object doesn't comprise the lambda attribute anymore.

Restore the Original Structure of a Python Object Using Pickle

We have seen how to exclude from the serialization process of a Python object one attribute for which pickling is not supported.

But, what if we want to preserve the original structure of an object as part of pickling / unpickling?

How can we become our lambda attribute back afterwards unpickling the bytes representation of our object?

We can apply the __setstate__ method that as explained in the official documentation it's called with the unpickled land as part of the unpickling procedure.

Python Pickle __setstate__

Update our course to implement the __setstate__() method. This method will restore the instance attributes and then add the lambda attribute that wasn't role of the pickled object.

          class Brute:     def __init__(self, name, group):         self.name = name         self.group = grouping         cocky.description = lambda: print("The {} is a {}".format(self.name, cocky.group))      def __getstate__(self):         state = self.__dict__.copy()         del state['description']         return land      def __setstate__(self, state):         self.__dict__.update(state)         cocky.clarification = lambda: print("The {} is a {}".format(self.name, self.group))                  

Let's pickle and unpickle an object to confirm that nosotros get back the lambda attribute.

          tiger = Fauna('tiger', 'mammals') pickled_tiger = pickle.dumps(tiger)   unpickled_tiger = pickle.loads(pickled_tiger) print(unpickled_tiger.__dict__)  [output] {'name': 'tiger', 'group': 'mammals', 'description': <function Animal.__setstate__.<locals>.<lambda> at 0x7f9380253e50>}                  

All good, the unpickled object also contains the lambda attribute.

Pickling and Unpickling Between Python 2 and Python 3

I want to discover out if at that place are any limitations when it comes to pickling data with a version of Python and unpickling it with a different version of Python.

Is there backward compatibility with the pickle module between Python 2 and three?

In this test I volition use Python 3.8.v to serialize a listing of tuples and Python 2.seven.sixteen to deserialize it.

          Python 3.8.5 (default, Sep  four 2020, 02:22:02)  [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more data. >>> import pickle >>> animals = [('tiger', 'mammals'), ('crocodile', 'reptiles')] >>> with open('data.pickle', 'wb') equally f: ...     pickle.dump(animals, f) ... >>> exit()                  

Exit from the Python trounce to ostend that the file data.pickle has been created.

          $ ls -al information.pickle  -rw-r--r--  i myuser  mygroup  61  3 May 12:01 data.pickle                  

Now use Python 2 to unpickle the data:

          Python 2.7.16 (default, Dec 21 2020, 23:00:36)  [GCC Apple LLVM 12.0.0 (clang-1200.0.30.iv) [+internal-os, ptrauth-isa=sign+stri on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pickle >>> with open('data.pickle', 'rb') as f: ...     data = pickle.load(f) ...  Traceback (most recent call concluding):   File "<stdin>", line two, in <module>   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1384, in load     return Unpickler(file).load()   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 864, in load      dispatch[key](self)   File "/Arrangement/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 892, in load_proto     raise ValueError, "unsupported pickle protocol: %d" % proto ValueError: unsupported pickle protocol: 4                  

It didn't work, the Python interpreter throws a ValueError exception complaining about the pickle protocol being unsupported.

Let'south find out why and to what protocol the interpreter is referring to…

Default Protocol for Python Pickle

According to the documentation of the Pickle module a default protocol version is used for pickling by your Python interpreter.

The DEFAULT_PROTOCOL value depends on the version of Python you use…

…ok, nosotros are getting somewhere…

Default protocol for Python Pickle module

It looks the default protocol for Python 3.8 is 4, this matches the error we accept seen considering that the Python 2 interpreter is lament with the error "unsupported pickle protocol: 4".

Using the Python shell nosotros can confirm the value of the pickle DEFAULT_PROTOCOL for our Python 3 interpreter.

          Python three.eight.5 (default, Sep  4 2020, 02:22:02)  [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more data. >>> import pickle >>> print(pickle.DEFAULT_PROTOCOL) 4                  

I wonder if I tin can use the Python 3.8.five interpreter to generate pickled data and specify a protocol version supported by Python 2.7.16.

Protocol version 3 was added in Python three.0 and protocol version 2 was implemented in Python 2.3.

So we should be able to employ version 2 when pickling our list of tuples…

We can pass the protocol as third argument of the pickle dump() function as you tin can see below:

Dump protocol for Python Pickle

Let's try it…

          >>> import pickle >>> animals = [('tiger', 'mammals'), ('crocodile', 'reptiles')] >>> with open('information.pickle', 'wb') as f: ...     pickle.dump(animals, f, 2) ...  >>>                  

And now permit'south unpickle information technology with Python two:

          Python 2.7.16 (default, December 21 2020, 23:00:36)  [GCC Apple LLVM 12.0.0 (clang-1200.0.30.4) [+internal-os, ptrauth-isa=sign+stri on darwin Blazon "help", "copyright", "credits" or "license" for more information. >>> import pickle >>> with open('data.pickle', 'rb') equally f: ...     data = pickle.load(f) ...  >>> print(data) [(u'tiger', u'mammals'), (u'crocodile', u'reptiles')]                  

Information technology worked!

So, at present you know how to save data with pickle if you need information technology to be exchanged betwixt applications that apply different versions of Python.

Yous can get the highest protocol available for the pickle module used by your Python interpreter by looking at the value of                      pickle.HIGHEST_PROTOCOL          . You tin pass this value to the functions dump() and dumps().

Compression for Data Generated with Python Pickle

If yous take a huge amount of information to relieve using pickle, yous tin can reduce the size of your information by applying bzip2 pinch to it. To practice that you can use the Python bz2 module.

The bz2 module provides the form bz2.BZ2File that allows to open a file compressed with bzip2 in binary mode.

Here is how nosotros can use it with a list of tuples and together with pickle:

          >>> import pickle >>> import bz2 >>> animals = [('tiger', 'mammals'), ('crocodile', 'reptiles')] >>> with bz2.BZ2File('data.pickle.compressed', 'due west') as f: ...     pickle.dump(animals, f) ...  >>>        

Nosotros can apply the built-in Python type() function to confirm the blazon of our file object.

          >>> type(f) <class 'bz2.BZ2File'>                  

And now permit's unpickle the compressed data…

          >>> with bz2.BZ2File('data.pickle.compressed', 'r') as f: ...     print(pickle.load(f)) ...  [('tiger', 'mammals'), ('crocodile', 'reptiles')]                  

Dainty i 🙂

Python Pickle and Pandas DataFrames

Let'south observe out if we tin apply the pickle module to serialize and deserialize a Pandas dataframe.

Commencement of all create a new dataframe:

          >>> import pandas equally pd >>> df = pd.DataFrame({"Animals": ["Tiger", "Crocodile"], "Group": ["Mammals", "Reptiles"]}) >>> impress(df)      Animals     Group 0      Tiger   Mammals one  Crocodile  Reptiles                  

Can nosotros serialize this object?

          >>> import pickle >>> pickled_dataframe = pickle.dumps(df)                  

Aye, we tin!

Permit's come across if we get dorsum the original dataframe using the pickle loads() function.

          >>> unpickled_dataframe = pickle.loads(pickled_dataframe) >>> impress(unpickled_dataframe)      Animals     Grouping 0      Tiger   Mammals 1  Crocodile  Reptiles                  

Aye, we do!

The Pandas library besides provides its own functions to pickle and unpickle a dataframe.

You tin can utilise the function to_pickle() to serialize the dataframe to a file:

          >>> df.to_pickle('./dataframe.pickle')                  

This is the file that contains the pickled dataframe:

          $ ls -al dataframe.pickle -rw-r--r--  1 myuser  mygroup  706  three May 14:42 dataframe.pickle                  

To get the dataframe back you lot tin can use the read_pickle() function.

          >>> import pandas as pd >>> unpickled_dataframe = pd.read_pickle('./dataframe.pickle') >>> print(unpickled_dataframe)      Animals     Group 0      Tiger   Mammals 1  Crocodile  Reptiles                  

Exactly what nosotros were expecting.

Python Pickle Security

Everything we have seen and then far about the pickle module is groovy but at the same time the Pickle module is not secure.

It'due south important to only unpickle information that you trust. Data for which you definitely know the source.

Why?

The Pickle deserialization process is insecure.

Pickled data tin be constructed in such a way to execute capricious code when it gets unpickled.

Pickled data tin act equally an exploit by using the __setstate__() method we have seen in i of the previous sections to add together an attribute to our deserialized object.

Here is a basic form that explains how this would work:

          import pickle, os   class InsecurePickle:     def __init__(cocky, name):         self.proper name = name      def __getstate__(self):         return self.__dict__      def __setstate__(cocky, state):         os.system('echo Executing malicious command')        

As yous can see in the implementation of the __setstate__ method we can call any capricious command that tin harm the system that unpickles the data.

Allow'due south see what happens when we pickle and unpickle this object…

          insecure1 = InsecurePickle('insecure1') pickled_insecure1 = pickle.dumps(insecure1) unpickled_insecure1 = pickle.loads(pickled_insecure1)        

Hither is the output of this code:

          $ python3 pickle_security.py Executing malicious command        

For example, you could use the bone.arrangement phone call to create a reverse shell and proceeds access to the target arrangement.

Protecting Pickled Data with HMAC

One of the ways to protect pickled data from tampering is to accept a secure connection betwixt the 2 parties exchanging pickled information.

Information technology'south also possible to increase security of data shared betwixt multiple systems by using a cryptographic signature.

The idea backside it is that:

  1. Pickled information is signed earlier being stored on the filesystem or before existence transmitted to another political party.
  2. Its signature can and so be verified before the data is unpickled.

This process can assistance understand if pickled data has been tampered with and hence information technology might exist unsafe to read.

We volition apply cryptographic signature to the Pandas dataframe defined before using the Python hmac module:

          >>> import pandas as pd >>> import pickle >>> df = pd.DataFrame({"Animals": ["Tiger", "Crocodile"], "Group": ["Mammals", "Reptiles"]}) >>> pickled_dataframe = pickle.dumps(df)                  

Presume that sender and receiver share the post-obit clandestine cardinal:

          secret_key = '25345-abc456'        

The sender generates a digest for the data using the hmac.new() function.

          >>> import hmac, hashlib >>> digest =  hmac.new(secret_key.encode(), pickled_dataframe, hashlib.sha256).hexdigest() >>> impress(digest) 022396764cea8a60a492b391798e4155daedd99d794d15a4d574caa182bab6ba                  

The receiver knows the secret central and it can calculate the digest to ostend if its value is the same equally the value received with the pickled data.

If the ii digest values are the same the receiver knows that the pickled data has not been tampered with and it'due south safe to read.

Conclusion

If you didn't get the hazard to use the pickle module earlier going through this tutorial, now yous should have a pretty good idea of how pickle works.

We have seen how to use pickle to serialize lists, dictionaries, nested dictionaries, listing of tuples, custom classes and Pandas dataframes.

Y'all take also learned how to exclude certain attributes that are not supported past pickle from the serialization procedure.

Finally we take covered security issues that can occur when exchanging data serialized with pickle.

At present it'southward your turn…

…how are you planning to utilise the pickle module in your awarding?

Related posts:

Share noesis with your friends!

wilsondidich1989.blogspot.com

Source: https://codefather.tech/blog/python-pickle/

Posting Komentar untuk "Reading Multiple Objects From a Pickle File Python"