浅谈Python使用pickle模块序列化数据优化代码的方法
作者:小斌哥ge
pickle模块序列化数据
pickle是Python标准库中的一个二进制序列化和反序列化库。
可以以二进制的形式将数据持久化保存到磁盘文件中。可以将数据和代码分离,提高代码可读性和优雅度。
一、pickle模块介绍
pickle模块可以对多种Python对象进行序列化和反序列化,序列化称为pickling,反序列化称为unpickling。
序列化是将Python对象转化为二进制数据,可以配合文件操作将序列化结果保存到文件中(也可以配合数据库操作保存到数据库中)。
反序列化则是将二进制数据还原回Python对象,先从文件中(或数据库中)读取出保存的二进制数据。
pickle模块常用的方法如下:
- dump(obj, file): 将Python对象序列化,并将序列化结果写入到打开的文件中。
- load(file): 从打开的文件中读取出保存的数据,将数据反序列化成Python对象。
- dumps(obj): 将Python对象序列化,并直接返回序列化的二进制数据(类型为bytes),而不写入文件。
- loads(data): 将字节码数据反序列化成Python对象,传入的数据必须为二进制数据(bytes-like object)。
dump()和load()是互逆的方法,dumps()和loads()是互逆的方法,使用哪一对方法取决于是否要读写文件。
二、pickle可以序列化哪些Python对象
pickle与json相比,json数据有严格的格式要求,只能保存满足格式要求的数据,而pickle几乎可以序列化Python中的所有数据对象。
pickle可以序列化的Python对象如下:
- None、
True
和False
- 整数、浮点数、复数
- str、byte、bytearray
- 只包含可序列化对象的集合,包括tuple、list、set和dict
- 定义在模块最外层的函数(使用def定义,lambda函数不可以)
- 定义在模块最外层的内置函数
- 定义在模块最外层的类
- 某些类实例
三、案例分享
之前写过一篇使用matplotlib绘制柱状图的文章,参考:使用Python的matplotlib库绘制柱状图。
文章里有一个56行的字典,本文利用pickle模块来将字典序列化写入文件中,绘图时从文件中读取数据并反序列化,实现数据与代码的分离。
1. 将数据序列化保存
创建一个Python脚本pickling.py,先将56行的字典序列化保存。
# coding=utf-8 import pickle data = { "DWG-DRX1": [[(3, 2, 4), (2, 0, 4), (1, 0, 1), (3, 1, 4), (0, 0, 4)], [(2, 3, 1), (0, 2, 1), (1, 0, 0), (0, 2, 1), (0, 2, 2)]], "DWG-DRX2": [[(1, 2, 8), (6, 1, 5), (2, 1, 8), (3, 1, 7), (0, 2, 7)], [(3, 3, 1), (0, 2, 5), (1, 3, 4), (2, 2, 4), (1, 2, 4)]], "DWG-DRX3": [[(2, 2, 10), (7, 0, 6), (5, 0, 8), (3, 1, 6), (4, 4, 4)], [(3, 4, 0), (2, 6, 2), (1, 3, 0), (1, 3, 3), (0, 5, 3)]], "SN-JDG1": [[(4, 2, 9), (3, 1, 9), (5, 1, 11), (7, 3, 10), (1, 6, 7)], [(3, 5, 8), (1, 5, 7), (2, 5, 7), (7, 2, 6), (0, 3, 10)]], "SN-JDG2": [[(7, 2, 12), (7, 2, 14), (2, 0, 16), (9, 0, 12), (1, 4, 13)], [(2, 6, 2), (2, 6, 4), (0, 4, 7), (4, 4, 1), (0, 6, 7)]], "SN-JDG3": [[(5, 1, 5), (5, 1, 9), (3, 1, 8), (3, 1, 7), (1, 3, 11)], [(0, 4, 2), (1, 2, 4), (0, 4, 3), (3, 1, 4), (3, 6, 3)]], "SN-JDG4": [[(2, 2, 4), (3, 2, 5), (1, 0, 10), (7, 1, 5), (0, 2, 12)], [(2, 3, 1), (2, 3, 3), (1, 3, 4), (0, 2, 6), (2, 2, 3)]], "TES-FNC1": [[(2, 3, 8), (4, 2, 6), (2, 0, 8), (6, 0, 8), (1, 0, 10)], [(0, 3, 3), (1, 3, 3), (4, 0, 0), (0, 6, 2), (0, 3, 3)]], "TES-FNC2": [[(0, 2, 10), (8, 1, 4), (4, 0, 6), (4, 1, 5), (1, 2, 13)], [(3, 2, 3), (1, 4, 5), (1, 2, 3), (0, 2, 6), (1, 7, 1)]], "TES-FNC3": [[(3, 1, 4), (3, 1, 9), (3, 1, 7), (7, 1, 2), (0, 2, 12)], [(0, 4, 3), (2, 6, 4), (2, 3, 2), (2, 0, 4), (0, 3, 3)]], "TES-FNC4": [[(1, 2, 7), (10, 1, 7), (6, 2, 5), (0, 4, 16), (1, 4, 12)], [(2, 3, 3), (3, 1, 5), (1, 4, 8), (4, 3, 5), (3, 7, 5)]], "TES-FNC5": [[(1, 2, 1), (4, 1, 6), (4, 0, 6), (4, 1, 5), (0, 1, 6)], [(2, 2, 1), (2, 3, 1), (0, 4, 1), (0, 1, 2), (0, 3, 2)]], "G2-GEN1": [[(4, 0, 7), (2, 2, 11), (4, 1, 11), (6, 1, 6), (3, 0, 10)], [(0, 5, 2), (3, 4, 1), (1, 3, 2), (0, 4, 1), (0, 3, 2)]], "G2-GEN2": [[(3, 3, 14), (4, 3, 12), (11, 0, 11), (9, 2, 13), (1, 3, 15)], [(3, 8, 1), (2, 5, 3), (2, 6, 5), (4, 4, 2), (0, 5, 7)]], "G2-GEN3": [[(2, 5, 11), (7, 2, 10), (6, 3, 13), (7, 3, 11), (1, 1, 18)], [(4, 5, 8), (2, 6, 7), (5, 4, 6), (3, 2, 6), (0, 6, 7)]], "DWG-G21": [[(4, 0, 12), (7, 2, 9), (4, 2, 11), (6, 0, 9), (1, 2, 8)], [(1, 5, 1), (3, 5, 2), (2, 5, 3), (0, 2, 3), (0, 5, 4)]], "DWG-G22": [[(4, 2, 7), (5, 1, 9), (6, 2, 11), (7, 3, 9), (3, 1, 11)], [(0, 7, 1), (0, 4, 4), (4, 4, 2), (3, 4, 1), (1, 6, 2)]], "DWG-G23": [[(3, 1, 9), (6, 2, 5), (5, 2, 6), (8, 2, 7), (0, 3, 13)], [(1, 3, 3), (3, 3, 4), (1, 4, 3), (2, 3, 3), (3, 9, 4)]], "DWG-G24": [[(5, 0, 3), (2, 0, 7), (2, 0, 10), (2, 1, 3), (4, 1, 4)], [(0, 5, 1), (1, 3, 0), (0, 3, 1), (1, 2, 1), (0, 2, 1)]], "SN-TES1": [[(5, 1, 5), (3, 1, 6), (1, 0, 4), (2, 3, 3), (0, 2, 3)], [(2, 4, 0), (0, 1, 4), (1, 2, 2), (4, 2, 0), (0, 2, 4)]], "SN-TES2": [[(5, 1, 4), (1, 2, 5), (3, 1, 7), (3, 3, 4), (0, 0, 7)], [(2, 1, 2), (1, 3, 5), (2, 5, 4), (2, 2, 0), (0, 1, 5)]], "SN-TES3": [[(3, 0, 7), (2, 2, 4), (2, 1, 4), (5, 2, 4), (1, 2, 7)], [(0, 3, 3), (2, 3, 3), (3, 1, 1), (0, 4, 4), (2, 2, 2)]], "SN-TES4": [[(5, 2, 4), (1, 3, 16), (8, 1, 8), (6, 4, 9), (1, 8, 13)], [(1, 2, 10), (9, 5, 4), (1, 4, 9), (5, 6, 10), (2, 4, 12)]], "DWG-SN1": [[(2, 2, 11), (5, 3, 9), (8, 1, 11), (4, 2, 12), (2, 4, 7)], [(1, 5, 5), (5, 4, 4), (3, 3, 2), (2, 3, 3), (1, 6, 3)]], "DWG-SN2": [[(10, 1, 4), (2, 1, 10), (3, 3, 11), (3, 3, 10), (2, 4, 7)], [(0, 4, 8), (5, 4, 2), (5, 6, 2), (2, 3, 5), (0, 3, 9)]], "DWG-SN3": [[(3, 3, 10), (5, 2, 8), (3, 3, 3), (5, 1, 6), (0, 2, 8)], [(3, 6, 5), (1, 2, 2), (4, 3, 2), (2, 3, 3), (1, 2, 6)]], "DWG-SN4": [[(2, 0, 12), (8, 0, 7), (1, 3, 5), (9, 1, 5), (4, 3, 4)], [(2, 9, 1), (1, 5, 2), (2, 2, 0), (2, 4, 2), (0, 4, 3)]], } with open('S10.pkl', 'wb') as pkl_file: pickle.dump(data, pkl_file)
序列化只需要两行代码,打开一个文件对象,使用dump()方法将字典序列化保存到了S10.pkl文件中,文件默认在代码的同级目录下,也可以指定绝对路径。注意,打开文件对象时使用wb模式。
S10.pkl的后缀名可以自定义(后缀名不会改变文件保存的格式),不过因为是用pickle模块序列化的数据,通常都以.pkl作为后缀,方便识别。
直接用IDE打开文件S10.pkl,显示的内容是乱码的,因为保存的内容是二进制数据。
2. 读取数据并反序列化
# coding=utf-8 import matplotlib.pyplot as plt from matplotlib import ticker from numpy import mean import pickle with open('S10.pkl', 'rb') as pkl_file: data = pickle.load(pkl_file) location = ["上单", "打野", "中单", "下路", "辅助"] win_loc_kill, win_loc_die, win_loc_assists = [[list() for _ in range(5)] for _ in range(3)] lose_loc_kill, lose_loc_die, lose_loc_assists = [[list() for _ in range(5)] for _ in range(3)] for i in range(5): win_loc_kill[i] = [value[0][i][0] for value in data.values()] win_loc_die[i] = [value[0][i][1] for value in data.values()] win_loc_assists[i] = [value[0][i][2] for value in data.values()] lose_loc_kill[i] = [value[1][i][0] for value in data.values()] lose_loc_die[i] = [value[1][i][1] for value in data.values()] lose_loc_assists[i] = [value[1][i][2] for value in data.values()] # noinspection PyTypeChecker win_avg_kill = [round(mean(kill), 2) for kill in win_loc_kill] # noinspection PyTypeChecker win_avg_die = [round(mean(die), 2) for die in win_loc_die] # noinspection PyTypeChecker win_avg_assists = [round(mean(assists), 2) for assists in win_loc_assists] # noinspection PyTypeChecker lose_avg_kill = [round(mean(kill), 2) for kill in lose_loc_kill] # noinspection PyTypeChecker lose_avg_die = [round(mean(die), 2) for die in lose_loc_die] # noinspection PyTypeChecker lose_avg_assists = [round(mean(assists), 2) for assists in lose_loc_assists] fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(20, 16), dpi=100) x = range(len(location)) axs[0].bar([i-0.2 for i in x], win_avg_kill, width=0.2, color='b') axs[0].bar(x, win_avg_die, width=0.2, color='r') axs[0].bar([i+0.2 for i in x], win_avg_assists, width=0.2, color='g') axs[1].bar([i-0.2 for i in x], lose_avg_kill, width=0.2, color='b') axs[1].bar(x, lose_avg_die, width=0.2, color='r') axs[1].bar([i+0.2 for i in x], lose_avg_assists, width=0.2, color='g') for a, b in zip(x, win_avg_kill): axs[0].text(a-0.2, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for a, b in zip(x, win_avg_die): axs[0].text(a, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for a, b in zip(x, win_avg_assists): axs[0].text(a+0.2, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for a, b in zip(x, lose_avg_kill): axs[1].text(a-0.2, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for a, b in zip(x, lose_avg_die): axs[1].text(a, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for a, b in zip(x, lose_avg_assists): axs[1].text(a+0.2, b+0.1, '%.02f' % b, ha='center', va='bottom', fontsize=14) for i in range(2): axs[i].xaxis.set_major_locator(ticker.FixedLocator(x)) axs[i].xaxis.set_major_formatter(ticker.FixedFormatter(location)) axs[i].set_yticks(range(0, 11, 2)) axs[i].grid(linestyle="--", alpha=0.5) axs[i].legend(['击杀', '死亡', '助攻'], loc='upper left', fontsize=16, markerscale=0.5) axs[i].set_xlabel("位置", fontsize=18) axs[i].set_ylabel("场均数据", fontsize=18, rotation=0) axs[0].set_title("S10总决赛胜方各位置场均数据", fontsize=18) axs[1].set_title("S10总决赛负方各位置场均数据", fontsize=18) plt.show()
反序列化代码也只有两行,打开文件S10.pkl,然后使用load()方法对文件对象反序列化,返回数据。打开文件对象时使用rb模式。
运行代码,绘图功能正常。
经过pickle模块的序列化和反序列化,将数据持久化到了文件S10.pkl中。实现了数据与代码的分离,避免了直接在代码中写一个很长的字典数据,代码更加优雅。
在上面的例子中,对一个56行的数据进行序列化,已经有不错的效果了。在实际的项目中,数据更大,将数据放到代码中会占很大的篇幅,进行序列化处理的优化效果会更明显。
如果有多个脚本使用同一份数据,可以直接读取同一个序列化数据文件,避免了在不同脚本中粘贴同一份数据。
到此这篇关于浅谈Python使用pickle模块序列化数据优化代码的方法的文章就介绍到这了,更多相关Python的pickle模块序列化数据内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!