Python字节码解混淆-安全KER

前言

上次打NISCCTF2019留下来的一道题，关于pyc文件逆向，接着这道题把Python Bytecode解混淆相关的知识和工具全部过一遍。同时在已有的基础上进一步创新得到自己的成果，这是上篇，做基础铺垫和已有工具梳理。

pyc文件结构

首先pyc文件是python源码进行编译之后得到的字节码文件。虽然Python是解释型语言，但并非直接解释源码，而是先编译到字节码然后解释执行字节码。Python2和Python3字节码有区别不通用，同时目前以及可遇见范围内的事实标准都是CPython实现。

Pyc文件由3部分组成：

最开始4个字节是标识此pyc的版本的Magic Number, 具体对应关系在Python/import.c内定义。

/* Magic word to reject .pyc files generated by other Python versions.
 It should change for each incompatible change to the bytecode.

 The value of CR and LF is incorporated so if you ever read or write
 a .pyc file in text mode the magic number will be wrong; also, the
 Apple MPW compiler swaps their values, botching string constants.

 The magic numbers must be spaced apart atleast 2 values, as the
 -U interpeter flag will cause MAGIC+1 being used. They have been
 odd numbers for some time now.

 There were a variety of old schemes for setting the magic number.
 The current working scheme is to increment the previous value by
 10.

 Known values:
     Python 1.5:   20121
     Python 1.5.1: 20121
     Python 1.5.2: 20121
     Python 1.6:   50428
     Python 2.0:   50823
     Python 2.0.1: 50823
     Python 2.1:   60202
     Python 2.1.1: 60202
     Python 2.1.2: 60202
     Python 2.2:   60717
     Python 2.3a0: 62011
     Python 2.3a0: 62021
     Python 2.3a0: 62011 (!)
     Python 2.4a0: 62041
     Python 2.4a3: 62051
     Python 2.4b1: 62061
     Python 2.5a0: 62071
     Python 2.5a0: 62081 (ast-branch)
     Python 2.5a0: 62091 (with)
     Python 2.5a0: 62092 (changed WITH_CLEANUP opcode)
     Python 2.5b3: 62101 (fix wrong code: for x, in ...)
     Python 2.5b3: 62111 (fix wrong code: x += yield)
     Python 2.5c1: 62121 (fix wrong lnotab with for loops and
                          storing constants that should have been removed)
     Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp)
     Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode)
     Python 2.6a1: 62161 (WITH_CLEANUP optimization)
     Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND)
     Python 2.7a0: 62181 (optimize conditional branches:
              introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
     Python 2.7a0  62191 (introduce SETUP_WITH)
     Python 2.7a0  62201 (introduce BUILD_SET)
     Python 2.7a0  62211 (introduce MAP_ADD and SET_ADD)
.
*/
#define MAGIC (62211 | ((long)'r'<<16) | ((long)'n'<<24))

接下来四个字节还是pyc产生的时间(TIMESTAMP, 1970.01.01到产生pyc时候的秒数)
接下来是序列化了的PyCodeObject,作为整体的字节码对象存在，命名空间为<module>，是pyc文件加载的时候最先执行的字节吗空间。其余的函数在该对象上组织并初始化。

PyCodeObject

可以利用marshal库来反序列化pyc文件里的PyCodeObject看一下有的字段。

>>> import marshal
>>> f = open('py交易.pyc','rb')
>>> f.read(8)
'x03xf3rnLiT\'
>>> code = marshal.load(f)
>>> for x in list(filter(lambda x: x[:2] != '__',dir(code))):
...     x
... 
'co_argcount'      # code需要的位置参数个数,不包括变长参数(*args 和 **kwargs)
'co_cellvars'      # code 所用到的 cellvar 的变量名,tuple 类型, 元素是 PyStringObject('s/t/R')
'co_code'          # PyStringObject('s'), code对应的字节码
'co_consts'        # 所有常量组成的 tuple
'co_filename'      # PyStringObject('s'), 此 code 对应的 py 文件名
'co_firstlineno'   # 此 code 对应的 py 文件里的第一行的行号
'co_flags'         # 一些标识位,也在 code.h 里定义,注释很清楚,比如 CO_NOFREE(64) 表示此 PyCodeObject 内无 freevars 和 cellvars 等
'co_freevars'      # code 所用到的 freevar 的变量名,tuple 类型, 元素是 PyStringObject('s/t/R')
'co_lnotab'        # PyStringObject('s'),指令与行号的对应表
'co_name'          # 此 code 的名称
'co_names'         # code 所用的到符号表, tuple 类型,元素是字符串
'co_nlocals'       # code内所有的局部变量的个数,包括所有参数
'co_stacksize'     # code段运行时所需要的最大栈深度
'co_varnames'      # code 所用到的局部变量名, tuple 类型, 元素是 PyStringObject('s/t/R')

其中最关键的例如:

co_name 这个PyCodeObject的名称，如<module>，str2hex等
co_names 这个PyCodeObject用到的符号(函数，变量）表，如('sys', 'str2hex', 'hex2str', 'p_s', 'p_f', 'count', 'stdout', 'write', 'stdin', 'read', 'flag')
co_varnames 这个PyCodeObject用到的局部变量名表，如('DIVIDER',)
co_code 这个PyCodeObject所对应的实际字节码内容
co_consts 这个PyCodeObject之上所有的常量列表，这个很重要，存储了所用得到的所有函数的PyCodeObject，形成了嵌套关系。

比如：

>>> code.co_consts
(-1, None, <code object str2hex at 0x7f620dc613b0, file "enc.py", line 3>, <code object hex2str at 0x7f620dc61730, file "enc.py", line 9>, <code object p_s at 0x7f620dc61ab0, file "enc.py", line 14>, <code object p_f at 0x7f620dc2f230, file "enc.py", line 17>, <code object count at 0x7f620dc2f7b0, file "enc.py", line 20>, 102, 108, 97, 103, 58, 38, 4130330538L, 1627830889, 3168701571L, 4084147187L, 3521152606L, 651787064, 1860581437, 2730391645L, 2694209818L, 3715947653L, 3816944324L, 394367122, None)

所以结构是PyCodeObject.co_consts里面包含了所用到的函数的PyCodeObject，逐层嵌套构造得到整个字节码对象。

co_code

用marshal得到的co_code里面的字节码以str存储，一般来说可以用dis库进行反汇编：

>>> def add(a,b):
...     return a+b
...
>>> add.__code__.co_code
'|x00x00|x01x00x17S'
>>> import dis
>>> dis.dis(add.__code__.co_code)
          0 LOAD_FAST           0 (0)
          3 LOAD_FAST           1 (1)
          6 BINARY_ADD
          7 RETURN_VALUE

dis在反汇编的时候会对引用到的参数会进行解析，虽然方便，但也带来了问题。具体的字节码功能可以在官方文档找到。

但在这道题的情况下就不适用了:

>>> dis.dis(code.co_code)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dis.py", line 45, in dis
    disassemble_string(x)
  File "/usr/lib/python2.7/dis.py", line 112, in disassemble_string
    labels = findlabels(code)
  File "/usr/lib/python2.7/dis.py", line 166, in findlabels
    oparg = ord(code[i]) + ord(code[i+1])*256
IndexError: string index out of range

阅读错误提示和源码可以发现是在解析指令参数的过程中有问题。首先所有的指令可以分为两类，不需要参数和需要参数的，Python字节码在设计的时候故意把没有参数的指令分配在了对应编号的低位，高位都是有参数的，以Include/opcode.h中的HAVE_ARGUMENT分界。他们的在二进制级别上的组织是这样的：

[指令] 不需要参数的指令只占用一个字节
[指令] [参数低字节] [参数高字节] 需要参数的指令占用三个字节，一个字节指令，两个字节参数

那么按照这个格式来看一下让dis崩溃的字节码:

>>> list(map(ord,code.co_code[:9]))
[113, 158, 2, 136, 104, 110, 126, 58, 140]
>>> dis.opname[113]
'JUMP_ABSOLUTE'
>>> 2*256+158
670
>>> dis.opname[136]
'LOAD_DEREF'
>>> 110*256+104
28264

从上面可以看到，第一条指令是JUMP_ABSOLTE 670，这个offset的指令是真实存在的，所以指令合法。但是第二条指令应该是LOAD_DEREF 28264，这个index的对象并不存在，在dis尝试解析的时候就会崩溃。

实际上因为之前的跳转指令所以第二条的非法指令并不会被真实执行到，所以pyc文件作者是故意加入不影响执行的非法指令触发分析软件崩溃，阻碍对该pyc文件的分析。

绕过恶意指令的阻碍

修改dis模块

既然是通过利用引用解析过程完成分析崩溃，那就直接停用引用解析好了。于是手动修改dis模块忽略错误，尽可能解析pyc文件。

举个例子，原版：

         print opname[op].ljust(20),
         i = i+1
         if op >= HAVE_ARGUMENT:
             oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
             extended_arg = 0
             i = i+2
             if op == EXTENDED_ARG:
                 extended_arg = oparg*65536L
             print repr(oparg).rjust(5),

修改后加入try和except过滤掉异常。

         print opname[op].ljust(20),
         i = i+1
         if op >= HAVE_ARGUMENT:
             try:
                 oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
             except:
                 pass
             extended_arg = 0
             i = i+2
             if op == EXTENDED_ARG:
                 extended_arg = oparg*65536L
             print repr(oparg).rjust(5),

可以得到完整的修改patch文件，之后就能够正常的反编译看字节码了。

>>> diss.dis(code.co_code)
          0 JUMP_ABSOLUTE     670
          3 LOAD_DEREF      28264
          6 DELETE_FAST     35898 (35898)
          9 JUMP_ABSOLUTE   23887
    >>   12 LOAD_FAST           0 (0)
         15 COMPARE_OP          2 (==)
         18 POP_JUMP_IF_TRUE    39

         ...

    >>  691 LOAD_CONST         15 (15)
        694 STORE_FAST          0 (0)
        697 JUMP_ABSOLUTE     121
        700 JUMP_ABSOLUTE     188
    >>  703 JUMP_ABSOLUTE     511
    >>  706 JUMP_ABSOLUTE     511
        709 RETURN_VALUE   
        710 STORE_SLICE+2  
        711 BINARY_TRUE_DIVIDE
        712 LOAD_CLOSURE

活跃代码分析

虽然通过patch了dis的代码绕过了恶意指令，比如pycdc等工具依旧不能正常打开这个恶意pyc文件。故欲将恶意指令全部nop掉方便分析。

思路是这样的：从co_code开始模拟逻辑执行，需要分支就把两个分支全部执行到。在过程中记录访问到过的offset，最后把除了收集到的offset全部nop掉。

import marshal, sys, opcode, types, dis

NOP = 9
HAVE_ARGUMENT = 90
JUMP_FORWARD = 110
RETURN_VALUE = 83

used_set = set()

def deconf_inner(code, now):
    global used_set

    while code[now] != RETURN_VALUE:
        if now in used_set:
            break
        used_set.add(now)
        if code[now] >= HAVE_ARGUMENT:
            used_set.add(now+1)
            used_set.add(now+2)
        op = code[now]

        if op == JUMP_FORWARD:
            arg = code[now+2] << 8 | code[now+1]
            now += arg + 3
            continue

        ...

    used_set.add(now)
    if code[now] >= HAVE_ARGUMENT:
        used_set.add(now+1)
        used_set.add(now+2)

def deconf(code):
    global used_set

    used_set = set() #Remember to clean up used_set for every target function

    cod = list(map(ord, code))
    deconf_inner(cod, 0)

    for i in range(len(cod)):
        if i not in used_set:
            cod[i] = NOP

    return "".join(list(map(chr, cod)))

with open(sys.argv[1], 'rb') as f:
    header = f.read(8)
    code = marshal.load(f)

    ...

mode = types.CodeType(code.co_argcount,
    # c.co_kwonlyargcount,  Add this in Python3
    code.co_nlocals,
    code.co_stacksize,
    code.co_flags,
    deconf(code.co_code),
    tuple(consts),
    code.co_names,
    code.co_varnames,
    code.co_filename,
    code.co_name,
    code.co_firstlineno,
    code.co_lnotab,   # In general, You should adjust this
    code.co_freevars,
    code.co_cellvars)

f = open(sys.argv[1]+".mod", 'wb')
f.write(header)
marshal.dump(mode, f)

很简单的DFS过程，不赘述，在这里给出完整代码。以下是nop的效果:

    [Disassembly]
        0       JUMP_ABSOLUTE           670
        3       NOP
        4       NOP
        5       NOP
        6       NOP
        7       NOP
        8       NOP
        9       NOP
        10      NOP
        11      NOP
        12      LOAD_FAST               0: DIVIDER
        15      COMPARE_OP              2 (==)
        18      POP_JUMP_IF_TRUE        39
        21      LOAD_CONST              19: 1860581437
        24      LOAD_FAST               0: DIVIDER
        27      COMPARE_OP              2 (==)
        30      POP_JUMP_IF_TRUE        42
        33      LOAD_CONST              23: 0xE381F2C4L
        36      JUMP_FORWARD            425 (to 464)
        39      JUMP_FORWARD            472 (to 514)
        42      JUMP_FORWARD            382 (to 427)
        45      NOP
        46      NOP
        47      NOP
        48      NOP

        ...

调试中建议用自己修改过的CPython实现，追踪pyc的执行offset，目前pyc文件的调试工具还基本没有，在动态调试上比较被动。