Defeating Antivirus Real-time Protection From The Inside

Hello again! In this post I'd like to talk about the research I did some time ago on antivirus real-time protection mechanism and how I found effective ways to evade it. This method may even work for evading analysis in sandbox environments, but I haven't tested that yet.

The specific AV I was testing this method with was BitDefender. It performs real-time protection for every process in user-mode and detects suspicious behaviour patterns by monitoring the calls to Windows API.

Without further ado, let's jump right to it.

What is Realtime Protection?

Detecting malware by signature detection is still used, but it is not very efficient. More and more malware use polymorphism, metamorphism, encryption or code obfuscation in order to make itself extremely hard to detect using the old detection methods. Most new generation AV software implement behavioral detection analysis. They monitor every running process on the PC and look for suspicious activity patterns that may indicate the computer was infected with malware.

As an example, let's imagine a program that doesn't create any user interface (dialogs, windows etc.) and as soon as it starts, it wants to connect and download files from external server in Romania. This kind of behaviour is extremely suspicious and most AV software with real-time protection, will stop such process and flag it as dangerous even though it may have been seen for the first time.

Now you may ask - how does such protection work and how does the AV know what the monitored process is doing? In majority of cases, AV injects its own code into the running process, which then performs Windows API hooking of specific API functions that are of interest to the protection software. API hooking allows the AV to see exactly what function is called, when and with what parameters. Cuckoo Sandbox, for example, does the same thing for generating the detailed report on how the running program interacts with the operating system.

Let's take a look at how the hook would look like for CreateFileW API imported from kernel32.dll library.

This is how the function code looks like in its original form:

76B73EFC > 8BFF                         MOV EDI,EDI
76B73EFE   55                           PUSH EBP
76B73EFF   8BEC                         MOV EBP,ESP
76B73F01   51                           PUSH ECX
76B73F02   51                           PUSH ECX
76B73F03   FF75 08                      PUSH DWORD PTR SS:[EBP+8]
76B73F06   8D45 F8                      LEA EAX,DWORD PTR SS:[EBP-8]
...
76B73F41   E8 35D7FFFF                  CALL <JMP.&API-MS-Win-Core-File-L1-1-0.C>
76B73F46   C9                           LEAVE
76B73F47   C2 1C00                      RETN 1C

Now if an AV was to hook this function, it would replace the first few bytes with a JMP instruction that would redirect the execution flow to its own hook handler function. That way AV would register the execution of this API with all parameters lying on the stack at that moment. After the AV hook handler finishes, they would execute the original set of bytes, replaced by the JMP instruction and jump back to the API function for the process to continue its execution.

This is how the function code would look like with the injected JMP instruction:

Hook handler:
1D001000     < main hook handler code - logging and monitoring >
...
1D001020     8BFF                       MOV EDI,EDI              ; original code that was replaced with the JMP is executed
1D001022     55                         PUSH EBP
1D001023     8BEC                       MOV EBP,ESP
1D001025    -E9 D72EB759                JMP kernel32.76B73F01    ; jump back to CreateFileW to instruction right after the hook jump

CreateFileW:
76B73EFC >-E9 FFD048A6                  JMP handler.1D001000     ; jump to hook handler
76B73F01   51                           PUSH ECX                 ; execution returns here after hook handler has done its job
76B73F02   51                           PUSH ECX
76B73F03   FF75 08                      PUSH DWORD PTR SS:[EBP+8]
76B73F06   8D45 F8                      LEA EAX,DWORD PTR SS:[EBP-8]
...
76B73F46   C9                           LEAVE
76B73F47   C2 1C00                      RETN 1C

There are multiple ways of hooking code, but this one is the fastest and doesn't create too much bottleneck in code execution performance. Other hooking techniques involve injecting INT3 instructions or properly setting up Debug Registers and handling them with your own exception handlers that later redirect execution to hook handlers.

Now that you know how real-time protection works and how exactly it involves API hooking, I can proceed to explain the methods of bypassing it.

There are AV products on the market that perform real-time monitoring in kernel-mode (Ring0), but this is out of scope of this post and I will focus only on bypassing protections of AV products that perform monitoring in user-mode (Ring3).

The Unhooking Flashbang

As you know already, the real-time protection relies solely on API hook handlers to be executed. Only when the AV hook handler is executed, the protection software can register the call of the API, monitor the parameters and continue mapping the process activity.

It is obvious that in order to completely disable the protection, we need to remove API hooks and as a result the protection software will become blind to everything we do.

In our own application, we control the whole process memory space. AV, with its injected code, is just an intruder trying to tamper with our software's functionality, but we are the king of our land.

Steps to take should be as follows:

  1. Enumerate all loaded DLL libraries in current process.
  2. Find entry-point address of every imported API function of each DLL library.
  3. Remove the injected hook JMP instruction by replacing it with the API's original bytes.

It all seems fairly simple until the point of restoring the API function's original code, from before, when the hook JMP was injected. Getting the original bytes from hook handlers is out of question as there is no way to find out which part of the handler's code is the original API function prologue code. So, how to find the original bytes?

The answer is: Manually retrieve them by reading the respective DLL library file stored on disk. The DLL files contain all the original code.

In order to find the original first 16 bytes (which is more than enough) of CreateFileW API, the process is as follows:

  1. Read the contents of kernel32.dll file from Windows system folder into memory. I will call this module raw_module.
  2. Get the base address of the imported kernel32.dll module in our current process. I will call the imported module imported_module.
  3. Fix the relocations of the manually loaded raw_module with base address of imported_module (retrieved in step 2). This will make all fixed address memory references look the same as they would in the current imported_module (complying with ASLR).
  4. Parse the raw_module export table and find the address of CreateFileW API.
  5. Copy the original 16 bytes from the found exported API address to the address of the currently imported API where the JMP hook resides.

This will effectively overwrite the current JMP with the original bytes of any API.

If you want to read more on parsing Portable Executable files, the best tutorial was written by Iczelion (the website has a great 90's feel too!). Among many subjects, you can learn about parsing the import table and export table of PE files.

When parsing the import table, you need to keep in mind that Microsoft, with release of Windows 7, introduced a strange creature called API Set Schema. It is very important to properly parse the imports pointing to these DLLs. There is a very good explanation of this entity by Geoff Chappel in his The API Set Schema article.

Stealth Calling

The API unhooking method may fool most of the AV products that perform their behavioral analysis in user-mode. This however does not fool enough, the automated sandbox analysis tools like Cuckoo Sandbox. Cuckoo apparently is able to detect if API hooks, it put in place, were removed. That makes the previous method ineffective in the long run.

I thought of another method on how to bypass AV/sandbox monitoring. I am positive it would work, even though I have yet to put it into practice. For sure there is already malware out there implementing this technique.

First of all, I must mention that ntdll.dll library serves as the direct passage between user-mode and kernel-mode. Its exported APIs directly communicate with Windows kernel by using syscalls. Most of the other Windows libraries eventually call APIs from ntdll.dll.

Let's take a look at the code of ZwCreateFile API from ntdll.dll on Windows 7 in WOW64 mode:

77D200F4 > B8 52000000                  MOV EAX,52
77D200F9   33C9                         XOR ECX,ECX
77D200FB   8D5424 04                    LEA EDX,DWORD PTR SS:[ESP+4]
77D200FF   64:FF15 C0000000             CALL DWORD PTR FS:[C0]
77D20106   83C4 04                      ADD ESP,4
77D20109   C2 2C00                      RETN 2C

Basically what it does is pass EAX = 0x52 with stack arguments pointer in EDX to the function, stored in TIB at offset 0xC0. The call switches the CPU mode from 32-bit to 64-bit and executes the syscall in Ring0 to NtCreateFile. 0x52 is the syscall for NtCreateFile on my Windows 7 system, but the syscall numbers are different between Windows versions and even between Service Packs, so it is never a good idea to rely on these numbers. You can find more information about syscalls on Simone Margaritelli blog here.

Most protection software will hook ntdll.dll API as it is the lowest level that you can get to, right in front of the kernel's doorstep. For example if you only hook CreateFileW in kernel32.dll which eventually calls ZwCreateFile in ntdll.dll, you will never catch direct API calls to ZwCreateFile. Although a hook in ZwCreateFile API will be triggered every time CreateFileW or CreateFileA is called as they both eventually must call the lowest level API that communicates directly with the kernel.

There is always one loaded instance of any imported DLL module. That means if any AV or sandbox solution wants to hook the API of a chosen DLL, they will find such module in current process' imported modules list. Following the DLL module's export table, they will find and hook the exported API function of interest.

Now, to the interesting part. What if we copied the code snippet, I pasted above from ntdll.dll, and implemented it in our own application's code. This would be an identical copy of ntdll.dll code that will execute a 0x52 syscall that was executed in our own code section. No user-mode protection software will ever find out about it.
It is an ideal method of bypassing any API hooks without actually detecting and unhooking them!

Thing is, as I mentioned before, we cannot trust the syscall numbers as they will differ between Windows versions. What we can do though is read the whole ntdll.dll library file from disk and manually map it into current process' address space. That way we will be able to execute the code which was prepared exclusively for our version of Windows, while having an exact copy of ntdll.dll outside of AV's reach.

I mentioned ntdll.dll for now as this DLL doesn't have any other dependencies. That means it doesn't have to load any other DLLs and call their API. Its every exported function passes the execution directly to the kernel and not to other user-mode DLLs. It shouldn't stop you from manually importing any other DLL (like kernel32.dll or user32.dll), the same way, if you make sure to walk through the DLLs import table and populate it manually while recursively importing all DLLs from the library's dependencies.
That way the manually mapped modules will use only other manually mapped dependencies and they will never have to touch the modules that were loaded into the address space when the program was started.

Afterwards, it is only a matter of calling the API functions from your manually mapped DLL files in memory and you can be sure that no AV or sandbox software will ever be able to detect such calls or hook them in user-mode.

Conclusion

There is certainly nothing that user-mode AV or sandbox software can do about the evasion methods I described above, other than going deeper into Ring0 and monitoring the process activity from the kernel.

The unhooking method can be countered by protection software re-hooking the API functions, but then the APIs can by unhooked again and the cat and mouse game will never end. In my opinion the stealth calling method is much more professional as it is completely unintrusive, but a bit harder to implement. I may spend some time on the implementation that I will test against all popular sandbox analysis software and publish the results.

As always, I'm always happy to hear your feedback or ideas.

You can hit me up on Twitter @mrgretzky or send an email to kuba@breakdev.org.

Enjoy and see you next time!