x86

X86 Shellcode Obfuscation - Part 1

Kuba Gretzky

May 11, 2016 • 8 min read

I decided to do research on shellcode obfuscation in order to see how hard it would be to develop a tool that can take any binary x86 shellcode and generate the completely unique version from it.

Please note that such tool is most useful in scenarios when shellcode has to be saved on disk (e.g. embedded inside an executable file). Obfuscated shellcodes will greatly increase in size, which may not be a desirable outcome, especially if such is to be embedded inside an exploit payload, where small shellcode size is often the highest priority.

Main reason for having a shellcode obfuscator is bypassing any static or run-time signature detections implemented by IDS or AV products. As an example, take Metasploit. Its shellcode payloads have been public for many years and by now most major IDS/AV solutions are able to detect them by searching their vast databases of malware signatures.

Manually modifying shellcodes or writing new ones from scratch in order to avoid detection is a never-ending job and a waste of time in the long run. My focus will be to write a tool that will take any shellcode in binary form and modify the instructions in such a way that will preserve its functionality, but make the code unique and thus harder to analyze.

I need to state here that this article won't cover the subject of preventing detection through behavioral analysis or code emulation implemented by various security products.

Some people may say that a good way to make the shellcode bypass signature detection methods is to use Shikata Ga Nai encoder. The shellcode decrypts itself in run-time using XOR. This means the memory block, where shellcode resides, needs to be writable which is often not the case. This method is also not immune to run-time signature detection during code emulation by security software.

Approach

Every shellcode is written in the manner to successfully execute from any memory location. That means all jumps or memory references are relative to its current position. There are no fixed memory locations.

There are exceptions though like this one that comes directly from one of Metasploit's payloads:

66   0000008d: 5d                               POP EBP
67   0000008e: 6a01                             PUSH 0x1
68   00000090: 8d85b2000000                     LEA EAX, [EBP+0xb2] ;<-- fixed offset that will break things when we insert code after this line!
69   00000096: 50                               PUSH EAX
70   00000097: 68318b6f87                       PUSH DWORD 0x876f8b31
71   0000009c: ffd5                             CALL EBP

In order to generate obfuscated code, the tool needs to:

Be able to insert new instructions before, after and between existing ones
Be able to fix relative jump offsets when instructions are inserted or deleted
Be able to replace existing instructions with alternative ones, making sure that no static data is left behind that may aid in creation of detection signatures
Be able to insert jump instructions that will change the execution flow and randomly divide the shellcode into separate blocks

The first requirement is not that hard to satisfy, but can be a little bit tricky. All call/jump instructions refer to relative memory location, which means something like "jump X bytes forward from this instruction or X bytes backward". That is, if we insert any bytes between the jump instruction and its destination, the relative jump offset will need to be corrected by the number of bytes we have inserted.

Second requirement involves keeping track of all relative jumps in the shellcode and adjusting the jump offsets whenever we increase or decrease the amount of code data between the jump's starting offset and its destination.

Third requirement will require proper fingerprinting of the disassembled instructions and recreating them as different instructions, while preserving original functionality.

Fourth requirement will allow to change the execution flow of the shellcode, dividing it into many separate blocks of code, completely changing the order, the instructions were originally placed in.

Fixing jumps

You may be thinking at this moment that fixing the jump instructions shouldn't be that hard. We just need to adjust the relative memory offset and that's it. Issue is, that jumps may be short or far. Short jump memory offset is written as 1 byte whereas far jump memory offset is 4 bytes. Take a look at the example of jnz 401020 instruction that resides at 401000 memory location:

Short jump: 00401000: 75 1E
75 - JNZ opcode

1E - jump 0x1E bytes forward (0x401020 - 0x401000 - instruction length [2 bytes])

Far jump: 00401000: 0F 85 1A 00 00 00
0F 85 - JNZ opcode

1A 00 00 00 - jump 0x1A forward (0x401020 - 0x401000 - instruction length [4 bytes])

As you can see, both instructions perform the same operation, but are written differently. Also the relative memory offset used with the instruction is affected by the instruction length itself. This is important to keep in mind.

Now let's take a look at the following code:

00401000: 75 7E : jnz 00401080
...
00401080: 90 : nop

The JNZ instruction would be 75 7E, but what will happen if we insert 4 bytes between 00401000 and 00401080 addresses? It would make sense to increase the relative memory offset by 4, which would become 75 82. See what happens now to that instruction after it is fixed like this:

00401000: 75 82 : jnz 00400F84
...
-- added 4 bytes --
...
00401084: nop

The instruction jumps backwards now? Yes, that's right. Short jump's relative memory offset is a 1 signed byte which means the value range is between -128 and 127. 0x82 in this example is in fact treated as -126.

We need to be very careful when inserting bytes. The tool also needs to detect if the instruction needs to be converted from short to far. Proper jump fix from the previous example should look as follows:

00401000: 0F 85 7E 00 00 00 : jnz 00401084
...
-- added 4 bytes --
...
00401084: nop

Every time bytes are inserted, the tool needs to look for affected jump instructions and detect if fixing the relative address also involves replacing the affected jump instruction with the longer alternative.

Unfortunately there is also another complication that needs handling.

Problematic jumps

Several instructions that operate on short relative memory offsets don't have their far alternatives, meaning it is not easy to replace one instruction with another in such case. Instead, in order to properly handle such instructions, we need to replace one instruction with several other instructions, while retaining original instruction functionality.

Problematic instructions are: LOOP, JECXZ, LOOPNZ, LOOPZ

Fortunately I found a very helpful excerpt from the book The Art of Assembly Language Programming by Randall Hyde that covers the replacement of aforementioned instructions:

JECXZ

jecxz Target

becomes:

test ecx, ecx		;Sets the zero flag if ecx=0
jz Target

LOOP

loop Target

becomes:

dec ecx
jnz Target

LOOPNZ/LOOPZ

loopnz/loopz Target

becomes:

  jnz/jz quit
  dec ecx
  jz quit2
  jmp Target
quit:
  dec ecx
quit2:

In the obfuscation tool, I decided to replace the problematic instructions with their longer alternatives at the beginning in order to avoid issues later.

Data blocks

Most shellcodes embed some form of binary data that is not code. This may be a command to execute, IP address to connect back to with reverse shell or just anything that shouldn't be treated as code. It is not possible for the disassembler to make the distinction between code and data. Any binary data can and will be interpreted as code.

Before we let the tool perform any code obfuscation, we need the ability for the user to specify where real code instructions are and make it treat the rest as data that can be skipped during the obfuscation process.

Obfuscation

Now, this is where the fun part starts.

The tool is able to properly parse the disassembled code. It is able to insert and delete instructions while fixing any relative jumps that may become affected in the process. It is now time to make the tool do the real work and start replacing instructions with random and unique equivalents.

The more support we add for different instructions, the more unique the output shellcode will become.

I will start slowly with one instruction and add support for more in future parts of this article series.

MOV REG, IMM

(moves immediate static value to register)

Examples:

B8 EF BE AD DE   : MOV EAX, 0xDEADBEEF
66 BA FE CA      : MOV DX, 0xCAFE
B1 77            : MOV CL, 0x77

This is one of the simplest instructions, but if we don't replace these, they may become a great source of information for writing effective signatures to detect our shellcode.

I won't be getting into much detail here how x86 instructions are encoded. If you want to learn more (and you should), you can find a useful information following these links:

x86 Instruction Encoding Revealed: Bit Twiddling for Fun and Profit

Encoding Real x86 Instructions

X86 Opcode and Instruction Reference

The MOV R16/32, IMM16/32 instruction starts with the 0xB8 opcode. The register value that is used with the instruction is saved in the lowest 3 bits of this opcode.

The register values are as follows:

EAX/AX/AL : 000b / 00h
ECX/CX/CL : 001b / 01h
EDX/DX/DL : 010b / 02h
EBX/BX/BL : 011b / 03h
ESP/SP/AH : 100b / 04h
EBP/BP/CH : 101b / 05h
ESI/SI/DH : 110b / 06h
EDI/DI/BH : 111b / 07h

To form the opcode value with the register we want, we need to add the register value to the opcode starting value (0xB8).

B8 44 33 22 11 : MOV EAX, 11223344h
BA 44 33 22 11 : MOV EDX, 11223344h

If we want to use 16-bit registers and values instead of 32-bit, the trick is to prefix the opcode with 0x66 byte.

66 B8 22 11 : MOV AX, 1122h
66 BA 22 11 : MOV DX, 1122h

The MOV R8, IMM8 is very similar to the previous one. The only difference is that the opcode starting value is 0xB0.

B0 11 : MOV AL, 11h
B2 11 : MOV DL, 11h

To summarize, in order to properly detect the MOV REG, IMM instructions we need to look for opcodes that start with the byte in range of 0xB0 to 0xBF and also keep in mind that the first byte may be prefixed with 0x66, putting the instruction in 16-bit mode.

We will convert each instruction into several ADD, SUB or XOR instructions that will perform computation of the original immediate value.

As an example this is how the obfuscation will look like:

Before:

B8 44 33 22 11 : MOV EAX, 0x11223344

After:

BA 38 2C 30 A2    : MOV EDX, 0xA2302C38
81 C2 BD 85 4F D8 : ADD EDX, 0xD84F85BD
81 EA E0 5C 59 BF : SUB EDX, 0xBF595CE0
81 F2 1C 9A 82 23 : XOR EDX, 0x23829A1C
81 F2 4D FC 86 89 : XOR EDX, 0x8986FC4D

which gives:

EDX = 0xA2302C38 + 0xD84F85BD - 0xBF595CE0 ^ 0x23829A1C ^ 0x8986FC4D = 0x11223344

As you can see the instruction is now completely different, but still after the last instruction is executed, EDX will have the original value of 0x11223344. We can generate as many computation instructions as we want.

Wrapping up

I decided to write this tool in Python, considering it would be more approachable for wider audience. Although I performed a bit of simple disassembly myself to fingerprint several instruction types, I needed a full-blown disassembler library that would detect the length of each disassembled instruction.

For that purpose I used diStorm3 disassembler library by Gil Dabah. It is fast and easy to use. I thought using Capstone was a bit of an overkill for my needs.

Current version of the tool is obfuscating only MOV REG, IMM instructions, but support for more instructions is coming soon.

Here is the example dud shellcode, for testing purposes, I've prepared to demonstrate its obfuscation:

Input:

0    00000000: fc                               CLD
1    00000001: b105                             MOV CL, 0x5
2    00000003: 80fc02                           CMP AH, 0x2
3    00000006: 7504                             JNZ 0xc
4    00000008: 66b88888                         MOV AX, 0x8888
5    0000000c: ba44332211                       MOV EDX, 0x11223344
6    00000011: e2f9                             LOOP 0xc

Output:

0    00000000: fc                               CLD
1    00000001: b1a2                             MOV CL, 0xa2
2    00000003: 82c1c4                           ADD CL, 0xc4
3    00000006: 82e961                           SUB CL, 0x61
4    00000009: 80fc02                           CMP AH, 0x2
5    0000000c: 7518                             JNZ 0x26
6    0000000e: 66b80596                         MOV AX, 0x9605
7    00000012: 6681c0ce9a                       ADD AX, 0x9ace
8    00000017: 6681c0a039                       ADD AX, 0x39a0
9    0000001c: 6681f04371                       XOR AX, 0x7143
10   00000021: 6681c0886d                       ADD AX, 0x6d88
11   00000026: ba44351577                       MOV EDX, 0x77153544
12   0000002b: 81f2fbd13236                     XOR EDX, 0x3632d1fb
13   00000031: 81f299f2d3ed                     XOR EDX, 0xedd3f299
14   00000037: 81c22c9ffb09                     ADD EDX, 0x9fb9f2c
15   0000003d: 81eae2468266                     SUB EDX, 0x668246e2
16   00000043: 81f2345d4f41                     XOR EDX, 0x414f5d34
17   00000049: 49                               DEC ECX
18   0000004a: 75da                             JNZ 0x26

The source code is fully available on GitHub.

DOWNLOAD

Source code

To be continued...

This is the end of Part 1 of this blog series. Next part will cover obfuscation of other instructions.

Stay tuned!

You can follow me on Twitter and Google+.

If you have any questions or suggestions, please post them in the comments below.

Update:

Part 2 is out!

X86 Shellcode Obfuscation - Part 2

EOF

Approach

Fixing jumps

Problematic jumps

JECXZ

LOOP

LOOPNZ/LOOPZ

Data blocks

Obfuscation

MOV REG, IMM

Wrapping up

DOWNLOAD

To be continued...

Sign up for more like this.