X86 Shellcode Obfuscation - Part 1
I decided to do research on shellcode obfuscation in order to see how hard it would be to develop a tool that can take any binary x86 shellcode and generate the completely unique version from it.
Please note that such tool is most useful in scenarios when shellcode has to be saved on disk (e.g. embedded inside an executable file). Obfuscated shellcodes will greatly increase in size, which may not be a desirable outcome, especially if such is to be embedded inside an exploit payload, where small shellcode size is often the highest priority.
Main reason for having a shellcode obfuscator is bypassing any static or run-time signature detections implemented by IDS or AV products. As an example, take Metasploit. Its shellcode payloads have been public for many years and by now most major IDS/AV solutions are able to detect them by searching their vast databases of malware signatures.
Manually modifying shellcodes or writing new ones from scratch in order to avoid detection is a never-ending job and a waste of time in the long run. My focus will be to write a tool that will take any shellcode in binary form and modify the instructions in such a way that will preserve its functionality, but make the code unique and thus harder to analyze.
I need to state here that this article won't cover the subject of preventing detection through behavioral analysis or code emulation implemented by various security products.
Some people may say that a good way to make the shellcode bypass signature detection methods is to use Shikata Ga Nai encoder. The shellcode decrypts itself in run-time using XOR. This means the memory block, where shellcode resides, needs to be writable which is often not the case. This method is also not immune to run-time signature detection during code emulation by security software.
Approach
Every shellcode is written in the manner to successfully execute from any memory location. That means all jumps or memory references are relative to its current position. There are no fixed memory locations.
There are exceptions though like this one that comes directly from one of Metasploit's payloads:
66 0000008d: 5d POP EBP
67 0000008e: 6a01 PUSH 0x1
68 00000090: 8d85b2000000 LEA EAX, [EBP+0xb2] ;<-- fixed offset that will break things when we insert code after this line!
69 00000096: 50 PUSH EAX
70 00000097: 68318b6f87 PUSH DWORD 0x876f8b31
71 0000009c: ffd5 CALL EBP
In order to generate obfuscated code, the tool needs to:
- Be able to insert new instructions before, after and between existing ones
- Be able to fix relative jump offsets when instructions are inserted or deleted
- Be able to replace existing instructions with alternative ones, making sure that no static data is left behind that may aid in creation of detection signatures
- Be able to insert jump instructions that will change the execution flow and randomly divide the shellcode into separate blocks
The first requirement is not that hard to satisfy, but can be a little bit tricky. All call/jump instructions refer to relative memory location, which means something like "jump X bytes forward from this instruction or X bytes backward". That is, if we insert any bytes between the jump instruction and its destination, the relative jump offset will need to be corrected by the number of bytes we have inserted.
Second requirement involves keeping track of all relative jumps in the shellcode and adjusting the jump offsets whenever we increase or decrease the amount of code data between the jump's starting offset and its destination.
Third requirement will require proper fingerprinting of the disassembled instructions and recreating them as different instructions, while preserving original functionality.
Fourth requirement will allow to change the execution flow of the shellcode, dividing it into many separate blocks of code, completely changing the order, the instructions were originally placed in.
Fixing jumps
You may be thinking at this moment that fixing the jump instructions shouldn't be that hard. We just need to adjust the relative memory offset and that's it. Issue is, that jumps may be short
or far
. Short jump memory offset is written as 1 byte
whereas far jump memory offset is 4 bytes
. Take a look at the example of jnz 401020
instruction that resides at 401000
memory location:
Short jump: 00401000: 75 1E
75
- JNZ opcode
1E
- jump 0x1E bytes forward (0x401020 - 0x401000 - instruction length [2 bytes]
)
Far jump: 00401000: 0F 85 1A 00 00 00
0F 85
- JNZ opcode
1A 00 00 00
- jump 0x1A forward (0x401020 - 0x401000 - instruction length [4 bytes]
)
As you can see, both instructions perform the same operation, but are written differently. Also the relative memory offset used with the instruction is affected by the instruction length itself. This is important to keep in mind.
Now let's take a look at the following code:
00401000: 75 7E : jnz 00401080
...
00401080: 90 : nop
The JNZ instruction would be 75 7E
, but what will happen if we insert 4 bytes between 00401000
and 00401080
addresses? It would make sense to increase the relative memory offset by 4, which would become 75 82
. See what happens now to that instruction after it is fixed like this:
00401000: 75 82 : jnz 00400F84
...
-- added 4 bytes --
...
00401084: nop
The instruction jumps backwards now? Yes, that's right. Short jump's relative memory offset is a 1 signed byte
which means the value range is between -128 and 127. 0x82
in this example is in fact treated as -126.
We need to be very careful when inserting bytes. The tool also needs to detect if the instruction needs to be converted from short
to far
. Proper jump fix from the previous example should look as follows:
00401000: 0F 85 7E 00 00 00 : jnz 00401084
...
-- added 4 bytes --
...
00401084: nop
Every time bytes are inserted, the tool needs to look for affected jump instructions and detect if fixing the relative address also involves replacing the affected jump instruction with the longer alternative.
Unfortunately there is also another complication that needs handling.
Problematic jumps
Several instructions that operate on short
relative memory offsets don't have their far
alternatives, meaning it is not easy to replace one instruction with another in such case. Instead, in order to properly handle such instructions, we need to replace one instruction with several other instructions, while retaining original instruction functionality.
Problematic instructions are: LOOP
, JECXZ
, LOOPNZ
, LOOPZ
Fortunately I found a very helpful excerpt from the book The Art of Assembly Language Programming by Randall Hyde that covers the replacement of aforementioned instructions:
JECXZ
jecxz Target
becomes:
test ecx, ecx ;Sets the zero flag if ecx=0
jz Target
LOOP
loop Target
becomes:
dec ecx
jnz Target
LOOPNZ/LOOPZ
loopnz/loopz Target
becomes:
jnz/jz quit
dec ecx
jz quit2
jmp Target
quit:
dec ecx
quit2:
In the obfuscation tool, I decided to replace the problematic instructions with their longer alternatives at the beginning in order to avoid issues later.
Data blocks
Most shellcodes embed some form of binary data that is not code. This may be a command to execute, IP address to connect back to with reverse shell or just anything that shouldn't be treated as code. It is not possible for the disassembler to make the distinction between code and data. Any binary data can and will be interpreted as code.
Before we let the tool perform any code obfuscation, we need the ability for the user to specify where real code instructions are and make it treat the rest as data that can be skipped during the obfuscation process.
Obfuscation
Now, this is where the fun part starts.
The tool is able to properly parse the disassembled code. It is able to insert and delete instructions while fixing any relative jumps that may become affected in the process. It is now time to make the tool do the real work and start replacing instructions with random and unique equivalents.
The more support we add for different instructions, the more unique the output shellcode will become.
I will start slowly with one instruction and add support for more in future parts of this article series.
MOV REG, IMM
(moves immediate static value to register)
Examples:
B8 EF BE AD DE : MOV EAX, 0xDEADBEEF
66 BA FE CA : MOV DX, 0xCAFE
B1 77 : MOV CL, 0x77
This is one of the simplest instructions, but if we don't replace these, they may become a great source of information for writing effective signatures to detect our shellcode.
I won't be getting into much detail here how x86 instructions are encoded. If you want to learn more (and you should), you can find a useful information following these links:
x86 Instruction Encoding Revealed: Bit Twiddling for Fun and Profit
Encoding Real x86 Instructions
X86 Opcode and Instruction Reference
The MOV R16/32, IMM16/32 instruction starts with the 0xB8
opcode. The register value that is used with the instruction is saved in the lowest 3 bits of this opcode.
The register values are as follows:
EAX/AX/AL : 000b / 00h
ECX/CX/CL : 001b / 01h
EDX/DX/DL : 010b / 02h
EBX/BX/BL : 011b / 03h
ESP/SP/AH : 100b / 04h
EBP/BP/CH : 101b / 05h
ESI/SI/DH : 110b / 06h
EDI/DI/BH : 111b / 07h
To form the opcode value with the register we want, we need to add the register value to the opcode starting value (0xB8
).
B8 44 33 22 11 : MOV EAX, 11223344h
BA 44 33 22 11 : MOV EDX, 11223344h
If we want to use 16-bit registers and values instead of 32-bit, the trick is to prefix the opcode with 0x66
byte.
66 B8 22 11 : MOV AX, 1122h
66 BA 22 11 : MOV DX, 1122h
The MOV R8, IMM8 is very similar to the previous one. The only difference is that the opcode starting value is 0xB0
.
B0 11 : MOV AL, 11h
B2 11 : MOV DL, 11h
To summarize, in order to properly detect the MOV REG, IMM instructions we need to look for opcodes that start with the byte in range of 0xB0
to 0xBF
and also keep in mind that the first byte may be prefixed with 0x66
, putting the instruction in 16-bit mode.
We will convert each instruction into several ADD, SUB or XOR instructions that will perform computation of the original immediate value.
As an example this is how the obfuscation will look like:
Before:
B8 44 33 22 11 : MOV EAX, 0x11223344
After:
BA 38 2C 30 A2 : MOV EDX, 0xA2302C38
81 C2 BD 85 4F D8 : ADD EDX, 0xD84F85BD
81 EA E0 5C 59 BF : SUB EDX, 0xBF595CE0
81 F2 1C 9A 82 23 : XOR EDX, 0x23829A1C
81 F2 4D FC 86 89 : XOR EDX, 0x8986FC4D
which gives:
EDX = 0xA2302C38 + 0xD84F85BD - 0xBF595CE0 ^ 0x23829A1C ^ 0x8986FC4D = 0x11223344
As you can see the instruction is now completely different, but still after the last instruction is executed, EDX will have the original value of 0x11223344
. We can generate as many computation instructions as we want.
Wrapping up
I decided to write this tool in Python, considering it would be more approachable for wider audience. Although I performed a bit of simple disassembly myself to fingerprint several instruction types, I needed a full-blown disassembler library that would detect the length of each disassembled instruction.
For that purpose I used diStorm3 disassembler library by Gil Dabah. It is fast and easy to use. I thought using Capstone was a bit of an overkill for my needs.
Current version of the tool is obfuscating only MOV REG, IMM instructions, but support for more instructions is coming soon.
Here is the example dud shellcode, for testing purposes, I've prepared to demonstrate its obfuscation:
Input:
0 00000000: fc CLD
1 00000001: b105 MOV CL, 0x5
2 00000003: 80fc02 CMP AH, 0x2
3 00000006: 7504 JNZ 0xc
4 00000008: 66b88888 MOV AX, 0x8888
5 0000000c: ba44332211 MOV EDX, 0x11223344
6 00000011: e2f9 LOOP 0xc
Output:
0 00000000: fc CLD
1 00000001: b1a2 MOV CL, 0xa2
2 00000003: 82c1c4 ADD CL, 0xc4
3 00000006: 82e961 SUB CL, 0x61
4 00000009: 80fc02 CMP AH, 0x2
5 0000000c: 7518 JNZ 0x26
6 0000000e: 66b80596 MOV AX, 0x9605
7 00000012: 6681c0ce9a ADD AX, 0x9ace
8 00000017: 6681c0a039 ADD AX, 0x39a0
9 0000001c: 6681f04371 XOR AX, 0x7143
10 00000021: 6681c0886d ADD AX, 0x6d88
11 00000026: ba44351577 MOV EDX, 0x77153544
12 0000002b: 81f2fbd13236 XOR EDX, 0x3632d1fb
13 00000031: 81f299f2d3ed XOR EDX, 0xedd3f299
14 00000037: 81c22c9ffb09 ADD EDX, 0x9fb9f2c
15 0000003d: 81eae2468266 SUB EDX, 0x668246e2
16 00000043: 81f2345d4f41 XOR EDX, 0x414f5d34
17 00000049: 49 DEC ECX
18 0000004a: 75da JNZ 0x26
The source code is fully available on GitHub.
DOWNLOAD
To be continued...
This is the end of Part 1 of this blog series. Next part will cover obfuscation of other instructions.
Stay tuned!
You can follow me on Twitter and Google+.
If you have any questions or suggestions, please post them in the comments below.
Update:
Part 2 is out!
X86 Shellcode Obfuscation - Part 2
EOF