Handcrafting Linux Shellcode

Crafting your own shellcode requires getting muddy with low level programming. One does not simply write machine code from memory. This blog post is my attempt at providing a template and tutorial of the shellcode creation process for a 32-bit Linux machine.

The first step we will take is to write the task we want our shellcode to perform in a high level language:

Then statically compile the source and check to make sure it works as expected.
Note: There is already a file called flag.txt in the ‘ctf’ directory.

Now we need to disassemble the execve(2) function and gather information about how it works at the assembly level.

From the disassembly dump we can identify the registers and their arrangement as well as the function call number (0xb). We know this is the function call number because of the x86 function call convention. The Intel Architecture Software Developer Manuals go over how the stack and various registers are supposed to be utilized prior to a call instruction. The function call numbers are documented in the following locations on most Linux systems:

With all of this information we can postulate that the execve(2) system call resembles the following:

I’m using GDB with GEF, but bare-bones GDB will also work. We need to check the values of each register to confirm our assumptions.

First we set a breakpoint after all the registers in question have been populated and then run the program so we can look at the values. Refer to the GDB documentation if you are not familiar with how to examine memory locations and registers. Next we need to craft our assembly code with the caveat that we have to avoid bad characters. More on that later.

The code above is 32-bit assembly for Linux based systems. You can always tell from the software interrupt (int 0x80). Based on your experience level with assembly, the comments may or may not help understanding what is happening on each line, but I will try to elaborate briefly.

Executable programs are divided into sections .txt, .bss, .data, etc, that vary based on the specification for that format, which in our instance is the Executable and Linkable Format (ELF). On the Windows platform it is typically the Portable Executable (PE). The .txt section is where program code is stored. The .bss section contains uninitialized variables and the .data section is a read/write segment for initialized variables that do not have a local scope.

The Netwide Assembler (NASM) uses the directive global in order to export symbols for use during object code linking. The _start label identifies to the GNU Linker (/usr/bin/ld) where code execution begins. We could have used _main or wrench as labels, but then you would have to specify the label to the linker using the ‘-e’ switch.

The _start label is the default that ld(1) expects as the entry point for your code.

Moving down to the instructions it should be noted that NASM uses the Intel syntax as opposed to the AT&T syntax. The most notable difference between the two is the ordering of the source and destination operands. The format can be changed by passing the ‘-M’ switch to objdump or setting the ‘disassembly-flavor’ in gdb if you don’t like the default.

Strings are pushed onto the stack in reverse order (Little Endian) and have to be NULL-terminated. We also capture the memory location of the strings by using the position of the stack pointer (esp). The wonderful thing about this technique is we do not have to rely on a static memory location.  From our C code we saw that execve used the ebx, ecx, and edx registers. We also used the esi register above to store a memory location on the stack. This creates a pointer to a pointer so to speak in esp which we copy into ecx. The Intel Developer Manuals outline which registers are safe to modify during the scope of a function call.Therefore, we cannot arbitrarily use any random register and expect consistent results.

Using xor to zero out a register instead of pushing a ‘0x0’ on the stack avoids one of the most common bad characters that hinders shellcode. Another technique utilized in the code above to avoid ‘0x0’ in our shellcode is to use the least significant bytes (8-bits) of EAX.  The general purpose registers all have a way to reference their 16-bit version as well as the high and low order 8-bit versions. For the EAX register they are AX, AH, AL respectively. We use AL instead of EAX because it will be padded with zeros and that is not what we want.

In order to assemble and link the code above use nasm and ld:

If all of this seems daunting, I would recommend reading “The Art of Assembly” by Randall Hyde or “Assembly Language Step-by-Step” by Jeff Duntemann. Having a copy of the “Intel Architecture Software Developer Manuals” on your bookshelf or PDF reader is also a tremendous help and they are free. I had the hardbacks shipped several years ago.

Moving on we now need to get the hex values for the opcodes in our program. We can use GDB or objdump:

If you look at the middle column the two digit hex numbers are what we need, they represent the contents in the last two columns. Normally, you would just copy each opcode by hand and prefix them with ‘\x’ so for example:

We can automate this process in a number of ways, but lets do python:

The script above takes standard input from objdump and outputs the shellcode with the size in bytes.

You can take the shellcode and test it with the following code:

Then compile and execute:

Voila! We are done. Just for fun here is a bash one-liner that does the same thing.

Other options to speed up the process include modifying shellcode publicly available or using msfvenom to create what you need.

But notice the default includes \x00 which will affect string buffers. This is what is known as a bad character (badchar). There are a few of them and you will have to do bad character analysis to figure out which ones you will need to avoid, otherwise your payload will have unintended consequences.

Thanks for reading.



Categories: Binary Exploitation

Tags: , , , , , , , , , , , , , ,

1 reply

  1. Very nice bro.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: