Learning Assembly in Linux

For someone who struggled in the security world Assembly language is a compulsory language to master. Because this language is the language of low-level, close to machine language (binary), then learn the language will be very beneficial. By learning assembly language, more or less automatically we will understand how computers work more deeply.
This article also as the opening before I discuss the shellcode, buffer overflow and other exploitation techniques that require an understanding of the assembly and operating systems. If you want to be a good hacker, you must master the language.
Machine language, Assembly and C
Basically the computer is a digital creatures who only understands digit digits 1 and 0 (binary). The computer would only accept data in binary form and can only understand commands in binary form. Command in binary form is called the machine language.
In general, the program can be viewed as a sequence of steps / command / instructions to accomplish something. Programmer can simply create a program by writing the command in the form of 1 and 0 (machine language), or using a high-level language that is more humane as C, Visual Basic or Java.

    Only two computer understandable symbols, namely 1 and 0
Consider this simple example: programmers want to keep the value of EAX register into the stack. In the machine language programmer must write 01,010,000. Whereas in assembly language programmers write enough PUSH EAX. Which is more humane? Of course using more human assembly. Very difficult for humans when should always use the 1 and 0 each command.
codeinbinary
The higher the language, the more humane way of giving orders. For example if the programmer wants to display a text on the screen, in the C language programmers write enough printf ( "Hello World"), easy and quick. But in a language such as assembly lower, it takes approximately 5 steps to complete the same task.
All programs made in any language, in the end when executed will be translated into machine language, because that's the only language the processor understands.
Assembly Language and Processor
Because assembly language is a mnemonic (abbreviation) for instructions in machine language. So the commands in assembly language is closely related to the processor. Each processor has instruction set of each, so that the assembly language for Intel will be different from the assembly to the other processors. But because Intel processor market share almost every manufacturer makes the instruction set of processors that are compatible with Intel.

    Assembly language of instruction is a mnemonic machine language (binary form) called opcode
Complete documentation of a program in Intel processors, including setnya instruction list can be obtained at the official Intel website, the Intel developer manuals. In this article I just explained some basic instructions of the most widely used, the rest can be seen in the manual is in intel website.
Assembly AT & T and NASM
There are two assembly language syntax, which is in the format of AT & T and NASM. AT & T syntax is used in many environments such as GNU GNU Assembler, and become the default format of the GNU Debugger (GDB). While the format used by netwide NASM assembler and is widely used in the windows environment.

    It should be noted that the difference in NASM and AT & T syntax is only a matter of course, both machines produce the same language exactly
Some of the differences between the format of AT & T and NASM are:

    * Comment line begins with ";" semicolon for NASM. AT & T started the comment with a # (hash)
    * The AT & T format, each register begins with%. NASM does not use%.
    * In the format of AT & T, each literal values (constants) begin with $. NASM does not use $.
    * In order to use the source and destination Operands, AT & T format to write goals for the second Operands (example: CMD
      ,). While NASM Operands written as the first goal (eg, CMD,).
Register
Register is an internal variable which has a built-in on the processor that can be used by programmers for a variety of purposes. Because of its position in the processor registers, not in memory, then use the register as a variable is much faster than using a variable dismipan at an address in memory.
Here are the types of registers available on Intel processors.
Categories

Name

Explanation
General Purpose

EAX, EBX, ECX, EDX

32-bit wide data, may diapakai for any purposes. E is Extended (because the first general-purpose registers 16 bits only).

AX, BX, CX, DX

Under the 16-bit 32-bit registers above. AX is the lower 16 bits of EAX.

AH, AL, BH, BL, CH, CL, DH, DL

Section 8 bits of 16-bit registers above. AH is the 8 bits of AX. AL is the lower 8 bits of AX.
Segment Register

CS, SS, DS, ES, FS, GS

Used to denote the beginning of 16-bit memory address. CS = Code, SS = Stack, DS = Data, ES, FS, GS = Extra segment register
Offset Register

Used to denote the end of 16-bit memory address. Memory address indicated by a combination of segment and offset.

EBP

Used as an offset in the stack frame. Usually refers to the bottom of the stack frames in a function. ESP shows peak stack, stack EBP basic points.

ESI

Usually used to offset the string resources in operations involving the memory block.

EDI

Usually used to offset the destination string operations involving memory block.

ESP

Stack pointer, indicates the top of the stack.
Special

EFLAGS

Can not be used programmers, the processor is only used for logical operations and the results of state.

EIP

Can not be used programmers, only used to show processor memory address containing the next instruction to be executed.
Note the picture below to view the registers in the processor family ia32 (Intel Architecture 32-bit).
http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/01x86/80x86% 20Architecture/registers.htm
courtesy of iu-bremen.de
ATS in the picture shows that the Extended registers (beginning with E) is a 32-bit registers. To be compatible previous programs when the register is only 16 bits, then the other registers are bit lower part of extendednya version. An example is the register ESI and SI. SI register is 16 bits at the bottom of the ESI. In the registers EAX, AX is the lower 16 bits of EAX. AX register was split again into 8 bits of AH and the 8 bit AL. Programmers are free to use any of the appropriate needs.
The classic "Hello World"
Enough already theorized, now we begin to wet. Let us create the first program in assembly that displays the text "Hello World". In this article I use the Intel syntax format, rather than AT & T. Please type the following source and save it in the name of hello.asm
; Special text section for code section. text
global _start
_start: ; Systemcall => write (1, msg, len) mov edx, len; long string entered in the register EDX mov ecx, msg; memory that stores the address string entered in the register ECX mov ebx, 1; file descriptor (stdout = 1 = default console) are stored in the EBX register mov eax, 4; No. 4 is a function syscall sys_write () int 0x80; call the system call interrupt 80 hex.
; Systemcall => exit (0) xor ebx, ebx; make EBX to 0 as return code when the exit mov eax, 1; number 1 is a function syscall exit () int 0x80; call the system call interrupt 80 hex.
section. data; special data section for data / variables msg db "Hello, World!", 0xa; String 0xA followed by the new line \ n. len equ $ - msg; length of the string obtained by reducing the address on this line with the string address.
After that we will compile the ASM files into ELF format object code with NASM (netwide assembler). After that will be formed hello.o files that must be linked with the linker LD in order to become executable format.
$ NASM-f elf hello.asm $ Ld-s-o hello hello.o $. / Hello Hello, World!
Congratulations, you've managed to create Hello World program in Assembly language. The program above is very simple, we call the system calls write () to display the string (msg), then we call the system call exit () to exit the program and the program finished. String msg we put in the section. Data such as a special section for storing data / variables. Meanwhile, assembly instructions stored in the section. Text for a special text section to save the code.
Hello World Opcode
To see the relationship between assembly and machine language we can see the opcode of yagn assembly program we created with objdump program at the following picture.
assembly relationships and opcode
Opcode on the left is the machine language version of assembly language on the right. This shows the close link between assembly and machine language. An example is the assembly instructions INT 0 × 80 is translated into machine language: 0xCD 0 × 80 (in hex) or 11001101 (binary from 0xCD) 10,000,000 (binary from 0 × 80).
Note that the assembly source code, "MOV EDX, len," after compiled translated into "MOV EDX, 0xE". This is because len is a constant length of the string "Hello, World!" Is 14 characters long. Assembly instructions on the source code "MOV ECX, msg" after compiled translated into "MOV ECX, 0 × 80490a4". Why msg translated into 0 × 80490a4? This is because msg is the address of the string "Hello, World!" So that when compiled translated into the address 0 × 80490a4. Seen also in the picture above at the location 0 × 80490a4 there is the string "Hello, World!".
System Call
In the hello world program above we use the system call to display text on the screen. System call is the gateway to the kernel mode for programs that require the services that can only be done by the kernel.
system call is the gateway to the kernel mode
System call in Linux are called by using interrupt 80 hex. System call number entered in the register EAX. A complete list of Linux systemcall numbers can be read at the header files: / usr / include / asm / unistd.h. Here is an excerpt from the content of the file unistd.h
# ifndef _ASM_I386_UNISTD_H_ # define _ASM_I386_UNISTD_H_
/ *
* This file contains the system call numbers.
* /
# define __NR_restart_syscall 0 # define __NR_exit 1 # define __NR_fork 2 # define __NR_read 3 # define __NR_write 4 # define __NR_open 5 # define __NR_close 6 # define __NR_waitpid 7 # define __NR_creat 8
In the hello world example we use the system call number 4 (write) and the number 1 (exit). To learn how to use and arguments for these system calls, we can use the man on Linux.
$ Man 2 write SYNOPSIS ssize_t write (int fd, const void * buf, size_t count); $ Man 2 exit void _exit (int status);
From the manual system call requested write 3 arguments: the file descriptor of type integer, the memory address where the string is, and the last is a long string of type integer. When the arguments are stored in the registers starting from EBX, ECX and EDX. The first argument in EBX, ECX's second argument and third in the EDX. EAX register is used to store the system call number. From the manual system call arguments 1 exit request: the status code of type integer is stored in the EBX register.
In the hello world example above we use the assembly instructions 3 MOV, XOR and INT. Let us discuss these intstruksi.
MOV instruction
We use MOV to copy data from source to destination. Source and destination memory addresses, or register. Consider the following example:
NASM / Intel

AT & T

Description
MOV EAX, 0 × 51

MOVL $ 0 × 51,% EAX

Filling EAX register with the value 51 hex
MOV ESP, EBP

MOVL EBP, ESP

Copy the contents of the register to register EBP ESP
The difference between NASM syntax and AT & T is the direction of copying. In NASM syntax, there are goals in the first Operands, whereas in AT & T syntax is the purpose of Operands to-2.
XOR Instruction
XOR instruction is used to perform logic operations OR Xclusive. XOR will produce 0 when both operandnya same, and produces 1 if not the same. XOR is usually used to create a register to 0 by performing XOR for Operands the same as in the hello world example.
NASM / Intel

AT & T

Description
XOR EBX, EAX

Xor% EAX,% EBX

XOR the contents of the EAX EBX, the result is stored in the EBX
INT instruction
INT instruction is used to send signals to a processor interrupt. In the example above we use the interrupt 80 hex number to request service from the kernel.
NASM / Intel

AT & T

Description
INT 0 × 80

Int $ 0 × 80

Call interrupt 80 hex numbers
Another example: Hello World X Times
This time we will modify the hello world program above in order to display the same message multiple times depending on the arguments the user entered.
section. text
global _start
_start: pop eax; pop number of argc (ignored) pop eax; pop argv [0] (ignored because it contains the name of the program) pop eax; pop argv [1] (this used to stringtoint) call stringtoint; ECX contains the argument of type integer as a counter
_print: the push ecx; save counter in the stack because ECX are also used in _print_hello call _print_hello; print hello world pop ecx; take longer because the counter from the stack will be used for loop loop _print; subtract ECX by 1, if not 0 return to _print
; This system call exit (0) mov ebx, 0 mov eax, 1 int 0x80
_print_hello:; systemcall write (1, msg, len) mov edx, len mov ecx, msg mov ebx, 1 mov eax, 4 int 0x80 ret
stringtoint:; change the string in a designated location to be an integer in EAX ECX ; EAX address of string xor ecx, ecx; clear ECX xor ebx, ebx; clear EBX mov bl, [eax]; BL ASCII string containing the code at a designated location EAX sub bl, 0x30; ascii code number is 30h-39h, 30h deducted by add ecx, ebx; plus EBX ECX, ECX contains an integer value ret
section. data msg db "Hello, World!", 0xa len equ $ - msg
Save the source code above helloxtimes.asm name, and compile and link as shown below.
$ NASM-f elf helloxtimes.asm $ Ld-s-o helloxtimes helloxtimes.o $. / Helloxtimes 1 Hello, World! $. / Helloxtimes 2 Hello, World! Hello, World! $. / Helloxtimes 3 Hello, World! Hello, World! Hello, World! $. / Helloxtimes 4 Hello, World! Hello, World! Hello, World! Hello, World!
We learn a few new instructions in the 2nd example of this, the loop, the use of arguments and procedures, while the system call is used remains the same, ie write () and exit ().
This time the program receives an integer argument is used as the 1-9 counter how many times a message will appear on the monitor. This argument is taken from the stack with the POP instruction. At the top of the stack is argc, the number of arguments when the program starts. Beneath it contains address of argv [0] is the name of the program. Then again under the new address of the argv [1] is a parameter / the first argument. Notice the line-6 s / d-line to no 8 POP EAX instruction three times. This is necessary because there is in a position to-3 that we must discard the top 2 elements in the address before it can take argv [1]. Address the argument to-1 derived from the POP and then stored in the register EAX. Because the shape is still a string, it must be changed first into an integer by calling the procedure stringtoint in row 9.
POP instruction to Take Argument
POP instruction to Take Argument
When the program starts with an argument like ". / Helloxtimes 7". Then the number of arguments (argc) will contain 2, which is the name of the program itself, and one argument. ARGC will be stored on top of the stack, and the element below it contains the memory address of the name of the program, under which again contains the memory address of the first argument. Notice the picture above which shows the process of making a memory address containing the first argument string from stack. In this example the argument is the string "7", which is an ASCII coded character hex 37 followed by ASCII 0 (NULL character). Memory address contains the string the first argument is taken from the stack and stored in the register EAX.
Stringtoint procedure, EAX register contains the address string that will be converted to an integer. We just took the first characters only, the row 35 with MOV instruction, the contents of the memory address pointed to by EAX is copied into the register BL.

    "MOV EBX, [EAX]" different from "MOV EBX, EAX". MOV EBX, [EAX] means to copy the contents of memory at the address stored in EAX into the EBX register. While MOV EBX, EAX means copying the contents of register EAX into the EBX register
I use the BL register for an ASCII code is only 8 bits wide. When properly contain numbers, then the BL register will contain the value 30h-39h (ascii code for "0" - "9"). After the register BL is reduced by 30h to get the value from 0-9. After that the result is added to the register ECX, so come back from this procedure with the converted integer value in the register ECX.
After getting the value of type integer arguments in the register ECX, ECX then this needs to be saved first in the stack (line 12) for ECX will be used in the procedure _print_hello (line 13). ECX is used as an address string msg when calling system calls write (). After returning from the procedure _print_hello, ECX values have to be returned as original with POP ECX (line 14) because it will be used as a counter in the loop (line 15). When running LOOP instruction, ECX registers will be reduced 1, then if ECX> 0 then the program will jump to _print. If ECX is 0, then the loop stops and running the system call exit (0).
After understanding how the sample program to the 2. Let us now consider the new instructions are there: CALL, RET, PUSH, POP, LOOP.
PUSH and POP Instructions
PUSH instruction is used to store the value into the stack. The opposite is the POP instruction to retrieve the value from the stack. Stack in Linux enlarged to address the lower memory. Top of stack is at a low address, while the bottom stack at a higher address.
NASM / Intel

AT & T

Description
PUSH value

PUSHL value

Save the value into the stack
POP dest

POPL dest

Taking the value of the stack to dest

Stack Data Structure
Stack is a data structure similar to the plates. Data taken from the stack is the last data entered, or the term is a LIFO (last in first out). So if we want to retrieve the data in the middle of the pile, the way is to take the first data from the top until the end, so the data we want to be at the top of the stack.
PUSH and POP in the Stack
PUSH and POP in the Stack
ESP shows Reguster memory address from the top of the stack. Every PUSH instruction, the register ESP decreased (remember the stack grows to an increasingly smaller) because the top stack changed. Similarly, if there otherwise POP instruction, then the ESP register will increase.
PUSH EAX
PUSH EAX on the same two instructions below:
SUB ESP, 4 MOV DWORD PTR SS: [ESP], EAX
PUSH EAX (4 bytes) can be done by subtracting the ESP by 4, then copy the contents of EAX into memory at location SS: [ESP], ie the stack segment at offset appointed by ESP. DWORD PTR indicates that the width of the data to be copied to memory in the MOV instruction is 4 bytes wide.
EAX POP
POP EAX on the same two instructions below:
MOV EAX, DWORD PTR SS: [esp] ADD ESP, 4
POP EAX (4 bytes) can be done by copying the contents of memory at location SS: [ESP], ie at the offset stack segment designated by the ESP to the EAX register, then adds ESP with a 4. DWORD PTR indicates that the width of the data to be copied to memory in the MOV instruction is 4 bytes wide.
CALL and RET Instructions
CALL instruction used to call a procedure. While RET is used to return from the procedure back to the location after the call instruction. When the CALL instruction is executed, the processor store instruction address after the CALL instruction into the stack (return address), then the processor jumps to the subroutine address destination. When the RET instruction is executed, then the processor takes (POP) return address (the address in-push as CALL), then jump to that address.
NASM / Intel

AT & T

Description
CALL subroutine1

CALL subroutine1

Calling procedures subroutine1
RET

RET

Return from procedure
CALL and RET
LOOP Instruction
LOOP is used to loop a number of existing values in the ECX register. When there LOOP instruction, the processor will reduce the value of ECX by 1, and then compare the results. If the value of ECX is still> 0, then the processor will jump to a designated address in the LOOP. If the value of ECX is now 0, the processor will not jump, but continued working on the next instruction after the loop.
NASM / Intel

AT & T

Description
LOOP address

LOOP address

Looping to the address indicated by the address if ECX> 0.
Single instruction "LOOP address" 2 is equivalent to the following assembly instruction:
DEC ECX; DECREMENT: ECX = ECX - 1, 1 minus the ECX register JNZ address; JUMP IF NOT ZERO: If ECX is not 0, JUMP to address
Illustration LOOP
In the picture above there are two conditions that may be the ECX> ECX = 0 or 0. Perhaps someone asks, you know how the condition ECX <0? Remember the computer knows only 2 symbols, namely 0 and 1, so basically there is no "-1" or "-0" in binary representation. Direpresntasikan negative numbers with a two-complement coding, please read the signed number representation because it was beyond the topics we discussed today.
If ECX is 0 before doing LOOP instruction, then there is a program that will loop as many 0xFFFFFFFF or 4,294,967,295 times. This happens because the 0-1 = -1 which in binary is 0xFFFFFFFF with two-complement system.

http://www.ilmuhacking.com/programming/belajar-assembly-di-linux/