Link to this headingARM

  • Reduced Instruction Set Computing (RISC)
    • Less than 100 Instructions
    • Instructions only operate on Registers
    • ONLY Load/Store instructions can access memory.
  • Instructions can be used for Continual Execution
  • ARMv3 and earlier use little-endian format for data
  • ARMv4 and later use Big-endian format by default but allows for switchable endian-ness for data
  • Uses little-endian format for Instructions
ARM FamilyARM Architecture
ARM7ARM v4
ARM9ARM v5
ARM11ARM v6
Cortex-AARM v7-A
Cortex-RARM v7-R
Cortex-MARM v7-M

ARM Mode:

  • R15 Program Counter is always 4 bytes

Link to this headingWriting Assembly

Use as to transform ASM file to object file
Use ld to link object files to binary

as program.s -o program.o ld program.o -o program

.string is null terminated
.ascii in not null terminated

Link to this headingInstructions

InstructionDescription
EORBitwise XOR
MVNMove and negate
SUBSubtraction
LDMLoad Multiple
MULMultiplication
STMStore Multiple
LSLLogical Shift Left
PUSHPush on Stack
LSRLogical Shift Right
POPPop off Stack
ASRArithmetic Shift Right
RORRotate Right
BLBranch with Link
BXBranch and eXchange
ANDBitwise AND
BLXBranch with Link and eXchange
ORRBitwise OR
SWI/SVCSystem Call

Barrel Shifter can be used to shrink multiple instructions into one.

  • Rx, ASR n: Register x with arithmetic shift right by n bits (1 = n = 32)
  • Rx, LSL n: Register x with logical shift left by n bits (0 = n = 31)
  • Rx, LSR n: Register x with logical shift right by n bits (1 = n = 32)
  • Rx, ROR n: Register x with rotate right by n bits (1 = n = 31)
  • Rx, RRX: Register x with rotate right by one bit, with extend

Examples:

ADD R0, R1, R2 // R1 + R2 -> R0 ADD R0, R1, #2 // R1 + 2 -> R0 LDR R2, [R0] // Use the address in R0 and load the data at the address into R2. LDR R1, [PC, #12] // Use the address in PC where the offset of the address is 12 and load the data at the address into R1. STR R2, [R1] // Store the value of R2 in to the address denoted by R1 STR r2, [r1, #4]! // R1 + 4 -> R1 // Store the varable in R2 in the new address in R1 with the offset of the address is 4. LDR r3, [r1], #4 // Load the value at memory address found in R1 to register R3. // R1 + 4 -> R1 STR r2, [r1, r2, LSL#2] // Store the value in R2 to the memory address in R1 with the offset R2 left-shifted by 2. STR r2, [r1, r2, LSL#2]! // R1 + R2<<2 -> R1 // Store the value in R2 to the new memory address found in R1. LDR r3, [r1], r2, LSL#2 // Load value at memory address found in R1 to the register R3. // R1 + R2<<2 -> R1 MOVLE R0, #5 // If LE (Less Than or Equal) is set 5 -> R0 MOV R0, R1, LSL #1 // Store left shifted R1 -> R0 adr r0, words+12 /* address of words[3] -> r0 */ ldr r1, array_buff_bridge /* address of array_buff[0] -> r1 */ ldr r2, array_buff_bridge+4 /* address of array_buff[2] -> r2 */ ldm r0, {r4,r5} /* words[3] -> r4 = 0x03; words[4] -> r5 = 0x04 */ stm r1, {r4,r5} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04 */ ldmia r0, {r4-r6} /* words[3] -> r4 = 0x03, words[4] -> r5 = 0x04; words[5] -> r6 = 0x05; */ stmia r1, {r4-r6} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04; r6 -> array_buff[2] = 0x05 */ ldmib r0, {r4-r6} /* words[4] -> r4 = 0x04; words[5] -> r5 = 0x05; words[6] -> r6 = 0x06 */ stmib r1, {r4-r6} /* r4 -> array_buff[1] = 0x04; r5 -> array_buff[2] = 0x05; r6 -> array_buff[3] = 0x06 */ ldmda r0, {r4-r6} /* words[3] -> r6 = 0x03; words[2] -> r5 = 0x02; words[1] -> r4 = 0x01 */ ldmdb r0, {r4-r6} /* words[2] -> r6 = 0x02; words[1] -> r5 = 0x01; words[0] -> r4 = 0x00 */ stmda r2, {r4-r6} /* r6 -> array_buff[2] = 0x02; r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00 */ stmdb r2, {r4-r5} /* r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00; */ push {r0, r1} pop {r2, r3} stmdb sp!, {r0, r1} ldmia sp!, {r4, r5}

Link to this headingIntermediate Values in ARM

Using any Intermediate value in arm can only be represented in 8bits with a bit shift throughout the 32bit.

MOV R0, #255 //Valid b1111111 << 0 MOV R0, #960 //Valid (0x3C0) = 0b00001111 << 6 = 0b1111000000 MOV R0, #961 //Invalid (0x3C1) = 0b1111000001

Link to this headingData Types

  • Signed data: Smaller Range of Numbers but can have negative

  • Unsigned data: Large Range including zero

  • ldr: Load Word

  • ldrh: Load unsigned Half Word

  • ldrsh: Load signed Half Word

  • ldrb: Load unsigned Byte

  • ldrsb: Load signed Bytes

  • str: Store Word

  • strh: Store unsigned Half Word

  • strsh: Store signed Half Word

  • strb: Store unsigned Byte

  • strsb: Store signed Byte

Link to this headingRegisters

  • 30 General Purpose 32-bit Registers
  • First 16 (R0-R15 General Purpose Registers) are accessible in User-Level Mode
    • R7 (Holds Syscall Number)
    • R11 (Base Frame Pointer) Points to the bottom of the stack
    • R12 (Intra Procedural Call)
    • R13 (Stack Pointer) Controls the Pointer to the top of the stack where the top element of the stack is.
    • R14 (Link Register) Used to store the Return address
    • R15 (Program Counter)
      • When a Branch/Jump is executed holds the destination address
      • Otherwise holds two arm instructions after the Current instruction (Older Arm processors fetched instructions two ahead and is kept to insure compatibility)
  • Control Program Status Register (CPSR)
    • Bit 0-4: (Processor/Privilege Mode)
    • Bit 5: (Thumb) 1 when in Thumb
    • Bit 6: (FIQ disable)
    • Bit 7: (IRQ disable)
    • Bit 8: (Abort disable)
    • Bit 9: (Endian-ness) 0 for little-endian 1 for big-endian
    • Bit 10-15: ???
    • Bit 16-19: ???
    • Bit 24: (Jazelle bit) Allows some ARM processors to execute Java bytecode in hardware.
    • Bit 25-26: ???
    • Bit 27: (Underflow)
    • Bit 28: (Overflow) Set when the result of an add, subtract, or compare is greater than or equal to 231, or less than 2^31.
    • Bit 29: (Carry)
      • Set when result of an addition is greater than or equal to 2^32
      • Set when result of a subtraction is positive or zero
      • Set when an inline barrel shifter operation in a move or logical instruction.
    • Bit 30: (Zero) 1 when result is zero
    • Bit 31: (Negative) 1 when result is negative

Example:

mov r0, #2 mov r1, #4 cmp r1, r0 // 4-2 Carry flag is set cmp r0, r1 // 2-4 Negative flag is set

Link to this headingConditionals

These conditionals below can be added to the end of any ARM instruction and will only execute when the flag is in the correct state.

Condition CodeMeaning (for cmp or subs)Status of Flags
GTSigned Greater Than(Z==0) && (N==V)
GESigned Greater Than or EqualN==V
LESigned Less Than or Equal(Z==1) || (N!=V)
CS or HSUnsigned Higher or Same (or Carry Set)C==1
CC or LOUnsigned Lower (or Carry Clear)C==0
MINegative (or Minus)N==1
PLPositive (or Plus)N==0
VCNo signed OverflowV==0
HIUnsigned Higher(C==1) && (Z==0)
LSUnsigned Lower or same(C==0) || (Z==0)

Example:

.global main main: mov r0, #2 # r0 = 2 cmp r0, #3 # r0 == 3 If false set Negative bit addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1 cmp r0, #3 # r0 == 3 If false set Zero bit and reset Negative bit addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1 bx lr # Branch to the lr register

Link to this headingIF-THEN-(Else) Conditional Instruction

This is a simple switch instruction for assembly

  • IT: refers to If-Then (If TRUE then execute the next instruction)
  • ITT: refers to If-Then-Then (If TRUE then execute the next 2 instructions)
  • ITE: refers to If-Then-Else (If TRUE then execute the next instruction, If FALSE skip the next instruction and execute the one after that)
  • ITTE: refers to If-Then-Then-Else (If TRUE then execute the next 2 instructions and skip the next one, If FALSE skip 2 instructions and execute the one after that)
  • ITTEE: refers to If-Then-Then-Else-Else (If TRUE then execute the next 2 instructions and skip the next 2 instructions after that, If FALSE skip 2 instructions and execute the two after that)

Example:

ITTE NE ; Next 3 instructions are conditional ANDNE R0, R0, R1 ; ANDNE does not update condition flags ADDSNE R2, R2, #1 ; ADDSNE updates condition flags MOVEQ R2, R3 ; Conditional move Where EQ is the Inverse of NE ITE GT ; Next 2 instructions are conditional ADDGT R1, R0, #55 ; Conditional addition in case the GT is true ADDLE R1, R0, #48 ; Conditional addition in case the GT is not true ITTEE EQ ; Next 4 instructions are conditional MOVEQ R0, R1 ; Conditional MOV ADDEQ R2, R2, #10 ; Conditional ADD ANDNE R3, R3, #1 ; Conditional AND BNE.W dloop ; Branch instruction can only be used in the last instruction of an IT block

Link to this headingBranching

Branch (B): Simple jump to a function
Branch link (BL): Saves the program counter (PC+4) in LR register and jumps to function
Branch exchange (BX): Simple jump to a function but switch instruction set (ARM <-> Thumb)
Branch link exchange (BLX): Saves the program counter (PC+4) in specified register and jumps to function

Switch THUMB Mode:

.text .global _start _start: .code 32 @ ARM mode add r2, pc, #1 @ put PC+1 into R2 bx r2 @ branch + exchange to R2 .code 16 @ Thumb mode mov r0, #1

Conditional Branch Example:

.text .global _start _start: mov r0, #2 # r0 = 2 mov r1, #2 # r1 = 2 add r0, r0, r1 # r0 = r0 + r1 cmp r0, #4 # if r0 = 4 beq func1 # if r0 = 4 jump to func1 add r1, #5 # Else r1 = r1 + 5 b func2 # jump to func2 func1: mov r1, r0 # r1 = r0 bx lr # jump to the address in lr func2: mov r0, r1 # r0 = r1 bx lr # jump to the address in lr

Link to this headingStack

Stack can be Grow up or down.

If the stack grows up it is a descending Stack.
If the stack grows down it is a ascending Stack.

If the stack points to an object then its a full stack
If the stack points to an null before the stack starts then its an empty stack.

Stack TypeStore InstructionLoad Instruction
Full descendingSTMFD (STMDB, Decrement Before)LDMFD (LDM, Increment after)
Full ascendingSTMFA (STMIB, Increment Before)LDMFA (LDMDA, Decrement After)
Empty descendingSTMED (STMDA, Decrement After)LDMED (LDMIB, Increment Before)
Empty ascendingSTMEA (STM, Increment after)LDMEA (LDMDB, Decrement Before)

Link to this headingThumb Mode

Thumb-1:

  • 16 bit Instructions
  • R15 Program Counter is always 2 bytes
  • Used in ARMv6 and earlier

Thumb-2:

  • Extends Thumb-1
  • 16 bit or 32 bit Instructions
    • 32bit instructions have a .w added to the instruction
  • Used in ARMv6T2, ARMv7
  • R15 Program Counter is always 2 bytes
  • Conditional Execution using the IT instruction

ThumbEE:

  • code compiled on the device either shortly before or during execution.

Link to this headingSwitching state

Switching to Thumb mode:

  1. Use the BX (Branch Exchange) or the BLX (Branch Link and Exchange) and set the least significant bit destination register to 1.
    • This does not cause alignment issues because the processor will ignore the last bit.
  2. We know that we are in Thumb mode if the T bit in the current program status register is set.

Link to this headingEmulating ARM with Unicorn

from __future__ import print_function from ctypes import sizeof from unicorn import * from unicorn.arm_const import * from unicorn.unicorn_const import * from capstone import * import struct, binascii #callback of the code hook def hook_code(uc, addr, size, user_data): mem = uc.mem_read(addr, size) disas_single(bytes(mem),addr) #disassembly each instruction and print the mnemonic name def disas_single(data,addr): for i in capmd.disasm(data,addr): print(f"0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}" % ()) break next_free_block = 0x0 def map_memory(unicorn_obj, map_data, align_size=(1024 * 1024), default_perm=UC_PROT_ALL ): for memory_loc, data_info in map_data.items(): #Set size if not set if data_info.get('size') == None: data_info["size"] = ((len(data_info["data"]) // align_size) + 1 ) * align_size #Set Permissions if not set if data_info.get('permissions') == None: data_info["permissions"] = default_perm #Check Memory map location if memory_loc < next_free_block: memory_loc = next_free_block #Map the memory to the unicorn obj unicorn_obj.mem_map(memory_loc, data_info["size"], perms=data_info["permissions"]) #Write the memory unicorn_obj.mem_write(ADDRESS, data_info["data"]) #Update the next possible write location next_free_block = memory_loc + data_info["size"] def get_address(map_data, tag_name): for memory_loc, data_info in map_data.items(): if data_info["tag"] == tag_name: return memory_loc #create a new instance of capstone capmd = Cs(UC_ARCH_ARM, UC_MODE_ARM) #code to be emulated in_file = open("u-boot.bin", "rb") # opening for [r]eading as [b]inary ARM_CODE32 = in_file.read() in_file.close() # file to be decrypted in_file = open("kernel.img.raw", "rb") # opening for [r]eading as [b]inary FILE_TOBE_DEC = in_file.read() in_file.close() print("Emulate ARM code") print("Shielder") try: # Initialize emulator in ARM-32bit mode # with "ARM" ARM instruction set mu = Uc(UC_ARCH_ARM, UC_MODE_ARM) #Map Memory from Dictionary #Uboot | Stack | RAM mem_map = { 0x80800000: {"tag": "uboot", "data": ARM_CODE32}, 0x00000000: {"tag": "stack", "data": b"\x00" * (2 * 1024 * 1024)}, 0x00000000: {"tag": "ram", "data": b"\x00" * (8 * 1024 * 1024)}} map_memory(mu, mem_map) # initialize machine registries mu.reg_write(UC_ARM_REG_SP, get_address(mem_map, "stack")) # first argument, memory pointer to the location of the file mu.reg_write(UC_ARM_REG_R0, get_address(mem_map, "ram")) # second argument, memory pointer to the location on which write the file mu.reg_write(UC_ARM_REG_R1, get_address(mem_map, "ram")) # third argument, block size to be read from memory pointed by r0 mu.reg_write(UC_ARM_REG_R2, 512) # hook any instruction and disassembly them with capstone mu.hook_add(UC_HOOK_CODE, hook_code) # emulate code in infinite time # Address + start/end of the block_aes_decrypt function # this trick save much headaches mu.emu_start(get_address(mem_map, "uboot")+0x8c40, get_address(mem_map, "uboot")+0x8c44) # now print out some registers print("Emulation done. Below is the CPU context") r_r0 = mu.reg_read(UC_ARM_REG_R0) r_r1 = mu.reg_read(UC_ARM_REG_R1) r_r2 = mu.reg_read(UC_ARM_REG_R2) r_pc = mu.reg_read(UC_ARM_REG_PC) print(f">>> r0 = 0x{r_r0:x}") print(f">>> r1 = 0x{r_r1:x}") print(f">>> r2 = 0x{r_r2:x}") print(f">>> pc = 0x{r_pc:x}") print("\nReading data from first 512byte of the RAM at: " + hex(get_address(mem_map, "ram"))) print("==== BEGIN ====") ram_data = mu.mem_read(get_address(mem_map, "ram"), 512) print(str(binascii.hexlify(ram_data))) print("==== END ====") # from the reversed binary, we know which are the magic bytes # at the beginning of the kernel if b"27051956" == binascii.hexlify(bytearray(ram_data[:4])): print("\nMagic Bytes match :)\n\n") with open("test.bin", "wb") as f: f.write(ram_data) except UcError as e: print("ERROR: %s" % e)