Skip to content

ARM

ARM

  • Reduced Instruction Set Computing (RISC)
    • Less than 100 Instructions
    • Instructions only operate on Registers
    • ONLY Load/Store instructions can access memory.
  • Instructions can be used for Continual Execution
  • ARMv3 and earlier use little-endian format for data
  • ARMv4 and later use Big-endian format by default but allows for switchable endian-ness for data
  • Uses little-endian format for Instructions

ARM Family ARM Architecture
ARM7 ARM v4
ARM9 ARM v5
ARM11 ARM v6
Cortex-A ARM v7-A
Cortex-R ARM v7-R
Cortex-M ARM v7-M

ARM Mode:
- R15 Program Counter is always 4 bytes

Writing Assembly

Use as to transform ASM file to object file
Use ld to link object files to binary

as program.s -o program.o
ld program.o -o program

.string is null terminated
.ascii in not null terminated

Instructions

Instruction Description
MOV Move data
EOR Bitwise XOR
MVN Move and negate
LDR Load
ADD Addition
STR Store
SUB Subtraction
LDM Load Multiple
MUL Multiplication
STM Store Multiple
LSL Logical Shift Left
PUSH Push on Stack
LSR Logical Shift Right
POP Pop off Stack
ASR Arithmetic Shift Right
B Branch
ROR Rotate Right
BL Branch with Link
CMP Compare
BX Branch and eXchange
AND Bitwise AND
BLX Branch with Link and eXchange
ORR Bitwise OR
SWI/SVC System Call

Barrel Shifter can be used to shrink multiple instructions into one.

  • Rx, ASR n: Register x with arithmetic shift right by n bits (1 = n = 32)
  • Rx, LSL n: Register x with logical shift left by n bits (0 = n = 31)
  • Rx, LSR n: Register x with logical shift right by n bits (1 = n = 32)
  • Rx, ROR n: Register x with rotate right by n bits (1 = n = 31)
  • Rx, RRX: Register x with rotate right by one bit, with extend

Examples:

ADD   R0, R1, R2     // R1 + R2 -> R0
ADD   R0, R1, #2     // R1  + 2 -> R0

LDR R2, [R0]         // Use the address in R0 and load the data at the address into R2.
LDR R1, [PC, #12]    // Use the address in PC where the offset of the address is 12 and load the data at the address into R1.
STR R2, [R1]         // Store the value of R2 in to the address denoted by R1


STR r2, [r1, #4]!    // R1 + 4 -> R1
                     // Store the varable in R2 in the new address in R1 with the offset of the address is 4.
LDR r3, [r1], #4     // Load the value at memory address found in R1 to register R3.
                     //  R1 + 4 -> R1
   
STR r2, [r1, r2, LSL#2]  // Store the value in R2 to the memory address in R1 with the offset R2 left-shifted by 2.
STR r2, [r1, r2, LSL#2]! // R1 + R2<<2 -> R1
						 // Store the value in R2 to the new memory address found in R1.
LDR r3, [r1], r2, LSL#2  // Load value at memory address found in R1 to the register R3. 
                         //  R1 + R2<<2 -> R1



MOVLE R0, #5         // If LE (Less Than or Equal) is set   5 -> R0 
MOV   R0, R1, LSL #1 // Store left shifted R1 -> R0



 adr r0, words+12             /* address of words[3] -> r0 */
 ldr r1, array_buff_bridge    /* address of array_buff[0] -> r1 */
 ldr r2, array_buff_bridge+4  /* address of array_buff[2] -> r2 */
 ldm r0, {r4,r5}              /* words[3] -> r4 = 0x03; words[4] -> r5 = 0x04 */
 stm r1, {r4,r5}              /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04 */
 ldmia r0, {r4-r6}            /* words[3] -> r4 = 0x03, words[4] -> r5 = 0x04; words[5] -> r6 = 0x05; */
 stmia r1, {r4-r6}            /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04; r6 -> array_buff[2] = 0x05 */
 ldmib r0, {r4-r6}            /* words[4] -> r4 = 0x04; words[5] -> r5 = 0x05; words[6] -> r6 = 0x06 */
 stmib r1, {r4-r6}            /* r4 -> array_buff[1] = 0x04; r5 -> array_buff[2] = 0x05; r6 -> array_buff[3] = 0x06 */
 ldmda r0, {r4-r6}            /* words[3] -> r6 = 0x03; words[2] -> r5 = 0x02; words[1] -> r4 = 0x01 */
 ldmdb r0, {r4-r6}            /* words[2] -> r6 = 0x02; words[1] -> r5 = 0x01; words[0] -> r4 = 0x00 */
 stmda r2, {r4-r6}            /* r6 -> array_buff[2] = 0x02; r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00 */
 stmdb r2, {r4-r5}            /* r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00; */

 push {r0, r1}
 pop {r2, r3}
 stmdb sp!, {r0, r1}
 ldmia sp!, {r4, r5}

Intermediate Values in ARM

Using any Intermediate value in arm can only be represented in 8bits with a bit shift throughout the 32bit.

MOV R0, #255 //Valid b1111111 << 0
MOV R0, #960 //Valid (0x3C0) = 0b00001111 << 6 = 0b1111000000
MOV R0, #961 //Invalid (0x3C1) = 0b1111000001

Data Types

  • Signed data: Smaller Range of Numbers but can have negative
  • Unsigned data: Large Range including zero
  • ldr: Load Word
  • ldrh: Load unsigned Half Word
  • ldrsh: Load signed Half Word
  • ldrb: Load unsigned Byte
  • ldrsb: Load signed Bytes
  • str: Store Word
  • strh: Store unsigned Half Word
  • strsh: Store signed Half Word
  • strb: Store unsigned Byte
  • strsb: Store signed Byte

Registers

  • 30 General Purpose 32-bit Registers
  • First 16 (R0-R15 General Purpose Registers) are accessible in User-Level Mode
    • R7 (Holds Syscall Number)
    • R11 (Base Frame Pointer) Points to the bottom of the stack
    • R12 (Intra Procedural Call)
    • R13 (Stack Pointer) Controls the Pointer to the top of the stack where the top element of the stack is.
    • R14 (Link Register) Used to store the Return address
    • R15 (Program Counter)
      • When a Branch/Jump is executed holds the destination address
      • Otherwise holds two arm instructions after the Current instruction (Older Arm processors fetched instructions two ahead and is kept to insure compatibility)
  • Control Program Status Register (CPSR)
    • Bit 0-4: (Processor/Privilege Mode)
    • Bit 5: (Thumb) 1 when in Thumb
    • Bit 6: (FIQ disable)
    • Bit 7: (IRQ disable)
    • Bit 8: (Abort disable)
    • Bit 9: (Endian-ness) 0 for little-endian 1 for big-endian
    • Bit 10-15: ???
    • Bit 16-19: ???
    • Bit 24: (Jazelle bit) Allows some ARM processors to execute Java bytecode in hardware.
    • Bit 25-26: ???
    • Bit 27: (Underflow)
    • Bit 28: (Overflow) Set when the result of an add, subtract, or compare is greater than or equal to 231, or less than 2^31.
    • Bit 29: (Carry)
      • Set when result of an addition is greater than or equal to 2^32
      • Set when result of a subtraction is positive or zero
      • Set when an inline barrel shifter operation in a move or logical instruction.
    • Bit 30: (Zero) 1 when result is zero
    • Bit 31: (Negative) 1 when result is negative

Example:

mov r0, #2
mov r1, #4
cmp r1, r0 // 4-2 Carry flag is set
cmp r0, r1 // 2-4 Negative flag is set

Conditionals

These conditionals below can be added to the end of any ARM instruction and will only execute when the flag is in the correct state.

Condition Code Meaning (for cmp or subs) Status of Flags
EQ Equal Z==1
NE Not Equal Z==0
GT Signed Greater Than (Z==0) && (N==V)
LT Signed Less Than N!=V
GE Signed Greater Than or Equal N==V
LE Signed Less Than or Equal (Z==1) \ \ (N!=V)
CS or HS Unsigned Higher or Same (or Carry Set) C==1
CC or LO Unsigned Lower (or Carry Clear) C==0
MI Negative (or Minus) N==1
PL Positive (or Plus) N==0
AL Always executed
NV Never executed
VS Signed Overflow V==1
VC No signed Overflow V==0
HI Unsigned Higher (C==1) && (Z==0)
LS Unsigned Lower or same (C==0) \ \ (Z==0)

Example:

.global main

main:
        mov     r0, #2     # r0 = 2
        cmp     r0, #3     # r0 == 3 If false set Negative bit
        addlt   r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
        cmp     r0, #3     # r0 == 3 If false set Zero bit and reset Negative bit
        addlt   r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
        bx      lr         # Branch to the lr register

IF-THEN-(Else) Conditional Instruction

This is a simple switch instruction for assembly

  • IT: refers to If-Then (If TRUE then execute the next instruction)
  • ITT: refers to If-Then-Then (If TRUE then execute the next 2 instructions)
  • ITE: refers to If-Then-Else (If TRUE then execute the next instruction, If FALSE skip the next instruction and execute the one after that)
  • ITTE: refers to If-Then-Then-Else (If TRUE then execute the next 2 instructions and skip the next one, If FALSE skip 2 instructions and execute the one after that)
  • ITTEE: refers to If-Then-Then-Else-Else (If TRUE then execute the next 2 instructions and skip the next 2 instructions after that, If FALSE skip 2 instructions and execute the two after that)

Example:

ITTE   NE           ; Next 3 instructions are conditional
ANDNE  R0, R0, R1   ; ANDNE does not update condition flags
ADDSNE R2, R2, #1   ; ADDSNE updates condition flags
MOVEQ  R2, R3       ; Conditional move Where EQ is the Inverse of NE

ITE    GT           ; Next 2 instructions are conditional
ADDGT  R1, R0, #55  ; Conditional addition in case the GT is true
ADDLE  R1, R0, #48  ; Conditional addition in case the GT is not true

ITTEE  EQ           ; Next 4 instructions are conditional
MOVEQ  R0, R1       ; Conditional MOV
ADDEQ  R2, R2, #10  ; Conditional ADD
ANDNE  R3, R3, #1   ; Conditional AND
BNE.W  dloop        ; Branch instruction can only be used in the last instruction of an IT block

Branching

Branch (B): Simple jump to a function
Branch link (BL): Saves the program counter (PC+4) in LR register and jumps to function
Branch exchange (BX): Simple jump to a function but switch instruction set (ARM <-> Thumb)
Branch link exchange (BLX): Saves the program counter (PC+4) in specified register and jumps to function

Switch THUMB Mode:

.text
.global _start

_start:
     .code 32         @ ARM mode
     add r2, pc, #1   @ put PC+1 into R2
     bx r2            @ branch + exchange to R2

    .code 16          @ Thumb mode
     mov r0, #1

Conditional Branch Example:

.text
.global _start

_start:
   mov r0, #2     # r0 = 2
   mov r1, #2     # r1 = 2
   add r0, r0, r1 # r0 = r0 + r1
   cmp r0, #4     # if r0 = 4
   beq func1      # if r0 = 4 jump to func1
   add r1, #5     # Else r1 = r1 + 5
   b func2        # jump to func2
func1:
   mov r1, r0     # r1 = r0
   bx  lr         # jump to the address in lr
func2:           
   mov r0, r1     # r0 = r1
   bx  lr         # jump to the address in lr

Stack

Stack can be Grow up or down.

If the stack grows up it is a descending Stack.
If the stack grows down it is a ascending Stack.

If the stack points to an object then its a full stack
If the stack points to an null before the stack starts then its an empty stack.

Stack Type Store Instruction Load Instruction
Full descending STMFD (STMDB, Decrement Before) LDMFD (LDM, Increment after)
Full ascending STMFA (STMIB, Increment Before) LDMFA (LDMDA, Decrement After)
Empty descending STMED (STMDA, Decrement After) LDMED (LDMIB, Increment Before)
Empty ascending STMEA (STM, Increment after) LDMEA (LDMDB, Decrement Before)

Thumb Mode

Thumb-1:
- 16 bit Instructions
- R15 Program Counter is always 2 bytes
- Used in ARMv6 and earlier

Thumb-2:
- Extends Thumb-1
- 16 bit or 32 bit Instructions
- 32bit instructions have a .w added to the instruction
- Used in ARMv6T2, ARMv7
- R15 Program Counter is always 2 bytes
- Conditional Execution using the IT instruction

ThumbEE:
- code compiled on the device either shortly before or during execution.

Switching state

Switching to Thumb mode:
1. Use the BX (Branch Exchange) or the BLX (Branch Link and Exchange) and set the least significant bit destination register to 1.
- This does not cause alignment issues because the processor will ignore the last bit.
2. We know that we are in Thumb mode if the T bit in the current program status register is set.

Emulating ARM with Unicorn

from __future__ import print_function
from ctypes import sizeof
from unicorn import *
from unicorn.arm_const import *
from unicorn.unicorn_const import *
from capstone import *
import struct, binascii

#callback of the code hook
def hook_code(uc, addr, size, user_data): 
   mem = uc.mem_read(addr, size)
   disas_single(bytes(mem),addr)

#disassembly each instruction and print the mnemonic name
def disas_single(data,addr):
      for i in capmd.disasm(data,addr):
         print(f"0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}" % ())
         break
         
next_free_block = 0x0
def map_memory(unicorn_obj, map_data, align_size=(1024 * 1024), default_perm=UC_PROT_ALL ):
    for memory_loc, data_info in map_data.items():
        #Set size if not set
        if data_info.get('size') == None:
            data_info["size"] = ((len(data_info["data"]) // align_size) + 1 ) * align_size

        #Set Permissions if not set
        if data_info.get('permissions') == None:
            data_info["permissions"] = default_perm

        #Check Memory map location
        if memory_loc < next_free_block:
            memory_loc = next_free_block

        #Map the memory to the unicorn obj
        unicorn_obj.mem_map(memory_loc, data_info["size"], perms=data_info["permissions"])

        #Write the memory
        unicorn_obj.mem_write(ADDRESS, data_info["data"])

        #Update the next possible write location
        next_free_block = memory_loc + data_info["size"]

def get_address(map_data, tag_name):
    for memory_loc, data_info in map_data.items():
        if data_info["tag"] == tag_name:
            return memory_loc


#create a new instance of capstone
capmd = Cs(UC_ARCH_ARM, UC_MODE_ARM) 

#code to be emulated
in_file = open("u-boot.bin", "rb") # opening for [r]eading as [b]inary
ARM_CODE32 = in_file.read()
in_file.close()

# file to be decrypted
in_file = open("kernel.img.raw", "rb") # opening for [r]eading as [b]inary
FILE_TOBE_DEC = in_file.read()
in_file.close()

print("Emulate ARM code")
print("Shielder")
try:
    # Initialize emulator in ARM-32bit mode
    # with "ARM" ARM instruction set
    mu = Uc(UC_ARCH_ARM, UC_MODE_ARM)


    #Map Memory from Dictionary
    #Uboot | Stack | RAM
    mem_map = { 0x80800000: {"tag": "uboot", "data": ARM_CODE32},
                0x00000000: {"tag": "stack", "data": b"\x00" * (2 * 1024 * 1024)},
                0x00000000: {"tag": "ram",   "data": b"\x00" * (8 * 1024 * 1024)}}

    map_memory(mu, mem_map)

        
    # initialize machine registries
    mu.reg_write(UC_ARM_REG_SP, get_address(mem_map, "stack"))
    # first argument, memory pointer to the location of the file
    mu.reg_write(UC_ARM_REG_R0, get_address(mem_map, "ram"))
    # second argument, memory pointer to the location on which write the file
    mu.reg_write(UC_ARM_REG_R1, get_address(mem_map, "ram")) 
    # third argument, block size to be read from memory pointed by r0
    mu.reg_write(UC_ARM_REG_R2, 512) 

    # hook any instruction and disassembly them with capstone
    mu.hook_add(UC_HOOK_CODE, hook_code)

    # emulate code in infinite time
    # Address + start/end of the block_aes_decrypt function
    # this trick save much headaches
    mu.emu_start(get_address(mem_map, "uboot")+0x8c40, get_address(mem_map, "uboot")+0x8c44) 

    # now print out some registers
    print("Emulation done. Below is the CPU context")

    r_r0 = mu.reg_read(UC_ARM_REG_R0)
    r_r1 = mu.reg_read(UC_ARM_REG_R1)
    r_r2 = mu.reg_read(UC_ARM_REG_R2)
    r_pc = mu.reg_read(UC_ARM_REG_PC)
    print(f">>> r0 = 0x{r_r0:x}")
    print(f">>> r1 = 0x{r_r1:x}")
    print(f">>> r2 = 0x{r_r2:x}")
    print(f">>> pc = 0x{r_pc:x}")

    print("\nReading data from first 512byte of the RAM at: " + hex(get_address(mem_map, "ram")))
    print("==== BEGIN ====")
    ram_data = mu.mem_read(get_address(mem_map, "ram"), 512)
    print(str(binascii.hexlify(ram_data)))
    print("==== END ====")

    # from the reversed binary, we know which are the magic bytes
    # at the beginning of the kernel
    if b"27051956" == binascii.hexlify(bytearray(ram_data[:4])):
        print("\nMagic Bytes match :)\n\n")
        with open("test.bin", "wb") as f:
            f.write(ram_data)

except UcError as e:
    print("ERROR: %s" % e)