主流的旗舰Android手机已经尽数升级到64位,相应的,内核镜像zImage也发生了改变。如果想要用IDA Pro逆向分析arm64的手机内核,特别是完成内核符号的加载,着实需要折腾一番功夫。
从/dev/block或ROM包中提取boot.img,然后用abootimg -x解开得到zImage
如果zImage是gzip压缩的,就gzip -d解压得到kernel
以上两部都是常规项目,下面重点是要从kernel中提取本应显示在/proc/kallsyms下的内核符号,这样IDA Pro加载分析时才更得心应手。参考Bits, Please!的文章中32位的kernel符号提取方法,可以很快想到64位的解决方案:
首先要知道内核加载时的虚拟地址,一种投机的方法是,手机开机后执行:
shell@surabaya:/ $ dmesg ... [ 0.000000] Virtual kernel memory layout: [ 0.000000] vmalloc : 0xffffff8000000000 - 0xffffffbdbfff0000 ( 246 GB) [ 0.000000] vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) [ 0.000000] PCI I/O : 0xffffffbffa000000 - 0xffffffbffb000000 ( 16 MB) [ 0.000000] fixed : 0xffffffbffbdfe000 - 0xffffffbffbdff000 ( 4 KB) [ 0.000000] modules : 0xffffffbffc000000 - 0xffffffc000000000 ( 64 MB) [ 0.000000] memory : 0xffffffc000000000 - 0xffffffc0fe550000 ( 4069 MB) [ 0.000000] .init : 0xffffffc001600000 - 0xffffffc001813000 ( 2124 KB) [ 0.000000] .text : 0xffffffc000080000 - 0xffffffc001600000 ( 22016 KB) [ 0.000000] .data : 0xffffffc00181d000 - 0xffffffc001995f80 ( 1508 KB) ...
由于现在手机还没有开启KASLR,所以基地址基本上总是0xffffffc000080000,有了这个地址就可以从kernel中找到symbol table了。内核导出的前两个符号stext,_text等总是指向0xffffffc000080000,所以搜索连续的两个0xffffffc000080000就能找到symbol table。之后按照Bits, Please!的方法就可以导出所有符号了,唯一要注意的是32位到64位,地址长度变成了8字节,内存对齐也从0x10变成了0x100。修改原来的Python脚本,开发了一个arm64解析符号的脚本:
import sys import struct #The default address at which the kernel text segment is loaded DEFAULT_KERNEL_TEXT_START = 0xffffffc000080000 #The size of the QWORD in a 64-bit architecture QWORD_SIZE = struct.calcsize("Q") #The size of the DWORD in a 32-bit architecture DWORD_SIZE = struct.calcsize("I") #The size of the WORD in a 32-bit architecture WORD_SIZE = struct.calcsize("H") #The alignment of labels in the resulting kernel file LABEL_ALIGN = 0x100 #The minimal number of repeating addresses pointing to the kernel's text start address #which are used as a heuristic in order to find the beginning of the kernel's symbol #table. Since usually there are at least two symbols pointing to the beginning of the #text segment ("stext", "_text"), the minimal number for the heuristic is 2. KALLSYMS_ADDRESSES_MIN_HEURISTIC = 2 def read_qword(kernel_data, offset): ''' Reads a DWORD from the given offset within the kernel data ''' return struct.unpack("<Q", kernel_data[offset : offset + QWORD_SIZE])[0] def read_dword(kernel_data, offset): ''' Reads a DWORD from the given offset within the kernel data ''' return struct.unpack("<I", kernel_data[offset : offset + DWORD_SIZE])[0] def read_word(kernel_data, offset): ''' Reads a WORD from the given offset within the kernel data ''' return struct.unpack("<H", kernel_data[offset : offset + WORD_SIZE])[0] def read_byte(kernel_data, offset): ''' Reads an unsigned byte from the given offset within the kernel data ''' return struct.unpack("<B", kernel_data[offset : offset + 1])[0] def read_c_string(kernel_data, offset): ''' Reads a NUL-delimited C-string from the given offset ''' current_offset = offset result_str = "" while kernel_data[current_offset] != '\x00': result_str += kernel_data[current_offset] current_offset += 1 return result_str def label_align(address): ''' Aligns the given value to the closest label output boundry ''' return address & ~(LABEL_ALIGN-1) def find_kallsyms_addresses(kernel_data, kernel_text_start): ''' Searching for the beginning of the kernel's symbol table Returns the offset of the kernel's symbol table, or -1 if the symbol table could not be found ''' search_str = struct.pack("<Q", DEFAULT_KERNEL_TEXT_START) * KALLSYMS_ADDRESSES_MIN_HEURISTIC return kernel_data.find(search_str) def get_kernel_symbol_table(kernel_data, kernel_text_start): ''' Retrieves the kernel's symbol table from the given kernel file ''' #Getting the beginning and end of the kallsyms_addresses table kallsyms_addresses_off = find_kallsyms_addresses(kernel_data, kernel_text_start) kallsyms_addresses_end_off = kernel_data.find(struct.pack("<Q", 0), kallsyms_addresses_off) num_symbols = (kallsyms_addresses_end_off - kallsyms_addresses_off) / QWORD_SIZE #Making sure that kallsyms_num_syms matches the table size kallsyms_num_syms_off = label_align(kallsyms_addresses_end_off + LABEL_ALIGN) kallsyms_num_syms = read_qword(kernel_data, kallsyms_num_syms_off) if kallsyms_num_syms != num_symbols: print "[-] Actual symbol table size: %d, read symbol table size: %d" % (num_symbols, kallsyms_num_syms) return None #Calculating the location of the markers table kallsyms_names_off = label_align(kallsyms_num_syms_off + LABEL_ALIGN) current_offset = kallsyms_names_off for i in range(0, num_symbols): current_offset += read_byte(kernel_data, current_offset) + 1 kallsyms_markers_off = label_align(current_offset + LABEL_ALIGN) #Reading the token table ''' Not sure if this can be a universal solution ''' kallsyms_token_table_off = label_align(kernel_data.find(struct.pack("<Q", 0)*2, kallsyms_markers_off)+LABEL_ALIGN) ## kallsyms_token_table_off = label_align(kallsyms_markers_off + (((num_symbols + 255) >> 8) * QWORD_SIZE)) current_offset = kallsyms_token_table_off for i in range(0, 256): token_str = read_c_string(kernel_data, current_offset) current_offset += len(token_str) + 1 kallsyms_token_index_off = label_align(current_offset + LABEL_ALIGN) #Creating the token table token_table = [] for i in range(0, 256): index = read_word(kernel_data, kallsyms_token_index_off + i * WORD_SIZE) token_table.append(read_c_string(kernel_data, kallsyms_token_table_off + index)) #Decompressing the symbol table using the token table offset = kallsyms_names_off symbol_table = [] for i in range(0, num_symbols): num_tokens = read_byte(kernel_data, offset) offset += 1 symbol_name = "" for j in range(num_tokens, 0, -1): token_table_idx = read_byte(kernel_data, offset) symbol_name += token_table[token_table_idx] offset += 1 symbol_address = read_qword(kernel_data, kallsyms_addresses_off + i * QWORD_SIZE) symbol_table.append((symbol_address, symbol_name[0], symbol_name[1:])) return symbol_table def main(): #Verifying the arguments if len(sys.argv) < 2: print "USAGE: %s: <KERNEL_FILE> [optional: <0xKERNEL_TEXT_START>]" % sys.argv[0] return kernel_data = open(sys.argv[1], "rb").read() kernel_text_start = int(sys.argv[2], 16) if len(sys.argv) == 3 else DEFAULT_KERNEL_TEXT_START #Getting the kernel symbol table symbol_table = get_kernel_symbol_table(kernel_data, kernel_text_start) fp = open("syms","wb") for symbol in symbol_table: print "%016X %s %s" % symbol fp.write("%016X %s %s\n" % symbol) fp.close() if __name__ == "__main__": main()
输出的符号会按照/proc/kallsyms打印出来,同时会写入当前目录syms文件。接下来就是让IDA Pro识别syms文件了,我的做法是针对每个符号尝试给特定地址重命名,如果失败就undefine以后再试一次,对于代码段的函数都重新makecode一次:
lines = open("syms","rb").read().split("\n") for line in lines: [addr, type, name] = line.split(" ") if not MakeNameEx(int(addr,16), name, SN_NOWARN): MakeUnkn(int(addr,16),1) MakeNameEx(int(addr,16), name, SN_NOWARN) if type == "t" or type=="T": MakeUnkn(int(addr,16),1) MakeCode(int(addr,16))