Overview
VM generates a crash report tombstone when there is a processor-defined exception (signal from the kernel – SIGILL, SIGABRT, SIGBUS, SIGFPE, SIGSEGV, SIGSTKFLT).
Once the core processor sees the problem, the kernel sends a signal to the debuggerd daemon socket in user space (system/core/debuggerd.c and bionic/linker/debugger.c).
The daemon generates a Tombstone Log (/data/tombstones/tombstone_xx) by getting a stack dump from each user process and the current CPU register value.
这里将研究Android Tombstone机制以及相应的异常机制
- exception如何产生
- SIGILL, SIGABRT, SIGBUS, SIGFPE, SIGSEGV, SIGSTKFLT如何传递?
- Tombstone如何产生?
- Tombstone如何解析?
exception如何产生?
To be continued...
Tombstone 原因-Framework Reboot
- Watchdog killing system process
- Fatal exception in one of system_server’s threads
- Excessive JNI references
SIGILL, SIGABRT, SIGBUS, SIGFPE, SIGSEGV, SIGSTKFLT如何传递?
To be continued...
Tombstone如何产生?
Two Files.
bionic/linker/debugger.c
system/core/debuggerd.c
bionic/linker/debugger.c
Step 1: android Dynamic Linker
Android 的加载/链接器linker 主要用于实现共享库的加载与链接。它支持应用程序对库函数的隐式和显式调用。
- 对于隐式调用,应用程序的编译与静态库大致相同,只是在静态链接的时候通过--dynamic-linker /system/bin/linker 指定动态链接器,(该信息将被存放在ELF文件的.interp节中,内核执行目标映像文件前将通过该信息加载并运行相应的解释器程序linker.)并链接相应的共享库。与ld.so不同的是,Linker目前没有提供Lazy Binding机制,所有外部过程引用都在映像执行之前解析。
- 对于显式调用,可以同过linker中提供的接口dlopen,dlsym,dlerror和dlclose来动态加载和链接共享库。
代码bionic/linker/linker.c
生成/system/bin/linker
也就是说,在加载过程时已为进程设定了信号处理方式。
Step 2: Called when a signal is received from Kernel, uses socket() to connect to The “android:debuggerd” socket, and write()s to the socket
__linker_init() (bionic/linker/linker.c) -> debugger_init() (bionic/linker/debugger.c)-> debugger_signal_handler()
void debugger_init()
{
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_sigaction = debugger_signal_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO;
sigemptyset(&act.sa_mask);
sigaction(SIGILL, &act, NULL);
sigaction(SIGABRT, &act, NULL);
sigaction(SIGBUS, &act, NULL);
sigaction(SIGFPE, &act, NULL);
sigaction(SIGSEGV, &act, NULL);
sigaction(SIGSTKFLT, &act, NULL);
sigaction(SIGPIPE, &act, NULL);
}
采用Socke方式进行进程间通信
/ * Catches fatal signals so we can ask debuggerd to ptrace us before * we crash. /
void debugger_signal_handler(int n, siginfo_t* info, void* unused)
{
unsigned tid;
int s;
logSignalSummary(n, info);
tid = gettid();
s = socket_abstract_client("android:debuggerd", SOCK_STREAM);
if(s >= 0) {
/* debugger knows our pid from the credentials on the
* local socket but we need to tell it our tid. It
* is paranoid and will verify that we are giving a tid
* that's actually in our process
*/
int ret;
RETRY_ON_EINTR(ret, write(s, &tid, sizeof(unsigned)));
if (ret == sizeof(unsigned)) {
/* if the write failed, there is no point to read on
* the file descriptor. */
RETRY_ON_EINTR(ret, read(s, &tid, 1));
notify_gdb_of_libraries();
}
close(s);
}
/* remove our net so we fault for real when we return */
signal(n, SIG_DFL);
}
可以看到logSignalSummary将输出如下示例信息:
F/libc ( 1373): Fatal signal 11 (SIGSEGV) at 0x0000055d (code=0)
system/core/debuggerd.c
Step 1:The debuggerd daemon creates a socket server android:debuggerd and loops forever,waiting for some client to write into the socket
main()(system/core/debuggerd/debuggerd.c)
Step 2:Dump stack trace and registers in /data/tombstone/
handle_crashing_process()->engrave_tombstone() (system/core/debuggerd/debuggerd.c)
if(WIFSTOPPED(status)){
n = WSTOPSIG(status);
switch(n) {
case SIGSTOP:
XLOG("stopped -- continuing\n");
n = ptrace(PTRACE_CONT, tid, 0, 0);
if(n) {
LOG("ptrace failed: %s\n", strerror(errno));
goto done;
}
continue;
case SIGABRT:
isAnr = true;
case SIGILL:
case SIGBUS:
case SIGFPE:
case SIGSEGV:
case SIGSTKFLT: {
XLOG("stopped -- fatal signal\n");
need_cleanup = engrave_tombstone(cr.pid, tid, debug_uid, n, isAnr);
kill(tid, SIGSTOP);
goto done;
}
default:
XLOG("stopped -- unexpected signal\n");
goto done;
}
} else {
XLOG("unexpected waitpid response\n");
goto done;
}
内容
保存位置
static int find_and_open_tombstone(bool isAnr)
snprintf(path, sizeof(path), TOMBSTONE_DIR"/tombstone%s_%02d", isAnr == true? "NoCrash":"", oldest);
如果是ANR类型引起的,即SIGABRT(源自anrNoResponding),那么就会生成
case SIGABRT: isAnr = true;
/tombstoneNoCrash_xx
- 如果不是ANR类型引起的,即SIGILL、SIGBUS、SIGFPE、SIGSEGV、SIGSTKFLT。那么就会生成
/tombstone_xx
保存内容
dump_crash_banner(fd, pid, tid, signal);
dump_crash_report(fd, pid, tid, true);
dump_logs(fd, pid, true);
dump_sibling_thread_report(fd, pid, tid);
dump_logs(fd, pid, false);
内容包括编译版本信息、进程和线程号码以及进程名字、异常信号信息、寄存器信息、PC和LR指针附近的代码、线程堆栈信息、Logcat中system和main的缓冲区中与崩溃进程有关的调试信息。
void dump_crash_banner(int tfd, unsigned pid, unsigned tid, int sig)
{
char data[1024];
char *x = 0;
FILE *fp;
sprintf(data, "/proc/%d/cmdline", pid);
fp = fopen(data, "r");
if(fp) {
x = fgets(data, 1024, fp);
fclose(fp);
}
_LOG(tfd, false,
"*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***\n");
dump_build_info(tfd);// ***property_get("ro.build.fingerprint", fingerprint, "unknown");***
_LOG(tfd, false, "pid: %d, tid: %d >>> %s <<<\n",
pid, tid, x ? x : "UNKNOWN");
if(sig) dump_fault_addr(tfd, tid, sig);
}
void dump_crash_report(int tfd, unsigned pid, unsigned tid, bool at_fault)
dump_registers(tfd, tid, at_fault);
parse_elf_info(milist, tid);
dump_pc_and_lr(tfd, tid, milist, stack_depth, at_fault);
dump_stack_and_code(tfd, tid, milist, stack_depth, sp_list, at_fault);
dump_dalvik(tfd, milist, tid, at_fault);
/*
* Dumps the logs generated by the specified pid to the tombstone, from both
* "system" and "main" log devices. Ideally we'd interleave the output.
*/
static void dump_logs(int tfd, unsigned pid, bool tailOnly)
{
dump_log_file(tfd, pid, "/dev/log/system", tailOnly);
dump_log_file(tfd, pid, "/dev/log/main", tailOnly);
}
tombstones 例子
Build fingerprint: 'xxx'
pid: 1373, tid: 1373 >>> com.android.development <<<
signal 11 (SIGSEGV), code 0 (?), fault addr 0000055d
r0 00000000 r1 0000000b r2 0000055d r3 0000000b
r4 4bbe49d8 r5 00958e38 r6 00000000 r7 00000025
r8 be8955e0 r9 4baded9c 10 00000008 fp be8955f4
ip 4021ae3c sp be8955c0 lr 401db0df pc 400bd8d0 cpsr 20000010
d0 0000000000000000 d1 0000000000000000
d2 0000000000000000 d3 0000000000000000
d4 0000000000000000 d5 0000000000000000
d6 0000000043200000 d7 0000000000000000
d8 0000000000000000 d9 440b000043150000
d10 000000000000022c d11 0000000000000000
d12 0000000000000000 d13 0000000000000000
d14 0000000000000000 d15 0000000000000000
d16 0000000040aa0eb8 d17 3f80000041808889
d18 3f80000041c4cccd d19 0701070100700798
d20 0000000000000c07 d21 0000043f00890000
d22 0000000000080008 d23 0000000000000008
d24 0078007700760074 d25 007a0078007a0079
d26 0000000000000000 d27 0000000000000000
d28 0048004700460044 d29 004a0048004a0049
d30 007a007a007a007a d31 0000000000000000
scr 80000017
#00 pc 0000d8d0 /system/lib/libc.so (kill)
#01 pc 000660dc /system/lib/libandroid_runtime.so (_Z29android_os_Process_sendSignalP7_JNIEnvP8_jobjectii)
code around pc:
400bd8b0 e2601000 e0100001 116f0f10 12600020 ..`.......o. .`.
400bd8c0 e12fff1e e92d50f0 e3a07025 ef000000 ../..P-.%p......
400bd8d0 e8bd50f0 e1b00000 512fff1e ea00b264 .P......../Qd...
400bd8e0 e92d50f0 e3a070ee ef000000 e8bd50f0 .P-..p.......P..
400bd8f0 e1b00000 512fff1e ea00b25d f5d0f000 ....../Q].......
code around lr:
401db0bc 0003ecaa b5102a00 4610dd03 f7d34619 .....*.....F.F..
401db0cc bd10e886 b5102a00 4610dd03 f7d34619 .....*.....F.F..
401db0dc bd10e87e 4601b513 4611b91a e820f7d1 ~......F...F.. .
401db0ec ac01e00c f7fe4620 9b01fce7 681ab133 .... F......3..h
401db0fc f8524621 18180c0c eebcf7d0 bf00bd1c !FR.............
stack:
be895580 40aa0eb8 /dev/ashmem/dalvik-heap (deleted)
be895584 be8955c8 [stack]
be895588 0095d5c8 [heap]
be89558c be8955c8 [stack]
be895590 a5c1c5ac
be895594 40890bd7 /system/lib/libdvm.so
be895598 00c1ce30 [heap]
be89559c 400f85a0
be8955a0 00000028
be8955a4 400f8554
be8955a8 00c1ce38 [heap]
be8955ac 00000006
be8955b0 4badece0
be8955b4 400c5bfd /system/lib/libc.so
be8955b8 df0027ad
be8955bc 00000000
#01 be8955c0 4bbe49d8 /dev/ashmem/dalvik-LinearAlloc (deleted)
be8955c4 00958e38 [heap]
be8955c8 00000000
be8955cc 4badeda4
be8955d0 4021ae3c /system/lib/libandroid_runtime.so
be8955d4 401db0df /system/lib/libandroid_runtime.so
be8955d8 4bbe49d8 /dev/ashmem/dalvik-LinearAlloc (deleted)
be8955dc 4085f134 /system/lib/libdvm.so
be8955e0 4baded9c
be8955e4 00000001
be8955e8 40aa0eb8 /dev/ashmem/dalvik-heap (deleted)
be8955ec 00958e48 [heap]
be8955f0 00000002
be8955f4 40899299 /system/lib/libdvm.so
be8955f8 4baded9c
be8955fc 4d01bc75 /data/dalvik-cache/system@framework@framework.jar@classes.dex
be895600 401db0d1 /system/lib/libandroid_runtime.so
be895604 00958e48 [heap]
--------- tail end of log /dev/log/main
06-14 13:05:39.319 1373 1373 D ActivityThread: setTargetHeapUtilization:0.25
06-14 13:05:39.329 1373 1373 D ActivityThread: setTargetHeapIdealFree:8388608
06-14 13:05:39.329 1373 1373 D ActivityThread: setTargetHeapConcurrentStart:2097152
06-14 13:05:54.269 1373 1373 I BadBehaviorActivity: Native crash pressed -- about to kill -11 self
06-14 13:05:54.269 1373 1373 F libc : Fatal signal 11 (SIGSEGV) at 0x0000055d (code=0)
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --
Effects
If a crash happens in the user space application, e.g., market applications or system-privileged OEM application, it kills the process and restarts if it is necessary.
If a crash happens in system_server, it restarts Android user space.
Tombstone如何解析?
Use the following toolchain tools to map the failed address to the source code:
- arm-eabi-objdump
arm-eabi-addr2line
Android 产生出來的还沒進行strip的执行档或shared libraries 是放在
out/target/product/YOUR_PRODUCT_NAME/symbols/system/bin out/target/product/YOUR_PRODUCT_NAME/symbols/system/lib
Android所使用的toolchain是放在
prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/
假设我們要看 #00 pc 0000d8d0 /system/lib/libc.so (kill) #01 pc 000660dc /system/lib/libandroid_runtime.so 是 call 到哪一個function (假设tombstone_XX 是在 $android_root 目录下)
$ ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/arm-eabi-addr2line -f -e ./out/target/product/YOUR_PRODUCT_NAME/symbols/system/lib/libc.so 0000d8d0
memcmp ...bionic/libc/arch-arm/bionic/memcmp.S:131