1 Overview
Android Dalvik应用收到该信号后,会打印改应用中所有线程的状态以及调用栈情况,并且并不是强制退出。这些状态通常保存在一个特定的叫做trace的文件中。
Traces.txt holds java level stacktraces
The stack trace of the application is save in /data/anr/traces.txt
研究内容:
- Trace file如何生成?
- Trace file内容及解释?
2 Trace file如何生成
Trace文件是 android davik 虚拟机在收到异常终止信号 (SIGQUIT)时产生的。
2.1 触发条件
最经常的触发条件是 android应用中产生了 ANR或FC (force close)。
实质
/frameworks/base/services/java/com/android/server/am/ActivityManagerService.java
setBroadcastTimeoutLocked-> handleMessage:BROADCAST_TIMEOUT_MSG -> broadcastTimeoutLocked->appNotResponding->dumpStackTraces
serviceTimeout->appNotResponding->dumpStackTraces
ActivityStack
handleMessage: PAUSE_TIMEOUT_MSG/LAUNCH_TICK_MSG logAppTooSlow->dumpStackTraces
./frameworks/base/services/java/com/android/server/am/DeviceMonitor.java
private static final File BASE = new File("/data/anr/");
dumpStackTraces
// First collect all of the stacks of the most important pids.
// Next measure CPU usage.
// sending signal
synchronized (observer) {
Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
observer.wait(200); // Wait for write-close, give up after 200msec
}
/frameworks/base/core/java/android/os/Process.java: public static final int SIGNAL_QUIT = 3;
2.2 服务对象
由于是该文件的产生是在 DVM里,所以只有运行 dvm实例的进程(如普通的java应用,java服务等)才会产生该文件
android 本地应用 (native app,指运行在 android lib层,用c/c++编写的linux应用、库、服务等)在收到 SIGQUIT时是不会产生 trace文件的。
可以在终端通过adb发送SIGQUIT给应用来生成trace文件。
2.3 实现过程
相关实现代码文件
Android ICS 实现文件后缀是 .cpp, 早期实现是.c
dalvik/vm/init.cpp
davik/vm/SignalCatcher.cpp
dalvik/vm/Thread.cpp
实现过程
Step #1: DVM初始化时,设置信号屏蔽字,屏蔽要特殊处理的信号(SIGQUIT, SIGUSR1, SIGUSR2)。由于信号处理方式是进程范围起作用的, 这意味着该进程里所有的线程都将屏蔽该信号。 实现代码在init.c中如下:
dalvik/vm/init.cpp
std::string dvmStartup(int argc, const char* const argv[],
bool ignoreUnrecognized, JNIEnv* pEnv)
{
....
/* configure signal handling */
if (!gDvm.reduceSignals)
blockSignals();
...
}
/*
* Configure signals. We need to block SIGQUIT so that the signal only
* reaches the dump-stack-trace thread.
*
* This can be disabled with the "-Xrs" flag.
*/
static void blockSignals()
{
sigset_t mask;
int cc;
sigemptyset(&mask);
sigaddset(&mask, SIGQUIT);
sigaddset(&mask, SIGUSR1); // used to initiate heap dump
#if defined(WITH_JIT) && defined(WITH_JIT_TUNING)
sigaddset(&mask, SIGUSR2); // used to investigate JIT internals
#endif
//sigaddset(&mask, SIGPIPE);
cc = sigprocmask(SIG_BLOCK, &mask, NULL);
assert(cc == 0);
if (false) {
/* TODO: save the old sigaction in a global */
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = busCatcher;
sa.sa_flags = SA_SIGINFO;
cc = sigaction(SIGBUS, &sa, NULL);
assert(cc == 0);
}
}
Step #2: DVM 生成单独的信号处理线程,用来对三个信号做特殊处理 (init.cpp):
dalvik/vm/init.cpp
/*
* Do non-zygote-mode initialization. This is done during VM init for
* standard startup, or after a "zygote fork" when creating a new process.
*/
bool dvmInitAfterZygote()
{
....
/* start signal catcher thread that dumps stacks on SIGQUIT */
if (!gDvm.reduceSignals && !gDvm.noQuitHandler) {
if (!dvmSignalCatcherStartup())
return false;
}
....
davik/vm/SignalCatcher.cpp
/*
* Crank up the signal catcher thread.
*
* Returns immediately.
*/
bool dvmSignalCatcherStartup()
{
gDvm.haltSignalCatcher = false;
if (!dvmCreateInternalThread(&gDvm.signalCatcherHandle,
"Signal Catcher", signalCatcherThreadStart, NULL))
return false;
return true;
}
DVM调用dvmCreateInternalThread()来生成一个新的内部线程来专门处理dvm进程里的信号。
dvmCreateInternalThread()其实是使用pthread_create()来产生新的线程。
该线程的处理函数是 signalCatcherThreadStart()。
(dvm里所谓的内部线程,就是用来帮助dvm实现本身使用的线程,比如信号处理线程,binder线程,Compiler线程,JDWP线程等,而不是应用程序申请的线程)
/*
* Sleep in sigwait() until a signal arrives.
*/
static void* signalCatcherThreadStart(void* arg)
{
...
/* set up mask with signals we want to handle */
sigemptyset(&mask);
sigaddset(&mask, SIGQUIT);
sigaddset(&mask, SIGUSR1);
#if defined(WITH_JIT) && defined(WITH_JIT_TUNING)
sigaddset(&mask, SIGUSR2);
#endif
...
while (true) {
...
/*
* Signals for sigwait() must be blocked but not ignored. We
* block signals like SIGQUIT for all threads, so the condition
* is met. When the signal hits, we wake up, without any signal
* handlers being invoked.
*
* When running under GDB we occasionally return from sigwait()
* with EINTR (e.g. when other threads exit).
*/
loop:
cc = sigwait(&mask, &rcvd);
...
switch (rcvd) {
case SIGQUIT:
handleSigQuit();
break;
case SIGUSR1:
handleSigUsr1();
break;
#if defined(WITH_JIT) && defined(WITH_JIT_TUNING)
case SIGUSR2:
handleSigUsr2();
break;
#endif
...
}
- 首先设置我们要处理的信号集(SIGQUIT, SIGUSR1, SIGUSR2)
- 然后 调用 sigwait()。 我们知道sigwait()会在当前的线程里重新打开指定的信号屏蔽字屏蔽的信号集。
dvm在启动时,首先在整个进程里设置信号屏蔽字屏蔽掉三个信号,sigwait()的调用,使的这三个信号只在 SignalCatcher线程里响应。
dvm对三个信号分别所做的特殊用途:
- SIGUSR1 被用来做手工垃圾收集。处理函数是 HandleSigUsr1()
/ * Respond to a SIGUSR1 by forcing a GC. /
static void handleSigUsr1()
{
LOGI("SIGUSR1 forcing GC (no HPROF)");
dvmCollectGarbage();
}
SIGUSR2 被用来做 JIT的调试。如果JIT下编译时打开,收到SIGUSR2时dvm会dump出相关的调试信息。
#if defined(WITH_JIT) && defined(WITH_JIT_TUNING)
/ Sample callback function for dvmJitScanAllClassPointers /
void printAllClass(void *ptr)
{
ClassObject **classPP = (ClassObject **) ptr;
LOGE("class %s", (*classPP)->descriptor);
}
/ * Respond to a SIGUSR2 by dumping some JIT stats and possibly resetting * the code cache. /
static void handleSigUsr2()
{
static int codeCacheResetCount = 0;
gDvmJit.receivedSIGUSR2 ^= true;
if ((--codeCacheResetCount & 7) == 0) {
/* Dump all class pointers in the traces */
dvmJitScanAllClassPointers(printAllClass);
gDvmJit.codeCacheFull = true;
} else {
dvmCompilerDumpStats();
/* Stress-test unchain all */
dvmJitUnchainAll();
LOGD("Send %d more signals to reset the code cache",
codeCacheResetCount & 7);
}
dvmCheckInterpStateConsistency();
}
#endif
SIGUSR1、SIGUSR2两个信号都仅用于DVM的内部实现的调试。可以在终端通过adb发送SIGUSR1和SIGUSR2信号来观察它的行为。
- SIGQUIT 用来输出trace文件,以记录异常终止是dvm的上下文信息.
/ * Respond to a SIGQUIT by dumping the thread stacks. Optionally dump * a few other things while we're at it. * * Thread stacks can either go to the log or to a file designated for holding * ANR traces. If we're writing to a file, we want to do it in one shot, * so we can use a single O_APPEND write instead of contending for exclusive * access with flock(). There may be an advantage in resuming the VM * before doing the file write, so we don't stall the VM if disk I/O is * bottlenecked. * * If JIT tuning is compiled in, dump compiler stats as well. /
static void handleSigQuit()
{
char* traceBuf = NULL;
size_t traceLen;
dvmSuspendAllThreads(SUSPEND_FOR_STACK_DUMP);
dvmDumpLoaderStats("sig");
if (gDvm.stackTraceFile == NULL) {
/* just dump to log */
DebugOutputTarget target;
dvmCreateLogOutputTarget(&target, ANDROID_LOG_INFO, LOG_TAG);
dvmDumpAllThreadsEx(&target, true);
} else {
/* write to memory buffer */
FILE* memfp = open_memstream(&traceBuf, &traceLen);
if (memfp == NULL) {
LOGE("Unable to create memstream for stack traces");
traceBuf = NULL; /* make sure it didn't touch this */
/* continue on */
} else {
logThreadStacks(memfp);
fclose(memfp);
}
}
#if defined(WITH_JIT) && defined(WITH_JIT_TUNING)
dvmCompilerDumpStats();
#endif
if (false) dvmDumpTrackedAllocations(true);
dvmResumeAllThreads(SUSPEND_FOR_STACK_DUMP);
if (traceBuf != NULL) {
/*
* We don't know how long it will take to do the disk I/O, so put us
* into VMWAIT for the duration.
*/
ThreadStatus oldStatus = dvmChangeStatus(dvmThreadSelf(), THREAD_VMWAIT);
/*
* Open the stack trace output file, creating it if necessary. It
* needs to be world-writable so other processes can write to it.
*/
int fd = open(gDvm.stackTraceFile, O_WRONLY | O_APPEND | O_CREAT, 0666);
if (fd < 0) {
LOGE("Unable to open stack trace file '%s': %s",
gDvm.stackTraceFile, strerror(errno));
} else {
ssize_t actual = write(fd, traceBuf, traceLen);
if (actual != (ssize_t) traceLen) {
LOGE("Failed to write stack traces to %s (%d of %zd): %s",
gDvm.stackTraceFile, (int) actual, traceLen,
strerror(errno));
} else {
LOGI("Wrote stack traces to '%s'", gDvm.stackTraceFile);
}
close(fd);
}
free(traceBuf);
dvmChangeStatus(dvmThreadSelf(), oldStatus);
}
}
- 首先查看有没有有指定 trace输出文件,没有就将trace信息打印到log里。如果有,就先将trace信息打印到内存文件traceBuf中,然后再将改内存文件traceBuf内容输出到指定trace文件中。
为什么指定了trace文件后,不直接打印trace信息到trace文件中呢。
原因是trace文件实际上记录的是当前运行的所有的线程的上下文信息。他需要暂停所有的线程才能输出。 dvmSuspendAllThreads(SUSPEND_FOR_STACK_DUMP);的调用正式这个目的。这个操作代价是很高的,它把当前所有的线程都停了下来。执行的时间越短,对正常运行的线程的影响越小。 输出信息到内存比直接到外部文件要快得多。所以 dvm采取了先输出到内存,马上恢复线程程dvmResumeAllThreads(SUSPEND_FOR_STACK_DUMP);,然后就可以慢慢的输出到外部文件里了。
而这真正的输出信息实现在 logThreadStacks()中:
/*
* Dump the stack traces for all threads to the supplied file, putting
* a timestamp header on it.
*/
static void logThreadStacks(FILE* fp)
{
DebugOutputTarget target;
dvmCreateFileOutputTarget(&target, fp);
pid_t pid = getpid();
time_t now = time(NULL);
struct tm* ptm;
#ifdef HAVE_LOCALTIME_R
struct tm tmbuf;
ptm = localtime_r(&now, &tmbuf);
#else
ptm = localtime(&now);
#endif
dvmPrintDebugMessage(&target,
"\n\n----- pid %d at %04d-%02d-%02d %02d:%02d:%02d -----\n",
pid, ptm->tm_year + 1900, ptm->tm_mon+1, ptm->tm_mday,
ptm->tm_hour, ptm->tm_min, ptm->tm_sec);
printProcessName(&target);
dvmPrintDebugMessage(&target, "\n");
dvmDumpAllThreadsEx(&target, true);
fprintf(fp, "----- end %d -----\n", pid);
}
该函数打印了trace文件的框架,其输出类似如下所示:
----- pid 26013 at 2012-06-14 09:59:28 -----
Cmd line: com.android.developmen
显示当前dvm进程的进程id,名字,输出的时间。最重要的所有线程的上下文信息是有函数 dvmDumpAllThreadsEx()里实现的,该函数定义在 thread.c里
dalvik/vm/Thread.cpp
/*
* Print information about all known threads. Assumes they have been
* suspended (or are in a non-interpreting state, e.g. WAIT or NATIVE).
*
* If "grabLock" is true, we grab the thread lock list. This is important
* to do unless the caller already holds the lock.
*/
void dvmDumpAllThreadsEx(const DebugOutputTarget* target, bool grabLock)
{
Thread* thread;
dvmPrintDebugMessage(target, "DALVIK THREADS:\n");
#ifdef HAVE_ANDROID_OS
dvmPrintDebugMessage(target,
"(mutexes: tll=%x tsl=%x tscl=%x ghl=%x)\n",
gDvm.threadListLock.value,
gDvm._threadSuspendLock.value,
gDvm.threadSuspendCountLock.value,
gDvm.gcHeapLock.value);
#endif
if (grabLock)
dvmLockThreadList(dvmThreadSelf());
thread = gDvm.threadList;
while (thread != NULL) {
dvmDumpThreadEx(target, thread, false);
/* verify link */
assert(thread->next == NULL || thread->next->prev == thread);
thread = thread->next;
}
if (grabLock)
dvmUnlockThreadList();
}
Trace file内容及解释?
----- pid 26013 at 2012-06-14 09:59:28 -----
Cmd line: com.android.developmen
显示当前dvm进程的进程id,名字,输出的时间。
DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)
"main" prio=5 tid=1 TIMED_WAIT
| group="main" sCount=1 dsCount=0 obj=0x40a7fc40 self=0x1ab2e38
| sysTid=26013 nice=0 sched=0/0 cgrp=default handle=1074406760
| schedstat=( 0 0 0 ) utm=30 stm=11 core=0
at java.lang.VMThread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:1031)
at java.lang.Thread.sleep(Thread.java:1013)
at com.android.development.BadBehaviorActivity.onCreate(BadBehaviorActivity.java:118)
at android.app.Activity.performCreate(Activity.java:4465)
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1049)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1928)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1989)
at android.app.ActivityThread.access$600(ActivityThread.java:126)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1155)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:4482)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:787)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:554)
at dalvik.system.NativeStart.main(Native Method
trace文件中thread信息
- 第一行是固定的头, 指明下面的都是当前运行的 dvm thread :“DALVIK THREADS:”
- 第二行输出的是该进程里各种线程互斥量的值。
- 第三行输出分别是线程的名字(“main”),线程优先级(“prio=5”),线程id(“tid=1”) 以及线程的类型(“TIMED_WAIT”)【others such as NATIVE】
- 第四行分别是线程所述的线程组 (“main”),线程被正常挂起的次处(“sCount=1”),线程因调试而挂起次数(”dsCount=0“),当前线程所关联的java线程对象(”obj=0x40a7fc40“)以及该线程本身的地址(“self=0x1ab2e38”)。
- 第五行显示线程调度信息。 分别是该线程在linux系统下得本地线程id (“ sysTid=26013”),线程的调度有优先级(“nice=0”),调度策略(sched=0/0),优先组属(“cgrp=default”)以及处理函数地址(“handle=1074406760”) 6 第六行 显示更多该线程当前上下文,分别是调度状态(从 /proc/[pid]/task/[tid]/schedstat读出)(最新可使用"/proc/self/task/%d/schedstat")(“schedstat=( 0 0 0 )”),以及该线程运行信息 ,它们是线程用户态下使用的时间值(单位是jiffies)(“utm=30”), 内核态下得调度时间值(“stm=11”),以及最后运行改线程的cup标识(“core=0”); 7.后面几行输出该线程调用栈。