文章目录
- Android8.0 系统异常处理流程
- 异常处理流程
- crash对话框的显示和用户行为
- 后续清理工作
- 总结
Android8.0 系统异常处理流程
异常处理流程
Java处理未捕获异常有个Thread.UncaughtExceptionHandler,在Android系统中当然也是通过实现其来进行未捕获异常处理。
Android 默认系统异常处理是在启动SystemServer进程时设置的。
Zygote进程启动SystemServer时会调用ZygoteInit的forkSystemServer()方法,该方法中又通过handleSystemServerProcess()方法来对SystemServer进程做一些处理,最后会调用到RuntimeInit.commonInit()方法
frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
protected static final void commonInit() {
Thread.setUncaughtExceptionPreHandler(new LoggingHandler());
// 该出就设置了默认未捕获异常的处理Handler-KillApplicationHandler
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());
...
}
KillApplicationHandler代码如下
frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
public void uncaughtException(Thread t, Throwable e) {
try {
...
// 1. mApplicationObject标识当前应用
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} ...
finally {
// 无论如何都要保证出现crash的进程不存活
Process.killProcess(Process.myPid());
System.exit(10);
}
}
}
注释1处的ActivityManager.getService()得到的就是ActivityManagerService的服务端代理对象,实现是通过Binder机制。看看AMS在handleApplicationCrash方法中是如何处理的
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
public void handleApplicationCrash(IBinder app,
ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
ProcessRecord r = findAppProcess(app, "Crash");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
handleApplicationCrashInner("crash", r, processName, crashInfo);
}
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
ApplicationErrorReport.CrashInfo crashInfo) {
// 1. 将crash信息写入event log中
EventLog.writeEvent(EventLogTags.AM_CRASH, Binder.getCallingPid(),
UserHandle.getUserId(Binder.getCallingUid()), processName,
r == null ? -1 : r.info.flags,
crashInfo.exceptionClassName,
crashInfo.exceptionMessage,
crashInfo.throwFileName,
crashInfo.throwLineNumber);
addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
// 2.
mAppErrors.crashApplication(r, crashInfo);
}
注释1处将log记录在event log中。注释2处调用AppError的crashApplication方法
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
final int callingPid = Binder.getCallingPid();
final int callingUid = Binder.getCallingUid();
final long origId = Binder.clearCallingIdentity();
try {
// 调用内部的crashApplicationInner
crashApplicationInner(r, crashInfo, callingPid, callingUid);
} finally {
Binder.restoreCallingIdentity(origId);
}
}
继续看crashApplicationInner方法
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
int callingPid, int callingUid) {
...
synchronized (mService) {
// 1. 处理有IActivityController的情况,如果Controller已经处理错误,则不会显示错误框
if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
timeMillis, callingPid, callingUid)) {
return;
}
...
AppErrorDialog.Data data = new AppErrorDialog.Data();
data.result = result;
data.proc = r;
...
// 2. 发送SHOW_ERROR_UI_MSG给AMS的mUiHandler,将弹出一个错误对话框,提示用户某进程crash
final Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
task = data.task;
msg.obj = data;
mService.mUiHandler.sendMessage(msg);
}
// 3. 调用AppErrorResult的get方法,该方法内部调用了wait方法,故为阻塞状态,当用户处理了对话框后会调用AppErrorResult的set方法,该方法内部调用了notifyAll()方法来唤醒线程。
// 注意此处涉及了两个线程的工作,crashApplicationInner函数工作在Binder调用所在的线程;对话框工作于AMS的Ui线程
int res = result.get();
Intent appErrorIntent = null;
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);
// 4. 判断用户操作结果,然后根据结果做不同处理
if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
res = AppErrorDialog.FORCE_QUIT;
}
synchronized (mService) {
// 不在提示错误
if (res == AppErrorDialog.MUTE) {
stopReportingCrashesLocked(r);
}
// 尝试重启进程
if (res == AppErrorDialog.RESTART) {
mService.removeProcessLocked(r, false, true, "crash");
if (task != null) {
try {
mService.startActivityFromRecents(task.taskId,
ActivityOptions.makeBasic().toBundle());
} ...
}
}
// 强行结束进程
if (res == AppErrorDialog.FORCE_QUIT) {
long orig = Binder.clearCallingIdentity();
try {
// Kill it with fire!
mService.mStackSupervisor.handleAppCrashLocked(r);
if (!r.persistent) {
mService.removeProcessLocked(r, false, false, "crash");
mService.mStackSupervisor.resumeFocusedStackTopActivityLocked();
}
} finally {
Binder.restoreCallingIdentity(orig);
}
}
// 停止进程并报告错误
if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
}
...
}
if (appErrorIntent != null) {
try {
// 启动报告错误界面
mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
} catch (ActivityNotFoundException e) {
Slog.w(TAG, "bug report receiver dissappeared", e);
}
}
}
注释1会优先让crash观察者进行crash处理,crash观察者通过AMS的setActivityController()方法进行设置,如果已经处理则不会再弹出错误对话框。注释2会发送SHOW_ERROR_UI_MSG消息给AMS的mUIHandler处理来请求弹出错误对话框。注释3通过调用AppErrorResult中的get()方法来使线程阻塞。需要注意的是此处涉及到两个线程,crashApplicationInner工作在Binder调用所在的线程,对话框显示则处于AMS的UI线程。具体AppErrorResult的工作后面会说到。待用户操作对话框后或者超时时间到时get()方法就会被唤醒,并且返回处理结果。注释4则根据用户操作结果进行不同的处理,例如强制停止进程,重启进程等。
这里看下注释2处是如何显示错误对话框的,AMS的UiHandler接收到了消息就会进行显示操作
crash对话框的显示和用户行为
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
final class UiHandler extends Handler {
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
// 显示错误对话框
case SHOW_ERROR_UI_MSG: {
mAppErrors.handleShowAppErrorUi(msg);
ensureBootCompleted();
} break;
// 显示ANR对话框
case SHOW_NOT_RESPONDING_UI_MSG: {
mAppErrors.handleShowAnrUi(msg);
ensureBootCompleted();
} break;
...
}
可以看到UiHandler对错误和ANR对话框显示的处理,这里看错误对话框的显示,其还是通过AppErrors类进行处理。
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
void handleShowAppErrorUi(Message msg) {
...
synchronized (mService) {
ProcessRecord proc = data.proc;
AppErrorResult res = data.result;
// 1. crash 对话框已显示,故无需再显示
if (proc != null && proc.crashDialog != null) {
if (res != null) {
res.set(AppErrorDialog.ALREADY_SHOWING);
}
return;
}
...
final boolean crashSilenced = mAppsNotReportingCrashes != null &&
mAppsNotReportingCrashes.contains(proc.info.packageName);
if ((mService.canShowErrorDialogs() || showBackground) && !crashSilenced) {
// 2. 创建crash对话框
proc.crashDialog = new AppErrorDialog(mContext, mService, data);
} else {
// 3. 如果AMS禁止显示错误对话框,或者当前设备处于睡眠模式则不会让显示对话框
if (res != null) {
res.set(AppErrorDialog.CANT_SHOW);
}
}
}
// 4. 调用Dialog show方法显示crash对话框
if(data.proc.crashDialog != null) {
data.proc.crashDialog.show();
}
}
注释1先对crash进程是否已经显示对话框做了判断,如果已经显示则无需显示。注释2处,手机没有息屏,AMS也允许显示crash对话框,则创建对话框,否则走注释3处,直接说明不显示。如果走到注释4则需要显示crash对话框,故直接调用Dialog的show()方法。这里对注释1和注释3处的res.set()方法做以解释,这res就是AppErrorResult,也就是在crashApplicationInner方法中创建的,该方法在请求AMS显示对话框时调用了result.get()使其阻塞,调用set方法后则会唤醒Binder调用线程,接着走下面代码,进而对结果进行判断。
看下AppErrorResult get()和set()的实现
frameworks/base/services/core/java/com/android/server/am/AppErrorResult.java
final class AppErrorResult {
public void set(int res) {
synchronized (this) {
mHasResult = true;
// 1. set方法设置mResult的值
mResult = res;
// 2. 调用notifyAll唤醒持有当前对象锁且处于阻塞状态的所有线程
notifyAll();
}
}
public int get() {
synchronized (this) {
while (!mHasResult) {
try {
//3. 实质通过wait()使当前线程阻塞
wait();
} catch (InterruptedException e) {
}
}
}
// 4. 返回mResult
return mResult;
}
boolean mHasResult = false;
int mResult;
}
通过get()方法线程阻塞,通过set方法更新mResult的值并唤醒处于等待队列的线程,此时接着get()方法wait后面的代码执行,将set()方法中更新的mResult值作为返回值。
当错误对话框弹出后,用户操作或者超时时间到时又是怎样的?我们一起看下AppErrorDialog
frameworks/base/services/core/java/com/android/server/am/AppErrorDialog.java
@Override
public void onClick(View v) {
// 1. 判断点击控件,来决定操作
switch (v.getId()) {
// 请求重启进程
case com.android.internal.R.id.aerr_restart:
mHandler.obtainMessage(RESTART).sendToTarget();
break;
// 请求反馈报错问题
case com.android.internal.R.id.aerr_report:
mHandler.obtainMessage(FORCE_QUIT_AND_REPORT).sendToTarget();
break;
// 请求关闭crash Dialog并杀死进程
case com.android.internal.R.id.aerr_close:
mHandler.obtainMessage(FORCE_QUIT).sendToTarget();
break;
// 请求不再提示对话框
case com.android.internal.R.id.aerr_mute:
mHandler.obtainMessage(MUTE).sendToTarget();
break;
default:
break;
}
}
// 2. 受到请求信息后调用setResult()方法并关闭对话框
private final Handler mHandler = new Handler() {
public void handleMessage(Message msg) {
setResult(msg.what);
dismiss();
}
};
private void setResult(int result) {
synchronized (mService) {
if (mProc != null && mProc.crashDialog == AppErrorDialog.this) {
mProc.crashDialog = null;
}
}
// 3. 调用AppErrorResult的set方法使阻塞线程运行,并将用户点击结果告知
mResult.set(result);
mHandler.removeMessages(TIMEOUT);
}
注释的步骤写的已经很清楚了,最终通过mResult.set()方法唤线程,是线程代码接着执行
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
int callingPid, int callingUid) {
...
// 3. 阻塞线程直至超时或者用户操作对话框
int res = result.get();
// 4. 判断用户操作结果,然后根据结果做不同处理
...
}
后续清理工作
根据前面的流程,我们知道当进程crash后,最终将被kill掉,此时AMS还需要完成后续的清理工作。
我们先来回忆一下进程启动后,注册到AMS的部分流程
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
// 进程启动后,对应的ActivityThread会attach到AMS上
private final boolean attachApplicationLocked(IApplicationThread thread,
int pid) {
...
final String processName = app.processName;
try {
// 1. 创建“讣告”接收者
AppDeathRecipient adr = new AppDeathRecipient(
app, pid, thread);
thread.asBinder().linkToDeath(adr, 0);
app.deathRecipient = adr;
}
...
}
当进程注册到AMS时,AMS注册了一个“讣告”接收者注册到进程中。
因此,当crash进程被kill后,AppDeathRecipient中的binderDied方法将被回调。看源码知道bindDied()方法中又会调用到appDiedLocked()方法
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
boolean fromBinderDied) {
...
// 1. 该进程没有杀死,则杀死进程
if (!app.killed) {
if (!fromBinderDied) {
killProcessQuiet(pid);
}
killProcessGroup(app.uid, pid);
app.killed = true;
}
if (app.pid == pid && app.thread != null &&
app.thread.asBinder() == thread.asBinder()) {
...
// 2.
handleAppDiedLocked(app, false, true);
...
} ...
}
注释1会将进程杀死,注释2处为app死亡的关键处理
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
private final void handleAppDiedLocked(ProcessRecord app,
boolean restarting, boolean allowRestart) {
int pid = app.pid;
// 1. 进行进程中service、ContentProvider、BroadcastReceiver等的收尾工作
boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,
false /*replacingPid*/);
if (!kept && !restarting) {
removeLruProcessLocked(app);
if (pid > 0) {
ProcessList.remove(pid);
}
}
...
// 2. 判断是否还存在可见的Activity
boolean hasVisibleActivities = mStackSupervisor.handleAppDiedLocked(app);
// 清除activity列表
app.activities.clear();
...
try {
if (!restarting && hasVisibleActivities
&& !mStackSupervisor.resumeFocusedStackTopActivityLocked()) {
// 3. 若当前crash进程中存在可视Activity,那么AMS还是会确保所有可见Activity正常运行,故会重启该进程
mStackSupervisor.ensureActivitiesVisibleLocked(null, 0, !PRESERVE_WINDOWS);
}
} finally {
mWindowManager.continueSurfaceLayout();
}
}
注释1比较重要的是对于crash进程中的Bounded Service而言,会清理掉service与客户端之间的联系,此外若service的客户端重要性过低,还会被直接kill掉。注释2处判断是否应用还存在可见的Activity,注释3处对于可见的Activity系统要保证其正常运行,还会重新启动进程。
总结
app停止原来如此啊,当然app停止不可完全避免,但是一旦出现实在太难看了,而且没法收集到log,下篇就看看作为开发者自己如何处理这种未捕获异常。