3.2.4 ANR分析入门与实战

8 minute read

如何分析ANR的trace

线程状态

 /* these match up with JDWP values */
    THREAD_ZOMBIE       = 0,        /* TERMINATED */
    THREAD_RUNNING      = 1,        /* RUNNABLE or running now */
    THREAD_TIMED_WAIT   = 2,        /* TIMED_WAITING in Object.wait() */
    THREAD_MONITOR      = 3,        /* BLOCKED on a monitor */
    THREAD_WAIT         = 4,        /* WAITING in Object.wait() */
    /* non-JDWP states */
    THREAD_INITIALIZING = 5,        /* allocated, not yet running */
    THREAD_STARTING     = 6,        /* started, not yet on thread list */
    THREAD_NATIVE       = 7,        /* off in a JNI native method */
    THREAD_VMWAIT       = 8,        /* waiting on a VM resource */
    THREAD_SUSPENDED    = 9,        /* suspended, usually by GC or debugger */

CPU

  • user – 用户空间的进程的cpu消耗时间
  • nice – 系统调整进程优先级所花费的时间
  • system – 内核态的cpu消耗时间, 如申请内存, 执行I/O
  • idle – cpu空闲时间
  • iowait – 等待I/O操作完成的cpu消耗时间
  • irq – 硬件中断cpu消耗时间
  • softirq – 软中断cpu消耗时间
  • minor或者major的页错误次数

堆栈

1.1.DVM mutexes format

"(mutexes: tll=%x tsl=%x tscl=%x ghl=%x hwl=%x hwll=%x)",
        tll  - thread list lock,
        tsl  - thread suspend lock,
        tscl - thread suspend count lock,
        dhl  - GC heap lock,
        hwl  - heap worker lock,
        hwll - heap worker list lock

1.2.Thread’s info format Thread’s info format - first line

%name %priority %tid %status,
        name     - thread name,
        priority - thread priority,
        tid      - thread id,
        status   - thread status

Thread’s info format - second line

" group=%s sCount=%d dsCount=%d obj=%p self=%p,
group   - group name,
sCount  - suspend count,
dsCount - debug suspend count,
obj     - Linux thread that we are associated with
self    - self reference

Thread’s info format - third line

 " sysTid=%d nice=%d sched=%d/%d cgrp=%s handle=%d",
sysTid - Linux thread id,
nice   - Linux "nice priority (lower numbers indicate higher priority),
sched  - scheduling priority,
cgrp   - scheduling group buffer
handle - thread handle
  • 查看堆栈是否有明确指向app的代码, 如果有, 检查是否可能的原因
  • 检查堆栈是否wait方法, 考虑主线程被其它线程block, 如果是wait on lock, 很大原因是死锁
  • 查看是否有IPC, 确认是否可能是被对端block

线索

  • traces.txt/dropbox AMS在ANR发生的时候,

    • traces.txt存在的几种形式:

      • /data/anr/trace.txt

      • adb shell dumpsys dropbox

        需要注意的是,traces.txt是抓取的是超时后(如input超时就是5s后)的snapshot,并不一定能够真实的反应出block的点。 也存在抓到主线程没有block,在idle的情况。

  • Eventlog中的dvm_lock_sample. 在同步锁发生content的时候,虚拟机会将这个竞争记录在eventlog中: 实现可以参考art/runtime/monitor_android.cc LogContentionEvent函数 如果主线程是被binder对端、被同步锁block,那么eventlog中就很有可能会有dvm_lock_sample的打印。

  • ANR发生的时候,打印出的CPU的占用。

    注意,并不能因为有进程cpu占用高就果断的去怀疑它。 因为最主要的线索traces.txt的有效性并不是非常高,所以ANR问题分析是存在一定的局限性的。

引起ANR的一些场景

  • Service Timeout:服务在20s内未执行完成;
  • BroadcastQueue Timeout:比如前台广播在10s内执行完成
  • ContentProvider Timeout:内容提供者执行超时
  • inputDispatching Timeout: 输入事件分发超时5s,包括按键分发事件的超时。

可能原因

  • 耗时操作,如复杂的layout,I/O,主线程大量DB操作等。
  • 被Binder 对端block
  • 被子线程同步锁block
  • Binder被占满导致主线程无法和SystemServer通信
  • 得不到系统资源(CPU/RAM/IO)

危险信号

1) 死锁:当一个线程处于WAIT状态,可以查找”held by”, 一般来说, 锁应该自己持有, 如果是别的线程持有, 那是个危险的标志。

2) 如果主线程处于WAIT状态,检查是否有”held by tid = 主线程”。

3) iowait高,说明有频繁的I/O操作

4) CPU使用率很高,则关注runnable的线程, 如果CPU使用量接近100%,说明当前设备很忙,有可能是CPU饥饿导致了ANR

5) CPU很闲时,则关注waiting for monitor entry的线程。

案例分析

1. Settings 被对端block
首先需要去看的就是traces.txt 中ANR进程的主线程的栈
   以settting的这次ANR为例

   "main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x73ee6470 self=0xb4d76500
  | sysTid=22831 nice=0 cgrp=default sched=0/0 handle=0xb6f4bc00
  | state=S schedstat=( 0 0 0 ) utm=22 stm=22 core=0 HZ=100
  | stack=0xbe283000-0xbe285000 stackSize=8MB
  | held mutexes=
  native: #00 pc 000410ac  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 000477e5  /system/lib/libc.so (ioctl+14)
  native: #02 pc 0001e7c5  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
  native: #03 pc 0001ee17  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+38)
  native: #04 pc 0001efcd  /system/lib/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
  native: #05 pc 00019fb7  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+30)
  native: #06 pc 00086de9  /system/lib/libandroid_runtime.so (???)
  native: #07 pc 00d94629  /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+140)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:503)
  at android.net.INetworkPolicyManager$Stub$Proxy.getNetworkPolicies(INetworkPolicyManager.java:410)
  at android.net.NetworkPolicyManager.getNetworkPolicies(NetworkPolicyManager.java:174)
  at com.android.settings.net.NetworkPolicyEditor.read(NetworkPolicyEditor.java:57)
  at com.android.settings.DataUsageSummary.onCreate(DataUsageSummary.java:361)
  at android.app.Fragment.performCreate(Fragment.java:2202)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:942)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1148)
  at android.app.BackStackRecord.run(BackStackRecord.java:793)
  at android.app.FragmentManagerImpl.execPendingActions(FragmentManager.java:1535)
  at android.app.FragmentManagerImpl.executePendingTransactions(FragmentManager.java:562)
  at com.android.settings.SettingsActivity.switchToFragment(SettingsActivity.java:1084)
  at com.android.settings.SettingsActivity.onCreate(SettingsActivity.java:657)
  at android.app.Activity.performCreate(Activity.java:6251)
  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1110)
  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2370)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2477)
  at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1345)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:148)
  at android.app.ActivityThread.main(ActivityThread.java:5438)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:762)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:652)
  主线程被对端block而对端是在systemserver中的NetworkPolicyManager
  接下来继续去查NetworkPolicyManager为何会被block
  通常情况下可以搜索getNetworkPolicies一般的对端的函数并不会修改函数名
  看到我们的对端
  "Binder_4" prio=5 tid=56 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x1321a0a0 self=0xad31e200
  | sysTid=2491 nice=0 cgrp=default sched=0/0 handle=0x9fd00930
  | state=S schedstat=( 0 0 0 ) utm=46441 stm=46582 core=1 HZ=100
  | stack=0x9fc04000-0x9fc06000 stackSize=1014KB
  | held mutexes=
  at com.android.server.net.NetworkPolicyManagerService.getNetworkPolicies(NetworkPolicyManagerService.java:1696)
  - waiting to lock <0x07439315> (a java.lang.Object) held by thread 35
  at android.net.INetworkPolicyManager$Stub.onTransact(INetworkPolicyManager.java:145)
  at android.os.Binder.execTransact(Binder.java:453
  被Tid=35的人拿住了一把锁0x07439315),那么继续看tid=35是谁有两种方法
  1. 搜索tid=35
  2. 搜索0x07439315找到  - locked <0x07439315> (a java.lang.Object)

  "NetworkPolicy" prio=5 tid=35 TimedWaiting
  | group="main" sCount=1 dsCount=0 obj=0x12d98940 self=0x9f91f700
  | sysTid=2415 nice=0 cgrp=default sched=0/0 handle=0xa0f33930
  | state=S schedstat=( 0 0 0 ) utm=7681 stm=7783 core=0 HZ=100
  | stack=0xa0e31000-0xa0e33000 stackSize=1038KB
  | held mutexes=
  at java.lang.Object.wait!(Native method)
  - waiting on <0x02580c1b> (a java.lang.Object)
  at java.lang.Thread.parkFor$(Thread.java:1220)
  - locked <0x02580c1b> (a java.lang.Object)
  at sun.misc.Unsafe.park(Unsafe.java:299)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2053)
  at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:372)
  at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.java:634)
  at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:426)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:345)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:340)
  at com.android.server.NetworkManagementService.setInterfaceQuota(NetworkManagementService.java:1712)
  - locked <0x0b0f91b8> (a java.lang.Object)
  at com.android.server.net.NetworkPolicyManagerService.setInterfaceQuota(NetworkPolicyManagerService.java:2421)
  at com.android.server.net.NetworkPolicyManagerService.updateNetworkRulesLocked(NetworkPolicyManagerService.java:1232)
  at com.android.server.net.NetworkPolicyManagerService$14.onReceive(NetworkPolicyManagerService.java:1060)
  - locked <0x07439315> (a java.lang.Object)
  at android.app.LoadedApk$ReceiverDispatcher$Args.run(LoadedApk.java:881)
  at android.os.Handler.handleCallback(Handler.java:739)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:148)
  at android.os.HandlerThread.run(HandlerThread.java:61)

  可以看到NetworkPolicy在通过NativeDaemonConnector和netd通信setInterfaceQuota
  我们结合log来看下是否有有用信息按照之前的经验netd在执行完cmd的时候会打印出slow operation
  在发生ANR的前后查找netd相关的打印
  06-19 15:29:00.997  1235  1270 I am_anr  : [0,22831,com.android.settings,818429509,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]
  06-19 15:29:05.683  1235  2415 E NetdConnector: NDC Command {55445 bandwidth setiquota seth_w0 9223372036854775807} took too long (4755ms
  06-19 15:29:05.723  1235  2491 I dvm_lock_sample: [system_server,1,Binder_4,4919,NetworkPolicyManagerService.java,1696,-,1056,100]
  从eventslog中还可以发现setting出现了两次连续的ANR而上下文中都有类似上面的dvm_lock_sample NetworkPolicyManagerService相关的告警  
  因此虽然15:28的这次ANR并没有打出有用的栈但是我们还是可以猜测出这两次ANR的原因都是netd的cmd耗时太久导致的在主线程的调用路径上存在必然的block
  那么在netd可能无法修改的情况下我们应该如何去resolve这个问题呢
  将可能存在block的操作放到非UI线程中去做
2. com.huawei.hwvplayer.youku对端block
打开dropbox中对应的system_app_anrxxxx:
     查看主线程的栈
  "main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x752b0000 self=0xb4276500
  | sysTid=25390 nice=-1 cgrp=default sched=3/0 handle=0xb6f18b34
  | state=S schedstat=( 0 0 0 ) utm=81 stm=12 core=2 HZ=100
  | stack=0xbe78b000-0xbe78d000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/25390/stack)
  native: #00 pc 000422d0  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 00047825  /system/lib/libc.so (ioctl+14)
  native: #02 pc 0001e835  /system/lib/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+132)
  native: #03 pc 0001ee93  /system/lib/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+38)
  native: #04 pc 0001f049  /system/lib/libbinder.so (_ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j+124)
  native: #05 pc 00019fe3  /system/lib/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+30)
  native: #06 pc 0008a035  /system/lib/libandroid_runtime.so (???)
  native: #07 pc 00d78869  /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+140)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:510)
  at android.os.storage.IMountService$Stub$Proxy.getVolumeList(IMountService.java:771)
  at android.os.storage.StorageManager.getVolumeList(StorageManager.java:883)
  at android.os.Environment$UserEnvironment.getExternalDirs(Environment.java:95)
  at android.os.Environment.getExternalStorageDirectory(Environment.java:354)
  at com.huawei.common.utils.PathUtils.<clinit>(PathUtils.java:51)
  at com.huawei.common.utils.PathUtils.getWorkspacePath(PathUtils.java:80)
  at com.huawei.common.components.log.Logger.<clinit>(Logger.java:37)
  at com.huawei.common.components.log.Logger.i(Logger.java:162)
  at com.huawei.hwvplayer.data.db.DbProvider.attachInfo(DbProvider.java:89)
  at android.app.ActivityThread.installProvider(ActivityThread.java:5279)
  at android.app.ActivityThread.installContentProviders(ActivityThread.java:4868)
  at android.app.ActivityThread.handleBindApplication(ActivityThread.java:4799)
  at android.app.ActivityThread.access$1600(ActivityThread.java:165)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1436)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:188)
  at android.app.ActivityThread.main(ActivityThread.java:5578)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)

  主线程尝试去调用MountService的 getVolumeList接口可能没有返回
  查看system_server中相关的搜索getVolumeList
  发现systemserver中有三个Binder线程和主线程被block我们的对端是哪个Binder线程暂时无法确认但这并不影响我们继续分析因为他们被blockd的路径是一致的


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x752b0000 self=0xb4276500
  | sysTid=22735 nice=-2 cgrp=default sched=0/0 handle=0xb6f18b34
  | state=S schedstat=( 0 0 0 ) utm=432 stm=85 core=1 HZ=100
  | stack=0xbe78b000-0xbe78d000 stackSize=8MB
  | held mutexes=
  at com.android.server.MountService.getVolumeList(MountService.java:2759)
  - waiting to lock <0x0eeb54f1> (a java.lang.Object) held by thread 40
  at android.os.storage.StorageManager.getVolumeList(StorageManager.java:883)
  at android.os.storage.StorageManager.getVolumeList(StorageManager.java:858)
  at android.os.storage.StorageManager.getPrimaryVolume(StorageManager.java:906)
  at com.android.server.usb.UsbDeviceManager.systemReady(UsbDeviceManager.java:327)
  at com.android.server.usb.UsbService.systemReady(UsbService.java:181)
  at com.android.server.usb.UsbService$Lifecycle.onBootPhase(UsbService.java:78)
  at com.android.server.SystemServiceManager.startBootPhase(SystemServiceManager.java:135)
  at com.android.server.SystemServer$3.run(SystemServer.java:1489)
  at com.android.server.am.ActivityManagerService.systemReady(ActivityManagerService.java:12417)
  at com.android.server.am.HwActivityManagerService.systemReady(HwActivityManagerService.java:960)
  at com.android.server.SystemServer.startOtherServices(SystemServer.java:1485)
  at com.android.server.SystemServer.run(SystemServer.java:381)
  at com.android.server.SystemServer.main(SystemServer.java:272)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)

  "Binder_8" prio=5 tid=76 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x13bd60a0 self=0x9c1abe00
  | sysTid=25191 nice=-1 cgrp=default sched=0/0 handle=0x97158930
  | state=S schedstat=( 0 0 0 ) utm=9 stm=7 core=2 HZ=100
  | stack=0x9705c000-0x9705e000 stackSize=1014KB
  | held mutexes=
  at com.android.server.MountService.getVolumeList(MountService.java:2759)
  - waiting to lock <0x0eeb54f1> (a java.lang.Object) held by thread 40
  at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1634)
  at android.os.Binder.execTransact(Binder.java:453)

  "Binder_2" prio=5 tid=8 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x12cac0a0 self=0xaebf0300
  | sysTid=22761 nice=-1 cgrp=default sched=0/0 handle=0xaef7d930
  | state=S schedstat=( 0 0 0 ) utm=41 stm=25 core=0 HZ=100
  | stack=0xaee81000-0xaee83000 stackSize=1014KB
  | held mutexes=
  at com.android.server.MountService.getVolumeList(MountService.java:2759)
  - waiting to lock <0x0eeb54f1> (a java.lang.Object) held by thread 40
  at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1634)
  at android.os.Binder.execTransact(Binder.java:453)

  他们均是被tid=40的人block按照上面的方法搜索tid=40或者 0x0eeb54f1得到block的人
  "MountService" prio=5 tid=40 TimedWaiting
  | group="main" sCount=1 dsCount=0 obj=0x132c1160 self=0x9ce57400
  | sysTid=23512 nice=0 cgrp=default sched=0/0 handle=0x9a239930
  | state=S schedstat=( 0 0 0 ) utm=29 stm=2 core=2 HZ=100
  | stack=0x9a137000-0x9a139000 stackSize=1038KB
  | held mutexes=
  at java.lang.Object.wait!(Native method)
  - waiting on <0x00fea1f3> (a java.lang.Object)
  at java.lang.Thread.parkFor$(Thread.java:1235)
  - locked <0x00fea1f3> (a java.lang.Object)
  at sun.misc.Unsafe.park(Unsafe.java:299)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2053)
  at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:372)
  at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.java:777)
  at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:489)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:386)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:381)
  at com.android.server.MountService.resetIfReadyAndConnectedLocked(MountService.java:827)
  at com.android.server.MountService.handleSystemReady(MountService.java:776)
  - locked <0x0eeb54f1> (a java.lang.Object)
  at com.android.server.MountService.access$500(MountService.java:152)
  at com.android.server.MountService$MountServiceHandler.handleMessage(MountService.java:596)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:150)
  at android.os.HandlerThread.run(HandlerThread.java:61)

  和上面netd类似的mountservice也是通过ndc和vold通信这里我们需要继续查看是否vold存在异常
  之前提到过类似这种同步锁block的dvm_lock_sample一定会有打印于是先去找eventslog不过这个是华为的log是没有eventlog的
  而华为实现了一个blockMonitor的功能和dvm_lock_sample类似当某个操作特别耗时的时候会将其打印出来
  在ANR的附近找到如下
 07-19 10:17:50.739 25271 25271 W BlockMonitor: The binder calling took 55209ms.
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.BlockMonitor.checkBinderTime(BlockMonitor.java:141)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.BinderProxy.transact(Binder.java:511)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.storage.IMountService$Stub$Proxy.getVolumeList(IMountService.java:771)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.storage.StorageManager.getVolumeList(StorageManager.java:883)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.Environment$UserEnvironment.getExternalDirs(Environment.java:95)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.Environment.getExternalStorageDirectory(Environment.java:354)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.support.v4.content.FileProvider.parsePathStrategy(FileProvider.java:583)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.support.v4.content.FileProvider.getPathStrategy(FileProvider.java:534)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.support.v4.content.FileProvider.attachInfo(FileProvider.java:352)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread.installProvider(ActivityThread.java:5279)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread.installContentProviders(ActivityThread.java:4868)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread.handleBindApplication(ActivityThread.java:4799)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread.access$1600(ActivityThread.java:165)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread$H.handleMessage(ActivityThread.java:1436)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.Handler.dispatchMessage(Handler.java:102)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.os.Looper.loop(Looper.java:188)
07-19 10:17:50.739 25271 25271 W BlockMonitor: android.app.ActivityThread.main(ActivityThread.java:5578)
07-19 10:17:50.739 25271 25271 W BlockMonitor: java.lang.reflect.Method.invoke(Native Method)
07-19 10:17:50.739 25271 25271 W BlockMonitor: com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
07-19 10:17:50.739 25271 25271 W BlockMonitor: com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)
再加上之前的
07-19 10:17:50.729 22735 23512 E NativeDaemonConnector.ResponseQueue: Timeout waiting for response
07-19 10:17:50.729 22735 23512 E VoldConnector: timed-out waiting for response to 4 volume reset
07-19 10:17:50.729 22735 23512 E VoldConnector: timed-out waiting for response mOutputStream = android.net.LocalSocketImpl$SocketOutputStream@922644, mSocket = vold
07-19 10:17:50.731 22735 23512 W MountService: Failed to reset vold
07-19 10:17:50.731 22735 23512 W MountService: com.android.server.NativeDaemonTimeoutException: command '4 volume reset' failed with 'null'
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:501)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:386)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:381)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.MountService.resetIfReadyAndConnectedLocked(MountService.java:827)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.MountService.handleSystemReady(MountService.java:776)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.MountService.access$500(MountService.java:152)
07-19 10:17:50.731 22735 23512 W MountService: 	at com.android.server.MountService$MountServiceHandler.handleMessage(MountService.java:596)
07-19 10:17:50.731 22735 23512 W MountService: 	at android.os.Handler.dispatchMessage(Handler.java:102)
07-19 10:17:50.731 22735 23512 W MountService: 	at android.os.Looper.loop(Looper.java:150)
07-19 10:17:50.731 22735 23512 W MountService: 	at android.os.HandlerThread.run(HandlerThread.java:61)
我们有理由去推断vold的状态是不对的但是又没有vold的栈
那么全局grep vold试试能不能找到线索在kernellog中发现了vold的异常
07-19 14:22:24.669 <3>[10772.492156] c0 Freezing of tasks failed after 20.008 seconds (1 tasks refusing to freeze, wq_busy=0):
07-19 14:22:24.669 <6>[10772.492217] c0 vold            R running      0   224      1 0x00000001
07-19 14:22:24.669 <4>[10772.492278] c0 [<c05ebecc>] (__schedule+0x38c/0x5bc) from [<c05ea478>] (schedule_timeout+0x18/0x1e8)
07-19 14:22:24.669 <4>[10772.492309] c0 [<c05ea478>] (schedule_timeout+0x18/0x1e8) from [<c05eb90c>] (wait_for_common+0x11c/0x164)
07-19 14:22:24.669 <4>[10772.492309] c0 [<c05eb90c>] (wait_for_common+0x11c/0x164) from [<c03cd8c8>] (mmc_wait_for_req+0xb4/0xe4)
07-19 14:22:24.669 <4>[10772.492339] c0 [<c03cd8c8>] (mmc_wait_for_req+0xb4/0xe4) from [<c03cd95c>] (mmc_wait_for_cmd+0x64/0x74)
07-19 14:22:24.669 <4>[10772.492370] c0 [<c03cd95c>] (mmc_wait_for_cmd+0x64/0x74) from [<c03d41f0>] (mmc_send_status+0x6c/0x8c)
07-19 14:22:24.670 <4>[10772.492400] c0 [<c03d41f0>] (mmc_send_status+0x6c/0x8c) from [<c03d4504>] (sd_send_status+0x14/0x44)
07-19 14:22:24.670 <4>[10772.492431] c0 [<c03d4504>] (sd_send_status+0x14/0x44) from [<c03d491c>] (mmc_lock_unlock_by_buf+0xac/0x168)
07-19 14:22:24.670 <4>[10772.492431] c0 [<c03d491c>] (mmc_lock_unlock_by_buf+0xac/0x168) from [<c03dabd8>] (mmc_lockable_store+0x594/0x75c)
07-19 14:22:24.670 <4>[10772.492461] c0 [<c03dabd8>] (mmc_lockable_store+0x594/0x75c) from [<c029d560>] (dev_attr_store+0x18/0x24)
07-19 14:22:32.070 <4>[10772.492492] c0 [<c029d560>] (dev_attr_store+0x18/0x24) from [<c013b370>] (sysfs_write_file+0x104/0x148)
07-19 14:22:32.070 <4>[10772.492522] c0 [<c013b370>] (sysfs_write_file+0x104/0x148) from [<c00eabb4>] (vfs_write+0xd0/0x180)
07-19 14:22:32.070 <4>[10772.492553] c0 [<c00eabb4>] (vfs_write+0xd0/0x180) from [<c00eb070>] (SyS_write+0x38/0x68)
07-19 14:22:32.071 <4>[10772.492583] c0 [<c00eb070>] (SyS_write+0x38/0x68) from [<c000e840>] (ret_fast_syscall+0x0/0x30)
vold一直在这个操作中没有退出来所以不能响应客户端的请求从而导致了ANR
这个问题需要mmc的同事进一步去分析目前怀疑是SD卡发生了错误
3. setting 主线程耗时操作
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x7425caf8 self=0xb4827800
  | sysTid=671 nice=0 cgrp=default sched=0/0 handle=0xb6febbec
  | state=S schedstat=( 0 0 0 ) utm=1853 stm=50 core=2 HZ=100
  | stack=0xbe7bd000-0xbe7bf000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/671/stack)
  native: #00 pc 000133cc  /system/lib/libc.so (syscall+28)
  native: #01 pc 000a9a83  /system/lib/libart.so (art::ConditionVariable::Wait(art::Thread*)+82)
  native: #02 pc 001b16f1  /system/lib/libart.so (art::JNI::NewString(_JNIEnv*, unsigned short const*, int)+640)
  native: #03 pc 00075887  /system/lib/libandroid_runtime.so (???)
  native: #04 pc 008570ab  /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_database_CursorWindow_nativeGetString__JII+110)
  at android.database.CursorWindow.nativeGetString(Native method)
  at android.database.CursorWindow.getString(CursorWindow.java:438)
  at android.database.AbstractWindowedCursor.getString(AbstractWindowedCursor.java:51)
  at android.database.CursorWrapper.getString(CursorWrapper.java:114)
  at com.android.settings.ApnSettings.fillList(ApnSettings.java:259)
  at com.android.settings.ApnSettings.onResume(ApnSettings.java:208)
  at android.app.Fragment.performResume(Fragment.java:2096)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:928)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1067)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1049)
  at android.app.FragmentManagerImpl.dispatchResume(FragmentManager.java:1879)
  at android.app.Activity.performResume(Activity.java:6113)
  at android.app.ActivityThread.performResumeActivity(ActivityThread.java:3015)
  at android.app.ActivityThread.handleResumeActivity(ActivityThread.java:3061)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2415)
  at android.app.ActivityThread.access$800(ActivityThread.java:151)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1313)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:135)
  at android.app.ActivityThread.main(ActivityThread.java:5345)
  at java.lang.reflect.Method.invoke!(Native method)
  at java.lang.reflect.Method.invoke(Method.java:372)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:947)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:742)


  从ApnSettings.java中可以看到fillList这个可能的耗时操作主线程处理的地方有OnResume和OnReceive
  且代码中存在使用HandlerThread异步处理这个操作的地方可以借鉴下将这两处都放到HandlerThread中去做