Bootstrap

Linux系统调用__get_thread获取TLS失败导致应用程序奔溃

背景

Android模拟器运行在PC端,Android应用运行在模拟器内部,当PC机在BIOS中没有打开虚拟化技术(vt-x: intel的硬件虚拟化技术; AMD-V: AMD CPU的硬件虚拟化技术)的时候,在模拟器内部运行ARM库的游戏,出现崩溃或者运行一段时间之后崩溃的问题. 具体奔溃点在__get_tls()+6处. 这里以当乐.apk这个游戏为例子,删除其中libs下的x86库,只保留arm类型库文件,安装运行后整个崩溃日志如下:

03-27 15:51:21.236 E/ZKOPCountUtil( 3290): find Name = 当乐
03-27 15:51:21.344 D/dalvikvm( 4203): GC_CONCURRENT freed 1093K, 8% free 13255K/14404K, paused 15ms+12ms, total 212ms
03-27 15:51:21.344 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 96ms
03-27 15:51:21.348 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 89ms
03-27 15:51:21.360 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 96ms
03-27 15:51:21.404 W/View    ( 4203): requestLayout() improperly called by android.support.v7.widget.AppCompatTextView{52831f4c V.ED.... ......I. 20,0-148,91 #7f0d0438 app:id/expand_title} during layout: running second layout pass
03-27 15:51:21.584 D/Volley  ( 4203): [148] b.a: HTTP response for request=<[ ] http://res5.d.cn/cp/img/502487/o_1bbl6epie170sbec184qs9i1ggou.png 0x22e400ee LOW 2> [lifetime=4156], [size=67], [rc=200], [retryCount=0]
03-27 15:51:21.920 F/libc    ( 4203): Fatal signal 11 (SIGSEGV) at 0x24244c8d (code=1), thread 4246 (Thread-133)
03-27 15:51:22.044 I/DEBUG   (  112): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-27 15:51:22.044 I/DEBUG   (  112): Build fingerprint: 'SAMSUNG/hlteatt/hlteuc:4.4.4/tt/eng.jenkins.20170306.140753:userdebug/test-keys'
03-27 15:51:22.044 I/DEBUG   (  112): Revision: '0'
03-27 15:51:22.044 I/DEBUG   (  112): pid: 4203, tid: 4246, name: Thread-133  >>> com.diguayouxi <<<
03-27 15:51:22.044 I/DEBUG   (  112): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 24244c8d
03-27 15:51:23.412 D/dalvikvm(  784): GC_CONCURRENT freed 437K, 28% free 3834K/5272K, paused 7ms+6ms, total 75ms
03-27 15:51:23.568 I/GAv4-SVC( 2937): Google Analytics 8.7.03 is starting up.
03-27 15:51:23.884 I/DEBUG   (  112):     eax 24244c89  ebx b76b7fcc  ecx 00000018  edx 00004000
03-27 15:51:23.888 I/DEBUG   (  112):     esi b76c694c  edi 00000000
03-27 15:51:23.888 I/DEBUG   (  112):     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000003b  xss 0000007b
03-27 15:51:23.888 I/DEBUG   (  112):     eip b76343c6  ebp 00004000  esp 956396cc  flags 00210206
03-27 15:51:23.900 D/dalvikvm(  697): GC_CONCURRENT freed 489K, 12% free 4671K/5284K, paused 10ms+1ms, total 74ms
03-27 15:51:23.976 I/DEBUG   (  112):
03-27 15:51:23.976 I/DEBUG   (  112): backtrace:
03-27 15:51:23.984 I/DEBUG   (  112):     #00  pc 000183c6  /system/lib/libc.so (__get_thread+6)
03-27 15:51:23.984 I/DEBUG   (  112):     #01  pc 0000de2d  /system/lib/libc.so (pthread_mutex_lock+205)
03-27 15:51:23.988 I/DEBUG   (  112):     #02  pc 0005a745  /system/lib/libc.so (flockfile+37)
03-27 15:51:23.988 I/DEBUG   (  112):     #03  pc 0004651f  /system/lib/libc.so (fread+335)
03-27 15:51:23.992 I/DEBUG   (  112):     #04  pc 00075f6a  /system/lib/libc.so (android_getaddrinfo_proxy+1050)
03-27 15:51:23.996 I/DEBUG   (  112):     #05  pc 00078c30  /system/lib/libc.so (android_getaddrinfoforiface+1936)
03-27 15:51:24.000 I/DEBUG   (  112):     #06  pc 00078e97  /system/lib/libc.so (getaddrinfo+55)
03-27 15:51:24.000 I/DEBUG   (  112):     #07  pc 00037160  /system/lib/libjavacore.so (Posix_getaddrinfo(_JNIEnv*, _jobject*, _jstring*, _jobject*)+336)
03-27 15:51:24.000 I/DEBUG   (  112):     #08  pc 0002a4ab  /system/lib/libdvm.so (dvmPlatformInvoke+79)
03-27 15:51:24.008 I/DEBUG   (  112):     #09  pc 00077a27  [heap]
03-27 15:51:24.008 I/DEBUG   (  112):     #10  pc 00086da2  /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+434)
03-27 15:51:24.008 I/DEBUG   (  112):     #11  pc 001775b8  /system/lib/libdvm.so
03-27 15:51:24.008 I/DEBUG   (  112):     #12  pc 00003cf7  <unknown>
03-27 15:51:24.008 I/DEBUG   (  112):     #13  pc 0003b962  /system/lib/libdvm.so (dvmMterpStd(Thread*)+66)
03-27 15:51:24.008 I/DEBUG   (  112):     #14  pc 00037029  /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+217)
03-27 15:51:24.008 I/DEBUG   (  112):     #15  pc 000bd027  /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, char*)+759)
03-27 15:51:24.008 I/DEBUG   (  112):     #16  pc 000bd437  /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+55)
03-27 15:51:24.008 I/DEBUG   (  112):     #17  pc 000993c3  /system/lib/libdvm.so (interpThreadStart(void*)+995)
03-27 15:51:24.008 I/DEBUG   (  112):     #18  pc 0000bc3c  /system/lib/libc.so (__thread_entry+236)
03-27 15:51:24.008 I/DEBUG   (  112):     #19  pc 0003e1b5  /system/lib/libc.so (__pthread_clone+69)
03-27 15:51:24.008 I/DEBUG   (  112):     #20  pc 00098fdf  /system/lib/libdvm.so (internalThreadStart(void*)+655)
03-27 15:51:24.008 I/DEBUG   (  112):
03-27 15:51:24.008 I/DEBUG   (  112): stack:
03-27 15:51:24.008 I/DEBUG   (  112):          9563968c  b4db080e  /system/lib/libdvm.so (dvmMterp_OP_RETURN_VOID_BARRIER+158)
03-27 15:51:24.008 I/DEBUG   (  112):          95639690  b8cadbc0  [heap]
03-27 15:51:24.008 I/DEBUG   (  112):          95639694  00000001
03-27 15:51:24.008 I/DEBUG   (  112):          95639698  00000000
03-27 15:51:24.008 I/DEBUG   (  112):          9563969c  b7629f39  /system/lib/libc.so (pthread_mutex_unlock+25)
03-27 15:51:24.008 I/DEBUG   (  112):          956396a0  00000000
03-27 15:51:24.024 I/DEBUG   (  112):          956396a4  9db6fdee  /data/dalvik-cache/system@[email protected]@classes.dex
03-27 15:51:24.024 I/DEBUG   (  112):          956396a8  9563dce4
03-27 15:51:24.024 I/DEBUG   (  112):          956396ac  b7629fba  /system/lib/libc.so (pthread_mutex_unlock+154)
03-27 15:51:24.024 I/DEBUG   (  112):          956396b0  00000000
03-27 15:51:24.024 I/DEBUG   (  112):          956396b4  b8cadbd0  [heap]
03-27 15:51:24.024 I/DEBUG   (  112):          956396b8  9dd30518  /dev/ashmem/dalvik-LinearAlloc (deleted)
03-27 15:51:24.024 I/DEBUG   (  112):          956396bc  b7629fba  /system/lib/libc.so (pthread_mutex_unlock+154)
03-27 15:51:24.032 I/DEBUG   (  112):          956396c0  00004000
03-27 15:51:24.032 I/DEBUG   (  112):          956396c4  b8cae030  [heap]
03-27 15:51:24.036 I/DEBUG   (  112):          956396c8  b7629d69  /system/lib/libc.so (pthread_mutex_lock+9)
03-27 15:51:24.052 I/DEBUG   (  112):     #00  956396cc  b7629e2e  /system/lib/libc.so (pthread_mutex_lock+206)
03-27 15:51:24.052 I/DEBUG   (  112):     #01  956396d0  a59e7eec  /dev/ashmem/dalvik-heap (deleted)
03-27 15:51:24.052 I/DEBUG   (  112):          956396d4  b8ea6808  [heap]
03-27 15:51:24.052 I/DEBUG   (  112):          956396d8  b76bc718
03-27 15:51:24.052 I/DEBUG   (  112):          956396dc  b762ed4f  /system/lib/libc.so (dlmalloc+351)
03-27 15:51:24.052 I/DEBUG   (  112):          956396e0  b76bc800
03-27 15:51:24.052 I/DEBUG   (  112):          956396e4  b8cae030  [heap]
03-27 15:51:24.052 I/DEBUG   (  112):          956396e8  00004000
03-27 15:51:24.052 I/DEBUG   (  112):          956396ec  00004000
03-27 15:51:24.052 I/DEBUG   (  112):          956396f0  00000050
03-27 15:51:24.052 I/DEBUG   (  112):          956396f4  b8e2bee8  [heap]
03-27 15:51:24.052 I/DEBUG   (  112):          956396f8  b7629d69  /system/lib/libc.so (pthread_mutex_lock+9)
03-27 15:51:24.052 I/DEBUG   (  112):          956396fc  b76b7fcc  /system/lib/libc.so
03-27 15:51:24.052 I/DEBUG   (  112):          95639700  b8ea6808  [heap]
03-27 15:51:24.052 I/DEBUG   (  112):          95639704  00000001
03-27 15:51:24.052 I/DEBUG   (  112):          95639708  b76c63a0
03-27 15:51:24.052 I/DEBUG   (  112):          9563970c  b7676746  /system/lib/libc.so (flockfile+38)
03-27 15:51:24.052 I/DEBUG   (  112):     #02  95639710  b76c694c
03-27 15:51:24.052 I/DEBUG   (  112):          95639714  b8e2bee8  [heap]
03-27 15:51:24.052 I/DEBUG   (  112):          95639718  00001000
03-27 15:51:24.068 I/DEBUG   (  112):          9563971c  b76b7fcc  /system/lib/libc.so
03-27 15:51:24.068 I/DEBUG   (  112):          95639720  956397da  [stack:4246]
03-27 15:51:24.068 I/DEBUG   (  112):          95639724  b7676726  /system/lib/libc.so (flockfile+6)
03-27 15:51:24.068 I/DEBUG   (  112):          95639728  b76b7fcc  /system/lib/libc.so
03-27 15:51:24.068 I/DEBUG   (  112):          9563972c  b7662520  /system/lib/libc.so (fread+336)
03-27 15:51:24.296 I/DEBUG   (  112):
03-27 15:51:24.296 I/DEBUG   (  112): memory map around fault addr 24244c8d:
03-27 15:51:24.304 I/DEBUG   (  112):     1c142000-1c145000 rw-
03-27 15:51:24.308 I/DEBUG   (  112):     (no map for address)
03-27 15:51:24.308 I/DEBUG   (  112):     9296d000-9296e000 ---
03-27 15:51:24.940 I/PhenotypeConfigurator(  697): Scheduling Phenotype for one-off execution 667 seconds from now (1490601084941)
03-27 15:51:25.244 D/dalvikvm( 2937): GC_CONCURRENT freed 214K, 6% free 5252K/5572K, paused 19ms+26ms, total 106ms

问题定位

根据奔溃日志,找到相应的函数__get_tls(),在源码中实现如下:

//android-4.4.4\bionic\libc\arch-x86\bionic\__get_tls.c

/* see the implementation of __set_tls and pthread.c to understand this
 * code. Basically, the content of gs:[0] always is a pointer to the base
 * address of the tls region
 */
void*   __get_tls(void)
{
  void*  tls;
  asm ( "   movl  %%gs:0, %0" : "=r"(tls) );
  return tls;
}

从代码的注释可以看出,这个gs寄存器保存的是指向TLS(Thread Local Storage:线程本地存储)的基地址指针.用IDA能更加直观的看到奔溃的点.如下是用IDA打开libc.so的__get_tls()函数,那么在__get_tls()+6这行崩溃,也就是mov eax, [eax+4]间接取址崩溃.

.text:000183C0
.text:000183C0 ; =============== S U B R O U T I N E =======================================
.text:000183C0
.text:000183C0
.text:000183C0                 public __get_thread
.text:000183C0 __get_thread    proc near               ; CODE XREF: __pthread_cleanup_push+1Bp
.text:000183C0                                         ; __pthread_cleanup_pop+1Bp ...
.text:000183C0                 mov     eax, large gs:0
.text:000183C6                 mov     eax, [eax+4]
.text:000183C9                 nop
.text:000183CA                 nop
.text:000183CB                 nop
.text:000183CC                 nop
.text:000183CD                 retn
.text:000183CD __get_thread    endp

那么问题来了,eax是从gs寄存器读取的值,加4后间接寻址失败.这里gs寄存器的值肯定有问题,从奔溃日志的来看,eax寄存器的值就是gs:0的值,这里地址有问题.那么现在我们需要了解的是这个gs寄存器哪里设置,作用时啥?

既然代码注释说明了gs时存放tls基地址指针的,tls存放在内核GDT表中,那么这个gs应该是由内核来设置的.这里以x86的段分配为例子,段定义文件在asm\Segment.h中,如下:


// genymotion_kernel_3.10\arch\x86\include\asm\Segment.h

/*
 * The layout of the per-CPU GDT under Linux:
 *
 *   0 - null
 *   1 - reserved
 *   2 - reserved
 *   3 - reserved
 *
 *   4 - unused         <==== new cacheline
 *   5 - unused
 *
 *  ------- start of TLS (Thread-Local Storage) segments:
 *
 *   6 - TLS segment #1         [ glibc's TLS segment ]
 *   7 - TLS segment #2         [ Wine's %fs Win32 segment ]
 *   8 - TLS segment #3
 *   9 - reserved
 *  10 - reserved
 *  11 - reserved
 *
 *  ------- start of kernel segments:
 *
 *  12 - kernel code segment        <==== new cacheline
 *  13 - kernel data segment
 *  14 - default user CS
 *  15 - default user DS
 *  16 - TSS
 *  17 - LDT
 *  18 - PNPBIOS support (16->32 gate)
 *  19 - PNPBIOS support
 *  20 - PNPBIOS support
 *  21 - PNPBIOS support
 *  22 - PNPBIOS support
 *  23 - APM BIOS support
 *  24 - APM BIOS support
 *  25 - APM BIOS support
 *
 *  26 - ESPFIX small SS
 *  27 - per-cpu            [ offset to per-cpu data area ]
 *  28 - stack_canary-20        [ for stack protector ]
 *  29 - unused
 *  30 - unused
 *  31 - TSS for double fault handler
 */

 ... ...
 //省去部分代码


 /*
 * Save a segment register aw
 */
#define savesegment(seg, value)             \
    asm("mov %%" #seg ",%0":"=r" (value) : : "memory")

/*
 * x86_32 user gs accessors.
 */
#ifdef CONFIG_X86_32
#ifdef CONFIG_X86_32_LAZY_GS
#define get_user_gs(regs)   (u16)({unsigned long v; savesegment(gs, v); v;})
#define set_user_gs(regs, v)    loadsegment(gs, (unsigned long)(v))
#define task_user_gs(tsk)   ((tsk)->thread.gs)
#define lazy_save_gs(v)     savesegment(gs, (v))
#define lazy_load_gs(v)     loadsegment(gs, (v))
#else   /* X86_32_LAZY_GS */
#define get_user_gs(regs)   (u16)((regs)->gs)
#define set_user_gs(regs, v)    do { (regs)->gs = (v); } while (0)
#define task_user_gs(tsk)   (task_pt_regs(tsk)->gs)
#define lazy_save_gs(v)     do { } while (0)
#define lazy_load_gs(v)     do { } while (0)
#endif  /* X86_32_LAZY_GS */
#endif  /* X86_32 */

问题解决

从上表可以看出整个GDT的分段,其中包括TLS段,关键的是在最后有关获取gs寄存器值的方法.可以看到,在内核配置了CONFIG_X86_32的情况下,有两个获取gs寄存器值的方法,依赖于内核中宏CONFIG_X86_32_LAZY_GS的定义与否.

通过查看内核中CONFIG_X86_32_LAZY_GS的定义,发现处于选中状态,那么此时gs的值是从局部变量v中赋值给gs的,这个时候局部变量的值由于没有初始化,所以为一个随机值.如果没有选CONFIG_X86_32_LAZY_GS,那么直接获取gs寄存器的值返回,这是regs的值在哪里设置gs暂且不表.看到这里也许还是不明白gs在整个内核中的作用以及流程.没有关系,后续在深入. 至于解决这个问题,由于发现CONFIG_X86_32_LAZY_GS对获取gs寄存器的影响,配置内核,去除CONFIG_X86_32_LAZY_GS选项,重编后验证,当乐.apk正常运行.说明此配置影响gs寄存器的取值.

解决patch如下,合入x86的deconfig配置文件即可:


@@ -37,7 +37,6 @@ CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
 CONFIG_HAVE_INTEL_TXT=y
 CONFIG_X86_32_SMP=y
 CONFIG_X86_HT=y
-CONFIG_X86_32_LAZY_GS=y
 CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
 CONFIG_ARCH_CPU_PROBE_RELEASE=y
 CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -452,7 +451,7 @@ CONFIG_ARCH_RANDOM=y
 CONFIG_X86_SMAP=y
 # CONFIG_EFI is not set
 # CONFIG_SECCOMP is not set
-# CONFIG_CC_STACKPROTECTOR is not set
+CONFIG_CC_STACKPROTECTOR=y
 # CONFIG_HZ_100 is not set
 CONFIG_HZ_250=y
 # CONFIG_HZ_300 is not set
  • 上述CONFIG_X86_32_LAZY_GSCONFIG_CC_STACKPROTECTOR是依赖关系,去除CONFIG_X86_32_LAZY_GS配置需要选择CONFIG_CC_STACKPROTECTOR=y
  • 如果打开上述内核配置选项出现内核编译错误error: undefined reference to '__stack_chk_guard',请参考本人的另外一篇文章: Linux编译x86架构内核出现_stack_chk_guard未定义错误

总结

好了,此问题解决了,但是还有很多疑点没有搞清楚,这个最要命了,作为开发,不了解整个流程总是心里没底,不踏实.但是还是得慢慢来,后续就是对整个GDT以及内存进行学习


感谢

2017 …… ,卷起裤管跑,撸起袖子干!

yanxiangyfg的专栏 : “忠于实践,记录点滴”


;