gcc编译轶事

手头有台车机,想在上面写个程序,调用它自带的图形库来处理图片。图片库为libJpeg.so,测试程序大致如下:

#include <stdio.h>
#include <stdlib.h>
#include "vfile.h"
#include "jpeglib.h"

int main(int argc, char *argv[])
{
    char *pFileName = argv[1];
    if (NULL == pFileName)
        return 0;
    VFILE *m_pInFile = new TDiskFile(pFileName, "rb");
    if (NULL == m_pInFile)
    {
        printf("error open\n");
        return 0;
    }
    struct jpeg_decompress_struct m_jds;
    jpeg_CreateDecompress(&m_jds, 62, 432);
    return 1;
}

jpeglib.h里面extern了jpeg_CreateDecompress,该函数来自libJpeg.so,所以如果要编译这段程序,想来正确的命令应该是(交叉编译,用到arm-linux-gnueabi-g++),提示的编译错误信息为:

$ arm-linux-gnueabi-g++ -o wrapper wrapper.cpp vfile.cpp -lJpeg
/usr/arm-linux-gnueabi/lib/libBasic.so: undefined reference to `dlopen'
/usr/arm-linux-gnueabi/lib/libBasic.so: undefined reference to `dlclose'
/usr/arm-linux-gnueabi/lib/libBasic.so: undefined reference to `dlerror'
/usr/arm-linux-gnueabi/lib/libBasic.so: undefined reference to `dlsym

蹊跷的是,错误信息出在libBasic.so,查看libJpeg.so和libBasic.so的依赖关系:

$ arm-linux-gnueabi-readelf -d /usr/arm-linux-gnueabi/lib/libJpeg.so 
Tag        Type                         Name/Value
0x00000001 (NEEDED)                     Shared library: [libBasic.so]
0x00000001 (NEEDED)                     Shared library: [libstdc++.so.6]
0x00000001 (NEEDED)                     Shared library: [libm.so.6]
0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED)                     Shared library: [libc.so.6]

$ arm-linux-gnueabi-readelf -d /usr/arm-linux-gnueabi/lib/libBasic.so 
0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
0x00000001 (NEEDED)                     Shared library: [libts_ipc_client_new.so]
0x00000001 (NEEDED)                     Shared library: [libiconv.so.2]
0x00000001 (NEEDED)                     Shared library: [libzmq.so.1]
0x00000001 (NEEDED)                     Shared library: [libuuid.so]
0x00000001 (NEEDED)                     Shared library: [libstdc++.so.6]
0x00000001 (NEEDED)                     Shared library: [libm.so.6]
0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED)                     Shared library: [libc.so.6]

libJpeg.so从libBasic.so导入了函数,但编译器尝试链入libBasic.so时,由于缺少对dlopen等函数的定义导致错误。查看libBasic.so,虽然内部使用了dlopen,但导入库并没有引入dlopen所属的libdl.so,所以下面的命令仍然失败:

$ arm-linux-gnueabi-g++ -o wrapper wrapper.cpp vfile.cpp -lJpeg -lBasic -ldl
/usr/lib/gcc-cross/arm-linux-gnueabi/5/../../../../arm-linux-gnueabi/lib/../lib/libBasic.so: undefined reference to `dlopen'
/usr/lib/gcc-cross/arm-linux-gnueabi/5/../../../../arm-linux-gnueabi/lib/../lib/libBasic.so: undefined reference to `dlclose'
/usr/lib/gcc-cross/arm-linux-gnueabi/5/../../../../arm-linux-gnueabi/lib/../lib/libBasic.so: undefined reference to `dlerror'
/usr/lib/gcc-cross/arm-linux-gnueabi/5/../../../../arm-linux-gnueabi/lib/../lib/libBasic.so: undefined reference to `dlsym'

最后的解决方法是在wrapper.cpp中自己手动写入dlopen的调用代码,强行引入libdl.so,之后用上面的命令行就编过了。

复现

一开始以为是libBasic.so文件被做了手脚。实际上,编译sharedlibrary时既可以用-l显式指明导入库,也可以什么都不写,这样编写好的so就不含有导入库信息了。比如sm2.c代码如下:

#include <stdio.h>
#include <dlfcn.h>
void foo2(void)
{
    void *hd = dlopen("test.so", RTLD_LAZY);
    puts("Hello, I'm shared library2");
}

两种编译方法结果如下:

$ arm-linux-gnueabi-gcc -shared -o sm2.so sm2.c
$ arm-linux-gnueabi-readelf -d sm2.so
Tag        Type                         Name/Value
0x00000001 (NEEDED)                     Shared library: [libc.so.6]
$ arm-linux-gnueabi-gcc -shared -o sm2.so sm2.c -ldl
$ arm-linux-gnueabi-readelf -d sm2.so
Tag        Type                         Name/Value
0x00000001 (NEEDED)                     Shared library: [libdl.so.2]
0x00000001 (NEEDED)                     Shared library: [libc.so.6]

也就是说如果有dlopen的调用操作,编译时是否有-ldl都可以编译通过,但产生的文件是不一样的。接下来再写一个sm.c调用sm2:

#include <stdio.h>
extern void foo2(void);
void foo(void)
{
    foo2();
    puts("Hello, I'm shared library1");
}

如果要构造libJpeg.so和libBasic.so相同的情形,编译sm的命令为:

$ arm-linux-gnueabi-gcc -shared -o sm.so sm.c -lsm2
$ arm-linux-gnueabi-readelf -d sm.so
Tag        Type                         Name/Value
0x00000001 (NEEDED)                     Shared library: [libsm2.so]
0x00000001 (NEEDED)                     Shared library: [libc.so.6]

之后写个调用程序,尝试调用sm.so就会复现这个问题:

#include <stdio.h>
#include <dlfcn.h>
extern void foo(void);
void main()
{
    foo();
}
$ arm-linux-gnueabi-gcc -o test test.c -lsm -lsm2 -ldl
/usr/lib/gcc-cross/arm-linux-gnueabi/5/../../../../arm-linux-gnueabi/lib/../lib/libsm2.so: undefined reference to `dlopen'
collect2: error: ld returned 1 exit status

但如果test.c改为:

#include <stdio.h>
#include <dlfcn.h>
extern void foo(void);
void main()
{
    void *hd = dlopen("any.so", RTLD_LAZY);
    foo();
}

上面的命令行就能编译过了。

结论

除了上面的方法,还可以在编译sm.so的时候,也选择不链入sm2:

$ arm-linux-gnueabi-gcc -shared -o sm.so sm.c 
$ arm-linux-gnueabi-readelf -d sm.so 
Tag        Type                         Name/Value
0x00000001 (NEEDED)                     Shared library: [libc.so.6]

这样即使不修改test.c,使用上述命令行也可以编译成功。值得注意的是,这种情况下-lsm -lsm2 -ldl的顺序是不能乱的,其他组合都不能编过。这样看来,很可能是开发者习惯不同,有人编译so时选择导入了其他第三方库,有人没有,这种不统一最后只能由可执行程序强行引入相关函数来修复。