Post

KN Codesize优化

KN Codesize优化

编译参数调整

KN 自带参数

  • KN 为苹果手表 target 上了一个 smallBinary 选项,仅用于release build,效果是不进行部分 inline 和将 LLVM IR 上的优化级别从 o3 改为 oz。开启方式:binaryOption("smallBinary", "true")
  • LLVM IR 上的 inline 会将比较小的函数复制到调用点,增加 codesize 换更好的运行性能,关闭 inline 或减小最大允许 inline 函数的行数阈值可以减小 codesize,开启方式:binaryOption("inlineThreshold", "0")
  • kotlin 2.2 新上的实验特性8位字符串,开启后如果字符串中所有字符都是0~255范围会使用 latin-1 编码,可以通过这个选项减小 so 中字符串常量的大小,开启方式:binaryOption("latin1Strings", "true")
1
2
3
4
5
6
7
8
9
10
11
12
13
(default) ➜  ovCompose-sample git:(main)ls -l baseline.so sb.so sb-noinline.so sb-noinline-latin1.so                                                
-rwxr-xr-x@ 1 ohoskt  staff  13333168 Feb 26 00:29 baseline.so
-rwxr-xr-x@ 1 ohoskt  staff  12606928 Feb 26 01:16 sb-noinline-latin1.so
-rwxr-xr-x@ 1 ohoskt  staff  11919840 Feb 26 00:49 sb-noinline.so
-rwxr-xr-x@ 1 ohoskt  staff  12606928 Feb 26 00:32 sb.so
(default) ➜  ovCompose-sample git:(main) ✗ python 
Python 3.12.12 | packaged by conda-forge | (main, Jan 27 2026, 00:01:15) [Clang 19.1.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> (13333168 - 12606928) / 13333168
0.0544686754115751
>>> (13333168 - 11919840) / 13333168
0.1060009144113387
>>> 

ovcomposeSample 上用前两个选项不 inline+oz 优化级别可以用运行性能换到 10% 的 codesize 下降,latin1选项还需要研究

–pack-dyn-relocs=relr

编译时无法得知运行时 so 会被加载到什么地址,因此在 PIE 的 exe 和 so 中会有 relative relocation,大概就是编译器告诉链接器在 so 里的 x 位置我写的地址是一个相对 so 开头的地址,加载这个 so 的时候麻烦把这些地址换成 so 的加载基地址 + 我写的这个 offset。relo 还有其他情况不过这种居多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// REL
typedef struct {
  Elf64_Addr  r_offset; // where to apply relocation
  Elf64_Xword r_info;   // type + symbol index
} Elf64_Rel;

// RELA
typedef struct {
  Elf64_Addr  r_offset; // where to apply relocation
  Elf64_Xword r_info;   // type + symbol index // 这里rela记录的type是固定枚举,index也固定是0
  Elf64_Sxword r_addend;// addend // 和r_offset相同
} Elf64_Rela;

// RELR:只是一个word,没有结构体
Elf64_Xword;

RELR 格式中最低位是0的记录是相对 so 开头的地址,这个地址本身是一个relocation。最低位是1的记录是一个 bitmap,代表地址记录+多少个 word 的位置是一个 relocation。一个地址后面可以接多个bitmap,如

【地址 A】 【bitmap 1】 【bitmap 2】 【地址 B】 【bitmap 3】

对于非常稀疏的relocation也可以有连续地址

【地址 A】 【地址 B】 【bitmap 3】

llvm/lib/Object/ELF.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
template <class ELFT>
std::vector<typename ELFT::Rel>
ELFFile<ELFT>::decode_relrs(Elf_Relr_Range relrs) const {
  // This function decodes the contents of an SHT_RELR packed relocation
  // section.
  //
  // Proposal for adding SHT_RELR sections to generic-abi is here:
  //   https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg
  //
  // The encoded sequence of Elf64_Relr entries in a SHT_RELR section looks
  // like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
  //
  // i.e. start with an address, followed by any number of bitmaps. The address
  // entry encodes 1 relocation. The subsequent bitmap entries encode up to 63
  // relocations each, at subsequent offsets following the last address entry.
  //
  // The bitmap entries must have 1 in the least significant bit. The assumption
  // here is that an address cannot have 1 in lsb. Odd addresses are not
  // supported.
  //
  // Excluding the least significant bit in the bitmap, each non-zero bit in
  // the bitmap represents a relocation to be applied to a corresponding machine
  // word that follows the base address word. The second least significant bit
  // represents the machine word immediately following the initial address, and
  // each bit that follows represents the next word, in linear order. As such,
  // a single bitmap can encode up to 31 relocations in a 32-bit object, and
  // 63 relocations in a 64-bit object.
  //
  // This encoding has a couple of interesting properties:
  // 1. Looking at any entry, it is clear whether it's an address or a bitmap:
  //    even means address, odd means bitmap.
  // 2. Just a simple list of addresses is a valid encoding.

  Elf_Rel Rel;
  Rel.r_info = 0;
  Rel.setType(getRelativeRelocationType(), false);
  std::vector<Elf_Rel> Relocs;

  // Word type: uint32_t for Elf32, and uint64_t for Elf64.
  using Addr = typename ELFT::uint;

  Addr Base = 0;
  for (Elf_Relr R : relrs) {
    typename ELFT::uint Entry = R;
    if ((Entry & 1) == 0) {
      // Even entry: encodes the offset for next relocation.
      Rel.r_offset = Entry;
      Relocs.push_back(Rel);
      // Set base offset for subsequent bitmap entries.
      Base = Entry + sizeof(Addr);
    } else {
      // Odd entry: encodes bitmap for relocations starting at base.
      for (Addr Offset = Base; (Entry >>= 1) != 0; Offset += sizeof(Addr))
        if ((Entry & 1) != 0) {
          Rel.r_offset = Offset;
          Relocs.push_back(Rel);
        }
      Base += (CHAR_BIT * sizeof(Entry) - 1) * sizeof(Addr);
    }
  }

  return Relocs;
}
  • 优化前:RELA 需要用 3 word 表达一个 relocation
  • 优化后:RELR 用两个 word 表达一个 relocation,这样优化到原来的 1 / 3 = 33%。如果有连续的 relocation,最极端压缩比趋近100%

  • 优化前:.rela.dyn 包含相对重定位和非相对重定位
  • 优化后:.rela.dyn 只包含非相对重定位,.relr.dyn包含所有相对重定位

KN 导出优化

TODO

cinterop静态库相关最佳实践

kn的so中如果有链进去静态库,建议参考这篇

All Rights Reserved.