KN Codesize优化
KN Codesize优化
编译参数调整
KN 自带参数
- KN 为苹果手表 target 上了一个 smallBinary 选项,仅用于release build,效果是不进行部分 inline 和将 LLVM IR 上的优化级别从 o3 改为 oz。开启方式:
binaryOption("smallBinary", "true") - LLVM IR 上的 inline 会将比较小的函数复制到调用点,增加 codesize 换更好的运行性能,关闭 inline 或减小最大允许 inline 函数的行数阈值可以减小 codesize,开启方式:
binaryOption("inlineThreshold", "0") - kotlin 2.2 新上的实验特性8位字符串,开启后如果字符串中所有字符都是0~255范围会使用 latin-1 编码,可以通过这个选项减小 so 中字符串常量的大小,开启方式:
binaryOption("latin1Strings", "true")
1
2
3
4
5
6
7
8
9
10
11
12
13
(default) ➜ ovCompose-sample git:(main) ✗ ls -l baseline.so sb.so sb-noinline.so sb-noinline-latin1.so
-rwxr-xr-x@ 1 ohoskt staff 13333168 Feb 26 00:29 baseline.so
-rwxr-xr-x@ 1 ohoskt staff 12606928 Feb 26 01:16 sb-noinline-latin1.so
-rwxr-xr-x@ 1 ohoskt staff 11919840 Feb 26 00:49 sb-noinline.so
-rwxr-xr-x@ 1 ohoskt staff 12606928 Feb 26 00:32 sb.so
(default) ➜ ovCompose-sample git:(main) ✗ python
Python 3.12.12 | packaged by conda-forge | (main, Jan 27 2026, 00:01:15) [Clang 19.1.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> (13333168 - 12606928) / 13333168
0.0544686754115751
>>> (13333168 - 11919840) / 13333168
0.1060009144113387
>>>
ovcomposeSample 上用前两个选项不 inline+oz 优化级别可以用运行性能换到 10% 的 codesize 下降,latin1选项还需要研究
–pack-dyn-relocs=relr
编译时无法得知运行时 so 会被加载到什么地址,因此在 PIE 的 exe 和 so 中会有 relative relocation,大概就是编译器告诉链接器在 so 里的 x 位置我写的地址是一个相对 so 开头的地址,加载这个 so 的时候麻烦把这些地址换成 so 的加载基地址 + 我写的这个 offset。relo 还有其他情况不过这种居多。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// REL
typedef struct {
Elf64_Addr r_offset; // where to apply relocation
Elf64_Xword r_info; // type + symbol index
} Elf64_Rel;
// RELA
typedef struct {
Elf64_Addr r_offset; // where to apply relocation
Elf64_Xword r_info; // type + symbol index // 这里rela记录的type是固定枚举,index也固定是0
Elf64_Sxword r_addend;// addend // 和r_offset相同
} Elf64_Rela;
// RELR:只是一个word,没有结构体
Elf64_Xword;
RELR 格式中最低位是0的记录是相对 so 开头的地址,这个地址本身是一个relocation。最低位是1的记录是一个 bitmap,代表地址记录+多少个 word 的位置是一个 relocation。一个地址后面可以接多个bitmap,如
【地址 A】 【bitmap 1】 【bitmap 2】 【地址 B】 【bitmap 3】
对于非常稀疏的relocation也可以有连续地址
【地址 A】 【地址 B】 【bitmap 3】
llvm/lib/Object/ELF.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
template <class ELFT>
std::vector<typename ELFT::Rel>
ELFFile<ELFT>::decode_relrs(Elf_Relr_Range relrs) const {
// This function decodes the contents of an SHT_RELR packed relocation
// section.
//
// Proposal for adding SHT_RELR sections to generic-abi is here:
// https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg
//
// The encoded sequence of Elf64_Relr entries in a SHT_RELR section looks
// like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
//
// i.e. start with an address, followed by any number of bitmaps. The address
// entry encodes 1 relocation. The subsequent bitmap entries encode up to 63
// relocations each, at subsequent offsets following the last address entry.
//
// The bitmap entries must have 1 in the least significant bit. The assumption
// here is that an address cannot have 1 in lsb. Odd addresses are not
// supported.
//
// Excluding the least significant bit in the bitmap, each non-zero bit in
// the bitmap represents a relocation to be applied to a corresponding machine
// word that follows the base address word. The second least significant bit
// represents the machine word immediately following the initial address, and
// each bit that follows represents the next word, in linear order. As such,
// a single bitmap can encode up to 31 relocations in a 32-bit object, and
// 63 relocations in a 64-bit object.
//
// This encoding has a couple of interesting properties:
// 1. Looking at any entry, it is clear whether it's an address or a bitmap:
// even means address, odd means bitmap.
// 2. Just a simple list of addresses is a valid encoding.
Elf_Rel Rel;
Rel.r_info = 0;
Rel.setType(getRelativeRelocationType(), false);
std::vector<Elf_Rel> Relocs;
// Word type: uint32_t for Elf32, and uint64_t for Elf64.
using Addr = typename ELFT::uint;
Addr Base = 0;
for (Elf_Relr R : relrs) {
typename ELFT::uint Entry = R;
if ((Entry & 1) == 0) {
// Even entry: encodes the offset for next relocation.
Rel.r_offset = Entry;
Relocs.push_back(Rel);
// Set base offset for subsequent bitmap entries.
Base = Entry + sizeof(Addr);
} else {
// Odd entry: encodes bitmap for relocations starting at base.
for (Addr Offset = Base; (Entry >>= 1) != 0; Offset += sizeof(Addr))
if ((Entry & 1) != 0) {
Rel.r_offset = Offset;
Relocs.push_back(Rel);
}
Base += (CHAR_BIT * sizeof(Entry) - 1) * sizeof(Addr);
}
}
return Relocs;
}
- 优化前:RELA 需要用 3 word 表达一个 relocation
优化后:RELR 用两个 word 表达一个 relocation,这样优化到原来的 1 / 3 = 33%。如果有连续的 relocation,最极端压缩比趋近100%
- 优化前:.rela.dyn 包含相对重定位和非相对重定位
- 优化后:.rela.dyn 只包含非相对重定位,.relr.dyn包含所有相对重定位
KN 导出优化
TODO
cinterop静态库相关最佳实践
kn的so中如果有链进去静态库,建议参考这篇
All Rights Reserved.