website/slides/2025-10-10-how-rust-compiles/index.html
Noratrieb 9dc632b657
Some checks are pending
test / test (push) Waiting to run
miao
2025-10-03 23:31:53 +02:00

781 lines
24 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"
/>
<title>how rust compiles</title>
<link rel="stylesheet" href="../dist/reset.css" />
<link rel="stylesheet" href="../dist/reveal.css" />
<link rel="stylesheet" href="../dist/theme/black.css" />
<link rel="stylesheet" type="text/css" href="asciinema-player.css" />
<script src="asciinema-player.min.js"></script>
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../plugin/highlight/monokai.css" />
<style>
* {
--r-heading-text-transform: initial;
}
</style>
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<h2>how Rust compiles</h2>
<h4>i promise it's actually doing something useful while you wait</h4>
</section>
<section style="height: 100%">
<div style="display: flex; align-items: flex-start; height: 100%">
<details>
<summary>the rust compilation model has surprising effects</summary>
<iframe
height="600"
width="800"
src="https://play.integer32.com/?version=stable&mode=debug&edition=2024&gist=7ac62650fd0b942ae5952b0027e0c1ce"
referrerpolicy="no-referrer"
></iframe>
</details>
</div>
</section>
<section>
<h2>introduction to myself</h2>
<div style="display: flex">
<div>
<div>Noratrieb (she/her)</div>
contributing to the compiler since 2021
<br />
</div>
<div>
<img src="me.png" height="300" />
</div>
</div>
</section>
<section data-markdown>
<textarea data-template>
# speed 🚀
</textarea>
</section>
<section>
<!-- cargo build -v -j1 -->
<div id="cargo-build-v-asciinema-player"></div>
<script>
AsciinemaPlayer.create(
"cargo-build-v.cast",
document.getElementById("cargo-build-v-asciinema-player"),
{
cols: 134,
rows: 36,
}
);
</script>
</section>
<section>
<h2>what does rustc like, do?</h2>
<h4>a quick overview of the compilation phases</h4>
</section>
<section>
<h2>the frontend and the backend</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph compiler
frontend --> backend
end
source --> frontend
backend --> binary
</pre>
</div>
</section>
<section>
<h2>it all starts at the source</h2>
<pre><code data-trim>
pub fn add(a: u8, b: u8) -> u8 {
a.wrapping_add(b)
}
</code></pre>
</section>
<section>
<h2>it gets processed</h2>
<pre><code data-trim>
#[attr = MacroUse {arguments: UseAll}]
extern crate std;
#[prelude_import]
use std::prelude::rust_2024::*;
fn add(a: u8, b: u8) -> u8 { a.wrapping_add(b) }
</code></pre>
</section>
<section>
<h2>until it doesn't even look like Rust anymore</h2>
<p>MIR</p>
<img src="add-runtime-mir.svg" />
</section>
<section data-markdown>
<textarea data-template>
## further going to LLVM IR
```
; meow::add
define noundef i8 @add(i8 noundef %a, i8 noundef %b) #0 {
start:
%_0 = add i8 %b, %a
ret i8 %_0
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and then you're done
```
&lt;add&gt;:
lea (%rsi,%rdi,1),%eax
ret
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## ok but why does my program compile so slowly now?
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## it's often LLVMs fault
- but like not really
</textarea>
</section>
<section>
<h2>a crate - the compilation unit</h2>
<p>quite big</p>
<p>in C it's just a single file</p>
</section>
<section>
<h2>a codegen unit</h2>
<p>LLVM is single-threaded</p>
<p>rustc: hi LLVM, look we are like a C file, now be fast</p>
<p>~1-256 depending on size and configuration (⚙️)</p>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
crate
crate --> cgu1["Codegen-Unit 1"]
crate --> cgu2["Codegen-Unit 2"]
crate --> cgu3["Codegen-Unit 3"]
crate --> cgu4["Codegen-Unit 4"]
</pre>
</div>
</section>
<section>
<h2>codegen units</h2>
<pre><code>
fn main() {}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
mainmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
mycgu1 --> |LLVM| mycgu1.rcgu.o
mainmir --> mainll
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
</pre>
</div>
</section>
<section>
<h2>codegen units (but more)</h2>
<pre><code>
fn main() {}
fn foo1() {}
fn foo2() {}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
mainmir["main (MIR)"]
foo1mir["foo1 (MIR)"]
foo2mir["foo2 (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
subgraph mycgu2[my CGU 2]
foo1ll["foo1 (LLVM IR)"]
foo2ll["foo2 (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
mycgu2 --> mycgu2.rcgu.o
mainmir --> mainll
foo1mir --> foo1ll
foo2mir --> foo2ll
mycgu1.rcgu.o --> my_binary
mycgu2.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
</pre>
</div>
</section>
<section>
<h2>codegen units (cross-crate)</h2>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code>
fn add() {}
</code></pre>
<pre><code>
fn main() { math::add() }
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
subgraph mathcgu1[math CGU 1]
addll["add (LLVM IR)"]
end
addmir --> addll
mathcgu1 --> mathcgu1.rcgu.o
mathcgu1.rcgu.o --> libmath.rlib
end
subgraph my crate
mainmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
mainmir --> mainll
mycgu1.rcgu.o --> my_binary
libmath.rlib --> my_binary
std["std (and others)"] --> my_binary
end
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
# so compile times just depend on the amount of functions?
- yes...
- but not source functions!
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## generics
```rust
fn add<T: Add>(a: T, b: T) -> T::Output {
a + b
}
add(0_u16, 0_u16); // creates add<u16> function
add(0_u32, 0_u32); // creates add<u32> function
```
</textarea>
</section>
<section>
<h2>instantiating generics</h2>
<pre><code data-trim class="language-rust">
fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output { a + b }
fn main() {
add(0_u16, 0_u16);
add(0_u32, 0_u32);
}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
addmir["add (MIR)"]
useitmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
addu16ll["add_u16 (LLVM IR)"]
addu32ll["add_u32 (LLVM IR)"]
useitll["main (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
addmir -->|instantiate with T=u16| addu16ll
addmir -->|instantiate with T=u32| addu32ll
useitmir --> useitll
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
</pre>
</div>
</section>
<section>
<h2>generics (cross-crate)</h2>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code data-trim class="language-rust">
pub fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output {
a + b
}
</code></pre>
<pre><code data-trim class="language-rust">
fn main() {
math::add(0_u16, 0_u16);
math::add(0_u32, 0_u32);
}
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
end
subgraph my crate
mainmir["main (MIR)"]
end
subgraph my crate
subgraph mycgu1[my CGU 1]
addu16ll["add_u16 (LLVM IR)"]
addu32ll["add_u32 (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addu16ll
addmir --> addu32ll
mycgu1 --> mycgu1.rcgu.o
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
end
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
# generics are slow to compile
- spend N times optimizing the function
- and there's duplicate instances!
- `cargo-llvm-lines`
</textarea>
</section>
<section data-markdown>
<textarea data-template>
# and the duplicates get worse
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## inlining
```rust
fn add(a: u8, b: u8) -> u8 {
a + b
}
fn main() {
let x = add(1, 4);
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## let's inline it
```rust
fn add(a: u8, b: u8) -> u8 {
a + b
}
fn main() {
let x = {
let a = 1;
let b = 4;
a + b
};
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## cross-crate inlining
```rust
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
```
```rust
fn main() {
let x = math::add(1, 4); // what is the body?...
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## #[inline] to the rescue
```rust
#[inline]
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
```
```rust
fn main() {
let x = math::add(1, 4); // 💡 it's a + b
println!("{x}");
}
```
</textarea>
</section>
<section>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code data-trim class="language-rust">
#[inline]
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
</code></pre>
<pre><code data-trim class="language-rust">
fn main() {
let x = math::add(1, 4);
println!("{x}");
}
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
end
subgraph my crate
mainmir["main (MIR)"]
end
subgraph my crate
subgraph mycgu1[my CGU 1]
addll["add (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addll
mycgu1 --> mycgu1.rcgu.o
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
end
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
## `#[inline]` (⚙️)
- for non-generic functions
- for very small functions, this happens automatically
- for other functions, it doesn't, because it would be slow
- don't over-apply it in a library, but also don't forget about it
- benchmark!
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## being lazy has advantages
- that's why i wrote most of this talk last week
- `#[inline]` means that the function is *never* instantiated if it's never used!
https://blog.rust-lang.org/inside-rust/2025/07/15/call-for-testing-hint-mostly-unused/
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## but performance is great, i love performance
- its ok i can wait forever
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## link-time optimization (LTO)
- optimizes everything in your program together at the end
- breaks crate boundaries
- is awesome
- is slow
- comes in many forms
</textarea>
</section>
<!--
# r-a
base:
Benchmark 1: cargo build --release
Time (mean ± σ): 58.150 s ± 0.163 s [User: 758.211 s, System: 37.637 s]
Range (min … max): 57.936 s … 58.321 s 5 runs
thin:
Benchmark 1: cargo build --release
Time (mean ± σ): 63.999 s ± 0.105 s [User: 879.703 s, System: 40.045 s]
Range (min … max): 63.921 s … 64.182 s 5 runs
fat:
Time (mean ± σ): 264.606 s ± 2.238 s [User: 570.800 s, System: 31.826 s]
Range (min … max): 261.573 s … 267.297 s 5 runs
# cargo
base:
Benchmark 1: cargo build --release
Time (mean ± σ): 89.381 s ± 0.460 s [User: 689.874 s, System: 55.347 s]
Range (min … max): 88.605 s … 89.696 s 5 runs
thin:
Benchmark 1: cargo build --release
Time (mean ± σ): 91.208 s ± 0.610 s [User: 757.353 s, System: 58.558 s]
Range (min … max): 90.415 s … 92.112 s 5 runs
fat:
Time (mean ± σ): 212.215 s ± 2.062 s [User: 576.259 s, System: 50.961 s]
Range (min … max): 208.662 s … 213.818 s 5 runs
# ripgrep
base:
Time (mean ± σ): 7.507 s ± 0.223 s [User: 64.115 s, System: 4.514 s]
Range (min … max): 7.357 s … 7.882 s 5 runs
thin:
Time (mean ± σ): 9.285 s ± 0.019 s [User: 81.101 s, System: 5.241 s]
Range (min … max): 9.262 s … 9.308 s 5 runs
fat:
Time (mean ± σ): 29.202 s ± 0.279 s [User: 51.015 s, System: 3.652 s]
Range (min … max): 28.860 s … 29.574 s 5 runs
# triagebot
base:
Time (mean ± σ): 74.532 s ± 0.378 s [User: 766.778 s, System: 58.719 s]
Range (min … max): 74.105 s … 75.109 s 5 runs
thin:
Time (mean ± σ): 89.505 s ± 0.299 s [User: 1523.951 s, System: 102.429 s]
Range (min … max): 89.024 s … 89.796 s 5 runs
fat:
Time (mean ± σ): 273.275 s ± 1.694 s [User: 929.604 s, System: 65.856 s]
Range (min … max): 271.007 s … 275.619 s 5 runs
-->
<section>
<h2>lto = "fat" (monolithic)</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (LLVM IR)"]
end
subgraph crate math2
submir["sub (LLVM IR)"]
end
subgraph my crate
mainmir["main (LLVM IR)"]
end
subgraph my crate
subgraph fatlto[fat LTO]
addll["add (LLVM IR)"]
subll["sub (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addll
submir --> subll
fatlto --> my_binary
end
</pre>
</div>
<p>compiles r-a 237583957% more slowly</p>
</section>
<section>
<h2>lto = "thin" (sharded)</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (LLVM IR)"]
end
subgraph crate math2
submir["sub (LLVM IR)"]
end
subgraph my crate
mainmir["main (LLVM IR)"]
end
subgraph my crate
subgraph thinltosummary[ThinLTO Summary]
end
subgraph thinlto1[ThinLTO 1]
addll["add (LLVM IR)"]
subll["sub (LLVM IR)"]
end
subgraph thinlto2[ThinLTO 2]
mainll["main (LLVM IR)"]
end
mainmir --> thinltosummary
addmir --> thinltosummary
submir --> thinltosummary
thinltosummary --> mainll
thinltosummary --> addll
thinltosummary --> subll
thinlto1 --> my_binary
thinlto2 --> my_binary
end
</pre>
</div>
<p>compiles r-a 70% more slowly</p>
</section>
<section data-markdown>
<textarea data-template>
## linker-plugin-lto
- fat LTO style and thin LTO style
- the merging is not done by rustc but by the linker
- works across languages (Rust (rustc) + C (clang))
- great for FFI
- annoying to set up
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## there was some LTO all along
- in release mode, automatic ThinLTO across codegen units in the same crate
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## how do i make my program run quickly?
- let the compiler inline functions
- libraries: remember `#[inline]`
- binaries: you want LTO
- really, it really needs to inline your functions
- without it, it's so over
- and read this: https://nnethercote.github.io/perf-book
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## how do i make my program compile quickly?
- reduce the amount and size of functions
- importantly: in LLVM IR, not necessarily source!
- duplicate and frequent instantiations are bad
- and read this:
- https://corrode.dev/blog/tips-for-faster-rust-compile-times
- https://doc.rust-lang.org/nightly/cargo/guide/build-performance.html
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and both? 🥺👉👈
- no
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and why did `#[inline]` make the error go away?
```rust
pub fn error() {
// vvvvvvvvvvv post-mono error!
let _x: [u8; usize::MAX] = [0; usize::MAX];
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## happy compiling
slides at <a href="https://noratrieb.dev/slides/">https://noratrieb.dev/slides/</a>
<img src="seaslug.png" alt="sea slug with a tada emoji">
</textarea>
</section>
</div>
</div>
<script src="../dist/reveal.js"></script>
<script src="../plugin/notes/notes.js"></script>
<script src="../plugin/markdown/markdown.js"></script>
<script src="../plugin/highlight/highlight.js"></script>
<script src="./reveal.js-mermaid-plugin_11-6-0.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// Learn about plugins: https://revealjs.com/plugins/
plugins: [RevealMarkdown, RevealHighlight, RevealNotes, RevealMermaid],
});
</script>
</body>
</html>