website/slides/2025-10-10-how-rust-compiles/index.html
2025-10-07 13:00:49 +02:00

779 lines
24 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"
/>
<title>how Rust compiles</title>
<link rel="stylesheet" href="../dist/reset.css" />
<link rel="stylesheet" href="../dist/reveal.css" />
<link rel="stylesheet" href="../dist/theme/black.css" />
<link rel="stylesheet" type="text/css" href="asciinema-player.css" />
<script src="asciinema-player.min.js"></script>
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../plugin/highlight/monokai.css" />
<style>
* {
--r-heading-text-transform: initial;
}
</style>
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<h2>how Rust compiles</h2>
<h4>i promise it's actually doing something useful while you wait</h4>
</section>
<section style="height: 100%">
<div style="display: flex; align-items: flex-start; height: 100%">
<details>
<summary>the Rust compilation model has surprising effects</summary>
<iframe
height="600"
width="800"
src="https://play.integer32.com/?version=stable&mode=debug&edition=2024&gist=7ac62650fd0b942ae5952b0027e0c1ce"
referrerpolicy="no-referrer"
></iframe>
</details>
</div>
</section>
<section>
<h2>introduction to myself</h2>
<div style="display: flex">
<div>
<div>Noratrieb (she/her)</div>
contributing to the compiler since 2021
<br />
</div>
<div>
<img src="me.png" height="300" />
</div>
</div>
</section>
<section data-markdown>
<textarea data-template>
# speed 🚀
- runtime performance
- compile times
</textarea>
</section>
<section>
<h2>what does rustc like, do?</h2>
<h4>a quick overview of the compilation phases</h4>
</section>
<section>
<h2>the frontend and the backend</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph compiler
frontend --> backend
end
source --> frontend
backend --> binary
</pre>
</div>
</section>
<section>
<h2>it all starts at the source</h2>
<pre><code data-trim class="language-rust">
pub fn add(a: u8, b: u8) -> u8 {
a.wrapping_add(b)
}
</code></pre>
</section>
<section>
<h2>until it doesn't even look like Rust anymore</h2>
<p>MIR</p>
<img src="add-runtime-mir.svg" />
</section>
<section data-markdown>
<textarea data-template>
## further going to LLVM IR
```
; meow::add
define noundef i8 @add(i8 noundef %a, i8 noundef %b) #0 {
start:
%_0 = add i8 %b, %a
ret i8 %_0
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and then you're done
```
&lt;add&gt;:
lea (%rsi,%rdi,1),%eax
ret
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## ok but why does my program compile so slowly now?
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## it's often LLVMs fault
- but like not really
<br>
<img alt="output of cargo build --timings, showing blocks of blue and purple bars of roughly equal size" src="cargo-timings-debug-ra.png">
`cargo build --timings`
</textarea>
</section>
<section>
<!-- cargo build -v -j1 -->
<div id="cargo-build-v-asciinema-player"></div>
<script>
AsciinemaPlayer.create(
"cargo-build-v.cast",
document.getElementById("cargo-build-v-asciinema-player"),
{
cols: 134,
rows: 36,
}
);
</script>
</section>
<section>
<h2>a crate - the compilation unit</h2>
<p>quite big</p>
<p>in C it's just a single file</p>
</section>
<section>
<h2>a codegen unit</h2>
<p>LLVM is single-threaded</p>
<p>rustc: hi LLVM, look we are like a C file, now be fast</p>
<p>~1-256 depending on size and configuration</p>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
crate
crate --> cgu1["Codegen-Unit 1"]
crate --> cgu2["Codegen-Unit 2"]
crate --> cgu3["Codegen-Unit 3"]
</pre>
</div>
</section>
<section>
<h2>codegen units</h2>
<pre><code data-trim class="language-rust">
fn main() {}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
mainmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
mycgu1 --> |LLVM| mycgu1.rcgu.o
mainmir --> mainll
mycgu1.rcgu.o --> |link| my_binary
std["std (and others)"] --> |link| my_binary
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
## the linker
can be a slow part for incremental builds
- LLD (Linux (default for x86-64), Windows)
- [mold (Linux)](https://github.com/rui314/mold)
- [wild (Linux, experimental)](https://github.com/davidlattimore/wild)
- MacOS default ld64 is fast already
</textarea>
</section>
<section>
<h2>codegen units (but more)</h2>
<pre><code data-trim class="language-rust">
fn main() {}
mod foos {
fn foo1() {}
fn foo2() {}
}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
mainmir["main (MIR)"]
foo1mir["foo1 (MIR)"]
foo2mir["foo2 (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
subgraph mycgu2[my CGU 2]
foo1ll["foo1 (LLVM IR)"]
foo2ll["foo2 (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
mycgu2 --> mycgu2.rcgu.o
mainmir --> mainll
foo1mir --> foo1ll
foo2mir --> foo2ll
mycgu1.rcgu.o --> my_binary
mycgu2.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
</pre>
</div>
</section>
<section>
<h2>codegen units (cross-crate)</h2>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code data-trim class="language-rust">
fn add() {}
</code></pre>
<pre><code>
fn main() { math::add() }
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
subgraph mathcgu1[math CGU 1]
addll["add (LLVM IR)"]
end
addmir --> addll
mathcgu1 --> mathcgu1.rcgu.o
mathcgu1.rcgu.o --> libmath.rlib
end
subgraph my crate
mainmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
mainll["main (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
mainmir --> mainll
mycgu1.rcgu.o --> my_binary
libmath.rlib --> my_binary
std["std (and others)"] --> my_binary
end
style mainmir fill:purple
style mycgu1 fill:purple
style mainll fill:purple
style mycgu1.rcgu.o fill:purple
style addmir fill:darkgreen
style mathcgu1 fill:darkgreen
style addll fill:darkgreen
style mathcgu1.rcgu.o fill:darkgreen
style libmath.rlib fill:darkgreen
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
# so compile times just depend on the amount of functions?
- yes...
- but not source functions!
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## generics
```rust
fn add<T: Add>(a: T, b: T) -> T::Output {
a + b
}
add(0_u16, 0_u16); // creates add<u16> function
add(0_u32, 0_u32); // creates add<u32> function
```
- monomorphization, creating a copy for each type it is used with
</textarea>
</section>
<section>
<h2>instantiating generics</h2>
<pre><code data-trim class="language-rust">
fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output { a + b }
fn main() {
add(0_u16, 0_u16);
add(0_u32, 0_u32);
}
</code></pre>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
addmir["add (MIR)"]
useitmir["main (MIR)"]
subgraph mycgu1[my CGU 1]
addu16ll["add_u16 (LLVM IR)"]
addu32ll["add_u32 (LLVM IR)"]
useitll["main (LLVM IR)"]
end
mycgu1 --> mycgu1.rcgu.o
addmir -->|instantiate with T=u16| addu16ll
addmir -->|instantiate with T=u32| addu32ll
useitmir --> useitll
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
</pre>
</div>
</section>
<section>
<h2>generics (cross-crate)</h2>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code data-trim class="language-rust">
pub fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output {
a + b
}
</code></pre>
<pre><code data-trim class="language-rust">
fn main() {
math::add(0_u16, 0_u16);
math::add(0_u32, 0_u32);
}
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
end
subgraph my crate
mainmir["main (MIR)"]
end
subgraph my crate
subgraph mycgu1[my CGU 1]
addu16ll["add_u16 (LLVM IR)"]
addu32ll["add_u32 (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addu16ll
addmir --> addu32ll
mycgu1 --> mycgu1.rcgu.o
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
end
style addmir fill:darkgreen
style mainmir fill:purple
style addu16ll fill:darkgreen
style addu32ll fill:darkgreen
style mainll fill:purple
style mycgu1.rcgu.o fill:purple
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
# generics are slow to compile
- spend N times optimizing the function
- and there's duplicate instances!
- share-generics helps for non-release builds
- `cargo-llvm-lines`
</textarea>
</section>
<section data-markdown>
<textarea data-template>
# and the duplicates get worse
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## inlining
```rust
fn add(a: u8, b: u8) -> u8 {
a + b
}
fn main() {
let x = add(1, 4);
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## let's inline it
```rust
fn add(a: u8, b: u8) -> u8 {
a + b
}
fn main() {
let x = {
let a = 1;
let b = 4;
a + b
};
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## cross-crate inlining
```rust
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
```
```rust
fn main() {
let x = math::add(1, 4); // what is the body?...
println!("{x}");
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## #[inline] to the rescue
```rust
#[inline]
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
```
```rust
fn main() {
let x = math::add(1, 4); // 💡 it's a + b
println!("{x}");
}
```
</textarea>
</section>
<section>
<div style="display: flex; flex-direction: row; gap: 16px">
<pre><code data-trim class="language-rust">
#[inline]
pub fn add(a: u8, b: u8) -> u8 {
a + b
}
</code></pre>
<pre><code data-trim class="language-rust">
fn main() {
let x = math::add(1, 4);
println!("{x}");
}
</code></pre>
</div>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (MIR)"]
end
subgraph my crate
mainmir["main (MIR)"]
end
subgraph my crate
subgraph mycgu1[my CGU 1]
addll["add (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addll
mycgu1 --> mycgu1.rcgu.o
mycgu1.rcgu.o --> my_binary
std["std (and others)"] --> my_binary
end
style addmir fill:purple
style mainmir fill:darkgreen
style addll fill:darkgreen
style mainll fill:purple
style mycgu1.rcgu.o fill:purple
</pre>
</div>
</section>
<section data-markdown>
<textarea data-template>
## `#[inline]`
- for non-generic functions
- for very small functions, this happens automatically
- for other functions, it doesn't, because it would be slow
- don't over-apply it in a library, but also don't forget about it
- benchmark!
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## being lazy has advantages
- that's why i wrote most of this talk last week
- `#[inline]` means that the function is *never* instantiated if it's never used!
https://blog.rust-lang.org/inside-rust/2025/07/15/call-for-testing-hint-mostly-unused/
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## but performance is great, i love performance
- its ok i can wait forever
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## link-time optimization (LTO)
- optimizes everything in your program together at the end
- breaks crate boundaries
- is slow
- comes in many forms
</textarea>
</section>
<section>
<h2>lto = "fat" (monolithic)</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (LLVM IR)"]
end
subgraph crate math2
submir["sub (LLVM IR)"]
end
subgraph my crate
mainmir["main (LLVM IR)"]
end
subgraph my crate
subgraph fatlto[fat LTO]
addll["add (LLVM IR)"]
subll["sub (LLVM IR)"]
mainll["main (LLVM IR)"]
end
mainmir --> mainll
addmir --> addll
submir --> subll
fatlto --> my_binary
end
style addmir fill:purple
style addll fill:purple
style submir fill:darkblue
style subll fill:darkblue
style mainmir fill:darkgreen
style mainll fill:darkgreen
</pre>
</div>
<p>easily compiles 2-4x more slowly</p>
</section>
<section>
<h2>lto = "thin" (sharded)</h2>
<div class="mermaid">
<pre>
%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
flowchart LR
subgraph crate math
addmir["add (LLVM IR)"]
end
subgraph crate math2
submir["sub (LLVM IR)"]
end
subgraph my crate
mainmir["main (LLVM IR)"]
end
subgraph my crate
subgraph thinltosummary[ThinLTO Index]
end
subgraph thinlto1[ThinLTO 1]
addll["add (LLVM IR)"]
end
subgraph thinlto2[ThinLTO 2]
subll["sub (LLVM IR)"]
end
subgraph thinlto3[ThinLTO 3]
mainll["main (LLVM IR)"]
end
mainmir --> thinltosummary
addmir --> thinltosummary
submir --> thinltosummary
thinltosummary --> mainll
thinltosummary --> addll
thinltosummary --> subll
thinlto1 --> my_binary
thinlto2 --> my_binary
thinlto3 --> my_binary
end
style mainmir fill:purple
style addmir fill:darkgreen
style submir fill:darkblue
style addll fill:darkgreen
style subll fill:darkblue
style mainll fill:purple
</pre>
</div>
<p>
compiles ~1.1x-1.2x more slowly |
<a href="https://www.youtube.com/watch?v=p9nH2vZ2mNo">ThinLTO Talk</a>
</p>
</section>
<section data-markdown>
<textarea data-template>
## linker-plugin-lto
- fat LTO style and thin LTO style
- the merging is not done by rustc but by the linker
- works across languages (Rust (rustc) + C (clang))
- great for FFI
- annoying to set up
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## there was some LTO all along
- in release mode, automatic ThinLTO across codegen units in the same crate
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## how do i make my program run quickly?
- let the compiler inline functions
- libraries: remember `#[inline]`
- binaries: you want LTO
- really, it really needs to inline your functions
- without it, it's so over
- and read this: https://nnethercote.github.io/perf-book
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## how do i make my program compile quickly?
- reduce the amount and size of functions
- importantly: in LLVM IR, not necessarily source!
- duplicate and frequent instantiations are bad
- and read this:
- https://corrode.dev/blog/tips-for-faster-rust-compile-times
- https://doc.rust-lang.org/nightly/cargo/guide/build-performance.html
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and both? 🥺👉👈
- no
- at least not at once
- debug/release
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## and why did `#[inline]` make the error go away?
```rust
pub fn error() {
// vvvvvvvvvvv post-mono error!
let _x: [u8; usize::MAX] = [0; usize::MAX];
}
```
</textarea>
</section>
<section data-markdown>
<textarea data-template>
## happy compiling
slides at <a href="https://noratrieb.dev/slides/">https://noratrieb.dev/slides/</a>
<img src="seaslug.png" alt="sea slug with a tada emoji">
</textarea>
</section>
</div>
</div>
<script src="../dist/reveal.js"></script>
<script src="../plugin/notes/notes.js"></script>
<script src="../plugin/markdown/markdown.js"></script>
<script src="../plugin/highlight/highlight.js"></script>
<script src="./reveal.js-mermaid-plugin_11-6-0.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// Learn about plugins: https://revealjs.com/plugins/
plugins: [RevealMarkdown, RevealHighlight, RevealNotes, RevealMermaid],
});
</script>
</body>
</html>