website/slides/2025-10-10-how-rust-compiles/index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta
      name="viewport"
      content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"
    />

    <title>how Rust compiles</title>

    <link rel="stylesheet" href="../dist/reset.css" />
    <link rel="stylesheet" href="../dist/reveal.css" />
    <link rel="stylesheet" href="../dist/theme/black.css" />

    <link rel="stylesheet" type="text/css" href="asciinema-player.css" />
    <script src="asciinema-player.min.js"></script>

    <!-- Theme used for syntax highlighted code -->
    <link rel="stylesheet" href="../plugin/highlight/monokai.css" />
    <style>
      * {
        --r-heading-text-transform: initial;
      }
    </style>
  </head>
  <body>
    <div class="reveal">
      <div class="slides">
        <section>
          <h2>how Rust compiles</h2>
          <h4>i promise it's actually doing something useful while you wait</h4>
        </section>
        <section style="height: 100%">
          <div style="display: flex; align-items: flex-start; height: 100%">
            <details>
              <summary>the Rust compilation model has surprising effects</summary>
              <iframe
                height="600"
                width="800"
                src="https://play.integer32.com/?version=stable&mode=debug&edition=2024&gist=7ac62650fd0b942ae5952b0027e0c1ce"
                referrerpolicy="no-referrer"
              ></iframe>
            </details>
          </div>
        </section>
        <section>
          <h2>introduction to myself</h2>
          <div style="display: flex">
            <div>
              <div>Noratrieb (she/her)</div>
              contributing to the compiler since 2021
              <br />
            </div>
            <div>
              <img src="me.png" height="300" />
            </div>
          </div>
        </section>
        <section data-markdown>
          <textarea data-template>
            # speed 🚀

            - runtime performance
            - compile times
          </textarea>
        </section>
        <section>
          <h2>what does rustc like, do?</h2>
          <h4>a quick overview of the compilation phases</h4>
        </section>
        <section>
          <h2>the frontend and the backend</h2>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph compiler
                  frontend --> backend
                end
                source --> frontend
                backend --> binary
            </pre>
          </div>
        </section>
        <section>
          <h2>it all starts at the source</h2>
          <pre><code data-trim class="language-rust">
              pub fn add(a: u8, b: u8) -> u8 {
                a.wrapping_add(b)
              }
            </code></pre>
        </section>
        <section>
          <h2>until it doesn't even look like Rust anymore</h2>
          <p>MIR</p>
          <img src="add-runtime-mir.svg" />
        </section>
        <section data-markdown>
          <textarea data-template>
          ## further going to LLVM IR

          ```
          ; meow::add
          define noundef i8 @add(i8 noundef %a, i8 noundef %b) #0 {
          start:
            %_0 = add i8 %b, %a
            ret i8 %_0
          }
          ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
          ## and then you're done

          ```
          &lt;add&gt;:
          lea    (%rsi,%rdi,1),%eax
          ret
          ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
          ## ok but why does my program compile so slowly now?
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
          ## it's often LLVMs fault

          - but like not really

          <br>
          <img alt="output of cargo build --timings, showing blocks of blue and purple bars of roughly equal size" src="cargo-timings-debug-ra.png">

          `cargo build --timings`
          </textarea>
        </section>
        <section>
          <!-- cargo build -v -j1 -->
          <div id="cargo-build-v-asciinema-player"></div>
          <script>
            AsciinemaPlayer.create(
              "cargo-build-v.cast",
              document.getElementById("cargo-build-v-asciinema-player"),
              {
                cols: 134,
                rows: 36,
              }
            );
          </script>
        </section>
        <section>
          <h2>a crate - the compilation unit</h2>
          <p>quite big</p>
          <p>in C it's just a single file</p>
        </section>
        <section>
          <h2>a codegen unit</h2>
          <p>LLVM is single-threaded</p>
          <p>rustc: hi LLVM, look we are like a C file, now be fast</p>
          <p>~1-256 depending on size and configuration</p>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                crate

                crate --> cgu1["Codegen-Unit 1"]
                crate --> cgu2["Codegen-Unit 2"]
                crate --> cgu3["Codegen-Unit 3"]
            </pre>
          </div>
        </section>
        <section>
          <h2>codegen units</h2>
          <pre><code data-trim class="language-rust">
fn main() {}
          </code></pre>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                mainmir["main (MIR)"]

                subgraph mycgu1[my CGU 1]
                  mainll["main (LLVM IR)"]
                end

                mycgu1 --> |LLVM| mycgu1.rcgu.o

                mainmir --> mainll

                mycgu1.rcgu.o --> |link| my_binary
                std["std (and others)"] --> |link| my_binary
            </pre>
          </div>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## the linker

            can be a slow part for incremental builds

            - LLD (Linux (default for x86-64), Windows)
            - [mold (Linux)](https://github.com/rui314/mold)
            - [wild (Linux, experimental)](https://github.com/davidlattimore/wild)
            - MacOS default ld64 is fast already
          </textarea>
        </section>
        <section>
          <h2>codegen units (but more)</h2>
          <pre><code data-trim class="language-rust">
fn main() {}
mod foos {
  fn foo1() {}
  fn foo2() {}
}
          </code></pre>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                mainmir["main (MIR)"]
                foo1mir["foo1 (MIR)"]
                foo2mir["foo2 (MIR)"]

                subgraph mycgu1[my CGU 1]
                  mainll["main (LLVM IR)"]
                end
                subgraph mycgu2[my CGU 2]
                  foo1ll["foo1 (LLVM IR)"]
                  foo2ll["foo2 (LLVM IR)"]
                end

                mycgu1 --> mycgu1.rcgu.o
                mycgu2 --> mycgu2.rcgu.o

                mainmir --> mainll
                foo1mir --> foo1ll
                foo2mir --> foo2ll

                mycgu1.rcgu.o --> my_binary
                mycgu2.rcgu.o --> my_binary
                std["std (and others)"] --> my_binary
            </pre>
          </div>
        </section>
        <section>
          <h2>codegen units (cross-crate)</h2>
          <div style="display: flex; flex-direction: row; gap: 16px">
            <pre><code data-trim class="language-rust">
fn add() {}
          </code></pre>
            <pre><code>
fn main() { math::add() }
          </code></pre>
          </div>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph crate math
                  addmir["add (MIR)"]

                  subgraph mathcgu1[math CGU 1]
                    addll["add (LLVM IR)"]
                  end

                  addmir --> addll

                  mathcgu1 --> mathcgu1.rcgu.o

                  mathcgu1.rcgu.o --> libmath.rlib
                end

                subgraph my crate
                  mainmir["main (MIR)"]

                  subgraph mycgu1[my CGU 1]
                    mainll["main (LLVM IR)"]
                  end

                  mycgu1 --> mycgu1.rcgu.o

                  mainmir --> mainll

                  mycgu1.rcgu.o --> my_binary
                  libmath.rlib --> my_binary
                  std["std (and others)"] --> my_binary
                end

                style mainmir fill:purple
                style mycgu1 fill:purple
                style mainll fill:purple
                style mycgu1.rcgu.o fill:purple

                style addmir fill:darkgreen
                style mathcgu1 fill:darkgreen
                style addll fill:darkgreen
                style mathcgu1.rcgu.o fill:darkgreen
                style libmath.rlib fill:darkgreen
            </pre>
          </div>
        </section>
        <section data-markdown>
          <textarea data-template>
            # so compile times just depend on the amount of functions?

            - yes...
            - but not source functions!
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## generics

            ```rust
            fn add<T: Add>(a: T, b: T) -> T::Output {
              a + b
            }

            add(0_u16, 0_u16); // creates add<u16> function
            add(0_u32, 0_u32); // creates add<u32> function
            ```

            - monomorphization, creating a copy for each type it is used with
          </textarea>
        </section>
        <section>
          <h2>instantiating generics</h2>
          <pre><code data-trim class="language-rust">
            fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output { a + b }
            fn main() {
              add(0_u16, 0_u16);
              add(0_u32, 0_u32);
            }
          </code></pre>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                addmir["add (MIR)"]
                useitmir["main (MIR)"]

                subgraph mycgu1[my CGU 1]
                  addu16ll["add_u16 (LLVM IR)"]
                  addu32ll["add_u32 (LLVM IR)"]
                  useitll["main (LLVM IR)"]
                end

                mycgu1 --> mycgu1.rcgu.o

                addmir -->|instantiate with T=u16| addu16ll
                addmir -->|instantiate with T=u32| addu32ll
                useitmir --> useitll

                mycgu1.rcgu.o --> my_binary
                std["std (and others)"] --> my_binary
            </pre>
          </div>
        </section>
        <section>
          <h2>generics (cross-crate)</h2>
          <div style="display: flex; flex-direction: row; gap: 16px">
            <pre><code data-trim class="language-rust">
            pub fn add&lt;T: Add&gt;(a: T, b: T) -> T::Output {
              a + b
            }
          </code></pre>
            <pre><code data-trim class="language-rust">
            fn main() {
              math::add(0_u16, 0_u16);
              math::add(0_u32, 0_u32);
            }
          </code></pre>
          </div>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph crate math
                  addmir["add (MIR)"]
                end

                subgraph my crate
                  mainmir["main (MIR)"]
                end

                subgraph my crate
                  subgraph mycgu1[my CGU 1]
                    addu16ll["add_u16 (LLVM IR)"]
                    addu32ll["add_u32 (LLVM IR)"]
                    mainll["main (LLVM IR)"]
                  end

                  mainmir --> mainll
                  addmir --> addu16ll
                  addmir --> addu32ll
                  mycgu1 --> mycgu1.rcgu.o
                  mycgu1.rcgu.o --> my_binary
                  std["std (and others)"] --> my_binary
                end

                style addmir fill:darkgreen
                style mainmir fill:purple
                style addu16ll fill:darkgreen
                style addu32ll fill:darkgreen
                style mainll fill:purple
                style mycgu1.rcgu.o fill:purple
            </pre>
          </div>
        </section>
        <section data-markdown>
          <textarea data-template>
            # generics are slow to compile

            - spend N times optimizing the function
            - and there's duplicate instances!
            - share-generics helps for non-release builds
            - `cargo-llvm-lines`
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            # and the duplicates get worse
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## inlining

            ```rust
            fn add(a: u8, b: u8) -> u8 {
              a + b
            }

            fn main() {
              let x = add(1, 4);
              println!("{x}");
            }
            ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## let's inline it

            ```rust
            fn add(a: u8, b: u8) -> u8 {
              a + b
            }

            fn main() {
              let x = {
                let a = 1;
                let b = 4;
                a + b
              };
              println!("{x}");
            }
            ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## cross-crate inlining

            ```rust
            pub fn add(a: u8, b: u8) -> u8 {
              a + b
            }
            ```
            ```rust
            fn main() {
              let x = math::add(1, 4); // what is the body?...
              println!("{x}");
            }
            ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## #[inline] to the rescue

            ```rust
            #[inline]
            pub fn add(a: u8, b: u8) -> u8 {
              a + b
            }
            ```
            ```rust
            fn main() {
              let x = math::add(1, 4); // 💡 it's a + b
              println!("{x}");
            }
            ```
          </textarea>
        </section>
        <section>
          <div style="display: flex; flex-direction: row; gap: 16px">
            <pre><code data-trim class="language-rust">
            #[inline]
            pub fn add(a: u8, b: u8) -> u8 {
              a + b
            }
          </code></pre>
            <pre><code data-trim class="language-rust">
            fn main() {
              let x = math::add(1, 4);
              println!("{x}");
            }
          </code></pre>
          </div>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph crate math
                  addmir["add (MIR)"]
                end

                subgraph my crate
                  mainmir["main (MIR)"]
                end

                subgraph my crate
                  subgraph mycgu1[my CGU 1]
                    addll["add (LLVM IR)"]
                    mainll["main (LLVM IR)"]
                  end

                  mainmir --> mainll
                  addmir --> addll
                  mycgu1 --> mycgu1.rcgu.o
                  mycgu1.rcgu.o --> my_binary
                  std["std (and others)"] --> my_binary
                end

                style addmir fill:purple
                style mainmir fill:darkgreen
                style addll fill:darkgreen
                style mainll fill:purple
                style mycgu1.rcgu.o fill:purple
            </pre>
          </div>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## `#[inline]`
            - for non-generic functions
            - for very small functions, this happens automatically
            - for other functions, it doesn't, because it would be slow
            - don't over-apply it in a library, but also don't forget about it
            - benchmark!
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## being lazy has advantages

            - that's why i wrote most of this talk last week
            - `#[inline]` means that the function is *never* instantiated if it's never used!

            https://blog.rust-lang.org/inside-rust/2025/07/15/call-for-testing-hint-mostly-unused/
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## but performance is great, i love performance

            - its ok i can wait forever
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## link-time optimization (LTO)

            - optimizes everything in your program together at the end
            - breaks crate boundaries
            - is slow
            - comes in many forms
          </textarea>
        </section>
        <section>
          <h2>lto = "fat" (monolithic)</h2>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph crate math
                  addmir["add (LLVM IR)"]
                end
                subgraph crate math2
                  submir["sub (LLVM IR)"]
                end

                subgraph my crate
                  mainmir["main (LLVM IR)"]
                end

                subgraph my crate
                  subgraph fatlto[fat LTO]
                    addll["add (LLVM IR)"]
                    subll["sub (LLVM IR)"]
                    mainll["main (LLVM IR)"]
                  end

                  mainmir --> mainll
                  addmir --> addll
                  submir --> subll

                  fatlto --> my_binary
                end

                style addmir fill:purple
                style addll fill:purple
                style submir fill:darkblue
                style subll fill:darkblue
                style mainmir fill:darkgreen
                style mainll fill:darkgreen
            </pre>
          </div>
          <p>easily compiles 2-4x more slowly</p>
        </section>
        <section>
          <h2>lto = "thin" (sharded)</h2>
          <div class="mermaid">
            <pre>
              %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true, 'fontSize': '25px' }}}%%
              flowchart LR
                subgraph crate math
                  addmir["add (LLVM IR)"]
                end
                subgraph crate math2
                  submir["sub (LLVM IR)"]
                end

                subgraph my crate
                  mainmir["main (LLVM IR)"]
                end

                subgraph my crate
                  subgraph thinltosummary[ThinLTO Index]
                  end

                  subgraph thinlto1[ThinLTO 1]
                    addll["add (LLVM IR)"]
                  end
                  subgraph thinlto2[ThinLTO 2]
                    subll["sub (LLVM IR)"]
                  end
                  subgraph thinlto3[ThinLTO 3]
                    mainll["main (LLVM IR)"]
                  end

                  mainmir --> thinltosummary
                  addmir --> thinltosummary
                  submir --> thinltosummary

                  thinltosummary --> mainll
                  thinltosummary --> addll
                  thinltosummary --> subll

                  thinlto1 --> my_binary
                  thinlto2 --> my_binary
                  thinlto3 --> my_binary
                end

                style mainmir fill:purple
                style addmir fill:darkgreen
                style submir fill:darkblue
                style addll fill:darkgreen
                style subll fill:darkblue
                style mainll fill:purple
            </pre>
          </div>
          <p>
            compiles ~1.1x-1.2x more slowly |
            <a href="https://www.youtube.com/watch?v=p9nH2vZ2mNo">ThinLTO Talk</a>
          </p>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## linker-plugin-lto

            - fat LTO style and thin LTO style
            - the merging is not done by rustc but by the linker
            - works across languages (Rust (rustc) + C (clang))
            - great for FFI
            - annoying to set up
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## there was some LTO all along

            - in release mode, automatic ThinLTO across codegen units in the same crate
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## how do i make my program run quickly?

            - let the compiler inline functions
              - libraries: remember `#[inline]`
              - binaries: you want LTO
            - really, it really needs to inline your functions
              - without it, it's so over
            - and read this: https://nnethercote.github.io/perf-book
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## how do i make my program compile quickly?

            - reduce the amount and size of functions
              - importantly: in LLVM IR, not necessarily source!
              - duplicate and frequent instantiations are bad
            - and read this:
              - https://corrode.dev/blog/tips-for-faster-rust-compile-times
              - https://doc.rust-lang.org/nightly/cargo/guide/build-performance.html
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## and both? 🥺👉👈

            - no
            - at least not at once
            - debug/release
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## and why did `#[inline]` make the error go away?

            ```rust
            pub fn error() {
                //           vvvvvvvvvvv post-mono error!
                let _x: [u8; usize::MAX] = [0; usize::MAX];
            }
            ```
          </textarea>
        </section>
        <section data-markdown>
          <textarea data-template>
            ## happy compiling

            slides at <a href="https://noratrieb.dev/slides/">https://noratrieb.dev/slides/</a>

            <img src="seaslug.png" alt="sea slug with a tada emoji">
          </textarea>
        </section>
      </div>
    </div>

    <script src="../dist/reveal.js"></script>
    <script src="../plugin/notes/notes.js"></script>
    <script src="../plugin/markdown/markdown.js"></script>
    <script src="../plugin/highlight/highlight.js"></script>

    <script src="./reveal.js-mermaid-plugin_11-6-0.js"></script>

    <script>
      // More info about initialization & config:
      // - https://revealjs.com/initialization/
      // - https://revealjs.com/config/
      Reveal.initialize({
        hash: true,

        // Learn about plugins: https://revealjs.com/plugins/
        plugins: [RevealMarkdown, RevealHighlight, RevealNotes, RevealMermaid],
      });
    </script>
  </body>
</html>