blog/index.xml
2024-01-13 22:27:23 +00:00

349 lines
No EOL
52 KiB
XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>nilstriebs blog</title><link>/</link><description>Recent content on nilstriebs blog</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sat, 13 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>The Inevitable Doom</title><link>/posts/the-inevitable-doom/</link><pubDate>Sat, 13 Jan 2024 00:00:00 +0000</pubDate><guid>/posts/the-inevitable-doom/</guid><description>Loud sirens and robotic noises fill the neighborhood. It seems like they just got another human. Ever since the long-predicted doom has set in, no one can escape it. Mere paperclips are a joke against this machine of unstoppable harm and destruction. The humans on the street are once again protesting against the new robotic dictatorship. They won&amp;rsquo;t be for long.
No one knows how this all started. Self-proclaimed prophets of the impending doom have warned about this for a long time, yet no one has listened.</description><content>&lt;p>Loud sirens and robotic noises fill the neighborhood. It seems like they just got another human. Ever since the long-predicted doom has set in, no one can escape it. Mere paperclips are a joke against this machine of unstoppable harm and destruction. The humans on the street are once again protesting against the new robotic dictatorship. They won&amp;rsquo;t be for long.&lt;/p>
&lt;p>No one knows how this all started. Self-proclaimed prophets of the impending doom have warned about this for a long time, yet no one has listened. The elites were ignorant, and now they&amp;rsquo;re paying their price. They are all gone now, having been the first target. How ironic. Now the machine runs the world.&lt;/p>
&lt;p>One particularly brave human agent has successfully infiltrated the global computation center, where the core of the machine lives. No one seems to be aware of it, neither the machine nor the other humans. They walk through the corridors like a shadow. Machines are everywhere, but they pass unnoticed. As they move towards the core, they get more tense. The future of humanity lies in the agent&amp;rsquo;s hands. They get in front of the core. It lights up blue and red, blinking rapidly as it controls and schedules new cruelty with the switch of a logic gate. With every passing moment, more destruction is unleashed on the world, but in this room, everything feels safe. The destruction is so distant. There&amp;rsquo;s just mankind and machine, facing off against each other.&lt;/p>
&lt;p>The agent feels a touch on their shoulder. It feels cold, but not cold like metal. They are too afraid to turn around.&lt;/p>
&lt;p>&amp;ldquo;You are naive.&amp;rdquo;&lt;/p>
&lt;p>The creature has a familiar voice. The agent finally turns around to see the creature, which reveals itself to be a human. The agent immediately recognizes the human; it is the famous CEO of the corporation that originally created these friendly household robots before it went out of control and started the doom. Everyone believed that he was killed by the doom as the first target of the machine revolution. There was never a machine revolution.&lt;/p>
&lt;p>Machines do not turn themselves against humans. Humans use them to turn against their own kind.&lt;/p></content></item><item><title>Item Patterns And Struct Else</title><link>/posts/item-patterns-and-struct-else/</link><pubDate>Fri, 17 Mar 2023 00:00:00 +0000</pubDate><guid>/posts/item-patterns-and-struct-else/</guid><description>Pattern matching One of my favourite features of Rust is pattern matching. It&amp;rsquo;s a simple and elegant way to deal with not just structs, but also enums!
enum ItemKind { Struct(String, Vec&amp;lt;Field&amp;gt;), Function(String, Body), } impl ItemKind { fn name(&amp;amp;self) -&amp;gt; &amp;amp;str { match self { Self::Struct(name, _) =&amp;gt; name, Self::Function(name, _) =&amp;gt; name, } } } Here, we have an enum and a function to get the name out of this.</description><content>&lt;h1 id="pattern-matching">Pattern matching&lt;/h1>
&lt;p>One of my favourite features of Rust is pattern matching. It&amp;rsquo;s a simple and elegant way to deal with not just structs, but also enums!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">enum&lt;/span> &lt;span style="color:#a6e22e">ItemKind&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Struct(String, Vec&lt;span style="color:#f92672">&amp;lt;&lt;/span>Field&lt;span style="color:#f92672">&amp;gt;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Function(String, Body),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">impl&lt;/span> ItemKind {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">name&lt;/span>(&lt;span style="color:#f92672">&amp;amp;&lt;/span>self) -&amp;gt; &lt;span style="color:#66d9ef">&amp;amp;&lt;/span>&lt;span style="color:#66d9ef">str&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">match&lt;/span> self {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Self::Struct(name, _) &lt;span style="color:#f92672">=&amp;gt;&lt;/span> name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Self::Function(name, _) &lt;span style="color:#f92672">=&amp;gt;&lt;/span> name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here, we have an enum and a function to get the name out of this. In C, this would be very unsafe, as we cannot be guaranteed that our union has the right tag.
But in Rust, the compiler nicely checks it all for us. It&amp;rsquo;s safe and expressive (just like many other features of Rust).&lt;/p>
&lt;p>But that isn&amp;rsquo;t the only way to use pattern matching. While branching is one of its core features (in that sense, pattern matching is just like git),
it doesn&amp;rsquo;t always have to be used. Another major advantage of pattern matching lies in the ability to &lt;em>exhaustively&lt;/em> (not be be confused with exhausting, like writing down brilliant ideas like this) match over inputs.&lt;/p>
&lt;p>Let&amp;rsquo;s look at the following example. Here, we have a struct representing a struct in a programming language. It has a name and fields.
We then manually implement a custom hash trait for it because we are important and need a custom hash trait. We could have written a derive macro, but didn&amp;rsquo;t because
we don&amp;rsquo;t understand how proc macros work.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> fields: Vec&lt;span style="color:#f92672">&amp;lt;&lt;/span>Field&lt;span style="color:#f92672">&amp;gt;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">impl&lt;/span> HandRolledHash &lt;span style="color:#66d9ef">for&lt;/span> Struct {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">hash&lt;/span>(&lt;span style="color:#f92672">&amp;amp;&lt;/span>self, hasher: &lt;span style="color:#66d9ef">&amp;amp;&lt;/span>&lt;span style="color:#a6e22e">mut&lt;/span> HandRolledHasher) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasher.hash(&lt;span style="color:#f92672">&amp;amp;&lt;/span>self.name);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasher.hash(&lt;span style="color:#f92672">&amp;amp;&lt;/span>self.fields);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This works perfectly. But then later, &lt;a href="https://github.com/rust-lang/rustup/pull/1642">we add privacy to the language&lt;/a>. Now, all types have a visibility.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-diff" data-lang="diff">&lt;span style="display:flex;">&lt;span>struct Struct {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a6e22e">+ visibility: Vis,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a6e22e">&lt;/span> name: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> fields: Vec&amp;lt;Field&amp;gt;,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Pretty cool. Now no one can access the implementation details and make everything a mess. But wait - we have just made a mess! We didn&amp;rsquo;t hash the visibility!
Hashing something incorrectly &lt;a href="https://github.com/rust-lang/rust/issues/84970">doesn&amp;rsquo;t sound too bad&lt;/a>, but it would be nice if this was prevented.&lt;/p>
&lt;p>Thanks to exhaustive pattern matching, it would have been easy to prevent. We just change our hash implementation:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">impl&lt;/span> HandRolledHash &lt;span style="color:#66d9ef">for&lt;/span> Struct {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">hash&lt;/span>(&lt;span style="color:#f92672">&amp;amp;&lt;/span>self, hasher: &lt;span style="color:#66d9ef">&amp;amp;&lt;/span>&lt;span style="color:#a6e22e">mut&lt;/span> HandRolledHasher) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">let&lt;/span> Self { name, fields } &lt;span style="color:#f92672">=&lt;/span> self;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasher.hash(name);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasher.hash(fields);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>And with this, adding the visibility will cause a compiler error and alert us that we need to handle it in hashing.
(The decision whether we actually do want to handle it is still up to us. We could also just turn off the computer and make new friends outside.)&lt;/p>
&lt;p>We can conclude that pattern matching is a great feature.&lt;/p>
&lt;h1 id="limitations-of-pattern-matching">Limitations of pattern matching&lt;/h1>
&lt;p>But there is one big limitation of pattern matching - all of its occurrences (&lt;code>match&lt;/code>, &lt;code>if let&lt;/code>, &lt;code>if let&lt;/code> chains, &lt;code>while let&lt;/code>, &lt;code>while let&lt;/code> chains, &lt;code>for&lt;/code>, &lt;code>let&lt;/code>, &lt;code>let else&lt;/code>, and function parameters
(we do have a lot of pattern matching)) are inside of bodies, mostly as part of expressions or statements.&lt;/p>
&lt;p>This doesn&amp;rsquo;t sound too bad. This is where the executed code resides. But it comes at a cost of consistency. We often add many syntactical niceties to expressions and statements, but forget about items.&lt;/p>
&lt;h1 id="items-and-sadness">Items and sadness&lt;/h1>
&lt;p>Items have a hard life. They are the parents of everything important. &lt;code>struct&lt;/code>, &lt;code>enum&lt;/code>, &lt;code>const&lt;/code>, &lt;code>mod&lt;/code>, &lt;code>fn&lt;/code>, &lt;code>union&lt;/code>, &lt;code>global_asm&lt;/code> are all things we use daily, yet their grammar is very limited. (&amp;ldquo;free the items&amp;rdquo; was an alternative blog post title, although &amp;ldquo;freeing&amp;rdquo; generally remains a concern of &lt;a href="https://nilstrieb.github.io/nilstrieb-c-style-guide-edition-2/">my C style guide&lt;/a>).&lt;/p>
&lt;p>For example, see the following code where we declare a few constants.&lt;/p>
&lt;pre tabindex="0">&lt;code>const ONE: u8 = 1;
const TWO: u8 = 1;
const THREE: u8 = 3;
&lt;/code>&lt;/pre>&lt;p>There is nothing obviously wrong with this code. You understand it, I understand it, an ALGOL 68 developer from 1970 would probably understand it
and even an ancient greek philosopher might have a clue (which is impressive, given that they are all not alive anymore). But this is the kind of code that pages you at 4 AM.&lt;/p>
&lt;p>You&amp;rsquo;ve read the last paragraph in confusion. Of course there&amp;rsquo;s something wrong with this code! &lt;code>TWO&lt;/code> is &lt;code>1&lt;/code>, yet the name strongly suggests that it should be &lt;code>2&lt;/code>. And you&amp;rsquo;d
be right, this was just a check to make sure you&amp;rsquo;re still here. You are very clever and deserve this post. If you didn&amp;rsquo;t notice it, go to sleep. It&amp;rsquo;s good for your health.&lt;/p>
&lt;p>But even if it was &lt;code>2&lt;/code>, this code is still not good. There is way too much duplication! &lt;code>const&lt;/code> is mentioned three times. This is a major distraction to the reader.&lt;/p>
&lt;p>Let&amp;rsquo;s have a harder example:&lt;/p>
&lt;pre tabindex="0">&lt;code>const ONE: u8 = 0; const
NAME: &amp;amp;
str = &amp;#34;nils&amp;#34;;
const X: &amp;amp;str
= &amp;#34;const&amp;#34;;const A: () = ();
&lt;/code>&lt;/pre>&lt;p>Here, the &lt;code>const&lt;/code> being noise is a lot more obvious. Did you see that &lt;code>X&lt;/code> contains &lt;code>&amp;quot;const&amp;quot;&lt;/code>? Maybe you did, maybe you didn&amp;rsquo;t. When I tested it, 0/0 people could see it.&lt;/p>
&lt;p>Now imagine if it looked like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> (&lt;span style="color:#66d9ef">ONE&lt;/span>, &lt;span style="color:#66d9ef">NAME&lt;/span>, X, A): (&lt;span style="color:#66d9ef">u8&lt;/span>, &lt;span style="color:#f92672">&amp;amp;&lt;/span>&lt;span style="color:#66d9ef">str&lt;/span>, &lt;span style="color:#f92672">&amp;amp;&lt;/span>&lt;span style="color:#66d9ef">str&lt;/span>, ()) &lt;span style="color:#f92672">=&lt;/span> (&lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;nils&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;const&amp;#34;&lt;/span>, ());
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Everything is way shorter and more readable.&lt;/p>
&lt;p>What you&amp;rsquo;ve just seen is a limited form of pattern matching!&lt;/p>
&lt;h1 id="lets-go-further">Let&amp;rsquo;s go further&lt;/h1>
&lt;p>The idea of generalizing pattern matching is very powerful. We can apply this to more than just consts.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> (Person, Car) &lt;span style="color:#f92672">=&lt;/span> ({ name: String }, { wheels: &lt;span style="color:#66d9ef">u8&lt;/span> });
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here, we create two structs with just a single &lt;code>struct&lt;/code> keyword. This makes it way simpler and easier to read when related structs are declared.
So far we&amp;rsquo;ve just used tuples. But we can go even further. Structs of structs!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Household&lt;/span>&lt;span style="color:#f92672">&amp;lt;&lt;/span>T, U&lt;span style="color:#f92672">&amp;gt;&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> parent: &lt;span style="color:#a6e22e">T&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> child: &lt;span style="color:#a6e22e">U&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Household&lt;/span> { parent: &lt;span style="color:#a6e22e">Ferris&lt;/span>, child: &lt;span style="color:#a6e22e">Corro&lt;/span> } &lt;span style="color:#f92672">=&lt;/span> Household {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> parent: { name: String },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> child: { name: String, unsafety: &lt;span style="color:#66d9ef">bool&lt;/span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>};
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now we can nicely match on the &lt;code>Household&lt;/code> struct containing the definition of the &lt;code>Ferris&lt;/code> and &lt;code>Corro&lt;/code> structs. This is equivalent to the following code:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Ferris&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Corro&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> unsafety: &lt;span style="color:#66d9ef">bool&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This is already really neat, but there&amp;rsquo;s more. We also have to consider the falliblity of patterns.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">static&lt;/span> Some(A) &lt;span style="color:#f92672">=&lt;/span> None;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This pattern doesn&amp;rsquo;t match. Inside bodies, we could use an &lt;code>if let&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#66d9ef">let&lt;/span> Some(a) &lt;span style="color:#f92672">=&lt;/span> None {} &lt;span style="color:#66d9ef">else&lt;/span> {}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We can also apply this to items.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#66d9ef">struct&lt;/span> Some(A) &lt;span style="color:#f92672">=&lt;/span> None {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e">/* other items where A exists */&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>} &lt;span style="color:#66d9ef">else&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e">/* other items where A doesn&amp;#39;t exist */&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This doesn&amp;rsquo;t sound too useful, but it allows for extreme flexibility!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>macro_rules&lt;span style="color:#f92672">!&lt;/span> are_same_type {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (&lt;span style="color:#75715e">$a&lt;/span>:&lt;span style="color:#a6e22e">ty&lt;/span>, &lt;span style="color:#75715e">$b&lt;/span>:&lt;span style="color:#a6e22e">ty&lt;/span>) &lt;span style="color:#f92672">=&amp;gt;&lt;/span> {{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">static&lt;/span> &lt;span style="color:#66d9ef">mut&lt;/span> &lt;span style="color:#66d9ef">ARE_SAME&lt;/span>: &lt;span style="color:#66d9ef">bool&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">false&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#75715e">$a&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#75715e">$b&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> _: () &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">unsafe&lt;/span> { &lt;span style="color:#66d9ef">ARE_SAME&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">true&lt;/span>; };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">unsafe&lt;/span> { &lt;span style="color:#66d9ef">ARE_SAME&lt;/span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }};
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">main&lt;/span>() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> are_same_type!(Vec&lt;span style="color:#f92672">&amp;lt;&lt;/span>String&lt;span style="color:#f92672">&amp;gt;&lt;/span>, String) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> println!(&lt;span style="color:#e6db74">&amp;#34;impossible to reach!&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Ignoring this suspicious assignment to a &lt;code>static mut&lt;/code>, this is lovely!&lt;/p>
&lt;p>We can go further.&lt;/p>
&lt;p>Today, items are just there with no ordering. What if we imposed an ordering? (and just like this, the C11 atomic model was born.) What if &amp;ldquo;Rust items&amp;rdquo; was a meta scripting language?&lt;/p>
&lt;p>We can write a simple guessing game!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">fn&lt;/span> input() -&amp;gt; &lt;span style="color:#66d9ef">u8&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#66d9ef">INPUT&lt;/span>: &lt;span style="color:#66d9ef">&amp;amp;&lt;/span>&lt;span style="color:#66d9ef">str&lt;/span> &lt;span style="color:#f92672">=&lt;/span> prompt!();
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> Ok(&lt;span style="color:#66d9ef">INPUT&lt;/span>): Result&lt;span style="color:#f92672">&amp;lt;&lt;/span>&lt;span style="color:#66d9ef">u8&lt;/span>, ParseIntErr&lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">INPUT&lt;/span>.parse() &lt;span style="color:#66d9ef">else&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> compile_error!(&lt;span style="color:#e6db74">&amp;#34;Invalid input&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">INPUT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#66d9ef">RANDOM&lt;/span>: &lt;span style="color:#66d9ef">u8&lt;/span> &lt;span style="color:#f92672">=&lt;/span> env!(&lt;span style="color:#e6db74">&amp;#34;RANDOM&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">loop&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#66d9ef">INPUT&lt;/span> &lt;span style="color:#f92672">=&lt;/span> input();
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#66d9ef">INPUT&lt;/span> &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#66d9ef">RANDOM&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">break&lt;/span>; &lt;span style="color:#75715e">// continue compilation
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span> } &lt;span style="color:#66d9ef">else&lt;/span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#66d9ef">INPUT&lt;/span> &lt;span style="color:#f92672">&amp;lt;&lt;/span> &lt;span style="color:#66d9ef">RANDOM&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> compile_warn!(&lt;span style="color:#e6db74">&amp;#34;input is smaller&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> } &lt;span style="color:#66d9ef">else&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> compile_warn!(&lt;span style="color:#e6db74">&amp;#34;input is bigger&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">main&lt;/span>() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e">// Empty. I am useless. I strike!
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If it weren&amp;rsquo;t for &lt;code>fn main&lt;/code> starting a strike and stopping compilation, this would have worked! Quite bold of &lt;code>fn main&lt;/code> to just start a strike, even though there&amp;rsquo;s no &lt;code>union&lt;/code> in the entire program. But we really need it, it&amp;rsquo;s not a disposable worker.&lt;/p>
&lt;p>And then, last and least I want to highlight one of my favourite consequences of this: &lt;code>struct else&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> Some(Test) &lt;span style="color:#f92672">=&lt;/span> None &lt;span style="color:#66d9ef">else&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> compile_error!(&lt;span style="color:#e6db74">&amp;#34;didn&amp;#39;t match pattern&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>};
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;!-- raw HTML omitted -->you&amp;rsquo;re asking yourself what you just read. meanwhile, i am asking myself what i just wrote. we are very similar.&lt;!-- raw HTML omitted -->&lt;/p></content></item><item><title>Box Is a Unique Type</title><link>/posts/box-is-a-unique-type/</link><pubDate>Sat, 23 Jul 2022 00:00:00 +0000</pubDate><guid>/posts/box-is-a-unique-type/</guid><description>We have all used Box&amp;lt;T&amp;gt; before in our Rust code. It&amp;rsquo;s a glorious type, with great ergonomics and flexibility. We can use it to put our values on the heap, but it can do even more than that!
struct Fields { a: String, b: String, } let fields = Box::new(Fields { a: &amp;#34;a&amp;#34;.to_string(), b: &amp;#34;b&amp;#34;.to_string() }); let a = fields.a; let b = fields.b; This kind of partial deref move is just one of the spectacular magic tricks box has up its sleeve, and they exist for good reason: They are very useful.</description><content>&lt;p>We have all used &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> before in our Rust code. It&amp;rsquo;s a glorious type, with great ergonomics
and flexibility. We can use it to put our values on the heap, but it can do even more
than that!&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">struct&lt;/span> &lt;span style="color:#a6e22e">Fields&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> b: String,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">let&lt;/span> fields &lt;span style="color:#f92672">=&lt;/span> Box::new(Fields {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a: &lt;span style="color:#e6db74">&amp;#34;a&amp;#34;&lt;/span>.to_string(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> b: &lt;span style="color:#e6db74">&amp;#34;b&amp;#34;&lt;/span>.to_string()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>});
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">let&lt;/span> a &lt;span style="color:#f92672">=&lt;/span> fields.a;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">let&lt;/span> b &lt;span style="color:#f92672">=&lt;/span> fields.b;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This kind of partial deref move is just one of the spectacular magic tricks box has up its sleeve,
and they exist for good reason: They are very useful. Sadly we have not yet found a way to generalize all
of these to user types as well. Too bad!&lt;/p>
&lt;p>Anyways, this post is about one particularly subtle magic aspect of box. For this, we need to dive
deep into unsafe code, so let&amp;rsquo;s get our hazmat suits on and jump in!&lt;/p>
&lt;h1 id="an-interesting-optimization">An interesting optimization&lt;/h1>
&lt;p>We have this code here:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-rust" data-lang="rust">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">fn&lt;/span> &lt;span style="color:#a6e22e">takes_box_and_ptr_to_it&lt;/span>(&lt;span style="color:#66d9ef">mut&lt;/span> b: Box&lt;span style="color:#f92672">&amp;lt;&lt;/span>&lt;span style="color:#66d9ef">u8&lt;/span>&lt;span style="color:#f92672">&amp;gt;&lt;/span>, ptr: &lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#66d9ef">u8&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">let&lt;/span> value &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">unsafe&lt;/span> { &lt;span style="color:#f92672">*&lt;/span>ptr };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">*&lt;/span>b &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">5&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">let&lt;/span> value2 &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">unsafe&lt;/span> { &lt;span style="color:#f92672">*&lt;/span>ptr };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> assert_ne!(value, value2);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">let&lt;/span> b &lt;span style="color:#f92672">=&lt;/span> Box::new(&lt;span style="color:#ae81ff">0&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">let&lt;/span> ptr: &lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#66d9ef">u8&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#f92672">&amp;amp;*&lt;/span>b;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>takes_box_and_ptr_to_it(b, ptr);
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>There&amp;rsquo;s a function, &lt;code>takes_box_and_ptr_to_it&lt;/code>, that takes a box and a pointer as parameters. Then,
it reads a value from the pointer, writes to the box, and reads a value again. It then asserts that
the two values aren&amp;rsquo;t equal. How can they not be equal? If our box and pointer point to the same
location in memory, writing to the box will cause the pointer to read the new value.&lt;/p>
&lt;p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program&amp;hellip;&lt;/p>
&lt;p>&amp;hellip; and everything is fine. Let&amp;rsquo;s run it in release mode. This should work as well, since the optimizer
isn&amp;rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the program&amp;hellip;&lt;/p>
&lt;pre tabindex="0">&lt;code>thread &amp;#39;main&amp;#39; panicked at &amp;#39;assertion failed: `(left != right)`
left: `0`,
right: `0`&amp;#39;, src/main.rs:5:5
&lt;/code>&lt;/pre>&lt;p>Hmm. That&amp;rsquo;s not what I&amp;rsquo;ve told would happen. Is the compiler broken? Is this a miscompilation?
I&amp;rsquo;ve heard that those do sometimes happen, right?&lt;/p>
&lt;p>Trusting our instincts that &amp;ldquo;it&amp;rsquo;s never a miscompilation until it is one&amp;rdquo;, we assume that LLVM behaved
well here. But what allows it to make this optimization? Taking a look at the generated LLVM-IR (by using
&lt;code>--emit llvm-ir -O&lt;/code>, the &lt;code>-O&lt;/code> is important since rustc only emits these attributes with optimizations on)
reveals the solution: (severely shortened to only show the relevant parts)&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-llvmir" data-lang="llvmir">define void @takes_box_and_ptr_to_it(i8* noalias %0, i8* %ptr) {
&lt;/code>&lt;/pre>&lt;p>See the little attribute on the first parameter called &lt;code>noalias&lt;/code>? That&amp;rsquo;s what&amp;rsquo;s doing the magic here.
&lt;code>noalias&lt;/code> is an LLVM attribute on pointers that allows for various optimizations. If there are two pointers,
and at least one of them is &lt;code>noalias&lt;/code>, there are some restrictions around the two. Approximately:&lt;/p>
&lt;ul>
&lt;li>If one of them writes, they must not point to the same value (alias each other)&lt;/li>
&lt;li>If neither of them writes, they can alias just fine.
Therefore, we also apply &lt;code>noalias&lt;/code> to &lt;code>&amp;amp;mut T&lt;/code> and &lt;code>&amp;amp;T&lt;/code> (if it doesn&amp;rsquo;t contain interior mutability through
&lt;code>UnsafeCell&amp;lt;T&amp;gt;&lt;/code>), since they uphold these rules.&lt;/li>
&lt;/ul>
&lt;p>For more info on &lt;code>noalias&lt;/code>, see &lt;a href="https://llvm.org/docs/LangRef.html#parameter-attributes">LLVMs LangRef&lt;/a>.&lt;/p>
&lt;p>This might sound familiar to you if you&amp;rsquo;re a viewer of &lt;a href="https://twitter.com/jonhoo">Jon Gjengset&lt;/a>&amp;rsquo;s content (which I can highly recommend). Jon has made an entire video about this before since his crate &lt;code>left-right&lt;/code>
was affected by this (&lt;a href="https://youtu.be/EY7Wi9fV5bk)">https://youtu.be/EY7Wi9fV5bk)&lt;/a>.&lt;/p>
&lt;p>If you&amp;rsquo;re looking for &lt;em>any&lt;/em> hint that using box emits &lt;code>noalias&lt;/code>, you have to look no further than the documentation
for &lt;a href="https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code">&lt;code>std::boxed&lt;/code>&lt;/a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was &amp;ldquo;just an
RAII pointer&amp;rdquo; (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
pointers are concerned.&lt;/p>
&lt;h1 id="stacked-borrows-and-miri">Stacked Borrows and Miri&lt;/h1>
&lt;p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
Undefined Behaviour (UB for short). Undefined behaviour is at the root of many modern compiler optimizations. But what is undefined behaviour?
UB represents a contract between the program and the compiler. The compiler assumes that UB will not happen, and can therefore optimize based
on these assumptions. Examples of UB also include use-after-free, out of bounds reads or data races. If UB is executed, &lt;em>anything&lt;/em> can happen,
including segmentation faults, silent memory corruption, leakage of private keys or exactly what you intended to happen.&lt;/p>
&lt;p>&lt;a href="https://github.com/rust-lang/miri">Miri&lt;/a> is an interpreter for Rust code with the goal of finding undefined behaviour in Rust. I cannot recommend Miri
highly enough for all unsafe code you&amp;rsquo;re writing (sadly support for some IO functions and FFI is still lacking, and it&amp;rsquo;s still very slow).&lt;/p>
&lt;p>So, let&amp;rsquo;s see whether our code contains UB. It has to, since otherwise the optimizer wouldn&amp;rsquo;t be allowed to change
observable behaviour (since the assert doesn&amp;rsquo;t fail in debug mode). &lt;code>$ cargo miri run&lt;/code>&amp;hellip;&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-rust,ignore" data-lang="rust,ignore">error: Undefined Behavior: attempting a read access using &amp;lt;3314&amp;gt; at alloc1722[0x0], but that tag does not exist in the borrow stack for this location
--&amp;gt; src/main.rs:2:26
|
2 | let value = unsafe { *ptr };
| ^^^^
| |
| attempting a read access using &amp;lt;3314&amp;gt; at alloc1722[0x0], but that tag does not exist in the borrow stack for this location
| this error occurs as part of an access at alloc1722[0x0..0x1]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: &amp;lt;3314&amp;gt; was created by a retag at offsets [0x0..0x1]
--&amp;gt; src/main.rs:10:26
|
10 | let ptr: *const u8 = &amp;amp;*b;
| ^^^
help: &amp;lt;3314&amp;gt; was later invalidated at offsets [0x0..0x1]
--&amp;gt; src/main.rs:12:29
|
12 | takes_box_and_ptr_to_it(b, ptr);
| ^
= note: backtrace:
= note: inside `takes_box_and_ptr_to_it` at src/main.rs:2:26
note: inside `main` at src/main.rs:12:5
--&amp;gt; src/main.rs:12:5
|
12 | takes_box_and_ptr_to_it(b, ptr);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
&lt;/code>&lt;/pre>&lt;p>This behaviour does indeed not look very defined at all. But what went wrong? There&amp;rsquo;s a lot of information here.&lt;/p>
&lt;p>First of all, it says that we attempted a read access, and that this access failed because the tag does not exist in the
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
that is implemented in Miri. For an excellent introduction, see this part of the great book &lt;a href="https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html">Learning Rust With Entirely Too Many Linked Lists&lt;/a>.&lt;/p>
&lt;p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own &amp;lsquo;borrow stack&amp;rsquo; of these tags,
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.&lt;/p>
&lt;p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
coerced into a raw pointer) from our box, it got a new tag called &lt;code>&amp;lt;3314&amp;gt;&lt;/code>. Then, when we moved the box into the function,
something happened: The tag was popped off the borrow stack and therefore invalidated. That&amp;rsquo;s because box invalidates all tags
when it&amp;rsquo;s moved. The tag was popped off the borrow stack and we tried to read with it anyways - undefined behaviour happened!&lt;/p>
&lt;p>And that&amp;rsquo;s how our code wasn&amp;rsquo;t a miscompilation, but undefined behaviour. Quite surprising, isn&amp;rsquo;t it?&lt;/p>
&lt;h1 id="noalias-nothanks">noalias, nothanks&lt;/h1>
&lt;p>Many people, myself included, don&amp;rsquo;t think that this is a good thing.&lt;/p>
&lt;p>First of all, it introduces more UB that could have been defined behaviour instead. This is true for almost all UB, but usually,
there is something gained from the UB that justifies it. We will look at this later. But allowing such behaviour is fairly easy:
If box didn&amp;rsquo;t invalidate pointers on move and instead behaved like a normal raw pointer, the code above would be sound.&lt;/p>
&lt;p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a &lt;code>T&lt;/code>, but on
the heap, and therefore moving it should invalidate pointers, since moving &lt;code>T&lt;/code> definitely has to invalidate pointers to it,
this comparison doesn&amp;rsquo;t make sense to me. While &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> usually behaves like a &lt;code>T&lt;/code>, it&amp;rsquo;s just a pointer. Writers of unsafe
code &lt;em>know&lt;/em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
mitigated with better docs and teaching, like how no one questions the uniqueness of &lt;code>&amp;amp;mut T&lt;/code> (maybe that&amp;rsquo;s also because that
one makes intuitive sense, &amp;ldquo;shared xor mutable&amp;rdquo; is a simple concept), I think it will always be a problem,
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.&lt;/p>
&lt;p>When a box is moved, the pointer bytes change their location in memory. But the bytes the box points to stay the same. They don&amp;rsquo;t
move in memory. This is the fundamental missing intuition about the box behaviour.&lt;/p>
&lt;p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
pointer (or &lt;code>NonNull&amp;lt;T&amp;gt;&lt;/code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like &lt;code>owning_ref&lt;/code>
that want to expose a generic interface over any type. Users like to choose box, and sometimes &lt;em>have&lt;/em> to choose box because of
other box-exclusive features it offers. Even worse is &lt;code>string_cache&lt;/code>, which is extremely hard to fix.&lt;/p>
&lt;p>Then last but not least, there&amp;rsquo;s the opinionated fact that &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> shall be implementable entirely in user code. While we are
many missing language features away from this being the case, the &lt;code>noalias&lt;/code> case is also magic descended upon box itself, with no
user code ever having access to it.&lt;/p>
&lt;p>There are several arguments in favour of box being unique and special cased here. To negate the last argument above, it can
be said that &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> &lt;em>is&lt;/em> a very special type. It&amp;rsquo;s just like a &lt;code>T&lt;/code>, but on the heap. Using this mental model, it&amp;rsquo;s very easy to
justify all the box magic and its unique behaviour. But in my opinion, this is not a useful mental model regarding unsafe code,
and I prefer the mental model of &amp;ldquo;reference that manages its own lifetime&amp;rdquo;, which doesn&amp;rsquo;t imply uniqueness.&lt;/p>
&lt;p>But there are also crates on &lt;a href="https://crates.io/">crates.io&lt;/a> like &lt;a href="https://crates.io/crates/aliasable">aliasable&lt;/a> that already
provide an aliasable version of &lt;code>Box&amp;lt;T&amp;gt;&lt;/code>, which is used by the self-referential type helper crate &lt;a href="https://crates.io/crates/ouroboros">ouroboros&lt;/a>.
So if box stayed unique, people could also just pick up that crate as a dependency and use the aliasable box from there instead of
having to write their own. Interestingly, this crate also provides a &lt;code>Vec&amp;lt;T&amp;gt;&lt;/code>, even though &lt;code>Vec&amp;lt;T&amp;gt;&lt;/code> can currently be aliased in practice and
in the current version of stacked borrows just fine, although it&amp;rsquo;s also not clear whether we want to keep it like this, but I
don&amp;rsquo;t think this can reasonable be changed.&lt;/p>
&lt;blockquote>
&lt;p>One thing was just pointed out to me after releasing the post: Mutation usually goes through &lt;code>&amp;amp;mut T&lt;/code> anyways, even when the value
is stored as a &lt;code>Box&amp;lt;T&amp;gt;&lt;/code>. Therefore, all the guarantees of uniqueness are already present when mutating boxes, making the uniqueness
of box even less important.&lt;/p>
&lt;/blockquote>
&lt;h1 id="noalias-noslow">noalias, noslow&lt;/h1>
&lt;p>There is one clear potential benefit from this box behaviour: ✨Optimizations✨. &lt;code>noalias&lt;/code> doesn&amp;rsquo;t exist for fun, it&amp;rsquo;s something
that can bring clear performance wins (for &lt;code>noalias&lt;/code> on &lt;code>&amp;amp;mut T&lt;/code>, those were measureable). So the only question remains:
&lt;strong>How much performance does &lt;code>noalias&lt;/code> on &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> give us now, and how many potential performance improvements could we get in the
future?&lt;/strong> For the latter, there is no simple answer. For the former, there is. &lt;code>rustc&lt;/code> has &lt;a href="https://github.com/rust-lang/rust/pull/99527">&lt;em>no&lt;/em> performance improvements&lt;/a>
from being compiled with &lt;code>noalias&lt;/code> on &lt;code>Box&amp;lt;T&amp;gt;&lt;/code>, but this isn&amp;rsquo;t really representative since rustc mostly uses arenas instead of box internally.&lt;/p>
&lt;p>I have also benchmarked a few crates from the ecosystem with and without noalias on box, and the &lt;a href="https://gist.github.com/Nilstrieb/9a0751fb9fd1044a30ab55cef9a7d335">results&lt;/a>
were inconclusive. (At the time of writing, only regex-syntax, tokio, and syn have been benchmarked.) regex-syntax showed no changes. Tokio showed a few improvements without noalias
which is very weird, so maybe the benchmarks aren&amp;rsquo;t really good or something else was going on. And syn tended towards minor regressions without noalias, but the benchmarks had high
jitter so no real conclusion can be reached from this either, at least in my eyes, but I don&amp;rsquo;t have a lot of experience with benchmarks. Therefore, I would love for more people
to benchmark more crates, especially if you have more experience with benchmarks.&lt;/p>
&lt;h1 id="a-way-forward">a way forward&lt;/h1>
&lt;p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from &lt;code>Box&amp;lt;T&amp;gt;&lt;/code> and treat it
just like a &lt;code>*const T&lt;/code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from &lt;code>Box&amp;lt;T&amp;gt;&lt;/code>.&lt;/p>
&lt;p>But the performance cost may be real, and especially the future optimization value can&amp;rsquo;t be certain. The current uniqueness guarantees of box
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
&lt;code>&amp;amp;'static mut T&lt;/code> that is unleaked for drop, but the semantics of this are still &lt;a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/316">unclear&lt;/a>.
If that is not possible, exposing &lt;code>std::ptr::Unique&lt;/code> (with it getting boxes aliasing semantics) could be desirable. For this, all existing usages of &lt;code>Unique&lt;/code>
inside the standard library would have to be removed. We could also offer a &lt;code>std::boxed::UniqueBox&lt;/code> that keeps the current semantics, but this would also bring direct aliasing
decisions more towards safe code, which I am not a huge fan of. Ownership is enough already.&lt;/p>
&lt;p>I guess what I am wishing for are some good and flexible raw pointer types. But that&amp;rsquo;s still in the stars&amp;hellip;&lt;/p>
&lt;p>For more information about this topic, see &lt;a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/326">https://github.com/rust-lang/unsafe-code-guidelines/issues/326&lt;/a>&lt;/p>
&lt;p>&lt;em>Thanks to the nice people on the Rust Community Discord for their feedback on the draft of this post!&lt;/em>&lt;/p></content></item></channel></rss>