mirror of
https://github.com/Noratrieb/blog.git
synced 2026-01-14 20:35:02 +01:00
deploy: c73b099eb7
This commit is contained in:
parent
cd04a656d9
commit
690a515f92
5 changed files with 46 additions and 46 deletions
18
index.xml
18
index.xml
|
|
@ -38,7 +38,7 @@ the two values aren’t equal. How can they not be equal? If our box and
|
|||
location in memory, writing to the box will cause the pointer to read the new value.</p>
|
||||
<p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program&hellip;</p>
|
||||
<p>&hellip; and everything is fine. Let&rsquo;s run it in release mode. This should work as well, since the optimizer
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the progrm&hellip;</p>
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the program&hellip;</p>
|
||||
<pre tabindex="0"><code>thread &#39;main&#39; panicked at &#39;assertion failed: `(left != right)`
|
||||
left: `0`,
|
||||
right: `0`&#39;, src/main.rs:5:5
|
||||
|
|
@ -59,12 +59,12 @@ Therefore, we also apply <code>noalias</code> to <code>&amp;mut T&l
|
|||
<code>UnsafeCell&lt;T&gt;</code>), since they uphold these rules.</li>
|
||||
</ul>
|
||||
<p>For more info on <code>noalias</code>, see <a href="https://llvm.org/docs/LangRef.html#parameter-attributes">LLVMs LangRef</a>.</p>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before, since his crate <code>left-right</code>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before since his crate <code>left-right</code>
|
||||
was affected by this (<a href="https://youtu.be/EY7Wi9fV5bk)">https://youtu.be/EY7Wi9fV5bk)</a>.</p>
|
||||
<p>If you&rsquo;re looking for <em>any</em> hint that using box emits <code>noalias</code>, you have to look no further than the documentation
|
||||
for <a href="https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code"><code>std::boxed</code></a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
|
||||
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was &ldquo;just an
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor, and deallocates it in the destructor on drop) for all
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
|
||||
pointers are concerned.</p>
|
||||
<h1 id="stacked-borrows-and-miri">Stacked Borrows and Miri</h1>
|
||||
<p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
|
||||
|
|
@ -109,7 +109,7 @@ note: inside `main` at src/main.rs:12:5
|
|||
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
|
||||
that is implemented in Miri. For an excellent introduction, see this part of the great book <a href="https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html">Learning Rust With Entirely Too Many Linked Lists</a>.</p>
|
||||
<p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own &lsquo;borrow stack&rsquo; of these tags,
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example borrowing.</p>
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.</p>
|
||||
<p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
coerced into a raw pointer) from our box, it got a new tag called <code>&lt;3314&gt;</code>. Then, when we moved the box into the function,
|
||||
something happened: The tag was popped off the borrow stack and therefore invalidated. That&rsquo;s because box invalidates all tags
|
||||
|
|
@ -123,7 +123,7 @@ If box didn&rsquo;t invalidate pointers on move and instead behaved like a n
|
|||
<p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a <code>T</code>, but on
|
||||
the heap, and therefore moving it should invalidate pointers, since moving <code>T</code> definitely has to invalidate pointers to it,
|
||||
this comparison doesn&rsquo;t make sense to me. While <code>Box&lt;T&gt;</code> usually behaves like a <code>T</code>, it&rsquo;s just a pointer. Writers of unsafe
|
||||
code <em>know</em> that box is just a pointer, and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
code <em>know</em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
mitigated with better docs and teaching, like how no one questions the uniqueness of <code>&amp;mut T</code> (maybe that&rsquo;s also because that
|
||||
one makes intuitive sense, &ldquo;shared xor mutable&rdquo; is a simple concept), I think it will always be a problem,
|
||||
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.</p>
|
||||
|
|
@ -132,8 +132,8 @@ move in memory. This is the fundamental missing intuition about the box behaviou
|
|||
<p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
|
||||
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
|
||||
pointer (or <code>NonNull&lt;T&gt;</code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
|
||||
This is fairly inconvenient, but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to chose box because of
|
||||
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to choose box because of
|
||||
other box-exclusive features it offers. Even worse is <code>string_cache</code>, which is extremely hard to fix.</p>
|
||||
<p>Then last but not least, there&rsquo;s the opinionated fact that <code>Box&lt;T&gt;</code> shall be implementable entirely in user code. While we are
|
||||
many missing language features away from this being the case, the <code>noalias</code> case is also magic descended upon box itself, with no
|
||||
|
|
@ -161,8 +161,8 @@ jitter so no real conclusion can be reached from this either, at least in my eye
|
|||
to benchmark more crates, especially if you have more experience with benchmarks.</p>
|
||||
<h1 id="a-way-forward">a way forward</h1>
|
||||
<p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code>, and treat it
|
||||
just like a <code>*const T</code> for the purposes of aliasing. This will make it more predictable for unsafe code, and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code> and treat it
|
||||
just like a <code>*const T</code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
<p>But the performance cost may be real, and especially the future optimization value can&rsquo;t be certain. The current uniqueness guarantees of box
|
||||
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
|
||||
<code>&amp;'static mut T</code> that is unleaked for drop, but the semantics of this are still <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/316">unclear</a>.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
<!doctype html><html lang=en><head><title>Box Is a Unique Type :: nilstriebs blog</title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=description content="About better aliasing semantics for `Box`"><meta name=keywords content="box,noalias"><meta name=robots content="noodp"><link rel=canonical href=/posts/box-is-a-unique-type/><link rel=stylesheet href=/assets/style.css><link rel=apple-touch-icon href=/img/apple-touch-icon-192x192.png><link rel="shortcut icon" href=/img/favicon/orange.png><meta name=twitter:card content="summary"><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="og:title" content="Box Is a Unique Type"><meta property="og:description" content="About better aliasing semantics for `Box`"><meta property="og:url" content="/posts/box-is-a-unique-type/"><meta property="og:site_name" content="nilstriebs blog"><meta property="og:image" content="/img/favicon/orange.png"><meta property="og:image:width" content="2048"><meta property="og:image:height" content="1024"><meta property="article:published_time" content="2022-07-23 00:00:00 +0000 UTC"></head><body class=orange><div class="container center headings--one-size"><header class=header><div class=header__inner><div class=header__logo><a href=/><div class=logo>nilstriebs blog</div></a></div></div></header><div class=content><div class=post><h1 class=post-title><a href=/posts/box-is-a-unique-type/>Box Is a Unique Type</a></h1><div class=post-meta><span class=post-date>2022-07-23</span>
|
||||
<span class=post-author>:: Nilstrieb</span>
|
||||
<span class=post-reading-time>:: 12 min read (2370 words)</span></div><span class=post-tags>#<a href=/tags/rust/>rust</a>
|
||||
<span class=post-reading-time>:: 12 min read (2367 words)</span></div><span class=post-tags>#<a href=/tags/rust/>rust</a>
|
||||
#<a href=/tags/unsafe-code/>unsafe code</a> </span><div class=post-content><div><p>We have all used <code>Box<T></code> before in our Rust code. It’s a glorious type, with great ergonomics
|
||||
and flexibility. We can use it to put our values on the heap, but it can do even more
|
||||
than that!</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-rust data-lang=rust><span style=display:flex><span><span style=color:#66d9ef>struct</span> <span style=color:#a6e22e>Fields</span> {
|
||||
|
|
@ -33,7 +33,7 @@ deep into unsafe code, so let’s get our hazmat suits on and jump in!</p><h
|
|||
it reads a value from the pointer, writes to the box, and reads a value again. It then asserts that
|
||||
the two values aren’t equal. How can they not be equal? If our box and pointer point to the same
|
||||
location in memory, writing to the box will cause the pointer to read the new value.</p><p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program…</p><p>… and everything is fine. Let’s run it in release mode. This should work as well, since the optimizer
|
||||
isn’t allowed to change observable behaviour, and an assert is very observable. Run the progrm…</p><pre tabindex=0><code>thread 'main' panicked at 'assertion failed: `(left != right)`
|
||||
isn’t allowed to change observable behaviour, and an assert is very observable. Run the program…</p><pre tabindex=0><code>thread 'main' panicked at 'assertion failed: `(left != right)`
|
||||
left: `0`,
|
||||
right: `0`', src/main.rs:5:5
|
||||
</code></pre><p>Hmm. That’s not what I’ve told would happen. Is the compiler broken? Is this a miscompilation?
|
||||
|
|
@ -45,11 +45,11 @@ reveals the solution: (severely shortened to only show the relevant parts)</p><p
|
|||
<code>noalias</code> is an LLVM attribute on pointers that allows for various optimizations. If there are two pointers,
|
||||
and at least one of them is <code>noalias</code>, there are some restrictions around the two. Approximately:</p><ul><li>If one of them writes, they must not point to the same value (alias each other)</li><li>If neither of them writes, they can alias just fine.
|
||||
Therefore, we also apply <code>noalias</code> to <code>&mut T</code> and <code>&T</code> (if it doesn’t contain interior mutability through
|
||||
<code>UnsafeCell<T></code>), since they uphold these rules.</li></ul><p>For more info on <code>noalias</code>, see <a href=https://llvm.org/docs/LangRef.html#parameter-attributes>LLVMs LangRef</a>.</p><p>This might sound familiar to you if you’re a viewer of <a href=https://twitter.com/jonhoo>Jon Gjengset</a>’s content (which I can highly recommend). Jon has made an entire video about this before, since his crate <code>left-right</code>
|
||||
<code>UnsafeCell<T></code>), since they uphold these rules.</li></ul><p>For more info on <code>noalias</code>, see <a href=https://llvm.org/docs/LangRef.html#parameter-attributes>LLVMs LangRef</a>.</p><p>This might sound familiar to you if you’re a viewer of <a href=https://twitter.com/jonhoo>Jon Gjengset</a>’s content (which I can highly recommend). Jon has made an entire video about this before since his crate <code>left-right</code>
|
||||
was affected by this (<a href=https://youtu.be/EY7Wi9fV5bk)>https://youtu.be/EY7Wi9fV5bk)</a>.</p><p>If you’re looking for <em>any</em> hint that using box emits <code>noalias</code>, you have to look no further than the documentation
|
||||
for <a href=https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code><code>std::boxed</code></a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
|
||||
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was “just an
|
||||
RAII pointer” (a pointer that allocates the value in the constructor, and deallocates it in the destructor on drop) for all
|
||||
RAII pointer” (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
|
||||
pointers are concerned.</p><h1 id=stacked-borrows-and-miri>Stacked Borrows and Miri<a href=#stacked-borrows-and-miri class=hanchor arialabel=Anchor>⌗</a></h1><p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
|
||||
Undefined Behaviour (UB for short). Undefined behaviour is at the root of many modern compiler optimizations. But what is undefined behaviour?
|
||||
UB represents a contract between the program and the compiler. The compiler assumes that UB will not happen, and can therefore optimize based
|
||||
|
|
@ -87,7 +87,7 @@ note: inside `main` at src/main.rs:12:5
|
|||
</code></pre><p>This behaviour does indeed not look very defined at all. But what went wrong? There’s a lot of information here.</p><p>First of all, it says that we attempted a read access, and that this access failed because the tag does not exist in the
|
||||
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
|
||||
that is implemented in Miri. For an excellent introduction, see this part of the great book <a href=https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html>Learning Rust With Entirely Too Many Linked Lists</a>.</p><p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own ‘borrow stack’ of these tags,
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example borrowing.</p><p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.</p><p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
coerced into a raw pointer) from our box, it got a new tag called <code><3314></code>. Then, when we moved the box into the function,
|
||||
something happened: The tag was popped off the borrow stack and therefore invalidated. That’s because box invalidates all tags
|
||||
when it’s moved. The tag was popped off the borrow stack and we tried to read with it anyways - undefined behaviour happened!</p><p>And that’s how our code wasn’t a miscompilation, but undefined behaviour. Quite surprising, isn’t it?</p><h1 id=noalias-nothanks>noalias, nothanks<a href=#noalias-nothanks class=hanchor arialabel=Anchor>⌗</a></h1><p>Many people, myself included, don’t think that this is a good thing.</p><p>First of all, it introduces more UB that could have been defined behaviour instead. This is true for almost all UB, but usually,
|
||||
|
|
@ -95,15 +95,15 @@ there is something gained from the UB that justifies it. We will look at this la
|
|||
If box didn’t invalidate pointers on move and instead behaved like a normal raw pointer, the code above would be sound.</p><p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a <code>T</code>, but on
|
||||
the heap, and therefore moving it should invalidate pointers, since moving <code>T</code> definitely has to invalidate pointers to it,
|
||||
this comparison doesn’t make sense to me. While <code>Box<T></code> usually behaves like a <code>T</code>, it’s just a pointer. Writers of unsafe
|
||||
code <em>know</em> that box is just a pointer, and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
code <em>know</em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
mitigated with better docs and teaching, like how no one questions the uniqueness of <code>&mut T</code> (maybe that’s also because that
|
||||
one makes intuitive sense, “shared xor mutable” is a simple concept), I think it will always be a problem,
|
||||
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.</p><p>When a box is moved, the pointer bytes change their location in memory. But the bytes the box points to stay the same. They don’t
|
||||
move in memory. This is the fundamental missing intuition about the box behaviour.</p><p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
|
||||
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
|
||||
pointer (or <code>NonNull<T></code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
|
||||
This is fairly inconvenient, but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to chose box because of
|
||||
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to choose box because of
|
||||
other box-exclusive features it offers. Even worse is <code>string_cache</code>, which is extremely hard to fix.</p><p>Then last but not least, there’s the opinionated fact that <code>Box<T></code> shall be implementable entirely in user code. While we are
|
||||
many missing language features away from this being the case, the <code>noalias</code> case is also magic descended upon box itself, with no
|
||||
user code ever having access to it.</p><p>There are several arguments in favour of box being unique and special cased here. To negate the last argument above, it can
|
||||
|
|
@ -123,8 +123,8 @@ were inconclusive. (At the time of writing, only regex-syntax, tokio, and syn ha
|
|||
which is very weird, so maybe the benchmarks aren’t really good or something else was going on. And syn tended towards minor regressions without noalias, but the benchmarks had high
|
||||
jitter so no real conclusion can be reached from this either, at least in my eyes, but I don’t have a lot of experience with benchmarks. Therefore, I would love for more people
|
||||
to benchmark more crates, especially if you have more experience with benchmarks.</p><h1 id=a-way-forward>a way forward<a href=#a-way-forward class=hanchor arialabel=Anchor>⌗</a></h1><p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box<T></code>, and treat it
|
||||
just like a <code>*const T</code> for the purposes of aliasing. This will make it more predictable for unsafe code, and is a step forward towards less magic from <code>Box<T></code>.</p><p>But the performance cost may be real, and especially the future optimization value can’t be certain. The current uniqueness guarantees of box
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box<T></code> and treat it
|
||||
just like a <code>*const T</code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from <code>Box<T></code>.</p><p>But the performance cost may be real, and especially the future optimization value can’t be certain. The current uniqueness guarantees of box
|
||||
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
|
||||
<code>&'static mut T</code> that is unleaked for drop, but the semantics of this are still <a href=https://github.com/rust-lang/unsafe-code-guidelines/issues/316>unclear</a>.
|
||||
If that is not possible, exposing <code>std::ptr::Unique</code> (with it getting boxes aliasing semantics) could be desirable. For this, all existing usages of <code>Unique</code>
|
||||
|
|
|
|||
|
|
@ -38,7 +38,7 @@ the two values aren&rsquo;t equal. How can they not be equal? If our box and
|
|||
location in memory, writing to the box will cause the pointer to read the new value.</p>
|
||||
<p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program&hellip;</p>
|
||||
<p>&hellip; and everything is fine. Let&rsquo;s run it in release mode. This should work as well, since the optimizer
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the progrm&hellip;</p>
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the program&hellip;</p>
|
||||
<pre tabindex="0"><code>thread &#39;main&#39; panicked at &#39;assertion failed: `(left != right)`
|
||||
left: `0`,
|
||||
right: `0`&#39;, src/main.rs:5:5
|
||||
|
|
@ -59,12 +59,12 @@ Therefore, we also apply <code>noalias</code> to <code>&amp;mut T&l
|
|||
<code>UnsafeCell&lt;T&gt;</code>), since they uphold these rules.</li>
|
||||
</ul>
|
||||
<p>For more info on <code>noalias</code>, see <a href="https://llvm.org/docs/LangRef.html#parameter-attributes">LLVMs LangRef</a>.</p>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before, since his crate <code>left-right</code>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before since his crate <code>left-right</code>
|
||||
was affected by this (<a href="https://youtu.be/EY7Wi9fV5bk)">https://youtu.be/EY7Wi9fV5bk)</a>.</p>
|
||||
<p>If you&rsquo;re looking for <em>any</em> hint that using box emits <code>noalias</code>, you have to look no further than the documentation
|
||||
for <a href="https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code"><code>std::boxed</code></a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
|
||||
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was &ldquo;just an
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor, and deallocates it in the destructor on drop) for all
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
|
||||
pointers are concerned.</p>
|
||||
<h1 id="stacked-borrows-and-miri">Stacked Borrows and Miri</h1>
|
||||
<p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
|
||||
|
|
@ -109,7 +109,7 @@ note: inside `main` at src/main.rs:12:5
|
|||
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
|
||||
that is implemented in Miri. For an excellent introduction, see this part of the great book <a href="https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html">Learning Rust With Entirely Too Many Linked Lists</a>.</p>
|
||||
<p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own &lsquo;borrow stack&rsquo; of these tags,
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example borrowing.</p>
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.</p>
|
||||
<p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
coerced into a raw pointer) from our box, it got a new tag called <code>&lt;3314&gt;</code>. Then, when we moved the box into the function,
|
||||
something happened: The tag was popped off the borrow stack and therefore invalidated. That&rsquo;s because box invalidates all tags
|
||||
|
|
@ -123,7 +123,7 @@ If box didn&rsquo;t invalidate pointers on move and instead behaved like a n
|
|||
<p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a <code>T</code>, but on
|
||||
the heap, and therefore moving it should invalidate pointers, since moving <code>T</code> definitely has to invalidate pointers to it,
|
||||
this comparison doesn&rsquo;t make sense to me. While <code>Box&lt;T&gt;</code> usually behaves like a <code>T</code>, it&rsquo;s just a pointer. Writers of unsafe
|
||||
code <em>know</em> that box is just a pointer, and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
code <em>know</em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
mitigated with better docs and teaching, like how no one questions the uniqueness of <code>&amp;mut T</code> (maybe that&rsquo;s also because that
|
||||
one makes intuitive sense, &ldquo;shared xor mutable&rdquo; is a simple concept), I think it will always be a problem,
|
||||
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.</p>
|
||||
|
|
@ -132,8 +132,8 @@ move in memory. This is the fundamental missing intuition about the box behaviou
|
|||
<p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
|
||||
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
|
||||
pointer (or <code>NonNull&lt;T&gt;</code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
|
||||
This is fairly inconvenient, but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to chose box because of
|
||||
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to choose box because of
|
||||
other box-exclusive features it offers. Even worse is <code>string_cache</code>, which is extremely hard to fix.</p>
|
||||
<p>Then last but not least, there&rsquo;s the opinionated fact that <code>Box&lt;T&gt;</code> shall be implementable entirely in user code. While we are
|
||||
many missing language features away from this being the case, the <code>noalias</code> case is also magic descended upon box itself, with no
|
||||
|
|
@ -161,8 +161,8 @@ jitter so no real conclusion can be reached from this either, at least in my eye
|
|||
to benchmark more crates, especially if you have more experience with benchmarks.</p>
|
||||
<h1 id="a-way-forward">a way forward</h1>
|
||||
<p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code>, and treat it
|
||||
just like a <code>*const T</code> for the purposes of aliasing. This will make it more predictable for unsafe code, and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code> and treat it
|
||||
just like a <code>*const T</code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
<p>But the performance cost may be real, and especially the future optimization value can&rsquo;t be certain. The current uniqueness guarantees of box
|
||||
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
|
||||
<code>&amp;'static mut T</code> that is unleaked for drop, but the semantics of this are still <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/316">unclear</a>.
|
||||
|
|
|
|||
|
|
@ -38,7 +38,7 @@ the two values aren&rsquo;t equal. How can they not be equal? If our box and
|
|||
location in memory, writing to the box will cause the pointer to read the new value.</p>
|
||||
<p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program&hellip;</p>
|
||||
<p>&hellip; and everything is fine. Let&rsquo;s run it in release mode. This should work as well, since the optimizer
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the progrm&hellip;</p>
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the program&hellip;</p>
|
||||
<pre tabindex="0"><code>thread &#39;main&#39; panicked at &#39;assertion failed: `(left != right)`
|
||||
left: `0`,
|
||||
right: `0`&#39;, src/main.rs:5:5
|
||||
|
|
@ -59,12 +59,12 @@ Therefore, we also apply <code>noalias</code> to <code>&amp;mut T&l
|
|||
<code>UnsafeCell&lt;T&gt;</code>), since they uphold these rules.</li>
|
||||
</ul>
|
||||
<p>For more info on <code>noalias</code>, see <a href="https://llvm.org/docs/LangRef.html#parameter-attributes">LLVMs LangRef</a>.</p>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before, since his crate <code>left-right</code>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before since his crate <code>left-right</code>
|
||||
was affected by this (<a href="https://youtu.be/EY7Wi9fV5bk)">https://youtu.be/EY7Wi9fV5bk)</a>.</p>
|
||||
<p>If you&rsquo;re looking for <em>any</em> hint that using box emits <code>noalias</code>, you have to look no further than the documentation
|
||||
for <a href="https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code"><code>std::boxed</code></a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
|
||||
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was &ldquo;just an
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor, and deallocates it in the destructor on drop) for all
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
|
||||
pointers are concerned.</p>
|
||||
<h1 id="stacked-borrows-and-miri">Stacked Borrows and Miri</h1>
|
||||
<p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
|
||||
|
|
@ -109,7 +109,7 @@ note: inside `main` at src/main.rs:12:5
|
|||
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
|
||||
that is implemented in Miri. For an excellent introduction, see this part of the great book <a href="https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html">Learning Rust With Entirely Too Many Linked Lists</a>.</p>
|
||||
<p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own &lsquo;borrow stack&rsquo; of these tags,
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example borrowing.</p>
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.</p>
|
||||
<p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
coerced into a raw pointer) from our box, it got a new tag called <code>&lt;3314&gt;</code>. Then, when we moved the box into the function,
|
||||
something happened: The tag was popped off the borrow stack and therefore invalidated. That&rsquo;s because box invalidates all tags
|
||||
|
|
@ -123,7 +123,7 @@ If box didn&rsquo;t invalidate pointers on move and instead behaved like a n
|
|||
<p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a <code>T</code>, but on
|
||||
the heap, and therefore moving it should invalidate pointers, since moving <code>T</code> definitely has to invalidate pointers to it,
|
||||
this comparison doesn&rsquo;t make sense to me. While <code>Box&lt;T&gt;</code> usually behaves like a <code>T</code>, it&rsquo;s just a pointer. Writers of unsafe
|
||||
code <em>know</em> that box is just a pointer, and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
code <em>know</em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
mitigated with better docs and teaching, like how no one questions the uniqueness of <code>&amp;mut T</code> (maybe that&rsquo;s also because that
|
||||
one makes intuitive sense, &ldquo;shared xor mutable&rdquo; is a simple concept), I think it will always be a problem,
|
||||
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.</p>
|
||||
|
|
@ -132,8 +132,8 @@ move in memory. This is the fundamental missing intuition about the box behaviou
|
|||
<p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
|
||||
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
|
||||
pointer (or <code>NonNull&lt;T&gt;</code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
|
||||
This is fairly inconvenient, but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to chose box because of
|
||||
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to choose box because of
|
||||
other box-exclusive features it offers. Even worse is <code>string_cache</code>, which is extremely hard to fix.</p>
|
||||
<p>Then last but not least, there&rsquo;s the opinionated fact that <code>Box&lt;T&gt;</code> shall be implementable entirely in user code. While we are
|
||||
many missing language features away from this being the case, the <code>noalias</code> case is also magic descended upon box itself, with no
|
||||
|
|
@ -161,8 +161,8 @@ jitter so no real conclusion can be reached from this either, at least in my eye
|
|||
to benchmark more crates, especially if you have more experience with benchmarks.</p>
|
||||
<h1 id="a-way-forward">a way forward</h1>
|
||||
<p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code>, and treat it
|
||||
just like a <code>*const T</code> for the purposes of aliasing. This will make it more predictable for unsafe code, and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code> and treat it
|
||||
just like a <code>*const T</code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
<p>But the performance cost may be real, and especially the future optimization value can&rsquo;t be certain. The current uniqueness guarantees of box
|
||||
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
|
||||
<code>&amp;'static mut T</code> that is unleaked for drop, but the semantics of this are still <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/316">unclear</a>.
|
||||
|
|
|
|||
|
|
@ -38,7 +38,7 @@ the two values aren&rsquo;t equal. How can they not be equal? If our box and
|
|||
location in memory, writing to the box will cause the pointer to read the new value.</p>
|
||||
<p>Now construct a box, get a pointer to it, and pass the two to the function. Run the program&hellip;</p>
|
||||
<p>&hellip; and everything is fine. Let&rsquo;s run it in release mode. This should work as well, since the optimizer
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the progrm&hellip;</p>
|
||||
isn&rsquo;t allowed to change observable behaviour, and an assert is very observable. Run the program&hellip;</p>
|
||||
<pre tabindex="0"><code>thread &#39;main&#39; panicked at &#39;assertion failed: `(left != right)`
|
||||
left: `0`,
|
||||
right: `0`&#39;, src/main.rs:5:5
|
||||
|
|
@ -59,12 +59,12 @@ Therefore, we also apply <code>noalias</code> to <code>&amp;mut T&l
|
|||
<code>UnsafeCell&lt;T&gt;</code>), since they uphold these rules.</li>
|
||||
</ul>
|
||||
<p>For more info on <code>noalias</code>, see <a href="https://llvm.org/docs/LangRef.html#parameter-attributes">LLVMs LangRef</a>.</p>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before, since his crate <code>left-right</code>
|
||||
<p>This might sound familiar to you if you&rsquo;re a viewer of <a href="https://twitter.com/jonhoo">Jon Gjengset</a>&rsquo;s content (which I can highly recommend). Jon has made an entire video about this before since his crate <code>left-right</code>
|
||||
was affected by this (<a href="https://youtu.be/EY7Wi9fV5bk)">https://youtu.be/EY7Wi9fV5bk)</a>.</p>
|
||||
<p>If you&rsquo;re looking for <em>any</em> hint that using box emits <code>noalias</code>, you have to look no further than the documentation
|
||||
for <a href="https://doc.rust-lang.org/nightly/std/boxed/index.html#considerations-for-unsafe-code"><code>std::boxed</code></a>. Well, the nightly or beta docs, because I only added this section very recently. For years, this behaviour was not really documented, and you had to
|
||||
belong to the arcane circles of the select few who were aware of it. So lots of code was written thinking that box was &ldquo;just an
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor, and deallocates it in the destructor on drop) for all
|
||||
RAII pointer&rdquo; (a pointer that allocates the value in the constructor and deallocates it in the destructor on drop) for all
|
||||
pointers are concerned.</p>
|
||||
<h1 id="stacked-borrows-and-miri">Stacked Borrows and Miri</h1>
|
||||
<p>So, LLVM was completely correct in optimizing our code to make the assert fail. But what exactly gave it permission to do so?
|
||||
|
|
@ -109,7 +109,7 @@ note: inside `main` at src/main.rs:12:5
|
|||
borrow stack of the byte that was accessed. This is something about stacked borrows, the experimental memory model for Rust
|
||||
that is implemented in Miri. For an excellent introduction, see this part of the great book <a href="https://rust-unofficial.github.io/too-many-lists/fifth-stacked-borrows.html">Learning Rust With Entirely Too Many Linked Lists</a>.</p>
|
||||
<p>In short: each pointer has a unique tag attached to it. Each byte in memory has its own &lsquo;borrow stack&rsquo; of these tags,
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example borrowing.</p>
|
||||
and only the pointers that have their tag in the stack are allowed to access it. Tags can be pushed and popped from the stack through various operations, for example, borrowing.</p>
|
||||
<p>In the code example above, we get a nice little hint where the tag was created. When we created a reference (that was then
|
||||
coerced into a raw pointer) from our box, it got a new tag called <code>&lt;3314&gt;</code>. Then, when we moved the box into the function,
|
||||
something happened: The tag was popped off the borrow stack and therefore invalidated. That&rsquo;s because box invalidates all tags
|
||||
|
|
@ -123,7 +123,7 @@ If box didn&rsquo;t invalidate pointers on move and instead behaved like a n
|
|||
<p>But more importantly, this is not behaviour generally expected by users. While it can be argued that box is like a <code>T</code>, but on
|
||||
the heap, and therefore moving it should invalidate pointers, since moving <code>T</code> definitely has to invalidate pointers to it,
|
||||
this comparison doesn&rsquo;t make sense to me. While <code>Box&lt;T&gt;</code> usually behaves like a <code>T</code>, it&rsquo;s just a pointer. Writers of unsafe
|
||||
code <em>know</em> that box is just a pointer, and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
code <em>know</em> that box is just a pointer and will abuse that knowledge, accidentally causing UB with it. While this can be
|
||||
mitigated with better docs and teaching, like how no one questions the uniqueness of <code>&amp;mut T</code> (maybe that&rsquo;s also because that
|
||||
one makes intuitive sense, &ldquo;shared xor mutable&rdquo; is a simple concept), I think it will always be a problem,
|
||||
because in my opinion, box being unique and invalidating pointers on move is simply not intuitive.</p>
|
||||
|
|
@ -132,8 +132,8 @@ move in memory. This is the fundamental missing intuition about the box behaviou
|
|||
<p>There are also other reasons why the box behaviour is not desirable. Even people who know about the behaviour of box will want
|
||||
to write code that goes directly against this behaviour at some point. But usually, fixing it is pretty simple: Storing a raw
|
||||
pointer (or <code>NonNull&lt;T&gt;</code>) instead of a box, and using the constructor and drop to allocate and deallocate the backing box.
|
||||
This is fairly inconvenient, but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to chose box because of
|
||||
This is fairly inconvenient but totally acceptable. There are bigger problems though. There are crates like <code>owning_ref</code>
|
||||
that want to expose a generic interface over any type. Users like to choose box, and sometimes <em>have</em> to choose box because of
|
||||
other box-exclusive features it offers. Even worse is <code>string_cache</code>, which is extremely hard to fix.</p>
|
||||
<p>Then last but not least, there&rsquo;s the opinionated fact that <code>Box&lt;T&gt;</code> shall be implementable entirely in user code. While we are
|
||||
many missing language features away from this being the case, the <code>noalias</code> case is also magic descended upon box itself, with no
|
||||
|
|
@ -161,8 +161,8 @@ jitter so no real conclusion can be reached from this either, at least in my eye
|
|||
to benchmark more crates, especially if you have more experience with benchmarks.</p>
|
||||
<h1 id="a-way-forward">a way forward</h1>
|
||||
<p>Based on all of this, I do have a few solutions. First of all, I think that even if there might be some small performance regressions, they are not significant enough
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code>, and treat it
|
||||
just like a <code>*const T</code> for the purposes of aliasing. This will make it more predictable for unsafe code, and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
to justify boxes uniqueness. Unsafe code wants to use box, and it is reasonable to do so. Therefore I propose to completely remove all uniqueness from <code>Box&lt;T&gt;</code> and treat it
|
||||
just like a <code>*const T</code> for aliasing. This will make it more predictable for unsafe code and is a step forward towards less magic from <code>Box&lt;T&gt;</code>.</p>
|
||||
<p>But the performance cost may be real, and especially the future optimization value can&rsquo;t be certain. The current uniqueness guarantees of box
|
||||
are very strong, and still giving code an option to obtain these seems useful. One possibility would be for code to use a
|
||||
<code>&amp;'static mut T</code> that is unleaked for drop, but the semantics of this are still <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/316">unclear</a>.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue