What disaster does the compiler prevent by disallowing assigning to a borrowed value?

1

An example from Programming in Rust (PDF):

#[derive(Debug)]
enum IntOrString {
    I(isize),
    S(String),
}

fn corrupt_enum() {
    let mut s = IntOrString::S(String::new());
    match s {
        IntOrString::I(_) => (),
        IntOrString::S(ref p) => {
            s = IntOrString::I(0xdeadbeef);
            // Now p is a &String, pointing at memory
            // that is an int of our choosing!
        }
    }
}

corrupt_enum();

The compiler does not allow this:

error[E0506]: cannot assign to `s` because it is borrowed
  --> src/main.rs:13:17
   |
12 |             IntOrString::S(ref p) => {
   |                            ----- borrow of `s` occurs here
13 |                 s = IntOrString::I(0xdeadbeef);
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `s` occurs here

But suppose it did; how is it that

Now p is a &String, pointing at memory that is an int of our choosing!

is a bad thing?

rust
borrowing
asked on Stack Overflow Mar 25, 2016 by qed • edited Dec 3, 2017 by Shepmaster

1 Answer

4

Let's make up a memory layout for the types involved. IntOrString will have one byte to determine which variant it is (0 = number, 1 = string), followed by 4 bytes that will either be a number or the address to the beginning of a set of UTF-8 characters.

Let's allocate s in memory at 0x100. The variant is at 0x100 and the value is at 0x101, 0x102, 0x103, 0x104. Additionally, let's say that the contents of the value is the pointer 0xABCD; this is where the bytes of the string live.

When the match arm IntOrString::S(ref p) is used, p will be set to the value 0x101 - it's a reference to the value and the value starts at 0x101. When you try to use p, the processor will go to the address 0x101, read the value (an address), and then read the data from that address.

If the compiler allowed you to change s at this point, then the new bytes of the new data would replace the value stored at 0x101. In the example, the "address" stored at the value would now point to somewhere arbitrary (0xDEADBEEF). If we tried to use the "string", we'd start reading bytes of memory that are highly unlikely to correspond to UTF-8 data.

None of this is academic, this exact kind of problem can occur in a well-formed C program. In the good cases, the program will crash. In bad cases, it's possible to read data in the program you aren't supposed to. It's even possible to inject shellcode that then gives an attacker the ability to run code they wrote inside your program.


Note that the memory layout above is very simplified, and an actual String is larger and more complicated.

answered on Stack Overflow Mar 25, 2016 by Shepmaster

User contributions licensed under CC BY-SA 3.0