Lifetimes in Rust functions

Posted by Stefan Kecskes on Wednesday, January 31, 2024

Lifetimes in Rust functions

Lifetimes in Rust are a way for the Rust compiler to ensure that the references are always valid. They prevent dangling references or data races. The compiler is checking the lifetimes at compile time. If I am right, Rust is the only language that has this concept. In this post, I will show you how to use lifetimes in functions, what doesn’t work, what not to do and what are the best practices.

Let’s see what will not work

I have defined var a and call a function do by passing reference to a. Then function do does something (eg. calculates the results) and stores it in var b. Function do wants to return a reference to b to the caller. This is probably very normal scenario in other languages and developer doesn’t need to think about underlying memory management.

fn main() {
    let a = 1;
    let b = do(&a);
    println!("b: {}", b);
}

fn do(a: &i32) -> &i32 {
    let b = a + 1;
    &b
}

When we run the code we will see that the compiler fails with error message: cannot return reference to local variable b

The reason for that is that variable b was declared in the stack of function do. When the function do ends, the compiler will destroy all variables declared in that scope (together with our variable b). One could say that the variable b came to end of its lifetime. This is important aspect of Rust’s memory management, that it doesn’t leave traces of dangling pointers and that the memory is always cleaned up.

2 solutions using heap

So how could we pass this variable b up to main function? One option is to create a variable outside the function do scope. As a Rust developer we can think about using heap or stack. Let’s first show heap approach. We will take ownership of variable b and wrap it inside Box, which allocates memory on the heap memory and let the function do return this Box:

fn main() {
    let a = 1;
    let b = do(&a);
    println!("b: {}", b);
}

fn do(a: &i32) -> Box<i32> {
    let b = a + 1;
    Box::new(b)
}

Well done, because this will work. We have to mention that Box is side effect and makes this function impure. The Box is stored in heap, and ownership will be moved up to the caller. So the lifetime of this solution is same as the lifetime of the caller. This is not a problem if you call this function once, but what if you call it in a loop? Let’s imagine that you call this function in loop to calculate value over your dataset of 1 million records. You will end up with 1 million Boxes stored in the heap. This side effect is also called memory leak.

Boxes are not bad, it depends on how you use them. For example, you would want to use Boxes to allocate a large data with unknown size at compile time, because heap allows flexible memory allocation. Or when you want to transfer ownership of the data to another thread or when you want when you want to pass data from one function to other without copying it. I know we said that in Rust everything in immutable, but boxes are truly flexible and mutable by default. This makes it ideal for complex data structures like trees, graphs, linked lists, etc.

Because we already touched on the memory leak, I have to mention that Rust has a function called leak which will purposefully allocate the memory for the entire lifetime of the program. This is useful for example when you want to keep some config values, and you don’t want to pass them around from function to function. Let’s use it:

fn main() {
    let a = 1;
    let b = do(&a);
    println!("b: {}", b);
}

fn do(a: &i32) -> &'static i32 {
    let b = a + 1;
    Box::leak(Box::new(b))
}

The function do now returns a reference to an i32 value, that lives for the entire duration of the program. The ' symbol in Rust is used to denote lifetimes, in this case static lifetime. Definitely, don’t put such function inside loop. Ok Stefan, you are giving us lemon solutions, give us some proper juice.

2 solutions using stack

Let’s forget heap now and let’s look into 2 solutions using stack to achieve this. First, is to let the caller of the function do to create the variable b and pass it to the function do as a reference. Because, we know we will modify the variable b inside the function do, we need to declare it as mutable reference. The function main is owner of the variable b therefore it will not go out of scope, and we can pass it as reference to the function do and function do doesn’t need to return anything as the variable b will be modified:

fn main() {
    let a = 1;
    let mut b = 0;
    do(&a, &mut b);
    println!("b: {}", b);
}

fn do(a: &i32, b: &mut i32) {
    *b = a + 1;
}

We achieved what we wanted, but I don’t like the fact that we need to declare this helper variable b outside the function do. Also, the function do is not pure anymore, because it modifies the variable b outside of its scope. One interesting bit here is the star * operator. The variable b is a reference to the memory address where the value is stored. We don’t want to change that memory address, we want to change the value stored in that memory address. Therefore, we need to dereference the variable b with * operator. The lifetime of the variable b is the same as the lifetime of the caller.

Ok, ok, but ideally we want to pass to the function do only a variable and return the value back to the caller. Let’s implement this:

fn main() {
    let a = 1;
    let b = do(&a);
    println!("b: {}", b);
}

fn do(a: &i32) -> i32 {
    let b = *a + 1;
    b
}

This is the most elegant solution in my opinion. We don’t need to declare any helper variables, we don’t need to pass any mutable references. The function do is pure, and it returns the value to the caller. The lifetime of the variable b is the same as the lifetime of the caller, so no memory leaks here. You can also see that we don’t need to dereference the variable b with * operator, because we are returning the value and not the reference to the value. But we need to dereference the variable a with * operator, because we want to use the value from the reference. Rust compiler will automatically dereference variables in arithmetic operations like a+1, so we would not need to use * symbol, but I like to be explicit about it.

So what we learned

  • the variables declared in the stack have a lifetime of the scope they are declared in (e.g. function)
  • the variables declared on the heap have a lifetime of the entire program (e.g. static)
  • the references are pointers to the memory address where the value is stored and not actual values
  • to get the value from the reference we need to dereference it with * operator
  • the pure functions take value and return the result to the caller and are deterministic
  • the impure functions modify the variables outside their scope and have side effects

Conclusion

The understanding of lifetimes is very important in Rust. It is a concept that is not used in other languages, and it might be confusing at the beginning. But, once you get used to it, you will understand better what is happening in the memory. The compiler will also help you a lot with the errors, and you will learn to read them and understand them. The use of lifetimes is not limited to functions, but also used instructs and traits. I might cover that in another post.