Lifetime subtyping
'a: 'b
reads as 'a is a subtype of 'b
, but mixing types with
lifetimes is often confusing, so rusteceans prefer to say: lifetime 'a outlives lifetime 'b
.
This outlives
relationship implies 2 important things:
- It allows to implicitly cast references with
'a
lifetime into references with'b
lifetime. - The compiler must assert that
'a >= 'b
(region'a
is the same or wider than region'b
)
I will call these implicit casts lifetimes shortenings, and denote them as: 'a ~> 'b
. Let's go through
some examples to get used to these concepts.
Let's start from this pseudocode:
given 'a: 'b
ref_a: 'a
ref_b: 'b
ref_b = ref_a // fine, 'a ~> 'b
ref_a = ref_b // not fine, requires 'b: 'a
We're given 'a: 'b
and we have 2 references: ref_a
belonging to the region 'a
and ref_b
belonging to the region 'b
.
'a: 'b
implies 'a ~> 'b
allowing us to assign ref_b
to ref_a
. By doing that we're forgetting a reference in the longer region 'a
and recreating it in the shorter region 'b
. However, assigning ref_a
to ref_b
results in a compile error.
It requires a 'b: 'a
relationship to cast 'b
ref into 'a
ref, but we only have a 'a: 'b
relationship.
In the previous chapter we used a visual approach to show how the borrow checker infers regions. In reality it doesn't work like that. All it does
is it assumes a new region for every line of code, infers outlives
relationships for those regions, and then executes validations based
on this information. When the borrow checker encounters a function call it doesn't try to be smart and infer anything, it just reads regions
and relationships between them directly from the function signature and assigns references to those regions.
It means when we're annotating our signatures with lifetimes we're doing the great part of the borrow checker's work ourselves. To get a brief
feel of how the borrow checker actually operates we'll go through the next example written in real Rust. To make it readable I won't assume a new region
for every line of code, but I'll assume it for every scope:
#![allow(unused)] fn main() { { // 'a let a = 42; let ref_a = &a; // ref_a belongs to 'a { // 'b. 'b is subscope of a', so `'a: 'b` let b = 24; let mut ref_b = &b; // ref_b belongs to 'b ref_b = ref_a; // 'a: 'b => 'a ~> 'b println!("{}", ref_b); // prints 42 } println!("{}", ref_a); // prints 42 } }
The example compiles just fine. It corresponds to the next lines of the pseudocode we met before and works for the same reasons.
The only difference is 'a: 'b
relationship is not given, but inferred from the function scopes:
inferred 'a: 'b
ref_a: 'a
ref_b: 'b
ref_b = ref_a // fine, 'a ~> 'b
Now let's try this variation.
#![allow(unused)] fn main() { { // 'a let a = 42; let mut ref_a = &a; // ref_a belongs to 'a { // 'b. 'b is subscope of a', so `'a: 'b` let b = 24; let ref_b = &b; // ref_b belongs to 'b ref_a = ref_b; // compilation error. No `'b: 'a` relationship println!("{}", ref_b); // doesn't compile } println!("{}", ref_a); // doesn't compile } }
This code corresponds to these lines of the pseudocode above and doesn't compile:
inferred 'a: 'b
ref_a: 'a
ref_b: 'b
ref_a = ref_b // not fine, requires 'b: 'a
We can't assign ref_a
to ref_b
because we didn't infer 'b: 'a
relationship(inferring it would be wrong because 'b
region is
shorter than 'a
region). We inferred only 'a: 'b
, so knowing that and by further inferring the region boundaries within
the function scope compiler was able to produce a user friendly b doesn't live long enough
error.
At this point, it should be clear why 'a: 'b
relationship is required to be able to implicitly cast 'a
references into 'b
references.
In short, we just can't guarantee safety after the cast if 'a >= 'b
condition is not met. If you still feel uncertain
you want to study the last 2 examples carefully.
Specifying lifetime relationships in signatures
Returning to our post_urls_from_blog
example we had this error
#![allow(unused)] fn main() { struct DiscoveredItem { blog_url: String, post_url: String, } fn post_urls_from_blog<'post_urls, 'blog_url>( items: &'post_urls [DiscoveredItem], blog_url: &'blog_url str, ) -> impl Iterator<Item = &'post_urls str> + 'blog_url { items.iter().filter_map(move |item| { if item.blog_url == blog_url { Some(item.post_url.as_str()) } else { None } }) } }
Compiling playground v0.0.1 (/playground)
error[E0623]: lifetime mismatch
--> src/main.rs:11:6
|
10 | blog_url: &'blog_url str,
| -------------- this parameter and the return type are declared with different lifetimes...
11 | ) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| ...but data from `items` is returned here
For more information about this error, try `rustc --explain E0623`.
error: could not compile `playground` due to previous error
The error is a bit tricky because the root cause lies at this particular dot:
#![allow(unused)] fn main() { struct DiscoveredItem { blog_url: String, post_url: String, } fn post_urls_from_blog<'post_urls, 'blog_url>( items: &'post_urls [DiscoveredItem], blog_url: &'blog_url str, ) -> impl Iterator<Item = &'post_urls str> + 'blog_url { items.iter().filter_map(move |item| { // here---------^ if item.blog_url == blog_url { Some(item.post_url.as_str()) } else { None } }) } }
Let's examine what happens in between of the iter()
and filter_map()
calls. iter()
returns an
Iterator
from items
and this iterator belongs to 'post_urls
region. filter_map()
takes the items
iterator, but also captures the blog_url
from the 'blog_url
region in the closure, and we expect the resulting iterator to belong to the
'blog_url
region. We can represent what's happening with the following function:
#![allow(unused)] fn main() { fn dot<'post_urls, 'blog_url>( input: impl Iterator<Item = ()> + 'post_urls, ) -> impl Iterator<Item = ()> + 'blog_url { input } }
The function doesn't compile. The cast from Iterator + 'post_urls
into Iterator + 'blog_url
is prohibited because
'post_urls
and 'blog_url
lifetimes are unrelated. In order to make the cast possible we need to introduce a relationship
between the regions. We want to be able to cast(shorten) 'post_urls
references into 'blog_url
references
therefore we need a 'post_urls: 'blog_url
relationship. Let's type it out.
#![allow(unused)] fn main() { fn dot<'post_urls, 'blog_url>( iter: impl Iterator<Item = ()> + 'post_urls, ) -> impl Iterator<Item = ()> + 'blog_url where 'post_urls: 'blog_url { iter } }
Now, with this additional bit of information the funciton does compile. The relationships between lifetimes
aren't inferred between the function calls, we need to specify them manually in order to apply casts we want
in the function body. Adding where 'post_urls: 'blog_url
to post_urls_from_blog
makes items.iter()
cast
into Iterator + 'blog_url
valid. Adding this where
clause to our post_urls_from_blog
function
makes it compile for the same reason.
#![allow(unused)] fn main() { struct DiscoveredItem { blog_url: String, post_url: String, } fn post_urls_from_blog<'post_urls, 'blog_url>( items: &'post_urls [DiscoveredItem], blog_url: &'blog_url str, ) -> impl Iterator<Item = &'post_urls str> + 'blog_url where 'post_urls: 'blog_url { items.iter().filter_map(move |item| { if item.blog_url == blog_url { Some(item.post_url.as_str()) } else { None } }) } }
But what if we had used 'blog_url: 'post_urls
relationship instead?
#![allow(unused)] fn main() { struct DiscoveredItem { blog_url: String, post_url: String, } fn post_urls_from_blog<'post_urls, 'blog_url>( items: &'post_urls [DiscoveredItem], blog_url: &'blog_url str, ) -> impl Iterator<Item = &'post_urls str> + 'post_urls where 'blog_url: 'post_urls { items.iter().filter_map(move |item| { if item.blog_url == blog_url { Some(item.post_url.as_str()) } else { None } }) } }
Now instead of casting items.iter()
which belongs to 'post_urls
we're casting the borrow of
the blog_url
in the filter_map
closure 'blog_url ~> 'post_urls
, so the resulting iterator appears
to be Iterator + 'post_urls
as shown in the updated function signature and this signature compiles too.
What's the difference? To understand why this is not what we want we need to remember the second implication of 'a: 'b
relationship:
- The compiler must assert that
'a >= 'b
(region'a
is the same or wider than region'b
)
Let's return to the caller site and infer the regions for this signature. We will continue to use the visual approach introduced in the previous chapter because even if it's not what compiler actually does it works quite well for humans. Ok, so we need to infer 2 regions by the last usage rule and we actually already did that in the previous chapter:
/---blog_url region
| let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
||-
|| let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
| for url in post_urls {
| process_post(url);
| }
-
But now we have an extra precondition in our post_urls_from_blog
function signature that 'blog_url
must be as wide as 'post_urls
region or wider, so we need to extend post_urls_from_blog 'blog_url
region to meet this requirement.
/---blog_url region
| let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
|||
||| let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-||
|| for url in post_urls {
|| process_post(url);
|| }
--
As the result post_urls_from_blog 'blog_url
and blog_url
regions are not aligned and we have a conflict and the same compiler
error we were struggling with from the beginning. We know that the region for the iterator must be shorter because, usually, iterators
live less then the items they yield, but we failed to communicate this to the compiler and our signature requires the region for the Iterator
to be as wide or wider than the region for its items which is wrong, so we must stick with the 'post_urls: 'blog_url
relationship.
The regions for it will look as we want:
/---blog_url region
| let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
||-
|| let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
| for url in post_urls {
| process_post(url);
| }
-
post_urls_from_blog 'post_urls > post_urls_from_blog 'blog_url
, so no extra region expansion is required and everything compiles just fine.
Hope, this example was sufficient to show that lifetime subtyping is very straighforward to work with. The important thing to remember is when you define a signature you manually specify how many regions the compiler needs to infer and what relationships are between them. The relationships come from the "lifetime casts" you want to perform in your function body, and specifying them results in possible extra region expansions on the caller site, so you need to think ahead which regions can be shorter than others. If regions should be the same replace them with a single region. However, there is one more important thing to consider. To fully grasp lifetime mechanics we need to learn about lifetime variance.
Chapter exercises
Analyze and write down the equivalent to the following signature:
#![allow(unused)] fn main() { struct DiscoveredItem { blog_url: String, post_url: String, } fn post_urls_from_blog<'post_urls, 'blog_url>( items: &'post_urls [DiscoveredItem], blog_url: &'blog_url str, ) -> impl Iterator<Item = &'post_urls str> + 'blog_url where 'post_urls: 'blog_url, 'blog_url: 'post_urls { items.iter().filter_map(move |item| { if item.blog_url == blog_url { Some(item.post_url.as_str()) } else { None } }) } }