How to shallow clone a Cow
Rust's “copy on write” Cow
abstraction is very useful to avoid costly cloning
when dealing with vectors or strings. A Cow allows you to defer allocating
new memory until it becomes inevitable. Consider the following example:
use std::borrow::Cow;
fn do_something_or_nothing(v: Cow<str>) -> Cow<str> {
if v.len() > 3 {
let s = "Hello ".to_string() + &*v;
Cow::Owned(s)
} else {
v
}
}
Only if the input v
is longer than 3 characters we replace it. Otherwise, it
is just piped through the function and no memory allocation occurs. It is to
be observed, that the API requires consuming the input v
. This is compulsory,
because you can only replace an owned Cow
with another owned one. The downside
is, that you often have to clone the input before passing it into the function,
notably when you need to do it several times, or when the Cow
is behind
a reference.
// Sometimes, we only have a `&Cow`, but we need a `Cow`!
let a: &Cow<str> = &Cow::Owned("world!".to_string());
let b: Cow<str> = a.clone();
assert_eq!(do_something_or_nothing(b), "Hello world!");
Unfortunately, cloning a Cow
sometimes implies memory allocation. And this
what we tried to avoid in the first place!
Let's have a look at the source code of std::borrow::Cow
:
impl<B: ?Sized + ToOwned> Clone for Cow<'_, B> {
fn clone(&self) -> Self {
match *self {
Borrowed(b) => Borrowed(b),
Owned(ref o) => {
let b: &B = o.borrow();
Owned(b.to_owned())
}
}
}
With the last line (Owned(b.to_owned())
) it becomes clear, why an owned Cow
variant results in a deep copy implying memory allocation.
Let's go back to our use case fn do_something_or_nothing()
: As we never
modify the input parameter v
, there should be no need for cloning at all!
Can't we just give out a borrowed version of our owned Cow
?
This is where shallow_clown()
comes into play:
impl<'b> CloneExt<'b> for Cow<'b, str> {
fn shallow_clone(&'b self) -> Cow<'b, str> {
match *self {
Self::Borrowed(b) => Self::Borrowed(b),
Self::Owned(ref o) => Self::Borrowed(o.as_ref()),
}
}
}
If the to be cloned Cow
is an owned variant, we give out a borrowed read-only
variant with the same content. No deep copying or memory allocation occurs!
The above code can be further simplified:
impl<'b> CloneExt<'b> for Cow<'b, str> {
fn shallow_clone(&'b self) -> Cow<'b, str> {
Cow::Borrowed(&**self)
}
}
After replacing all clone()
with shallow_clone()
, we can clone as much as
we want, trusting that we never implicitly allocate memory. Here the
complete motivating example (playground):
use std::borrow::Cow;
pub trait CloneExt<'b> {
fn shallow_clone(&'b self) -> Cow<'b, str>;
}
impl<'b> CloneExt<'b> for Cow<'b, str> {
fn shallow_clone(&'b self) -> Cow<'b, str> {
Cow::Borrowed(&**self)
}
}
fn main() {
use CloneExt;
use std::borrow::Cow;
fn do_something_or_nothing(v: Cow<str>) -> Cow<str> {
if v.len() > 3 {
let s = "Hello ".to_string() + &*v;
Cow::Owned(s)
} else {
v
}
}
// Sometimes, we only have a `&Cow`, but we need a `Cow`!
let a: &Cow<str> = &Cow::Owned("world!".to_string());
let b: Cow<str> = a.shallow_clone();
assert_eq!(do_something_or_nothing(b), "Hello world!");
let a: &Cow<str> = &Cow::Owned("ld!".to_string());
let b: Cow<str> = a.shallow_clone();
assert_eq!(do_something_or_nothing(b), "ld!");
}
The above use case arose from developing Tp-Note. Tp-Note is note-talking application for minimalists. It ships with an internal HTTP server and a Markdown renderer. The latter comprises a hyperlink rewriting engine. As the viewer detects note file changes and updates the rendition accordingly, the link rewriting code should be as fast as possible. This is why every memory allocation counts in this code.
In general, when you deal with a Cow
you probably are in the same situation.
There had been a proposal to
Add `shallow_clone()` method to Cow which always reborrows data to the
standard library which was not accepted. Nevertheless, it took me some time to
figure out this solution, so I share it with you in hope you find it useful.