Swift copy-on-write

To understand what copy-on-write means we need to delve into Swift’s type classification.

Objects in Swift can be divided into two type groups: value types and reference types.

The main difference between the two is how they are managed in memory. Theoretically, Value types create a new copy on memory each time they are assigned to a new variable, conversely reference types share the same reference amongst all variables.

What is copy-on-write?#

This is a technique used to manage the copy of objects efficiently.

When we create a new assignment of a variable that holds a type supporting copy-on-write, the actual duplication of the data only occurs when there is a modification to the data through its references (data mutation).

This mechanism also occurs when we initialize or pass them as arguments.

This technique allow us to improve performance and reduce the use of costly copy operations.

In essence, we are postponing the allocation and copying of data in memory until we actually need it.

How does this affect Swift?#

We mentioned that, in Swift, value types are copied to a new memory region when we assign them to variables (copy semantics). This is partially true for built-in value-types and for custom types where we implement this functionality.

The whole process involves the extra step of copy-on-write mechanism that allows us to reduce unnecessary memory allocations, making the entire process more efficient and performant.

Copy and move semantics is a complex topic and it’s out of the scope of this post, in simple terms here are some definitions of them:

Copy semantics means that when you pass or assign an object, you create a new copy of that object’s data.

Move semantics means that when you pass or assign an object, you transfer the ownership of that object’s data to another object.

This means that every time we make a new assignment, the copy allocation is delayed until we modify the actual data (mutation). This implies that at certain times, our value types can refer to the same memory region, behaving as reference types.

Does all value types uses copy-on-write in Swift?#

We have this behaviour for free in Collection types: Arrays, Sets, Dictionaries, and Strings.

For custom value types, we need to manually implement it using the function isKnownUniquelyReferenced(_:). Apple provides guidance on how to apply this mechanism here.

Code example#

To demonstrate this functionality let’s use an Array collection:

/// Prints the address of the given array
/// - Parameter bytes: the array pointer
func printAddress(_ bytes: UnsafeRawBufferPointer) {
    print(NSString(format: "%p", Int(bitPattern: bytes.baseAddress)))
}

var nums1 = [1,2,3]
var nums2 = nums1 // copying

nums1.withUnsafeBytes(printAddress) // 0x60000170c0e0
nums2.withUnsafeBytes(printAddress) // 0x60000170c0e0

nums2.append(4)

nums1.withUnsafeBytes(printAddress) // 0x60000170c0e0
nums2.withUnsafeBytes(printAddress) // 0x60000210c6b0 (copy-on-write happens)

In the above code we can observe how the copy-on-write delays the duplication on a different memory location until we made some changes to the underlying data.

Warning: There was a change in the Swift 5.3 compiler that made the withUnsafeBytes (_:) method of String internal, meaning that it can only be accessed within the same module as the String type. Meaning this way of printing addresses won’t work for Strings.

Another example: Passing collections to functions#

Let’s explore passing collections as arguments in functions.

By default, when we pass a type to a function this is immutable by default (it cannot be changed). This improves the predictability and safety of our code.

Another consequence of this immutability in function’s parameters is the use of copy-on-write. Since we cannot mutate our value types, no memory copy occurs and we get a reference to the same value. This helps to avoid unnecessary cpu and memory work.

Let’s demonstrate this fact with the following code.

/// Prints the address of the given array
/// - Parameter bytes: the array pointer
func printAddress(_ bytes: UnsafeRawBufferPointer) {
    print(NSString(format: "%p", Int(bitPattern: bytes.baseAddress)))
}

/// A functions that receives an array as argument and prints its memory address and content.
/// - Parameter array: the array being passed.
func foo(_ array: [Int]) {
    array.withUnsafeBytes(printAddress)
    print(array)
}

var nums = [1, 2, 3]

nums.withUnsafeBytes(printAddress) // 0x6000017001a0

foo(nums) // 0x6000017001a0 (same memory address)

The previous code shows that the data we get inside the function is the same as the one being passed. This indicates the use of copy-on-write.

This is great from a performance point of view because we don’t need to optimize the memory copy for built-in types.

In the case of custom value types, the copy does occur if we don’t add the copy-on-write functionality manually.

Advantages and disadvantages#

The use of copy-on-write can be a great way to reduce memory usage by having several references to the same object in memory, this also helps with performance when we use it in data that rarely change and is shared across several parts.

Conversely, it can be hard to implement for our custom types, specially if our types are complex. This can limit the applicability of this technique by adding complexity to our code.

Conclusion#

Copy-on-write is a great way to improve memory management and boost performance of our value types. Despite some complexities involved in its manual implementation for custom types, it’s a great tool that we need to understand to be able to write performant code when implementing low level abstractions.