Porting a Physics Engine to Rust

Return

Bepuphysics2 is a high performance 3D physics engine written in C#. This library emphasizes performance through cache locality, SIMD, and aggressive compiler optimizations. Memory is carefully managed manually to avoid garbage collection pauses. Porting this library to Rust comes with several challenges and porting to Rust will not provide much, if any, performance improvement. The goal of this port is to bolster Rust's 3D ecosystem with a mature and feature-rich physics engine, remove foreign function overhead, and integrate with the many popular ECS libraries in Rust.

Challenges

There are several important differences between C# and Rust that make porting a physics engine challenging.

1. There are a lot of C# syntax features and standard library APIs that need to be replaced.
2. Rust and C# have their own idioms and it's important to use Rust's idioms when porting to Rust.
3. Rust has a strict ownership system that can make it difficult to translate C# code to Rust, especially when dealing with memory management and alignment. At times, this can require a lot of unsafe code and fighting the borrow checker.
4. Rust's SIMD support is still unstable and incomplete. Sometimes, it is necessary to write instrinsics directly in Rust instead of using the portable_simd crate.
5. The Bepuphysics2 library has a lack of testing, which can make it difficult to ensure that the ported library is correct.

Strategy

The first step in porting the library involves identifying dependencies and C# APIs. Next I need to find all self-contained components that can be ported and tested independently. Lastly, I need to write comprehensive tests for the ported code to ensure correctness.

Studying the code base, it is determined that there are 2 major parts: the physics engine itself and the utilities code, which is self contained. The physics engine relies heavily on the utilities code. The utilities code is a collection of math functions and high performance structures that take advantage of SIMD. SIMD is achieved in C# through the Vector struct in the Systems.Numerics namespace. This class is a wrapper around SIMD instructions that relies on JIT optimizations to generate platform specific SIMD instructions. Fallback implementations are provided for platforms that do not support SIMD instructions. The amount of lanes in the Vector is determined at runtime based on the CPU architecture. This is a feature that is not available in Rust. Fortunately, it is was trivial to implement this by creating a wrapper type and using conditional compilation:

Rust

// I ommitted the implementation for all architectures for brevity
// This code is targeted for the ARM architecture

#[cfg(target_arch = "aarch64")]
const fn preferred_byte_size() -> usize {
    #[cfg(all(
        target_feature = "neon",
        not(any(target_feature = "sve", target_feature = "sve2"))
    ))]
    {
        16
    }
}

pub const fn optimal_lanes<T>() -> usize {
    const fn max(a: usize, b: usize) -> usize {
        if a > b {
            a
        } else {
            b
        }
    }
    max(preferred_byte_size() / std::mem::size_of::<T>(), 2)
}

pub type Vector<T> = std::simd::Simd<T, { optimal_lanes::<T>() }>;

The `out var` syntax is used quite heavily in the C# code. Functions are written in a way that accepts references to already allocated variables. Instead of returning a value, the function writes the result to the reference. It is important to control the initialization of variables to avoid unnecessary allocations and moves.

/// Computes the length of a vector.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void Length(in Vector3Wide v, out Vector<float> length)
{
    length = Vector.SquareRoot(v.X * v.X + v.Y * v.Y + v.Z * v.Z);
}

Replicating this in Rust is not as straightforward, as the borrow checker will complain about multiple mutable references. Rust provides the MaybeUninit type to implement "out pointers" like in C#. I've written this macro to replicate this behavior, which works around the borrow checker and provides a zero cost abstraction:

Rust

/// Provides a zero-cost abstraction for out parameters similar to C#'s `out` keyword.
///
/// # Examples
/// ```
/// // Instead of:
/// let mut result = MaybeUninit::<Symmetric3x3Wide>::uninit();
/// Symmetric3x3Wide::scale(&self, rhs, unsafe { result.as_mut_ptr().as_mut().unwrap() });
/// let result = unsafe { result.assume_init() };
///
/// // You can write:
/// let result = out!(Symmetric3x3Wide::scale(&self, rhs));
/// ```
/// In C#, the equivalent would be:
/// ```
/// Symmetric3x3Wide.Scale(ref this, rhs, out var result);
/// ```
#[macro_export]
macro_rules! out {
    ($e:ident :: $method:ident ( $($arg:expr),* )) => {{
        let mut __result = std::mem::MaybeUninit::uninit();
        $e::$method($($arg,)* unsafe { &mut *(__result.as_mut_ptr()) });
        unsafe { __result.assume_init() }
    }};
}

In this specific macro, it only accepts one "out pointer" and must be at the end of the function, which is the case for most of the original C# code. Lastly, the code gen from the C# code is also inspected and optimized for the RyuJIT compiler, whereas Rust uses LLVM. Some of the code seems to be written in an odd way, which is likely to be optimized by the RyuJIT compiler specifically. Some C# code patterns that were specifically written for RyuJIT optimization may need to be restructured for optimal LLVM code generation.

Adding ARM Support

The Bepuphysics2 library is currently not optimized for ARM. Instead, some SIMD operations will fallback to scalar operations on ARM. Here is an example from the MathHelper.cs file:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Vector<float> FastReciprocal(Vector<float> v)
{
    if (Avx.IsSupported && Vector<float>.Count == 8)
    {
        return Avx.Reciprocal(v.AsVector256()).AsVector();
    }
    else if (Sse.IsSupported && Vector<float>.Count == 4)
    {
        return Sse.Reciprocal(v.AsVector128()).AsVector();
    }
    else
    {
        return Vector<float>.One / v;
    }
    //TODO: Arm!
}

Here, there are manual checks for AVX and SSE support as well as lane counts. The portable_simd crate, at the time of this writing, does not use intrinsics for fast reciprocals, which means writing intrinsics directly is necessary in some parts of the codebase.

Rust

#[inline(always)]
pub fn fast_reciprocal(v: Vector<f32>) -> Vector<f32> {
    #[cfg(target_arch = "x86_64")]
    unsafe {
        if is_x86_feature_detected!("avx512f") {
            let v512 = _mm512_load_ps(v.as_ptr());
            let result512 = _mm512_rcp14_ps(v512);
            std::mem::transmute(result512)
        } else if is_x86_feature_detected!("avx") {
            let v256 = _mm256_load_ps(v.as_ptr());
            let result256 = _mm256_rcp_ps(v256);
            let result128 = _mm256_castps256_ps128(result256);
            std::mem::transmute(result128)
        } else if is_x86_feature_detected!("sse") {
            let v128 = _mm_load_ps(v.as_ptr());
            let result128 = _mm_rcp_ps(v128);
            std::mem::transmute(result128)
        } else {
            v.recip()
        }
    }
    #[cfg(target_arch = "aarch64")]
    unsafe {
        let v_neon = vld1q_f32(v.as_array().as_ptr());
        let result_neon = vrecpeq_f32(v_neon);
        std::mem::transmute(result_neon)
    }
    #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
    {
        v.recip()
    }
}

Testing

Tests are still being implemented for each component in the utilities library.

Return