Note the overall size of this function in comparison to the C version, as well as its clarity. Of course, it is doing allocations in the background through std::string which requires more profiling if I want to make this super efficient™ but honestly the assembler just needs to work, whereas the runtime needs to be fast.