Design of Shader and Associated Host Class for Performance Optimization of Metal Compute KernelsOptimize compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family.Sep 3Sep 3
Published inBetter ProgrammingMemory Bandwidth Optimized Parallel Radix Sort in Metal for Apple M1 and BeyondA sort shader in Metal for use on primitive data typesMar 3, 2023Mar 3, 2023
Published inBetter ProgrammingEfficient Parallel Prefix Sum in Metal for Apple M1Comparison of optimal M1 GPU scan primitives to vectorized CPU performanceSep 9, 20211Sep 9, 20211
Published inBetter ProgrammingOptimizing Parallel Reduction in Metal for Apple M1Exploring how an optimal implementation should approach the memory bandwidth of the architectureMar 15, 20212Mar 15, 20212