Matthew Kieber-EmmonsDesign of Shader and Associated Host Class for Performance Optimization of Metal Compute KernelsOptimize compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family.Sep 3Sep 3
Matthew Kieber-EmmonsinBetter ProgrammingMemory Bandwidth Optimized Parallel Radix Sort in Metal for Apple M1 and BeyondA sort shader in Metal for use on primitive data typesMar 3, 2023Mar 3, 2023
Matthew Kieber-EmmonsinBetter ProgrammingEfficient Parallel Prefix Sum in Metal for Apple M1Comparison of optimal M1 GPU scan primitives to vectorized CPU performanceSep 9, 20211Sep 9, 20211
Matthew Kieber-EmmonsinBetter ProgrammingOptimizing Parallel Reduction in Metal for Apple M1Exploring how an optimal implementation should approach the memory bandwidth of the architectureMar 15, 20212Mar 15, 20212