Matthew Kieber-EmmonsinBetter ProgrammingMemory Bandwidth Optimized Parallel Radix Sort in Metal for Apple M1 and BeyondA sort shader in Metal for use on primitive data types20 min read·Mar 3, 2023----
Matthew Kieber-EmmonsinBetter ProgrammingEfficient Parallel Prefix Sum in Metal for Apple M1Comparison of optimal M1 GPU scan primitives to vectorized CPU performance20 min read·Sep 9, 2021--1--1
Matthew Kieber-EmmonsinBetter ProgrammingOptimizing Parallel Reduction in Metal for Apple M1Exploring how an optimal implementation should approach the memory bandwidth of the architecture18 min read·Mar 15, 2021--2--2