Apple Researchers Introduce Parallel Speculative Sampling (PaSS): A Leap in Language Model Efficiency and Scalability
EPFL researchers, in collaboration with Apple, have introduced a new approach to speculative sampling called Parallel Speculative Sampling (PaSS). This new approach allows for the drafting of multiple tokens simultaneously using a single model, combining the benefits of auto-regressive generation and speculative sampling. The PaSS method was evaluated on text and code completion tasks,…