PayPal Optimizes Commerce AI with EAGLE3 and Speculative Decoding

Published on April 23, 2026

PayPal’s Commerce Agent has relied on advanced fine-tuning to enhance its operational efficiency. Previously, the integration of the llama3.1-nemotron-nano-8B-v1 model led to notable performance improvements in transaction processing. However, the demand for even faster and more efficient solutions has increased.

The introduction of EAGLE3 marks a significant shift in optimization strategies. This new approach leverages speculative decoding to push the limits of throughput and latency without additional hardware costs. A recent empirical study compared EAGLE3’s performance on NVIDIA NIM using a variety of configurations.

Results demonstrated that using a gamma value of 3 delivered a 22-49% increase in throughput and reduced latency by 18-33%, with stable acceptance rates around 35.5%. Meanwhile, gamma=5 provided minimal benefits, indicating a saturation point in performance gains. LLM-as-Judge evaluations confirmed that the quality of outputs remained uncompromised during these enhancements.

This innovation enables significant cost reductions, allowing PayPal to match or even surpass previous benchmarks with only one H100 GPU—accomplishing up to a 50% reduction in GPU expenses. As a result, PayPal is well-positioned to offer quicker and more affordable services in an increasingly competitive landscape.

Related News