The Problem: Why FP8 Destroys Standard LoRA
NVIDIA's Hopper and Blackwell architectures support native FP8 computation — 2–3× faster training at a fraction of the memory cost. Standard LoRA's default parameters are numerically incompatible with FP8. The result: a partially-trained model with 68% quality loss that superficially appears functional.
Standard LoRA FP8 val loss: 3.3412. Quality degradation: 68.2%. Unusable.
The Koščák Gamma Theorem
The theorem derives the exact stability constraint that standard LoRA violates. KSS-LoRA is designed to satisfy it — reducing FP8 degradation from 68% to 5.2%. The same constraint extends to FP4 on B300/Blackwell Ultra, which KSS-LoRA also satisfies by design.
| Method | BF16 Loss (A100) | FP8 Loss (H200) | Degradation |
|---|---|---|---|
| Standard LoRA | 1.9861 | 3.3412 | 68.2% |
| KSS-LoRA | 1.5051 | 1.5831 | 5.2% |
Full theorem and derivation in preprint (Q2 2026).