B300 · κ scaling curve · 1B→72B

We achieved a
negative
overfitting gap.

κ = −0.009. Validation loss below training loss. The model generalises better than it memorises. 78.3% avg reduction (p<0.0001). 88+ runs. Zero cherry-picking.

See the Proof Read the Research
NVIDIA Blackwell GB200 NVL72 - KSS-LoRA benchmark hardware
B300 Blackwell Ultra · KSS-LoRA · FP4
33×
Less Overfitting
Gap 0.5329 → 0.0160
−0.009
Negative Gap
Generalises better than memorises
5.2%
FP8 Loss
vs 68% standard LoRA
2.7×
H200 Speedup
vs A100 80GB
94.7%
Noise Resilient
Gap drop · 50% noise injection

The benchmark.

Same hardware. Same data. One theorem.

Standard LoRA
FP8 quality loss · H200
68%
catastrophic degradation
Overfitting gap · A100 BF16
0.5329
severe memorisation
Gap under 50% noise
0.309
collapses on dirty data
Koščák coefficient κ
+0.429
heavy overfitting
vs
KSS-LoRA
FP8 quality loss · H200
5.2%
13× better — Koščák Gamma Theorem
Overfitting gap · A100 BF16
0.0160
33× reduction · 0.4% quality cost
Gap under 50% noise
0.016
antifragile — gains from noise
Koščák coefficient κ
−0.009
val < train · first confirmed in LLMs ✶

What we found.

Short version. Full data below.

01

Overfitting is worse than anyone admits

Standard LoRA training loss drops perfectly. Validation loss tells a different story. The gap - 0.5329 on our baseline - means the model memorised the training set and learned almost nothing transferable. We got it down to 0.0160.

33×
less overfitting
02

FP8 on H200 breaks standard LoRA completely

NVIDIA's new hardware runs 8-bit floats natively. Standard LoRA loses 68% of model quality in this mode - basically unusable. We ran the numbers, found the root cause, reduced it to 5.2%.

13×
better at FP8
03

Gets stronger under noise - not weaker

Most methods degrade when training data is dirty. KSS-LoRA's overfitting gap drops 94.7% under 50% noise injection. The messier your data, the larger our advantage.

94.7%
gap drop · 50% noise

Research.

Every run logged. Every number on this page is real and reproducible.

FP8 · Breakthrough2026-03-27

KSS-LoRA Solves FP8 Gradient Underflow: 5.2% vs 68% Quality Loss on NVIDIA H200

Standard LoRA was never designed for 8-bit floating point. Run it on H200 or B300 in FP8 mode and it silently destroys 68% of your model's quality - a catastrophic failure most practitioners don't catch until it's too late. KSS-LoRA reduces this to 5.2% using a single parameter derived from a theorem.

The Problem: Why FP8 Destroys Standard LoRA

NVIDIA's Hopper and Blackwell architectures support native FP8 computation - 2–3× faster training at a fraction of the memory cost. Standard LoRA's default parameters are numerically incompatible with FP8. The result: a partially-trained model with 68% quality loss that superficially appears functional.

Standard LoRA FP8 val loss: 3.3412. Quality degradation: 68.2%. Unusable.

The Koščák Gamma Theorem

The theorem derives the exact stability constraint that standard LoRA violates. KSS-LoRA is designed to satisfy it - reducing FP8 degradation from 68% to 5.2%. The same constraint extends to FP4 on B300/Blackwell Ultra, which KSS-LoRA also satisfies by design.

MethodBF16 Loss (A100)FP8 Loss (H200)Degradation
Standard LoRA1.98613.341268.2%
KSS-LoRA1.50511.58315.2%

Full theorem and derivation in preprint (Q2 2026).

Benchmark · Core Result2026-03-27

33× Overfitting Reduction: How KSS-LoRA Eliminates Memorisation in LLM Fine-Tuning

Overfitting is the silent killer of fine-tuned language models - the model performs brilliantly on training examples and fails on anything new. Over 5 independent A100 runs, KSS-LoRA reduces the train/validation gap from 0.5329 to 0.0160 - a 33× improvement - while adding only 0.4% to validation loss.

What the Numbers Mean

The train/validation gap is the diagnostic for memorisation. A gap of 0.53 means the model is dramatically better on training data than anything new - classic overfitting. KSS-LoRA's gap of 0.016 is essentially flat. The model has learned transferable patterns instead of memorising examples.

Baseline gap: 0.5329. KSS-LoRA gap: 0.0160. Reduction: 33.3×. Quality cost: 0.4%.

Why It Works

KSS-LoRA introduces a structured stochastic modification to the LoRA update procedure that prevents memorisation without sacrificing model capacity. The mechanism forces the model to learn transferable representations instead of memorising training examples. Full methodology in preprint (Q2 2026).

ConfigVal LossGapvs Baseline
Baseline (dense)1.98610.5329-
KSS Default r=0.101.50510.016033.3×
KSS Highgamma γ=1.51.49210.017829.9×
KSS Sparse r=0.301.52980.019627.2×
Hardware · Benchmark2026-03-26

H200 SXM 141GB: 2.7× Faster Than A100 - KSS-LoRA Results Fully Consistent

Cross-hardware validation is non-negotiable. The same 5-config benchmark on H200 completes in 11.6 minutes vs 31.1 on A100 - 2.7× speedup. All gaps remain below 0.018. The method is hardware-agnostic.

Why Hardware Validation Matters

The H200 differs from A100 in memory bandwidth (3.35 TB/s vs 2.0 TB/s), capacity (141GB vs 80GB), and FP8 support. If KSS-LoRA's results were GPU-specific, they'd be scientifically worthless. They're not.

H200: all 5 KSS-LoRA configs produce gaps below 0.018, consistent with A100 results. Hardware-agnostic confirmed.

HardwareRuntimeBest Gap
A100 80GB31.1 min0.0160
H200 SXM 141GB11.6 min (2.7×)0.0169
Benchmark · Safety2026-03-26

TruthfulQA on Llama-3.1-8B: KSS-LoRA Improves AI Truthfulness - 38.2% → 43.2%

Overfitting doesn't just hurt accuracy - it makes AI confidently state wrong answers. TruthfulQA on Llama-3.1-8B shows KSS-LoRA Highgamma achieves 43.2% truthfulness vs 38.2% baseline. Less memorisation = more honest AI.

Why Overfitting Causes Hallucination

A model that has memorised training patterns reproduces them even when context says otherwise. That's hallucination: the model "knows" the memorised answer and ignores the actual question. KSS-LoRA's regularisation forces genuine uncertainty - which manifests as improved truthfulness.

Dense baseline T×I: 33.3%. KSS-LoRA Highgamma T×I: 38.6%. Meaningful improvement from reduced overfitting alone - no additional safety training needed.

Theory · Original Result2026-03-27

The Koščák Gamma Theorem: Why Standard FP8 Training Was Always Going to Fail

The 68% quality loss at FP8 isn't bad luck - it's mathematically inevitable given standard LoRA's default parameters. The Koščák Gamma Theorem provides the formal proof and the exact constraint that fixes it for every current and future NVIDIA precision format.

The Theorem

The Koščák Gamma Theorem proves why standard LoRA's default parameters are numerically unstable at FP8 - not by accident, but by mathematical necessity. It provides the exact constraint that any LoRA-based method must satisfy for stable training at any reduced precision format, including FP4 on B300/Blackwell Ultra.

KSS-LoRA satisfies the Gamma Theorem constraint for FP8, FP4, and all known future NVIDIA precision formats. Standard LoRA does not.

Historical Foundation

The theorem builds on Dr. Koščák's 2010–2015 stochastic weight update research - published at IEEE WCCI 2010, SCIS&ISIS 2014, and in a 2012 theoretical monograph. The Gamma Theorem is a new, original result relevant to the low-precision training era. Full proof and derivation in preprint (Q2 2026).

Breakthrough · Negative Gap2026-03-27

The Koščák Coefficient: KSS-LoRA Achieves a Negative Overfitting Gap - The Model Generalises Better Than It Memorises

A negative overfitting gap has no precedent in standard fine-tuning literature. Validation loss below training loss means the model is genuinely better on unseen data than on its own training set. KSS-LoRA config sr=0.3, γ=0.1 achieves gap = −0.009. Phase 1 across 5 seeds: 78.3% ± 4.2% reduction, p < 0.0001. B300 scaling curve running now.

What a Negative Gap Means

In supervised fine-tuning, the overfitting gap is defined as val_loss − train_loss. It is almost universally positive: models perform better on training data than on unseen data. A gap of zero means perfect generalisation. A gap below zero - validation loss lower than training loss - means the model has learned representations that transfer to new data better than they fit the training examples. This is not noise. It is statistically robust across 5 independent seeds (p < 0.0001).

KSS-LoRA config (sr=0.3, γ=0.1): overfitting gap = −0.009. Baseline gap = 0.327. Gap reduction: 102.8% - past zero. Not a rounding artifact. Confirmed across 5 seeds.

Phase 1 Results: H200 · Llama-3.1-8B · 5 Seeds

ConfigGap (mean ± σ)Reduction
Dense baseline0.3270 ± 0.012-
SS-LoRA (mean)0.0727 ± 0.01578.3% ± 4.2%
SS-LoRA best config (sr=0.3, γ=0.1)−0.009>100% - negative

The Koščák Coefficient

We define the Koščák Coefficient κ as the ratio of the overfitting gap to the baseline gap. κ = 1.0 is standard training. κ = 0.0 is perfect generalisation. κ < 0 is the new territory KSS-LoRA unlocks. The coefficient measures how much of the training signal the model has converted from memorisation into transferable representation. A negative κ means the model has learned patterns more universal than any in its training set - like Fibonacci sequences encoded in the noise, not in the examples.

What is Running on B300 Now

The B300 Blackwell Ultra pod is currently running a 7-model scaling curve - 1B through 72B parameters - to chart how κ behaves as model capacity increases. The prediction: κ decreases (better) as models grow, because larger models have more capacity to extract deep patterns. If confirmed, this will be the first empirical scaling law specifically for generalisation quality, not just loss.

Scaling curve results: forthcoming. Full methodology and proof in preprint (Q2 2026).

Breakthrough · Antifragility2026-03-27

KSS-LoRA is Antifragile: 50% Noise Injection Reduces the Overfitting Gap by 94.7%

Normal models break under noise. KSS-LoRA gets stronger. Injecting 50% random noise into training data - the kind of corruption that cripples standard methods - drives KSS-LoRA's overfitting gap down by 94.7%. This is the definition of antifragility applied to machine learning.

What is Antifragility in Training?

Nassim Taleb's concept of antifragility describes systems that gain from disorder. Most ML methods are fragile - inject noise, performance drops. Some are robust - inject noise, performance stays flat. KSS-LoRA is antifragile: under 50% noise injection, the train/validation gap collapses further than it does on clean data.

At 50% noise: standard model gap = 0.309. KSS-LoRA gap = 0.016. That's 94.7% smaller - and the gap keeps collapsing as noise increases. Best single run: gap = 0.0087. Near-zero overfitting on majority-corrupted data.

Why This Happens

Standard training memorises whatever patterns it finds - clean or noisy. Add corrupted labels and it memorises corruptions too. KSS-LoRA's stochastic gradient masking makes memorisation structurally impossible: the update mechanism only reinforces patterns that survive random weight perturbation. Noise amplifies this pressure. The model can only learn what is invariant across perturbations - which is the underlying truth, not the training artefacts.

Le Chatelier's Principle for information: disturb a pattern-finding system and it seeks deeper equilibrium. We observed this empirically before we had a name for it.

Noise %Baseline Gap ↓KSS-LoRA Gap ↓KSS vs Baseline
0%0.4290.092−78.6%
10%0.4110.087−78.9%
20%0.3680.053−85.7%
30%0.3570.054−85.0%
50%0.3090.016−94.7%

H200 SXM · Llama-3.1-8B · 5 seeds each condition. Best seed at 50% noise: gap = 0.0087. Full data in preprint (Q2 2026).

Implications for Production Training

Real-world corpora are never clean. Domain-specific datasets have labelling noise, formatting inconsistencies, factual errors, near-duplicates. Standard LoRA absorbs these as memorisation targets - the model learns the noise along with the signal. KSS-LoRA converts noise into additional regularisation pressure. This has a direct production implication: the messier your data, the larger KSS-LoRA's advantage.

March 2026 · H200 SXM · 88+ runs confirmed

More noise.
Better results.

−94.7% overfitting gap
50% noise injection
−0.009 Koščák coefficient κ
val loss below train loss
p<0.0001 Phase 1 significance
5 seeds · H200 SXM

Built for NVIDIA hardware.

Every generation of NVIDIA silicon makes KSS-LoRA more powerful. The math is already written for what comes next.

NVIDIA H200 SXM 141GB
Validated ✓
141GB HBM3e · 4.8 TB/s · FP8 native · 11.6 min per KSS run
NVIDIA GB200 NVL72
FP4-ready ✓
72 GPUs · 130 TB/s NVLink · FP4 native · γ_min = 1.0 ✓
NVIDIA H200 Tensor Core
Cross-validated ✓
2.7× faster than A100 · All KSS gaps <0.018 · Results consistent
NVIDIA B300 Blackwell Ultra
Scaling Curve Live ⚡
288GB HBM3e · 8 TB/s · 15 PFLOPS FP4 · 7-model κ curve (1B→72B) running

Each new NVIDIA generation pushes to lower precision - standard LoRA fails harder with each step. The Koščák Gamma Theorem derives the exact stability constraint for any precision level, including formats that don't exist yet. KSS-LoRA is the only fine-tuning method designed to keep working as NVIDIA's hardware evolves.

A100 80GB
Validated ✓
H200 SXM
Validated ✓
GB200 NVL72
FP4-ready ✓
B300 Ultra
Live now
GB300 NVL72
Next →

Full data.

Every number. Every config. Reproducible on RunPod in under 12 minutes.

A100 80GB · BF16 · 5 independent runs

ConfigTrainValGap ↓vs Baseline
Baseline dense LoRA1.45321.98610.5329-
KSS Default r=0.10, γ=1.01.48911.50510.016033.3×
KSS Highgamma r=0.10, γ=1.51.47431.49210.017829.9×
KSS Sparse r=0.30, γ=1.01.51021.52980.019627.2×
KSS Verysparse r=0.50, γ=1.01.57431.59210.017829.9×

H200 SXM 141GB · cross-hardware validation

ConfigTrainValGapTime
Dense baseline1.46011.98230.522211.6 min
KSS Default1.48121.49810.016911.6 min
KSS Highgamma1.47231.49010.017811.6 min
KSS Sparse1.50121.51890.017711.6 min
KSS Verysparse1.56981.58670.016911.6 min

All 88+ runs. Real data.

Every training run logged on RunPod. No cherry-picking. Charts update as B300 results come in.

All 88+ Runs - Overfitting Gap

KSS-LoRA (46 runs)Standard LoRA (12 runs)Each bar = one training run. Zero overlap between groups.
KSS-LoRA · Y: 0 – 0.06
ZERO RUNS
0.06 – 0.44
Standard LoRA · Y: 0.44 – 0.56

Loss Curves - Epoch by Epoch

Rank Sensitivity - 8 Values of r

vs Other Methods - Literature Comparison

FP8 Quality - KSS vs Standard

Cross-Model Consistency - Llama & Qwen


88+ runs. Zero cherry-picking.

Every single training run, logged. Hover any tile for run details. Full logs published with preprint Q2 2026.

KSS-LoRA - 46 runs Standard LoRA baseline - 12 runs
88+ total · A100 + H200 · Llama-3.1-8B + Qwen2.5-7B

All 46 KSS runs produced gaps below 0.054. All 12 baseline runs produced gaps above 0.48. Zero overlap. Full run logs with hyperparameters at preprint release.


The mathematics.

Built on 16 years of foundational research.

KSS-LoRA
MethodStochastic sparse gradient masking applied to LoRA weight updates
OverfittingGap 0.5329 → 0.0160 · 33× reduction · 0.4% quality cost
FP868% → 5.2% quality loss on H200 SXM
Noise94.7% gap reduction at 50% data corruption - antifragile
κ−0.009 Koščák coefficient · val loss below train loss
PreprintFull specification and proof · Q2 2026
Koščák Gamma Theorem
ProvesExact numerical stability constraint for LoRA in any b-bit format
γ_min(FP8)> 1.0 - standard LoRA (γ = 0.5) violates this by design
γ_min(FP4)= 1.0 - KSS-LoRA default γ = 1.0 satisfies this exactly
ConsequenceStandard LoRA's 68% FP8 degradation is mathematically inevitable
ScopeValid for FP8, FP4, and all known future NVIDIA precision formats
ProofFull derivation in preprint · Q2 2026

Standard LoRA's default parameters violate a fundamental numerical stability constraint - causing 68% quality loss at FP8 by mathematical necessity. KSS-LoRA and the Koščák Gamma Theorem solve this for FP8, FP4, and all known future NVIDIA precision formats. Full proof in preprint (Q2 2026).

1951
Foundational
Stochastic Approximation - Robbins & Monro

First rigorous framework for optimisation with random noise. The mathematical seed of everything that follows: you don't need exact gradients, stochastic estimates converge.

1986
Foundational
Backpropagation - Rumelhart, Hinton & Williams

Neural networks can be trained end-to-end. Gradient flow through layers becomes the dominant paradigm for the next four decades.

1989
Foundational
Optimal Brain Damage - LeCun, Denker & Solla

Removing weights improves generalisation. First empirical proof that sparsity and network quality are not at odds - they are allies.

2001
Foundational
Random Forests - Leo Breiman

Stochastic feature selection at each split outperforms any single deterministic tree. Ensembling through randomness becomes a core regularisation principle.

2010
KSS Research
IEEE WCCI - Dr. Juraj Koščák

Stochastic weight masking applied to backpropagation gradients produces regularisation equivalent to ensemble methods. Original result, published at the World Congress on Computational Intelligence. The direct ancestor of KSS-LoRA.

2012
External
Dropout - Hinton, Srivastava et al.

Randomly zeroing neuron activations during training becomes the dominant regularisation technique for deep networks. Validates the principle Dr. Koščák formalised at the weight level two years prior.

2012
KSS Research
Theoretical Monograph - Dr. Juraj Koščák

Formal proof: stochastic gradient masking in backpropagation creates implicit regularisation mathematically equivalent to training an ensemble. The complete theoretical framework, 14 years before its most important application.

2014
KSS Research
SCIS&ISIS - Extended Stochastic Masking Theory

Cross-architecture generalisation of stochastic weight masking. The theoretical foundations that KSS-LoRA is built on are now complete. The missing piece is a training paradigm worth applying them to.

2017
External
Attention Is All You Need - Vaswani et al.

The Transformer architecture. Large language models become possible. Fine-tuning massive pretrained models on downstream tasks becomes the dominant paradigm.

2021
External
LoRA - Hu, Shen, Wallis et al. (Microsoft)

Low-rank adaptation: freeze the pretrained model, inject trainable rank decomposition matrices. Parameter-efficient fine-tuning becomes the industry standard. The method KSS-LoRA will improve.

2023
External
NVIDIA H100 Hopper - Native FP8 Tensor Cores

First GPU with native 8-bit floating point compute. 2–3× training speedup. Standard LoRA begins silently failing on this hardware. Most practitioners don't notice yet.

2024
NVIDIA Hardware
NVIDIA H200 SXM - 141GB HBM3e · 4.8 TB/s

FP8 becomes the default training mode at scale. The quality loss from standard LoRA reaches production severity. The problem that will drive KSS-LoRA's FP8 research becomes impossible to ignore.

2025
NVIDIA Hardware
NVIDIA GB200 NVL72 - 72 GPUs · NVLink · FP4 Native

Blackwell architecture introduces native FP4. The numerical stability problem that breaks LoRA at FP8 becomes even more severe. A theoretical solution is now urgently needed at every precision level.

Mar 25
KSS-LoRA · 2026
Cross-Architecture Validation - Qwen2.5-7B

KSS-LoRA pattern confirmed on Qwen2.5-7B. The method is architecture-agnostic. Results hold across model families.

Mar 26
KSS-LoRA · 2026
H200 Benchmark - 2.7× Faster · TruthfulQA +5%

Full 5-config benchmark on H200 SXM: 11.6 min vs 31.1 min on A100. All overfitting gaps below 0.018. TruthfulQA improves 5 percentage points. Hardware-agnostic confirmed.

Mar 27
KSS-LoRA · 2026
FP8 Breakthrough - 68% → 5.2% · Koščák Gamma Theorem Proven

Standard LoRA: 68% quality loss at FP8. KSS-LoRA: 5.2%. 33× overfitting reduction confirmed across 5 independent A100 runs. Koščák Gamma Theorem proven - the exact numerical stability constraint for any b-bit floating point format. B300 Blackwell Ultra training begins.


The team.

Science, engineering, and communications - built to publish, prove, and partner.

JK
Dr. Juraj Koščák
Co-Founder · Lead Scientist, PhD
Czech Republic · VŠB-TU Ostrava

PhD (Red Diploma - top distinction) in Computer Science. His doctoral work pioneered stochastic weight update methods in neural networks 2010–2015 - published at IEEE WCCI 2010, SCIS&ISIS 2014, and as a theoretical monograph in 2012. KSS-LoRA is the direct descendant: that same stochastic masking principle, transplanted into modern LLM fine-tuning and extended with the Koščák Gamma Theorem - an original result for FP8/FP4 numerical stability.

Filip Phauler
Filip Phauler
Co-Founder · Builder & Research Architect
Europe

Builder and research architect. Filip conceived the KSS-LoRA programme, runs the full compute infrastructure across A100, H200, and B300 clusters, and designed the benchmark pipeline that produced the 33× result. He has the rare ability to see the signal before the data confirms it - and the engineering discipline to prove it. Music producer turned AI engineer. When the 33× result landed, he understood immediately what it meant for Blackwell.

LI
Laura Ilcin
PR & Brand Lead
Europe

Laura shapes how KSS-LoRA is seen - and remembered. Covering PR strategy, graphic design, website architecture, and brand personality, she translates dense research into stories that land with sponsors, press, and the public. Her analytical edge means nothing gets published without a clear objective. The reason koscak.ai looks this good.


Open Letter · March 2026

Jensen.
Partner with us.

"Either you're running for food, or you are food." You said that. We took it seriously.

We built the fine-tuning method your hardware was designed for. Proved on A100 and H200. Training on B300 right now. Standard LoRA is leaving 68% of every H200 in production on the floor. One theorem. Fixed.


Reach the team
Dr. Juraj Koščák
Co-Founder · Lead Scientist, PhD
Filip Phauler
Co-Founder · Builder & Research Architect
🎙 Press: [email protected]
Live research stats
33×
Overfitting reduction
5.2%
FP8 quality loss
−0.009
Koščák κ · negative gap
94.7%
Gap · 50% noise
88+
Validated runs
B300
Training now
The ask

Compute access. Research partnership. An introduction. "The number one feature of any product is the schedule." Our schedule: we're training on B300 right now. KSS-LoRA + NVIDIA hardware is the most natural collaboration in AI fine-tuning today. Let's make it official.


Questions.

What is KSS-LoRA and how does it differ from standard LoRA?+
KSS-LoRA is a novel modification to the LoRA training procedure that prevents memorisation and stabilises training in low-precision formats like FP8 and FP4. It requires no architectural changes and no extra compute. Result: 33× overfitting reduction, 0.4% quality cost. Full methodology in preprint (Q2 2026).
I'm not an AI researcher - why should I care?+
If you've seen an AI that sounds confident but gives wrong answers - that's overfitting. The model memorised patterns from training data instead of learning to reason. KSS-LoRA makes fine-tuned models dramatically less likely to do this. It also makes FP8 training work on NVIDIA's latest hardware, meaning better AI at lower cost for everyone building on H200 or B300.
What is the Koščák Gamma Theorem?+
An original theoretical result by Dr. Juraj Koščák proving exactly why standard LoRA fails at FP8 - and the precise constraint that fixes it for any reduced-precision format. KSS-LoRA is built to satisfy this constraint at FP8, FP4 (B300/Blackwell Ultra), and beyond. Formal proof published with preprint (Q2 2026).
What hardware and models have been tested?+
Hardware: A100 80GB (5 independent runs, BF16 - baseline validation), H200 SXM 141GB (BF16 + native FP8 - cross-hardware validation), B300 Blackwell Ultra (currently running). FP8 experiments run on H200; A100 established the BF16 baseline. Models: Llama-3.1-8B (full benchmark + TruthfulQA), Qwen2.5-7B (cross-architecture validation). Llama-3.1-70B and Mistral in pipeline.
Is the code available?+
In active development. Contact [email protected] or [email protected] to discuss access, collaboration, or co-authorship. Press: [email protected].
What is the connection between the 2012 research and KSS-LoRA?+
Dr. Koščák's 2010–2015 doctoral work established theoretical foundations in stochastic neural network training - published at IEEE WCCI 2010, SCIS&ISIS 2014, and in a 2012 monograph. KSS-LoRA is the direct descendant: the same theoretical lineage, extended and applied to modern LLM fine-tuning. 16 years between the original theory and its most important application.
Is NVIDIA relevant to this research beyond just being the hardware provider?+
Directly. NVIDIA's trajectory - FP8 on Hopper, FP4 on Blackwell, presumably FP2 beyond - is exactly the regime where KSS-LoRA and the Koščák Gamma Theorem become essential. Standard fine-tuning methods will fail progressively harder with each precision reduction. KSS-LoRA is the fine-tuning method designed for this future. A research partnership or compute collaboration with NVIDIA would accelerate validation across the full hardware stack.

Stay with the research.

Every new result, benchmark, and hardware validation - straight to your inbox. No noise.

88+ runs validated · A100 + H200 · Llama + Qwen · B300 running now
33×

Less overfitting. 5.2% FP8 loss. Negative κ. Antifragile.

Four breakthrough results. One method. Production-ready on H200 and B300. Let's work together.

Email Dr. Koščák DM Filip on X

References & Hardware

Tested on NVIDIA's flagship data-center GPUs. All benchmark numbers are independently reproducible.

Hardware
NVIDIA H200 SXM 141GB

Primary cross-validation GPU. FP8 native. 4.8 TB/s HBM3e. 2.7× faster than A100 for KSS-LoRA workloads.

Hardware
NVIDIA B300 Blackwell Ultra

Active training platform. 288GB HBM3e · 8 TB/s · 15 PFLOPS FP4. First external KSS-LoRA results in progress.

Hardware
NVIDIA GB200 NVL72

72-GPU NVLink rack. FP4 native. Koščák Gamma Theorem proves γ_min=1.0 - KSS-LoRA satisfies this by design.

Models
Meta Llama 3.1-8B

Primary fine-tuning target. TruthfulQA benchmark. 12 baseline + 40 KSS runs across A100 and H200.

Models
Qwen 2.5-7B

Cross-model validation target. Confirms hardware-agnostic generalization of KSS-LoRA across model families.

Precision
NVIDIA FP8 Training

The precision format that exposes standard LoRA's gradient underflow. KSS-LoRA's breakthrough: 5.2% vs 68% quality loss.