70 headline runs (4 engine pairs); 0 additional runs in the full-engine appendix. Devices observed: L40S (g6e.12xlarge). Each cell measures how often the MoE router's top-k experts disagree between two checkpoints / engines.
Research question
Does FP8 quantization preserve the routing decisions of a Mixture-of-Experts model? If gradients flow through the router gate, set-level routing changes show up as different parameter updates even when top-1 is stable.
What we observed
Headline: FP8 quantization barely shifts the dominant routed expert (top-1 flip rate 8.3% — about 1 in 12 (token, layer) pairs) but reorders the rest of the top-k almost everywhere (100.0% of tokens have set-level disagreement on at least one MoE layer). The top-1 looks robust; the top-k set is noise. The full-engine data is in the appendix below.
Next: Multi-prompt run (this is n=1 currently); per-layer stratified analysis to see if early/late layers differ.
Green: ≤ 5% (top-1 router stable). Amber: 5–15%. Red: > 15% (the dominant expert often changes between engines).
Left: how often the dominant routed expert changes between rollout and trainer. Right: how often any of the top-k experts changes on at least one layer. The gap between the two is the headline finding — quantization noise barely shifts the top-1 but reorders the rest of the top-k.
| rollout | trainer | count | mean flip rate | mean token disagreement | worst layer flip | layers | tokens |
|---|---|---|---|---|---|---|---|
| hermes-qwen3-30b-a3b-bf16 | fsdp-bf16-moe | 19 | 0.0601 | 0.9976 | 0.3333 | 48 | 703 |
| hermes-qwen3-30b-a3b-bf16 | megatron-bf16-moe | 19 | 0.0572 | 0.9972 | 0.3750 | 48 | 703 |
| hermes-qwen3-30b-a3b-fp8 | fsdp-bf16-moe | 16 | 0.0829 | 1.0000 | 0.4167 | 48 | 685 |
| hermes-qwen3-30b-a3b-fp8 | megatron-bf16-moe | 16 | 0.0834 | 1.0000 | 0.4167 | 48 | 685 |
| run_id | model | engines | device | tokens | layers | top_k | experts | flip rate | token disagreement | layer min | layer mean | layer max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-code-fix-factorial-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0556 | 1.0000 | 0.0000 | 0.0556 | 0.2500 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-code-fix-factorial-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 54 | 48 | 8 | 128 | 0.0556 | 1.0000 | 0.0000 | 0.0556 | 0.1296 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-code-fix-factorial-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 37 | 48 | 8 | 128 | 0.0495 | 1.0000 | 0.0000 | 0.0495 | 0.1351 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-cwc-repo-quality-loop-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 78 | 48 | 8 | 128 | 0.0831 | 1.0000 | 0.0000 | 0.0831 | 0.2051 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-cwc-repo-quality-loop-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 68 | 48 | 8 | 128 | 0.0423 | 1.0000 | 0.0000 | 0.0423 | 0.1176 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 39 | 48 | 8 | 128 | 0.0700 | 1.0000 | 0.0000 | 0.0700 | 0.2308 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 45 | 48 | 8 | 128 | 0.0963 | 1.0000 | 0.0000 | 0.0963 | 0.2667 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 33 | 48 | 8 | 128 | 0.0499 | 1.0000 | 0.0000 | 0.0499 | 0.1818 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-math-tokens-per-gpu-day-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0577 | 0.9714 | 0.0000 | 0.0577 | 0.2286 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-math-tokens-per-gpu-day-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 8 | 48 | 8 | 128 | 0.0625 | 1.0000 | 0.0000 | 0.0625 | 0.2500 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-mixed-research-and-math-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 44 | 48 | 8 | 128 | 0.0616 | 1.0000 | 0.0000 | 0.0616 | 0.2500 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-mixed-research-and-math-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0500 | 1.0000 | 0.0000 | 0.0500 | 0.2000 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-mixed-research-and-math-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 36 | 48 | 8 | 128 | 0.0712 | 1.0000 | 0.0000 | 0.0712 | 0.2222 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-mixed-research-and-math-turn3 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 30 | 48 | 8 | 128 | 0.0951 | 1.0000 | 0.0000 | 0.0951 | 0.2667 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-mixed-research-and-math-turn4 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 6 | 48 | 8 | 128 | 0.0556 | 1.0000 | 0.0000 | 0.0556 | 0.3333 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-no-op-trivia-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 22 | 48 | 8 | 128 | 0.0350 | 1.0000 | 0.0000 | 0.0350 | 0.1364 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-plan-3step-experiment-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 28 | 48 | 8 | 128 | 0.0543 | 1.0000 | 0.0000 | 0.0543 | 0.1786 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-plan-3step-experiment-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0660 | 1.0000 | 0.0000 | 0.0660 | 0.2083 |
| hermes-qwen3-30b-a3b-bf16-vs-fsdp-bf16-moe-plan-3step-experiment-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 57 | 48 | 8 | 128 | 0.0300 | 0.9825 | 0.0000 | 0.0300 | 0.0877 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-code-fix-factorial-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0495 | 1.0000 | 0.0000 | 0.0495 | 0.2083 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-code-fix-factorial-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 54 | 48 | 8 | 128 | 0.0494 | 1.0000 | 0.0000 | 0.0494 | 0.1111 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-code-fix-factorial-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 37 | 48 | 8 | 128 | 0.0535 | 1.0000 | 0.0000 | 0.0535 | 0.1892 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-cwc-repo-quality-loop-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 78 | 48 | 8 | 128 | 0.0756 | 1.0000 | 0.0000 | 0.0756 | 0.1923 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-cwc-repo-quality-loop-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 68 | 48 | 8 | 128 | 0.0423 | 1.0000 | 0.0000 | 0.0423 | 0.1176 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 39 | 48 | 8 | 128 | 0.0652 | 1.0000 | 0.0000 | 0.0652 | 0.2308 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 45 | 48 | 8 | 128 | 0.0912 | 1.0000 | 0.0000 | 0.0912 | 0.2444 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 33 | 48 | 8 | 128 | 0.0543 | 1.0000 | 0.0000 | 0.0543 | 0.1818 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-math-tokens-per-gpu-day-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0446 | 1.0000 | 0.0000 | 0.0446 | 0.1714 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-math-tokens-per-gpu-day-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 8 | 48 | 8 | 128 | 0.0625 | 1.0000 | 0.0000 | 0.0625 | 0.3750 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-mixed-research-and-math-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 44 | 48 | 8 | 128 | 0.0549 | 1.0000 | 0.0000 | 0.0549 | 0.2273 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-mixed-research-and-math-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0399 | 1.0000 | 0.0000 | 0.0399 | 0.2286 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-mixed-research-and-math-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 36 | 48 | 8 | 128 | 0.0700 | 1.0000 | 0.0000 | 0.0700 | 0.1944 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-mixed-research-and-math-turn3 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 30 | 48 | 8 | 128 | 0.0806 | 1.0000 | 0.0000 | 0.0806 | 0.2000 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-mixed-research-and-math-turn4 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 6 | 48 | 8 | 128 | 0.0660 | 1.0000 | 0.0000 | 0.0660 | 0.3333 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-no-op-trivia-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 22 | 48 | 8 | 128 | 0.0303 | 1.0000 | 0.0000 | 0.0303 | 0.0909 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-plan-3step-experiment-turn0 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 28 | 48 | 8 | 128 | 0.0513 | 1.0000 | 0.0000 | 0.0513 | 0.1786 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-plan-3step-experiment-turn1 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0755 | 1.0000 | 0.0000 | 0.0755 | 0.1667 |
| hermes-qwen3-30b-a3b-bf16-vs-megatron-bf16-moe-plan-3step-experiment-turn2 | Qwen/Qwen3-30B-A3B | hermes-qwen3-30b-a3b-bf16 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 57 | 48 | 8 | 128 | 0.0303 | 0.9474 | 0.0000 | 0.0303 | 0.0877 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-code-fix-factorial-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0642 | 1.0000 | 0.0000 | 0.0642 | 0.2083 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-code-fix-factorial-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 54 | 48 | 8 | 128 | 0.0745 | 1.0000 | 0.0185 | 0.0745 | 0.2037 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-code-fix-factorial-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 37 | 48 | 8 | 128 | 0.0715 | 1.0000 | 0.0000 | 0.0715 | 0.1622 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-cwc-repo-quality-loop-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 78 | 48 | 8 | 128 | 0.1084 | 1.0000 | 0.0000 | 0.1084 | 0.2436 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-cwc-repo-quality-loop-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 68 | 48 | 8 | 128 | 0.0640 | 1.0000 | 0.0000 | 0.0640 | 0.1912 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 39 | 48 | 8 | 128 | 0.0999 | 1.0000 | 0.0000 | 0.0999 | 0.2308 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 45 | 48 | 8 | 128 | 0.1088 | 1.0000 | 0.0000 | 0.1088 | 0.3333 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-cwc-repo-which-hook-commits-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 33 | 48 | 8 | 128 | 0.0657 | 1.0000 | 0.0000 | 0.0657 | 0.2121 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-math-tokens-per-gpu-day-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0690 | 1.0000 | 0.0000 | 0.0690 | 0.2000 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-math-tokens-per-gpu-day-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 8 | 48 | 8 | 128 | 0.1276 | 1.0000 | 0.0000 | 0.1276 | 0.3750 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-mixed-research-and-math-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 44 | 48 | 8 | 128 | 0.0777 | 1.0000 | 0.0000 | 0.0777 | 0.2045 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-mixed-research-and-math-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0643 | 1.0000 | 0.0000 | 0.0643 | 0.2000 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-mixed-research-and-math-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 36 | 48 | 8 | 128 | 0.0926 | 1.0000 | 0.0000 | 0.0926 | 0.2222 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-mixed-research-and-math-turn3 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 12 | 48 | 8 | 128 | 0.1198 | 1.0000 | 0.0000 | 0.1198 | 0.4167 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-plan-3step-experiment-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 28 | 48 | 8 | 128 | 0.0670 | 1.0000 | 0.0000 | 0.0670 | 0.2500 |
| hermes-qwen3-30b-a3b-fp8-vs-fsdp-bf16-moe-plan-3step-experiment-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> fsdp-bf16-moe | L40S (g6e.12xlarge) | 109 | 48 | 8 | 128 | 0.0518 | 1.0000 | 0.0092 | 0.0518 | 0.1376 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-code-fix-factorial-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 24 | 48 | 8 | 128 | 0.0642 | 1.0000 | 0.0000 | 0.0642 | 0.2083 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-code-fix-factorial-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 54 | 48 | 8 | 128 | 0.0637 | 1.0000 | 0.0000 | 0.0637 | 0.1296 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-code-fix-factorial-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 37 | 48 | 8 | 128 | 0.0800 | 1.0000 | 0.0000 | 0.0800 | 0.1622 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-cwc-repo-quality-loop-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 78 | 48 | 8 | 128 | 0.1098 | 1.0000 | 0.0000 | 0.1098 | 0.2308 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-cwc-repo-quality-loop-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 68 | 48 | 8 | 128 | 0.0604 | 1.0000 | 0.0000 | 0.0604 | 0.1912 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 39 | 48 | 8 | 128 | 0.1042 | 1.0000 | 0.0000 | 0.1042 | 0.2564 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 45 | 48 | 8 | 128 | 0.1097 | 1.0000 | 0.0000 | 0.1097 | 0.2667 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-cwc-repo-which-hook-commits-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 33 | 48 | 8 | 128 | 0.0701 | 1.0000 | 0.0000 | 0.0701 | 0.1818 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-math-tokens-per-gpu-day-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0649 | 1.0000 | 0.0000 | 0.0649 | 0.2000 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-math-tokens-per-gpu-day-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 8 | 48 | 8 | 128 | 0.1172 | 1.0000 | 0.0000 | 0.1172 | 0.3750 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-mixed-research-and-math-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 44 | 48 | 8 | 128 | 0.0795 | 1.0000 | 0.0000 | 0.0795 | 0.2727 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-mixed-research-and-math-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 35 | 48 | 8 | 128 | 0.0673 | 1.0000 | 0.0000 | 0.0673 | 0.2286 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-mixed-research-and-math-turn2 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 36 | 48 | 8 | 128 | 0.1019 | 1.0000 | 0.0000 | 0.1019 | 0.2500 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-mixed-research-and-math-turn3 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 12 | 48 | 8 | 128 | 0.1285 | 1.0000 | 0.0000 | 0.1285 | 0.4167 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-plan-3step-experiment-turn0 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 28 | 48 | 8 | 128 | 0.0632 | 1.0000 | 0.0000 | 0.0632 | 0.1786 |
| hermes-qwen3-30b-a3b-fp8-vs-megatron-bf16-moe-plan-3step-experiment-turn1 | Qwen/Qwen3-30B-A3B-FP8 | hermes-qwen3-30b-a3b-fp8 -> megatron-bf16-moe | L40S (g6e.12xlarge) | 109 | 48 | 8 | 128 | 0.0495 | 1.0000 | 0.0092 | 0.0495 | 0.1101 |
top-1 flip rateAlias of `router_flip_rate` — see that entry for cap/drop.
top-k set disagreementAlias of `token_expert_disagreement_rate` — see that entry for cap/drop.
router_flip_rateFraction of (token, layer) pairs where MoE top-1 routed expert differs between rollout and trainer.
Top-1 stability under quantization or precision changes. Low values (<5%) suggest the dominant expert is robust. Same as the top-1 flip rate shown on the router dashboard.
token_expert_disagreement_rateFraction of tokens with at least one MoE layer where the top-k *set* of routed experts differs.
More sensitive than top-1 flip — quantization noise often shuffles the lower-ranked experts even when the dominant one is stable. Visible in the gap between this and the top-1 flip rate.