Kotlin ViewModel
2026/5/11 18:01:40
rms_norm_fp8_noweight_fp16:计算流程与优化void rms_norm_fp8_noweight_fp16(const __half *x, __nv_fp8_e4m3 *out, int seq_len, int dim, const float *d_scale, cudaStream_t stream) { rms_norm_fp8_noweight_kernel<<<seq_len, 256, 0, stream>>>(x, out, seq_len, dim, d_scale); }// ── RMSNorm → FP8 with d_scale, no weight (norm weight baked into GEMM// weights) ── Verbatim copy of production rms_norm_fp8_static_k. NOT