带fp8激活量化的RMSNorm算子手撕
2026/5/11 15:27:54 网站建设 项目流程

rms_norm_fp8_noweight_fp16:计算流程与优化

完整代码

void rms_norm_fp8_noweight_fp16(const __half *x, __nv_fp8_e4m3 *out, int seq_len, int dim, const float *d_scale, cudaStream_t stream) { rms_norm_fp8_noweight_kernel<<<seq_len, 256, 0, stream>>>(x, out, seq_len, dim, d_scale); }
// ── RMSNorm → FP8 with d_scale, no weight (norm weight baked into GEMM// weights) ── Verbatim copy of production rms_norm_fp8_static_k. NOT

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询