从数据统计到格式处理:SQL聚合与标量函数的实战应用指南
2026/5/16 12:12:03 网站建设 项目流程

1. 为什么SQL函数是数据分析的瑞士军刀

刚入行做数据分析时,我最头疼的就是面对杂乱无章的原始数据。记得第一次接手电商销售报表任务,导出的CSV文件里既有"¥1,299.00"这样的价格字符串,又有"2023/12/31 23:59"这样的时间戳,还有大量NULL值。当时用Python写了上百行清洗代码,直到 mentor 拍了拍我肩膀:"试试SQL函数吧,数据库内置的工具比你想象的强大得多。"

SQL函数就像数据分析师的瑞士军刀,主要分为两大门派:聚合函数(Aggregate)标量函数(Scalar)。聚合函数像是统计局的调查员,专门计算列数据的总体特征,比如平均工资、最高温度;而标量函数更像流水线上的加工工人,对每个数据点进行单独处理,比如把手机号中间四位打星号、将日期格式统一化。

实际工作中最爽的使用姿势,是先用聚合函数抓取宏观指标,再用标量函数微调展示格式。比如计算各区域销售业绩时:

SELECT region, ROUND(SUM(amount)/10000, 2) || '万元' AS 销售额, FORMAT(AVG(delivery_days), 1) || '天' AS 平均配送时长 FROM orders GROUP BY region

这个查询同时用到了SUM、AVG聚合函数,ROUND、FORMAT标量函数,还有字符串连接操作符||。最终报表直接可读,省去了导出到Excel再加工的步骤。

2. 聚合函数:从微观数据到宏观洞察

2.1 基础五虎将实战

AVG、COUNT、MAX、MIN、SUM这五个基础聚合函数,我习惯叫它们"五虎将"。最近分析用户活跃度时,就用它们快速输出了关键指标:

SELECT COUNT(DISTINCT user_id) AS 活跃用户数, AVG(session_duration) AS 平均停留时长, MAX(last_active_time) AS 最近活跃时间, SUM(page_views) AS 总浏览量 FROM user_activity WHERE date >= '2023-06-01'

这里有个容易踩的坑:COUNT(*)和COUNT(column)的区别。前者统计所有行数,后者会跳过NULL值。上周我就因为用错导致UV统计差了3%,被运营同事追着问数据异常原因。

2.2 GROUP BY的魔法时刻

单用聚合函数得到的是全局统计,配合GROUP BY才能开启多维分析。最近做商品类目分析时这样写:

SELECT category, COUNT(*) AS sku_count, SUM(stock) AS total_inventory, ROUND(SUM(sales*price)/SUM(sales),2) AS avg_unit_price FROM products WHERE is_active = 1 GROUP BY category

这里有个高级技巧:在计算加权平均单价时,先用SUM分别求出总销售额和总销量,再做除法。比直接用AVG(price)准确,因为考虑了不同SKU销售量的权重。

2.3 HAVING的筛选艺术

WHERE和HAVING的区别,就像装修时的"选材"和"验收"。最近排查低效商品时这样用:

SELECT product_id, SUM(quantity) AS total_sales, SUM(amount) AS gross_revenue FROM order_details WHERE order_date BETWEEN '2023-01-01' AND '2023-03-31' GROUP BY product_id HAVING SUM(amount) < 1000 AND COUNT(*) >= 5

这个查询找出Q1期间下单次数≥5但总销售额不足1000元的商品。HAVING就像质检员,专门检查聚合后的结果是否达标。

3. 标量函数:数据美容师

3.1 字符串变形记

处理用户提交数据时,标量函数是救命稻草。比如清洗用户名:

SELECT user_id, CONCAT( UPPER(LEFT(name, 1)), LOWER(SUBSTRING(name, 2)) ) AS formatted_name, REGEXP_REPLACE(phone, '(\d{3})\d{4}(\d{4})', '\1****\2') AS masked_phone FROM users

这个查询同时运用了:

  • LEFT取首字母
  • UPPER/LOWER转换大小写
  • SUBSTRING截取子串
  • REGEXP_REPLACE正则替换

3.2 数字与日期化妆术

金融数据对格式要求严格,我常用:

SELECT order_id, CAST(amount AS DECIMAL(10,2)) AS exact_amount, ROUND(amount/7, 4) AS usd_amount, DATE_FORMAT(pay_time, '%Y年%m月%d日 %H时') AS local_time, TIMESTAMPDIFF(HOUR, create_time, pay_time) AS process_hours FROM payments

CAST确保金额精度,ROUND控制小数位,DATE_FORMAT本地化时间显示,TIMESTAMPDIFF计算处理时长。

3.3 条件判断函数

给数据打标签时,CASE WHEN比编程语言的if-else更直观:

SELECT product_name, stock, CASE WHEN stock = 0 THEN '缺货' WHEN stock < 10 THEN '低库存' ELSE '充足' END AS stock_status, IF(discount > 0, price*discount, price) AS final_price FROM inventory

配合聚合函数还能实现多条件统计:

SELECT COUNT(CASE WHEN score >= 90 THEN 1 END) AS excellent, COUNT(CASE WHEN score BETWEEN 60 AND 89 THEN 1 END) AS qualified FROM test_results

4. 组合技实战案例

4.1 销售漏斗分析

用CTE配合窗口函数和聚合:

WITH funnel AS ( SELECT DATE_TRUNC('day', event_time) AS day, COUNT(DISTINCT CASE WHEN event_type = 'view' THEN user_id END) AS viewers, COUNT(DISTINCT CASE WHEN event_type = 'cart' THEN user_id END) AS cart_adders, COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS buyers FROM user_events GROUP BY 1 ) SELECT day, viewers, cart_adders, buyers, ROUND(100.0 * buyers / viewers, 2) AS conversion_rate, LAG(viewers, 7) OVER (ORDER BY day) AS prev_week_viewers FROM funnel ORDER BY day

4.2 用户分群报表

结合日期函数和聚合:

SELECT FLOOR(DATEDIFF(NOW(), birth_date)/365/10)*10 AS age_group, COUNT(*) AS user_count, ROUND(AVG(balance), 2) AS avg_assets, GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(email, '@', -1) ORDER BY SUBSTRING_INDEX(email, '@', -1) SEPARATOR ', ') AS email_providers FROM customers WHERE is_vip = 1 GROUP BY 1 HAVING user_count > 100 ORDER BY age_group

4.3 库存预警系统

标量函数嵌套聚合结果:

SELECT warehouse, product_type, SUM(current_stock) AS total_stock, SUM(last_month_sales) AS total_sales, CASE WHEN SUM(current_stock) < SUM(last_month_sales)*0.3 THEN '紧急补货' WHEN SUM(current_stock) < SUM(last_month_sales)*0.7 THEN '建议补货' ELSE '库存充足' END AS alert_level, CONCAT('安全库存天数:', ROUND(SUM(current_stock)/NULLIF(SUM(daily_avg_sales),0),1)) AS safe_days FROM inventory GROUP BY warehouse, product_type

记得有次把NULLIF用错成IFNULL,导致除零错误让整个ETL作业失败。现在养成了习惯,处理除法前一定会用NULLIF(分母,0)做防护。这些实战中的小教训,比任何文档都让人记忆深刻。

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询