本文介绍了三种计算MySQL中新登录用户次日留存率的SQL解法。
解法一使用自连接和子查询,兼容性好且性能优秀;
解法二基于MySQL 8.0+的窗口函数;
解法三通过WITH子句实现清晰逻辑。
核心步骤包括:
1)获取用户首次登录日期;
2)判断次日是否登录;
3)计算留存率(留存用户数/新用户总数)。
文章还对比了不同写法,并扩展了按日期、设备统计及多日留存率的场景,最后强调了整数除法转换和日期处理等注意事项。
MySQL 计算新登录用户次日留存率
解法一:使用自连接 + 子查询(推荐)
sql
SELECT ROUND( COUNT(DISTINCT t1.user_id) * 1.0 / (SELECT COUNT(DISTINCT user_id) FROM login WHERE (user_id, date) IN (SELECT user_id, MIN(date) FROM login GROUP BY user_id)), 3 ) AS p FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) t1 JOIN login l ON t1.user_id = l.user_id AND l.date = DATE_ADD(t1.first_date, INTERVAL 1 DAY);
解法二:使用窗口函数(MySQL 8.0+)
sql
SELECT ROUND( COUNT(DISTINCT CASE WHEN l.date = DATE_ADD(t.first_date, INTERVAL 1 DAY) THEN l.user_id END) * 1.0 / COUNT(DISTINCT t.user_id), 3 ) AS p FROM ( SELECT user_id, MIN(date) OVER (PARTITION BY user_id) AS first_date FROM login ) t JOIN login l ON t.user_id = l.user_id GROUP BY t.first_date;
解法三:使用两个子查询(最清晰)
sql
WITH first_login AS ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ), retention AS ( SELECT f.user_id, f.first_date, CASE WHEN l.date IS NOT NULL THEN 1 ELSE 0 END AS is_retained FROM first_login f LEFT JOIN login l ON f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) ) SELECT ROUND(SUM(is_retained) * 1.0 / COUNT(*), 3) AS p FROM retention;
原理说明
计算步骤:
找出每个用户首次登录日期
sql
SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id
判断次日是否登录
用户1:首次登录2020-10-12,次日(2020-10-13)有登录 → 留存 ✅
用户2:首次登录2020-10-12,次日(2020-10-13)有登录 → 留存 ✅
用户3:首次登录2020-10-12,次日(2020-10-13)无登录 → 未留存 ❌
用户4:首次登录2020-10-13,次日(2020-10-14)无登录 → 未留存 ❌
计算留存率
留存用户数:2
新用户总数:4
留存率:2/4 = 0.5 → 0.500
完整示例(建表 + 数据 + 查询)
sql
-- 建表 CREATE TABLE `login` ( `id` int(4) NOT NULL, `user_id` int(4) NOT NULL, `client_id` int(4) NOT NULL, `date` date NOT NULL, PRIMARY KEY (`id`) ); -- 插入数据 INSERT INTO login VALUES (1,2,1,'2020-10-12'), (2,3,2,'2020-10-12'), (3,1,2,'2020-10-12'), (4,2,2,'2020-10-13'), (5,4,1,'2020-10-13'), (6,1,2,'2020-10-13'), (7,1,2,'2020-10-14'); -- 查询(解法一详细版) SELECT ROUND( COUNT(DISTINCT retained.user_id) * 1.0 / COUNT(DISTINCT first.user_id), 3 ) AS p FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) first LEFT JOIN login retained ON first.user_id = retained.user_id AND retained.date = DATE_ADD(first.first_date, INTERVAL 1 DAY);
各种写法对比
写法1:COUNT + CASE WHEN
sql
SELECT ROUND( COUNT(CASE WHEN l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) THEN 1 END) * 1.0 / COUNT(*), 3 ) AS p FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) f LEFT JOIN login l ON f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY);
写法2:使用 EXISTS
sql
SELECT ROUND( COUNT(*) * 1.0 / ( SELECT COUNT(DISTINCT user_id) FROM login WHERE (user_id, date) IN (SELECT user_id, MIN(date) FROM login GROUP BY user_id) ), 3 ) AS p FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) f WHERE EXISTS ( SELECT 1 FROM login l WHERE f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) );
扩展场景
场景1:计算不同日期的次日留存率
sql
SELECT first_date, ROUND( COUNT(CASE WHEN l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) THEN 1 END) * 1.0 / COUNT(*), 3 ) AS retention_rate FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) f LEFT JOIN login l ON f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) GROUP BY first_date ORDER BY first_date;
场景2:计算3日、7日留存率
sql
SELECT ROUND( COUNT(CASE WHEN l3.date IS NOT NULL THEN 1 END) * 1.0 / COUNT(*), 3 ) AS retention_3day FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) f LEFT JOIN login l3 ON f.user_id = l3.user_id AND l3.date = DATE_ADD(f.first_date, INTERVAL 3 DAY);
场景3:按设备统计留存率
sql
SELECT client_id, ROUND( COUNT(CASE WHEN l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) THEN 1 END) * 1.0 / COUNT(*), 3 ) AS retention_rate FROM ( SELECT user_id, MIN(date) AS first_date, client_id FROM login GROUP BY user_id ) f LEFT JOIN login l ON f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) GROUP BY client_id;
MySQL vs SQLite 语法对比
| 操作 | MySQL | SQLite |
|---|---|---|
| 日期加一天 | DATE_ADD(date, INTERVAL 1 DAY) | date(date, '+1 day') |
| 四舍五入 | ROUND(x, 3) | ROUND(x, 3) |
| 防止整数除法 | * 1.0 / | * 1.0 / |
验证计算过程
sql
-- 查看每个用户的新登录日期和次日登录情况 SELECT f.user_id, f.first_date, CASE WHEN l.date IS NOT NULL THEN '留存' ELSE '未留存' END AS status, l.date AS next_day_login FROM ( SELECT user_id, MIN(date) AS first_date FROM login GROUP BY user_id ) f LEFT JOIN login l ON f.user_id = l.user_id AND l.date = DATE_ADD(f.first_date, INTERVAL 1 DAY) ORDER BY f.user_id;
结果:
| user_id | first_date | status | next_day_login |
|---|---|---|---|
| 1 | 2020-10-12 | 留存 | 2020-10-13 |
| 2 | 2020-10-12 | 留存 | 2020-10-13 |
| 3 | 2020-10-12 | 未留存 | NULL |
| 4 | 2020-10-13 | 未留存 | NULL |
注意事项
新用户定义:首次登录的用户,不是所有登录用户
次日定义:首次登录日期的后一天(
DATE_ADD(first_date, INTERVAL 1 DAY))一天多次登录:只要次日有登录即可,使用
DISTINCT去重整数除法:必须
* 1.0或CAST(... AS DECIMAL)保留三位小数:使用
ROUND(..., 3)
推荐使用解法一(自连接 + 子查询),兼容性好,逻辑清晰,性能优秀。