数据库的选择

# 前言

玩家每参加一次游戏就可以获得 1 积分，现有需求需要统计游戏中过去 24 小时玩家积分排名，当前用户量约 200 万，未来会进一步增长。假设平均每人每天玩 3 次，那么每天约有 600 万的数据量，而需要对这 600 万的数据进行近乎实时的聚合统计。

挖坑待埋！！

# 关系数据库

单机 MySQL8 为例

# 建表

CREATE TABLE `point` (
	`id` INT(10) NOT NULL AUTO_INCREMENT,
	`point` INT(10) NOT NULL COMMENT '本次获得/扣减积分',
	`user_id` INT(10) NOT NULL COMMENT '用户id',
	`create_time` DATETIME NOT NULL COMMENT '创建时间',
	PRIMARY KEY (`id`) USING BTREE
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
;

# 插入

随机生成 500 万条记录

Point point = new Point();
int userId = (int) (Math.random() * 1000000) + 1;
point.setUserId(userId);
Date date = new Date();
// 随机生成一个最近48小时内的时间
date.setTime(date.getTime() - (long) (Math.random() * 48 * 60 * 60 * 1000));
point.setCreateTime(date);
int pointValue = (int) (Math.random() * 5) + 1;
point.setPoint(pointValue);

# 查询

查询 SQL：

SELECT SUM(p.`point`) AS total_points,p.user_id
FROM `point` p
WHERE p.create_time > DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY p.user_id
ORDER BY total_points DESC
LIMIT 10;

未加索引时的耗时情况：

行数 / 万	耗时 / 秒
100	8
150	10
200	16
300	25
400	32
500	38

在 create_time 字段添加索引，耗时没变化，explain：没走索引， Using where; Using temporary; Using filesort
在 user_id 字段添加索引，耗时超过 2 分钟，explain：显示走了索引，但反而耗时变长
删除索引恢复原来耗时时长

# 时序数据库

timescale/timescaledb:2.11.0-pg14 为例

# 建表

使用 ChatGPT 转换的

CREATE TABLE point (
    id int4 NOT NULL DEFAULT nextval('point_id_seq'::regclass),
    point INT NOT NULL,
    user_id INT NOT NULL,
    create_time TIMESTAMP NOT NULL
);

COMMENT ON TABLE point IS '积分表';
COMMENT ON COLUMN point.point IS '本次获得/扣减积分';
COMMENT ON COLUMN point.user_id IS '用户id';
COMMENT ON COLUMN point.create_time IS '创建时间';

转超表

SELECT create_hypertable('point','create_time');

笔记

原表中不能用唯一索引，或者需要将唯一索引包含 create_time 字段。否则会报错： ERROR: cannot create a unique index without the column "create_time" (used in partitioning)

# 插入

这里我使用的 Navicat 从 MySQL 导出 CSV，再导入。

# 查询

SELECT SUM(p.point) AS total_points, p.user_id
FROM point p
WHERE p.create_time > NOW() - INTERVAL '1 DAY'
GROUP BY p.user_id
ORDER BY total_points DESC
LIMIT 10;

执行耗时约 3.8 秒，相较于 MySQL 的 38 秒提升明显

# 参考

Timescale Documentation | Get started with Timescale (opens new window)

上次更新: 2023/06/02, 16:55:07

← Python实践