MyBatis游标Cursor的正确使用和百万数据传输的内存测试
作者:isea533
很早以前为了处理大量数据想过使用Cursor,当时发现没有效果,就没有继续深入。
这次为了搞清楚 Cursor 是否真的有用,找些资料和源码发现是有效果的,只是缺了必要的配置。
准备测试数据
创建表:
CREATE TABLE test_table ( id INT PRIMARY KEY, name VARCHAR(20), age INT, address VARCHAR(200) );
创建存储过程:
-- 创建一个存储过程,用于插入10万测试数据 DELIMITER // CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i <= 100000 DO -- 随机生成姓名和年龄 SET @name = CONCAT('name', i); SET @address = CONCAT('address......................', i); SET @age = FLOOR(RAND() * 100); -- 插入数据 INSERT INTO test_table (id, name, age, address) VALUES (i, @name, @age, @address); -- 更新计数器 SET i = i + 1; END WHILE; END // DELIMITER ;
插入数据:
-- 调用存储过程 CALL insert_test_data();
准备测试接口
public interface TestMapper { class Person { private String name; private int age; private Integer id; private String address; public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } public Integer getId() { return id; } public void setId(Integer id) { this.id = id; } public String getAddress() { return address; } public void setAddress(String address) { this.address = address; } } //TODO 注意sql中指定了表名 test,如果自己执行,需要按需修改 @Select("select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table") @Options(fetchSize = Integer.MIN_VALUE) Cursor<Person> selectAll(); @Select("select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table") List<Person> selectList(); }
前面插入10万数据,这里union all 10次达到百万数据。
测试代码
@Test public void testCursor() throws InterruptedException { //等待10秒方便jvisualVM监控 Thread.sleep(10000); long start = System.currentTimeMillis(); try (SqlSession sqlSession = getSqlSession()) { TestMapper testMapper = sqlSession.getMapper(TestMapper.class); try(Cursor<TestMapper.Person> cursor = testMapper.selectAll()){ int total = 0; for (TestMapper.Person o : cursor) { total++; } System.out.println("总数: " + total); } catch (IOException ignore) { } } System.out.println("耗时: " + (System.currentTimeMillis() - start)); Thread.sleep(10000); } @Test public void testSelectAll() throws InterruptedException { //等待10秒方便jvisualVM监控 Thread.sleep(10000); long start = System.currentTimeMillis(); try (SqlSession sqlSession = getSqlSession()) { TestMapper testMapper = sqlSession.getMapper(TestMapper.class); List<TestMapper.Person> people = testMapper.selectList(); System.out.println(people.size()); } System.out.println("耗时: " + (System.currentTimeMillis() - start)); Thread.sleep(10000); } private static SqlSessionFactory sqlSessionFactory; @BeforeAll public static void init() { try { Reader reader = Resources.getResourceAsReader("mybatis-config.xml"); sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); reader.close(); } catch (IOException ignore) { ignore.printStackTrace(); } } public SqlSession getSqlSession() { return sqlSessionFactory.openSession(); }
测试结果
1.1. 直接List接收100万数据
- 查询过程耗时:7833ms
- GC:21次
- 占用内存:885MB
1.2. 限制500MB内存,直接List接收100万数据
增加JVM参数 -Xmx500m
执行结果如下:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
内存溢出。
2.1. 使用游标Cursor,不配置其他参数
@Select("select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table") Cursor<Person> selectAll();
- 查询过程耗时:5908ms
- GC:21次
- 占用内存:428MB
使用游标的情况在测试中,占用了第1种情况一半的内存,处理速度也更快,GC次数也没增加。
2.2. 使用游标Cursor,不配置其他参数,限制200MB内存
等了1分30秒都没出结果,而且线程卡在MySQL传输数据上:
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:64)
at com.mysql.cj.protocol.a.SimplePacketReader.readMessageLocal(SimplePacketReader.java:137)
at com.mysql.cj.protocol.a.SimplePacketReader.readMessage(SimplePacketReader.java:102)
3.1. 使用游标Cursor,配置 FORWARD_ONLY
@Select("select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table union all " + "select * from test.test_table") @Options(resultSetType = ResultSetType.FORWARD_ONLY) Cursor<Person> selectAll();
- 查询过程耗时:6313ms
- GC:22次
- 占用内存:454MB
加了这个参数不如2.1不配置参数的情况。
3.2. 使用游标Cursor,配置 FORWARD_ONLY,限制200MB内存
仍然内存溢出:
org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: java.sql.SQLException: GC overhead limit exceeded
以上测试说明 @Options(resultSetType = ResultSetType.FORWARD_ONLY) 配置没用。
从 MyBatis 源码来看,就没有相关的代码,不起作用是正常的,但是奇怪的是,网上搜的大量文章都是加的这个配置。
接下来看看真正有用的配置。
4.1. 使用游标Cursor,配置 @Options(fetchSize = Integer.MIN_VALUE)
- 查询过程耗时:4735ms
- GC:12次
- 占用内存:206MB
这种情况比前面的都好,而且GC只有12次,内存比3.1少了一半。
4.2. 使用游标Cursor,配置 @Options(fetchSize = Integer.MIN_VALUE),内存限制50MB
- 查询过程耗时:4676ms
- GC:142次
- 占用内存:16MB
16MB内存就能处理百万数据,但是GC增加了,GC耗时231ms。
4.3. 使用游标Cursor,配置 @Options(fetchSize = Integer.MIN_VALUE),内存限制10MB
- 查询过程耗时:38715ms
- GC:1894次
- 占用内存:7.8MB
- 16MB内存就能处理百万数据,但是GC增加了,GC耗时34s。
程序一共运行了39秒,其中34秒是GC时间,吞吐量只有13%,太低了,限制50MB时使用了16MB,增加一次限制20MB的测试。
4.4. 使用游标Cursor,配置 @Options(fetchSize = Integer.MIN_VALUE),内存限制20MB
- 查询过程耗时:4880ms
- GC:366次
- 占用内存:7.8MB
- 16MB内存就能处理百万数据,但是GC增加了,GC耗时514ms,吞吐量90%。
正确使用MyBatis游标
从上面结果来看,真正有效的是 @Options(fetchSize = Integer.MIN_VALUE) 配置。
如果追查到JDBC层,会在 mysql 的 jdbc 驱动StatementImpl类中发现下面的方法:
/** * We only stream result sets when they are forward-only, read-only, and the * fetch size has been set to Integer.MIN_VALUE * * @return true if this result set should be streamed row at-a-time, rather * than read all at once. */ protected boolean createStreamingResultSet() { return ((this.query.getResultType() == Type.FORWARD_ONLY) && (this.resultSetConcurrency == java.sql.ResultSet.CONCUR_READ_ONLY) && (this.query.getResultFetchSize() == Integer.MIN_VALUE)); }
我们加的注解中,fetchSize条件满足了,另外两个在何时设置的呢?
在AbstractQuery中,存在下面的默认值:
protected Resultset.Type resultSetType = Type.FORWARD_ONLY;
在 ConnectionImpl 中的下面方法也有默认参数:
@Override public java.sql.PreparedStatement prepareStatement(String sql) throws SQLException { return prepareStatement(sql, DEFAULT_RESULT_SET_TYPE, DEFAULT_RESULT_SET_CONCURRENCY); }
所以在 MySQL 中,启用流式传输就需要 @Options(fetchSize = Integer.MIN_VALUE) 配置。
当考虑到更多类型的数据库时,fetchSize 一般都有不同大小的默认值,像 MySQL 这样直接用 Integer.MIN_VALUE 的不多见,Type.FORWARD_ONLY 也是一些数据库的默认值,为了保险可以设置上,就目前的游标功能来看,针对不同的数据库要做对应的测试才能找到合适的参数配置。
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。