使用java实现百万级别数据导出excel的三种方式
作者:奔腾的野马
在业务系统中,导出报表的需求会很常见,而随着时间推移业务量不断增加,数据库的数据可能达到百万甚至千万级别。对于导出报表功能,最简单的做法就是从数据库里面把需要的数据一次性加载到内存,然后写入excel文件,再把excel文件返回给用户。这种做法在数据量不大的时候是可行的,但是一旦需要导出几十万甚至上百万的数据,很可能出现OOM导致服务崩溃的情况,而且导出所消耗的时间会大大增加。
这里提供一种支持百万级别数据导出的方法,并且消耗很少的内存,核心思想就是不要一次性把数据加载到内存中。
主要是从两个方面去解决:
1.从数据库加载数据不要一次性加载,可以分页的方式或者用游标的方式分批加载数据,加载一批数据处理一批并且释放内存,这样内存占用始终处于一个比较平稳的状态。分页的方式加载编码比较繁琐,我一般是采用游标方式逐行加载。目前常用的持久层框架有JPA,mybaits,hibernate,下面会分别列出JPA,hibernate及mybatis通过游标方式加载数据。
2.写入excel也是分批写入,推荐阿里的EasyExcel,占用内存极低。
EasyExcel的pom依赖:
<dependency> <groupId>com.alibaba</groupId> <artifactId>easyexcel</artifactId> <version>2.1.1</version> <optional>true</optional> </dependency>
运行环境
jdk1.8,idea2019,堆内存:-Xms256M -Xms256M(导出100万数据毫无压力),springboot,数据库是mysql
先来张效果图,这个是最大堆内存设置为256M,两张表联合查询的情况下导出100万数据的效果,可以看到堆内存变化比较平稳,导出100万数据耗时143秒,这个速度还有优化的空间,如果是单表导出的话速度会更快些:
JPA使用游标方式导出百万数据(两种方式,推荐使用QueryDSL)
1.使用jpa原生方式,这种适合sql比较简单的情况
pom.xml:
<!-- spring web依赖,搭建web项目需要这个依赖--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- jpa --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency>
repository:
@Repository public interface UserRepository extends JpaRepository<UserEntity,Integer> { //@QueryHint(name = HINT_FETCH_SIZE,value = Integer.MIN_VALUE+"") 值设置为Integer.MIN_VALUE告诉mysql需要逐条返回数据,并且返回值需要用stream来接收 @QueryHints(@QueryHint(name = HINT_FETCH_SIZE,value = Integer.MIN_VALUE+"")) @Query(value = "select * from user limit 500000",nativeQuery = true) Stream<UserEntity> findAllList(); }
service:
注意:
需要加事务注解,并且是只读事务
需要及时调用entityManager的detach方法释放内存,不然还是会出现OOM
@Autowired private EntityManager entityManager; @Autowired private UserRepository userRepository; Transactional(readOnly = true) public void exportData3(ScrollResultsHandler<UserExportVO> scrollResultsHandler){ Stream<UserEntity> allList = userRepository.findAllList.forEach((o)->{ UserEntity userEntity = (UserEntity) o; UserExportVO userExportVO = UserExportVO.builer() .userName(userEntity.getUsername()) .mobile(userEntity.getMobile()) .build(); scrollResultsHandler.handle(userExportVO); //对象被session持有,调用detach方法释放内存 entityManager.detach(userEntity); }); }
controller:
@RequestMapping("export4") public void export4(HttpServletResponse response) throws IOException { response.setContentType("application/vnd.ms-excel"); response.setCharacterEncoding("utf-8"); String filenames="bigdata4"; response.addHeader("Content-Disposition", "filename=" + filenames + ".xlsx"); ExcelWriter excelWriter = EasyExcel.write(response.getOutputStream(), UserExportVO.class).build(); WriteSheet[] writeSheet = new WriteSheet[] { EasyExcel.writerSheet(0, "sheet").build() }; userService.exportData(s->{ UserExportVO resultObject = s; ArrayList arrayList = new ArrayList<UserExportVO>(); arrayList.add(resultObject); excelWriter.write(arrayList, writeSheet[0]); }); excelWriter.finish(); }
使用到的相关的类:
/** * @author 奔腾的野马 * @date 2022/04/25 09:12 */ public interface ScrollResultsHandler<T> { void handle(T t); }
import com.alibaba.excel.annotation.ExcelProperty; import lombok.AllArgsConstructor; import lombok.Builder; import lombok.Data; import lombok.NoArgsConstructor; import java.math.BigDecimal; import java.time.LocalDateTime; /** * @Author: 奔腾的野马 * @Date: 2021/10/16 16:19 */ @Data @Builder @AllArgsConstructor @NoArgsConstructor public class UserExportVO { @ExcelProperty(value = "用户名") private String userName; @ExcelProperty(value = "手机号") private String mobile; }
2.使用querysql游标方式导出,推荐这种方式,可以实现动态sql,多表关联甚至是常见的组函数都可以支持
pom.xml:
<dependencies> <!-- spring web依赖,搭建web项目需要这个依赖--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <!--QueryDSL支持--> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-apt</artifactId> <version>5.0.0</version> <scope>provided</scope> </dependency> <!--QueryDSL支持--> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-jpa</artifactId> <version>5.0.0</version> </dependency> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-core</artifactId> <version>5.0.0</version> </dependency> </dependencies> <build> <plugins> <!-- QueryDSL 插件 --> <plugin> <groupId>com.mysema.maven</groupId> <artifactId>apt-maven-plugin</artifactId> <version>1.1.3</version> <executions> <execution> <goals> <goal>process</goal> </goals> <configuration> <outputDirectory>target/generated-sources/java</outputDirectory> <processor>com.querydsl.apt.jpa.JPAAnnotationProcessor</processor> </configuration> </execution> </executions> </plugin> </plugins> </build>
service:
@Autowired private JPAQueryFactory jpaQueryFactory; private QUserEntity qUserEntity = QUserEntity.userEntity; @Transactional(readOnly = true) public void exportData2(ScrollResultsHandler<UserExportVO> scrollResultsHandler){ //需要用stream方式接收,这样才能逐条处理 Stream<UserExportVO> userExportVOStream = jpaQueryFactory.select(Projections.bean(UserExportVO.class , qUserEntity.userName, qUserEntity.mobile)) .from(qUserEntity) //.join(xxxEntity) //.on(xxxx) //setHint(HINT_FETCH_SIZE,Integer.MIN_VALUE+"") 告诉mysql需要逐条返回数据,注意值需要设置为Integer.MIN_VALUE才能生效 .setHint(HINT_FETCH_SIZE,Integer.MIN_VALUE+"") .limit(1000000) .stream(); userExportVOStream.forEach(dto->{ scrollResultsHandler.handle(dto); }); }
controller:
@RequestMapping("export4") public void export4(HttpServletResponse response) throws IOException { response.setContentType("application/vnd.ms-excel"); response.setCharacterEncoding("utf-8"); String filenames="bigdata4"; response.addHeader("Content-Disposition", "filename=" + filenames + ".xlsx"); ExcelWriter excelWriter = EasyExcel.write(response.getOutputStream(), UserExportVO.class).build(); WriteSheet[] writeSheet = new WriteSheet[] { EasyExcel.writerSheet(0, "sheet").build() }; userService.exportData(s->{ UserExportVO resultObject = s; ArrayList arrayList = new ArrayList<UserExportVO>(); arrayList.add(resultObject); excelWriter.write(arrayList, writeSheet[0]); }); excelWriter.finish(); }
mybatis使用游标方式导出百万数据
pom.xml:
<dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId> <version>3.5.9</version> </dependency>
dao:
/** * @author 奔腾的野马 * @date 2022/04/16 19:14 */ @Mapper public interface UserDao { //ResultSetType.TYPE_FORWORD_ONLY 结果集的游标只能向下滚动,fetchSize需要设置为Integer.MIN_VALUE游标才能生效 @Options(resultSetType = ResultSetType.FORWARD_ONLY,fetchSize = Integer.MIN_VALUE) @ResultType(UserExportVO.class) @Select("select userName,mobile from user limit 500000") void reportAll2(ResultHandler<UserExportVO> handler); }
service:
@Transactional(readOnly = true) public void export2(ResultHandler<UserExportVO> handler){ userDao.reportAll2(handler); }
controller:
同上
hibernate使用游标方式导出百万数据
service:
@Autowired private EntityManager entityManager; public void exportData(ScrollResultsHandler<UserExportVO> scrollResultsHandler){ //当不需要缓存时,最好使用StatelessSession StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession(); Query query = session.getNamedQuery("getAllList"); query.setCacheMode(CacheMode.IGNORE); //setFetchSize(Integer.MIN_VALUE)告诉mysql逐条返回数据 query.setFetchSize(Integer.MIN_VALUE); query.setFirstResult(0); query.setMaxResults(1000000); query.setReadOnly(true); query.setLockMode("a", LockMode.NONE); //ScrollMode.TYPE_FORWORD_ONLY 结果集的游标只能向下滚动 ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY); while (results.next()) { UserEntity userEntity = (UserEntity) results.get(0); UserExportVO userExportVO = UserExportVO.builer() .userName(userEntity.getUsername()) .mobile(userEntity.getMobile()) .build(); scrollResultsHandler.handle(userExportVO); } results.close(); session.close(); }
controller:
同上
流式导出遇到的问题(基于jpa和querydsl)及解决方法
1.导出过程中堆内存占用急剧上升,发生OOM
1.1 项目中使用了log4jdbc-log4j2-jdbc4.1(版本是1.16),驱动为net.sf.log4jdbc.sql.jdbcapi.DriverSpy,改成mysql的原生驱动就好了。"log4jdbc-log4j2-jdbc4.1"本来是用来开发过程中方便打印sql的,结果却带来了OOM问题,看来使用第三方jar包一定要慎重啊。
1.2 项目的存在多个版本的querydsl,jar包冲突,解决jar包就正常了
1.3 二次查询时,hibernate的一级缓存没有及时释放,进一步分析,发现大量的对象都被缓存在(org.hibernate.engine.StatefulPersistenceContext)中,导致一级缓存泄漏
解决方法:
由于 Hibernate 的一级缓存是其内部使用的,无法关闭或停用(随着Session 销毁)。从Hibernate 的手册或文档中可知,Hibernate 的一级缓存的清除可通过以下方式:
1)对于单个对象的清除:
Session session=sessionFactory.getCurrentSession(); session.evict(entity);
2)对于实体集合的清除:
Session session=sessionFactory.getCurrentSession(); session.clear();建议在程序中加入对 Hibernate 一级缓存的清除工作,以便可以其内存数据可以及时释放。
2.导出过程中遍历stream需要二次查询数据库
导出过程中遍历stream需要二次查询数据库时提示"Streaming result set com.mysql.cj.protocol.a.result.ResultsetRowsStreaming@5800daf5 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries"
错误详细内容:
java.sql.SQLException: Streaming result set com.mysql.cj.protocol.a.result.ResultsetRowsStreaming@3b8732ec is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953) at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:1003) at net.sf.log4jdbc.sql.jdbcapi.PreparedStatementSpy.executeQuery(PreparedStatementSpy.java:780) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:57) at org.hibernate.loader.Loader.getResultSet(Loader.java:2292) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2050) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2012) at org.hibernate.loader.Loader.doQuery(Loader.java:953) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:354) at org.hibernate.loader.Loader.doList(Loader.java:2838) at org.hibernate.loader.Loader.doList(Loader.java:2820) at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2652) at org.hibernate.loader.Loader.list(Loader.java:2647) at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:506) at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:396) at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:219) at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1404) at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1562) at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1530) at org.hibernate.query.internal.AbstractProducedQuery.getSingleResult(AbstractProducedQuery.java:1578) at org.hibernate.query.criteria.internal.compile.CriteriaQueryTypeQueryAdapter.getSingleResult(CriteriaQueryTypeQueryAdapter.java:111) at org.springframework.data.jpa.repository.query.JpaQueryExecution$SingleEntityExecution.doExecute(JpaQueryExecution.java:196) at org.springframework.data.jpa.repository.query.JpaQueryExecution.execute(JpaQueryExecution.java:88) at org.springframework.data.jpa.repository.query.AbstractJpaQuery.doExecute(AbstractJpaQuery.java:154) at org.springframework.data.jpa.repository.query.AbstractJpaQuery.execute(AbstractJpaQuery.java:142) at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:618) at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:605) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInterceptor.java:80) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:366) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:99) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:139) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:149) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) at com.sun.proxy.$Proxy223.countAllByDeliveryNo(Unknown Source)
查阅资料后发现,是mysql不支持在流式查询过程中使用同一连接再次查询数据库
解决方法:
方法1.使用异步方法查询,这样就可以规避同一个连接二次查询的问题
方法2.需要二次查询时开启一个新的事务去查询就可以,spring中可以使用事务注解开启新的事务就搞定了,注解如下:
@Transactional(propagation = Propagation.REQUIRES_NEW,readOnly = true)
到此这篇关于使用java实现百万级别数据导出excel的三种方式的文章就介绍到这了,更多相关java百万级别数据导出excel内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!