Java堆空间爆满导致宕机的问题分析及解决
作者:movee
1 问题现象与定位
团队有一个服务,一直运行的好好的,突然访问异常了,先是请求超时,然后直接无法访问。接到问题反馈后,首先想到的是查看日志,日志中没有任何程序逻辑处理错误的日志,但是接口请求处理的时间拉的特别的长,且访问外部接口都报超时了。
同一个接口处理时间对比如下:
2024-01-25 00:03:44.451 [http-nio-8843-exec-7] INFO c.b.w.o.api.v1.ApiControllerV1 - 2410710a-17ec-49ce-a74e-23e410189b08 clientRealIp: 10.181.202.12, finish getStatus, duration time: 634 ms 2024-01-25 02:29:38.707 [http-nio-8843-exec-105] INFO c.b.w.o.a.a.BasicApiExceptionHandler - 3f3b24ef-b1c0-4558-ad15-ec88b30aa0bd clientRealIp: 10.181.200.33, finish getStatus, duration time: 1217935 ms
eureka client周期性打印以下日志:
2024-01-25 01:00:59.823 [DiscoveryClient-1] WARN c.n.discovery.TimedSupervisorTask - task supervisor timed out java.util.concurrent.TimeoutException: null at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.netflix.discovery.TimedSupervisorTask.run(TimedSupervisorTask.java:66) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
访问数据库的连接异常断开:
2024-01-25 01:12:21.965 [http-nio-8843-exec-13] WARN c.zaxxer.hikari.pool.ProxyConnection - d0487d3e-89cf-4385-ae59-6d26ff1e277a HikariPool-1 - Connection com.mysql.jdbc.JDBC4Connection@2b165bf4 marked as broken because of SQLSTATE(08007), ErrorCode(0) com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) at com.mysql.jdbc.Util.getInstance(Util.java:360) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:870) at com.mysql.jdbc.ConnectionImpl.commit(ConnectionImpl.java:1611) at com.zaxxer.hikari.pool.ProxyConnection.commit(ProxyConnection.java:361) at com.zaxxer.hikari.pool.HikariProxyConnection.commit(HikariProxyConnection.java)
调用外部接口报错:
2024-01-25 01:12:22.148 [http-nio-8843-exec-3] ERROR c.b.w.o.a.a.BasicApiExceptionHandler - 7c4a2b46-dc8a-4121-85ff-cf25ec16a17d org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1055) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) at javax.servlet.http.HttpServlet.service(HttpServlet.java:634) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:88) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at com.baidu.wuyou.operation.api.aspect.CachingRequestBodyFilter.doFilterInternal(CachingRequestBodyFilter.java:25) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:94) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.filterAndRecordMetrics(WebMvcMetricsFilter.java:114) at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:104) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:367) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:860) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1598) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.OutOfMemoryError: Java heap space at com.fasterxml.jackson.core.util.BufferRecycler.calloc(BufferRecycler.java:191) at com.fasterxml.jackson.core.util.BufferRecycler.allocCharBuffer(BufferRecycler.java:161) at com.fasterxml.jackson.core.util.BufferRecycler.allocCharBuffer(BufferRecycler.java:151) at com.fasterxml.jackson.core.io.IOContext.allocConcatBuffer(IOContext.java:234) at com.fasterxml.jackson.core.json.UTF8JsonGenerator.<init>(UTF8JsonGenerator.java:139) at com.fasterxml.jackson.core.JsonFactory._createUTF8Generator(JsonFactory.java:1817) at com.fasterxml.jackson.core.JsonFactory.createGenerator(JsonFactory.java:1281)
上面这条日志很“友好”的说明出错的原因是java.lang.OutOfMemoryError: Java heap space
,立即查看应用程序的堆监控,显示堆空间爆满而宕机了,监控图如下所示:
现在就是要确定具体是哪个地方导致内存占用飙升了,幸亏项目设置了如下配置,当程序异常宕机时会dump堆空间快照。
JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError" JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=${LOG_DIR}/${PROJ_NAME}-`date +%s`-pid$$.hprof" JVM_OPTS="$JVM_OPTS -Xloggc:${LOG_DIR}/${PROJ_NAME}-gc.log" JVM_OPTS="$JVM_OPTS -XX:ErrorFile=${LOG_DIR}/${PROJ_NAME}-`date +%s`-pid%p.log"
马上用MAT
(MemoryAnalyzer
)工具查看heap dump数据,终于找到了罪魁祸首,如下所示:
2 问题原因分析
导致内存爆满的地方的核心功能是从多个数据源获取数据,然后提供给grafana
进行大屏展示(基于simplejson
插件)。同时,grafana大屏数据会自动周期性刷新。当在一个panel内展示的曲线比较多,且展示的时间范围比较大,同时需要展示的panel数也较多时,导致该数据处理逻辑处理不过来,上一个刷新周期还没有处理完,下一个刷新周期的请求又到了,同时java heap空间设置的也不足够大,导致heap空间占用累积上升,jvm一直忙于STW
(Stop The World
)式垃圾收集,直到最终宕机
3 问题修复与优化方法
通过这次故障,发现这个模块看上去处理逻辑比较简单,但是仍然暴露了不少设计上的问题,分别采用了一些优化方法:
首先当然是增加heap空间大小,毕竟能多撑一会儿是一会儿
梳理处理逻辑时,发现性能仍然有提升空间,通过提升性能,减少数据处理时间,让内存尽快释放
请求外部数据接口和等待并发任务完成时,为避免挂死,都设置了超时时间,原先为了尽量能获取到数据(外部接口有时确实返回也比较慢),设置的超时时间比较长,降低超时时间(消除隐患)
限制单个pannel的最大曲线数,限制最大展示时间窗口大小
增量获取数据
将该功能独立为单独的微服务,不与其他逻辑放在一个服务中(作为独立的微服务,也不会增加调用链长度)
优化原先的数据结构。
原先的数据结构为:
public class Avail { private Long total; private Integer loss; private Double latency; private Double rate; }
改为:
public class Avail { private long total; private int loss; private double latency; private double rate; }
这样一个数据点可以节约16字节(不压缩对象指针)或8字节(压缩对象指针),这样一个数据点可以节约内存约50%~67%
到此这篇关于Java堆空间爆满导致宕机的问题分析及解决的文章就介绍到这了,更多相关Java堆空间爆满导致宕机内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!