netty-grpc一次DirectByteBuffer内存泄露问题
作者:xihuanyuye
应用场景
当前应用会采用grpc大量发送数据,并且并发量并不固定
报错日志
2022-10-31 22:20:51.630 INFO 8 --- [ueue4-thread-20] c.t.s.d.p.ParquetNoBlockDataReaderWriter : groupCount:1,count:300000,end read end read
2022-10-31 22:20:51.630 INFO 8 --- [ueue4-thread-20] c.t.privpy.sdk.task.data.DataSender : MTU4NzA4NzE3MDkxNTUzNjg5Nw noFragment read data plain://ds02/xgboost_vertical-20221031221952-df1650ce/bins_Bob_13_0_tree0_depth0.parquet;type=int32 finished!
Exception in thread "send1-thread-7" 2022-10-31 22:20:51.634 INFO 8 --- [Queue3-thread-1] c.t.p.s.t.d.n.NoFragmentDataSender : MTU4NzA4NzE3MDkxNTUzNjg5Nw read plain data plain://ds02/xgboost_vertical-20221031221952-df1650ce/bins_Bob_13_0_tree0_depth0.parquet;type=int32 success!
Exception in thread "NoFragRec7-thread-8" 2022-10-31 22:20:51.634 INFO 8 --- [ptor2-thread-10] c.t.p.s.t.d.n.NoFragmentDataSender : MTU4NzA4NzE3MDkxNTUzNjg5Nw start cipher for plain data transfer plain://ds02/xgboost_vertical-20221031221952-df1650ce/bins_Bob_13_0_tree0_depth0.parquet;type=int32
io.grpc.netty.shaded.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1256 byte(s) of direct memory (used: 68121888, max: 67108864)
at io.grpc.netty.shaded.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
at io.grpc.netty.shaded.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledByteBufAllocator.java:186)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledDirectByteBuf.<init>(UnpooledDirectByteBuf.java:64)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:41)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledByteBufAllocator.java:181)
at io.grpc.netty.shaded.io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:91)
at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
at io.grpc.netty.shaded.io.netty.handler.codec.base64.Base64.encode(Base64.java:108)
at io.grpc.netty.shaded.io.netty.handler.ssl.SslUtils.toBase64(SslUtils.java:375)
at io.grpc.netty.shaded.io.netty.handler.ssl.PemX509Certificate.append(PemX509Certificate.java:128)
at io.grpc.netty.shaded.io.netty.handler.ssl.PemX509Certificate.toPEM(PemX509Certificate.java:86)
at io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslKeyMaterialProvider.validateSupported(OpenSslKeyMaterialProvider.java:78)
at io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslKeyMaterialProvider.validateKeyMaterialSupported(OpenSslKeyMaterialProvider.java:44)
at io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslClientContext.<init>(OpenSslClientContext.java:193)
at io.grpc.netty.shaded.io.netty.handler.ssl.SslContext.newClientContextInternal(SslContext.java:827)
at io.grpc.netty.shaded.io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:576)
at com.tsingj.privpy.sdk.client.grpc.ChannelPool.getESManagedChannel(ChannelPool.java:211)
at com.tsingj.privpy.sdk.client.grpc.EsClient.getNewChannel(EsClient.java:60)
at com.tsingj.privpy.sdk.client.grpc.EsClient.getResultData(EsClient.java:289)
at com.tsingj.privpy.sdk.task.data.nofragment.NoFragmentDataReceiver.lambda$new$2(NoFragmentDataReceiver.java:205)
at com.tsingj.privpy.sdk.task.core.QueueWithConsumer.lambda$new$0(QueueWithConsumer.java:41)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Exception in thread "send1-thread-5" Exception in thread "send1-thread-1" io.grpc.netty.shaded.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4096 byte(s) of direct memory (used: 68121888, max: 67108864)
分析
计算了一下,大概是64M,正好是jvm启动时设置的参数大小。
java -server -Xms$XMS_VALUE -Xmx$XMX_VALUE -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+UseG1GC -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=8890 -Dcom.sun.management.jmxremote.port=8890 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=40 -XX:MaxDirectMemorySize=64M -XX:HeapDumpPath=/opt/gemini/javadsheapdump$DS_ID$DS_INSTANCE_ID -Xloggc:/opt/gemini/javadsgclog$DS_ID$DS_INSTANCE_ID -jar /ds/ds_run.jar --logging.level.root=$LOG_LEVEL
visualVm安装插件
观察到,其中buffer Pools稳定在40M+,重启后恢复。
当前怀疑在发送突然超过DirectByteBuffer总大小时,发生了Om,并且会导致后续一直无法回收内存。
当前策略
增大XX:MaxDirectMemorySize为512M,进行观察。
-XX:MaxDirectMemorySize=512M
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。