Java8 String内存优化之字符串常量池详解
作者:osnot
前言
工作中遇到一个场景,需要在本地缓存大量信息,上百万数量级,耗费了大量内存4~5G,调研发现其大部分是String类型文本,因机器内存有限,故希望减少该内存占用,从String字段入手。
本文章是实验不同情况下String占用内存的表现。
环境
- mac os 10.12.6
- java version “1.8.0_112”
- Java™ SE Runtime Environment (build 1.8.0_112-b16)
- Java HotSpot™ 64-Bit Server VM (build 25.112-b16, mixed mode)
- IntelliJ idea
- G1垃圾回收器
实验思路
结合网上的一些查询,Java 8中字符串已经放到堆上存储,故没有大小限制;
另外如果内存不足时会GC释放掉不再使用的字符串内存,故设计实验如下:
编号 | 名称 | 目的 |
---|---|---|
1 | 存放近1G的字符串内存 | 作为2对照实验 |
2 | 存放近1G的字符串内存-存入字符串常量池 | 证明字符串内存在堆上,无大小限制 |
3 | 存放近1G的字符串内存-相同字符 | 作为4对照实验 |
4 | 存放近1G的字符串内存-相同字符-字符串常量池 | 证明字符串常量池可复用内存 |
5 | 存放超过jvm内存的字符串内存-字符串常量池 | 作为6对照实验 |
6 | 存放超过jvm内存的字符串内存-字符串常量池-释放内存 | 证明字符串常量池可被释放 |
实验1和实验2-字符串内存无大小限制
实验1-存放近1G的字符串内存
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[] test1_1() { //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC String[] array = new String[35 * 1024 * 1024]; //37335040(3700万)次循环 for (int i = 0; i < 35 * 1024 * 1024; i++) { String str1 = new String("A"); array[i] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } return array; }
最后输出:
now i=36696064
now i=36697088
now i=36698112
now i=36699136
–end
–gc…
[Full GC (System.gc()) 1005M->980M(1024M), 3.8400925 secs]
[Eden: 3072.0K(44.0M)->0.0B(51.0M) Survivors: 7168.0K->0.0B Heap: 1005.5M(1024.0M)->980.4M(1024.0M)], [Metaspace: 3345K->3345K(1056768K)]
[Times: user=5.63 sys=0.23, real=3.84 secs]
[GC concurrent-mark-abort]
–gc end
Heap
garbage-first heap total 1048576K, used 1003938K [0x0000000780000000, 0x0000000780102000, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3351K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 369K, capacity 388K, committed 512K, reserved 1048576K
实验2-存放近1G的字符串内存-存入字符串常量池
String.intern()方法可以从字符串常量池中获取,如不存在则会添加到字符串常量池中,所以本实验使用该方法:
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[] test1_1() { //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC String[] array = new String[35 * 1024 * 1024]; //37335040(3700万)次循环 for (int i = 0; i < 35 * 1024 * 1024; i++) { String str1 = String.valueOf(i).intern(); array[i] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } return array; }
最后输出:
now i=16506880
now i=16507904
now i=16508928
now i=16509952
[GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.5090989 secs]
[Parallel Time: 497.8 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 159566.9, Avg: 159566.9, Max: 159567.0, Diff: 0.1]
[Ext Root Scanning (ms): Min: 129.4, Avg: 131.1, Max: 132.8, Diff: 3.4, Sum: 524.3]
[Update RS (ms): Min: 0.8, Avg: 1.1, Max: 1.4, Diff: 0.7, Sum: 4.6]
[Processed Buffers: Min: 1, Avg: 1.8, Max: 2, Diff: 1, Sum: 7]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 363.3, Avg: 365.3, Max: 367.1, Diff: 3.7, Sum: 1461.0]
[Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.5, Diff: 0.5, Sum: 1.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 497.7, Avg: 497.7, Max: 497.8, Diff: 0.1, Sum: 1991.0]
[GC Worker End (ms): Min: 160064.7, Avg: 160064.7, Max: 160064.7, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 11.2 ms]
[Evacuation Failure: 11.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 11.0M(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.1M(1024.0M)->1022.1M(1024.0M)]
[Times: user=1.10 sys=0.34, real=0.51 secs]
[GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.1444584 secs]
[Parallel Time: 144.1 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 160076.1, Avg: 160076.1, Max: 160076.1, Diff: 0.0]
[Ext Root Scanning (ms): Min: 105.2, Avg: 105.8, Max: 106.3, Diff: 1.1, Sum: 423.1]
[Update RS (ms): Min: 1.0, Avg: 1.5, Max: 1.9, Diff: 0.9, Sum: 6.0]
[Processed Buffers: Min: 2, Avg: 2.5, Max: 3, Diff: 1, Sum: 10]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 36.0, Avg: 36.4, Max: 36.9, Diff: 1.0, Sum: 145.5]
[Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.8, Diff: 0.8, Sum: 1.6]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 144.1, Avg: 144.1, Max: 144.1, Diff: 0.0, Sum: 576.3]
[GC Worker End (ms): Min: 160220.2, Avg: 160220.2, Max: 160220.2, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.0 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.1M(1024.0M)->1022.1M(1024.0M)]
[Times: user=0.56 sys=0.01, real=0.15 secs]
[GC concurrent-root-region-scan-start]
[GC concurrent-root-region-scan-end, 0.0000053 secs]
[GC concurrent-mark-start]
[Full GC (Allocation Failure) 1022M->1022M(1024M), 3.8210984 secs]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.1M(1024.0M)->1022.0M(1024.0M)], [Metaspace: 3347K->3347K(1056768K)]
[Times: user=5.67 sys=0.03, real=3.82 secs]
[Full GC (Allocation Failure) 1022M->1022M(1024M), 3.7742211 secs]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.0M(1024.0M)->1022.0M(1024.0M)], [Metaspace: 3347K->3347K(1056768K)]
[Times: user=5.64 sys=0.03, real=3.77 secs]
[GC concurrent-mark-abort]
[GC pause (G1 Evacuation Pause) (young), 0.1590867 secs]
[Parallel Time: 158.8 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 167816.3, Avg: 167816.4, Max: 167816.4, Diff: 0.1]
[Ext Root Scanning (ms): Min: 122.4, Avg: 123.8, Max: 125.6, Diff: 3.2, Sum: 495.1]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.2, Max: 1, Diff: 1, Sum: 1]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 33.0, Avg: 34.8, Max: 36.1, Diff: 3.2, Sum: 139.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 158.6, Avg: 158.7, Max: 158.7, Diff: 0.1, Sum: 634.7]
[GC Worker End (ms): Min: 167975.0, Avg: 167975.0, Max: 167975.1, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.0M(1024.0M)->1022.0M(1024.0M)]
[Times: user=0.57 sys=0.01, real=0.16 secs]
[GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.2154171 secs]
[Parallel Time: 215.2 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 167975.6, Avg: 167976.4, Max: 167979.0, Diff: 3.4]
[Ext Root Scanning (ms): Min: 162.4, Avg: 165.3, Max: 167.9, Diff: 5.5, Sum: 661.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.2, Max: 1, Diff: 1, Sum: 1]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 46.1, Avg: 48.9, Max: 52.5, Diff: 6.4, Sum: 195.4]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 211.7, Avg: 214.3, Max: 215.1, Diff: 3.4, Sum: 857.1]
[GC Worker End (ms): Min: 168190.7, Avg: 168190.7, Max: 168190.7, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.0 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap: 1022.0M(1024.0M)->1022.0M(1024.0M)]
[Times: user=0.71 sys=0.00, real=0.22 secs]
[GC concurrent-root-region-scan-start]
[GC concurrent-root-region-scan-end, 0.0000065 secs]
[GC concurrent-mark-start]
[Full GC (Allocation Failure) 1022M->400K(8192K), 0.7234020 secs]
[Eden: 0.0B(51.0M)->0.0B(3072.0K) Survivors: 0.0B->0.0B Heap: 1022.0M(1024.0M)->400.6K(8192.0K)], [Metaspace: 3347K->3347K(1056768K)]
[Times: user=1.29 sys=0.09, real=0.72 secs]
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.toString(Integer.java:401)
at java.lang.String.valueOf(String.java:3099)
at com.kite.java.StringTest.test1_1(StringTest.java:26)
at com.kite.java.StringTest.main(StringTest.java:14)
[GC concurrent-mark-abort]
Heap
garbage-first heap total 8192K, used 400K [0x0000000780000000, 0x0000000780100040, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3377K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 372K, capacity 388K, committed 512K, reserved 1048576K
最终添加16509952(1651万)次左右(因1024 * 1024循环才会输出一次,故近似)最终报错内存不足,因和第一个实验不同,本次为避免重复文本,故每次循环使用的文本均不同,故所占内存较大。
放到Xmx限制可以存放更多内存(这里不再实验),可见字符串常量池也没有内存限制。
实验3和实验4-字符串常量池会复用内存
实验3
可用实验1数据。
实验4
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[] test1_1() { //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC String[] array = new String[35 * 1024 * 1024]; //37335040(3700万)次循环 for (int i = 0; i < 35 * 1024 * 1024; i++) { String str1 = String.valueOf("A").intern(); array[i] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } return array; }
最终输出:
now i=36696064
now i=36697088
now i=36698112
now i=36699136
–end
–gc…
[Full GC (System.gc()) 147M->140M(269M), 0.5767602 secs]
[Eden: 7168.0K(12.0M)->0.0B(15.0M) Survivors: 1024.0K->0.0B Heap: 147.0M(269.0M)->140.4M(269.0M)], [Metaspace: 3346K->3346K(1056768K)]
[Times: user=0.54 sys=0.01, real=0.58 secs]
–gc end
Heap
garbage-first heap total 275456K, used 143778K [0x0000000780000000, 0x0000000780100868, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3352K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 369K, capacity 388K, committed 512K, reserved 1048576K
假设去掉循环,只是初始化数组:
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[] test1_1() { //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC String[] array = new String[35 * 1024 * 1024]; // //37335040(3700万)次循环 // for (int i = 0; i < 35 * 1024 * 1024; i++) { // String str1 = String.valueOf("A").intern(); // array[i] = str1; // if (i % 1024 * 1024 == 0) { // System.out.println("now i=" + i); // } // } return array; }
最后输出:
[Full GC (System.gc()) 140M->140M(269M), 0.3415958 secs]
[Eden: 1024.0K(14.0M)->0.0B(15.0M) Survivors: 1024.0K->0.0B Heap: 140.8M(269.0M)->140.4M(269.0M)], [Metaspace: 3308K->3308K(1056768K)]
[Times: user=0.33 sys=0.00, real=0.35 secs]
–gc end
Heap
garbage-first heap total 275456K, used 143773K [0x0000000780000000, 0x0000000780100868, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3314K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 369K, capacity 388K, committed 512K, reserved 1048576K
无论有无添加字符串进数组,GC后只使用了140M内存。而对比实验3,占用了980M内存,可见字符串常量池是复用内存的。
实验5和实验6-字符串常量池-释放内存
实验5-常量池内存大小超过jvm
结合实验2数据,将16509952次循环数据分2次放入内存中:
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[][] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[][] test1_1() { String[][] result = new String[2][]; //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC //16509952(1651万)一半次循环 String[] array1 = new String[16509952 / 2]; for (int i = 0; i < 16509952 / 2; i++) { String str1 = String.valueOf(i).intern(); array1[i] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } result[0] = array1; String[] array2 = new String[16509952 / 2]; System.out.println((16509952 / 2) + "次已经添加完毕,另一半开始添加:"); //16509952 (1651万)另一半次循环 for (int i = 16509952 / 2; i < 16509952; i++) { String str1 = String.valueOf(i).intern(); array2[i - 16509952 / 2] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } result[1] = array2; return result; }
最后输出:
now i=16506880
now i=16507904
now i=16508928
–end
–gc…
[Full GC (System.gc()) 970M->945M(1024M), 4.4443937 secs]
[Eden: 12.0M(44.0M)->0.0B(51.0M) Survivors: 6144.0K->0.0B Heap: 971.0M(1009.0M)->945.0M(1024.0M)], [Metaspace: 3348K->3348K(1056768K)]
[Times: user=6.17 sys=0.06, real=4.44 secs]
[GC concurrent-mark-abort]
–gc end
Heap
garbage-first heap total 1048576K, used 967702K [0x0000000780000000, 0x0000000780102000, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3354K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 369K, capacity 388K, committed 512K, reserved 1048576K
共占用内存945M。
实验6-常量池内存大小超过jvm-字符串常量池-释放内存
在第一半添加进入后,不加入到最后返回的结果中(即没有引用可以被回收):
public static void main(String[] args) throws InterruptedException { System.out.println("--begin..."); String[][] result = test1_1(); System.out.println("--end"); System.out.println("--gc..."); System.gc(); System.out.println("--gc end"); } private static String[][] test1_1() { String[][] result = new String[2][]; //jvm:-Xmx1G -XX:+PrintGCDetails -XX:+UseG1GC //16509952(1651万)一半次循环 String[] array1 = new String[16509952 / 2]; for (int i = 0; i < 16509952 / 2; i++) { String str1 = String.valueOf(i).intern(); array1[i] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } // result[0] = array1; String[] array2 = new String[16509952 / 2]; System.out.println((16509952 / 2) + "次已经添加完毕,另一半开始添加:"); //16509952 (1651万)另一半次循环 for (int i = 16509952 / 2; i < 16509952; i++) { String str1 = String.valueOf(i).intern(); array2[i - 16509952 / 2] = str1; if (i % 1024 * 1024 == 0) { System.out.println("now i=" + i); } } result[1] = array2; return result; }
最后输出:
now i=16506880
now i=16507904
now i=16508928
–end
–gc…
[Full GC (System.gc()) 522M->472M(864M), 2.5913440 secs]
[Eden: 36.0M(37.0M)->0.0B(43.0M) Survivors: 6144.0K->0.0B Heap: 523.0M(864.0M)->472.8M(864.0M)], [Metaspace: 3348K->3348K(1056768K)]
[Times: user=3.74 sys=0.02, real=2.59 secs]
–gc end
Heap
garbage-first heap total 884736K, used 484110K [0x0000000780000000, 0x0000000780101b00, 0x00000007c0000000)
region size 1024K, 1 young (1024K), 0 survivors (0K)
Metaspace used 3354K, capacity 4564K, committed 4864K, reserved 1056768K
class space used 369K, capacity 388K, committed 512K, reserved 1048576K
可见最后只用了472M,说明之前添加的已经被回收了。
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。