Java String Pool—How Unbounded intern() Causes PermGen OOM
After 3 days, a trading platform hit PermGen OOM from unbounded String.intern() on 500K XML tags.
20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.
- The String Pool is a JVM-managed hash table of canonical String references
- String literals are automatically interned at class load time
- new String() creates a heap object outside the pool
- intern() looks up or inserts into the pool, with O(1) average cost
- The pool moved from PermGen to heap in Java 7, making pooled strings GC-eligible
- G1 String Deduplication is a separate mechanism that shares backing byte[] arrays at GC time
The Java String Pool is a dedicated heap region—historically in PermGen (pre-Java 7) and now in the main heap—that caches String literals and explicitly interned strings to reduce memory duplication. When you write String s = "hello", the JVM checks the pool first; if a matching string exists, it returns the pooled reference instead of creating a new object.
This is why "hello" == "hello" is true in Java, but new String("hello") == "hello" is false unless you call . The pool exists because strings are ubiquitous in Java applications—often 25-40% of heap—and deduplicating them at the JVM level saves significant memory, especially in data-heavy systems like web servers, ORM caches, or configuration loaders.intern()
The critical gotcha: String.intern() is unbounded by default. Every call to adds a new entry to the pool if the string isn't already there, and the pool never shrinks. In pre-Java 8, this lived in PermGen (a fixed-size, non-GC-scanned region), so a single loop calling intern() on unique strings—like dynamically generated SQL queries, XML tag names, or user input—could exhaust PermGen with an intern()OutOfMemoryError: PermGen space.
Even post-Java 8, where the pool moved to the main heap and is GC-eligible, unbounded can still cause heap exhaustion because the pool is a intern()HashMap-like structure in the JVM's internal StringTable that only grows. Production incidents often trace back to frameworks or libraries that aggressively intern strings without limits—Apache XMLBeans, old Hibernate versions, or custom caching layers.
Alternatives exist: G1 GC's string deduplication (enabled with -XX:+UseStringDeduplication) automatically deduplicates char[] arrays of live strings during GC pauses, without polluting the pool or risking OOM. For controlled use cases, WeakHashMap<String, WeakReference<String>> gives you a bounded, GC-friendly interning cache.
The rule of thumb: never call on strings you don't control the cardinality of. If you must intern, use a bounded cache with eviction—Guava's intern()Interners.newWeakInterner() is production-safe. The pool is a performance optimization, not a memory management tool; treat it like a global mutable cache with no eviction policy, because that's exactly what it is.
Imagine a school library with one copy of every textbook. Instead of printing a new copy every time a student needs 'Harry Potter', the librarian just hands everyone the same book. Java's String Pool works exactly like that library — when two parts of your code use the literal 'hello', the JVM hands them both the same object from a shared shelf instead of making two copies. This saves memory and makes comparisons lightning-fast. The 'intern()' method is your way of asking the librarian to shelve a book you brought from outside.
Strings are the most-created objects in virtually every Java application. A typical web service deserves thousands of 'GET', 'Content-Type', and status strings flowing through it every second. Without some form of deduplication, the heap would fill up with byte-for-byte identical objects doing nothing but wasting RAM — and that was exactly the situation Java's designers were trying to prevent before version 1.0 shipped. The String Pool (also called the String Intern Pool or String Constant Pool) is the JVM's answer to that problem, and understanding it is not optional for anyone who writes Java professionally.
The pool solves two problems at once: memory efficiency and comparison speed. When the JVM loads a class, it already knows every string literal baked into that class file. By stashing them in one canonical location, the runtime avoids duplicate allocations and lets you compare those strings with a cheap pointer comparison instead of a character-by-character walk. The trade-off — and there always is one — is that the pool itself occupies memory and has its own GC lifecycle, which changed dramatically in Java 7 and again in Java 8.
By the end of this article you'll know exactly where the pool lives in JVM memory and why that location changed, what happens byte-by-byte when you write a string literal versus 'new String()', how 'intern()' works and when it's worth calling, how to profile pool pressure in a running application, and the three mistakes that trip up even experienced engineers in code reviews. You'll also have crisp answers to the interview questions that consistently separate candidates who truly understand Java from those who just use it.
How Java's String Pool Really Works
The Java String Pool is a dedicated heap region (historically in PermGen, now in the main heap) that stores unique String literals and explicitly interned strings. When you write String s = "hello", the JVM checks the pool first: if "hello" exists, s points to the existing object; if not, a new String is created and added. This is a flyweight pattern built into the language — it saves memory by deduplicating identical string values at runtime.
Internally, the pool is a hash table. String.intern() is the manual entry point: calling it on any String object either returns a pooled reference (if an equal string exists) or adds the current string to the pool. The key property: pooled strings are never garbage collected as long as the class that loaded them is alive — in older JVMs with PermGen, this meant they lived forever, causing the classic OOM. In modern HotSpot (Java 7+), the pool lives in the main heap and is subject to GC, but intern() still pins strings for the lifetime of their defining class loader.
Use intern() sparingly — only when you have a bounded, well-known set of strings (e.g., HTTP method names, status codes, enum-like constants). For unbounded data (user input, log messages, database values), intern() is a memory bomb. The pool is a tool for canonicalization, not a general-purpose cache. Misunderstanding this distinction has caused countless production outages.
intern() on every string in a loop can silently fill the pool with millions of entries, triggering a full GC pause or OOM — even on modern JVMs.intern() to deduplicate user-agent strings in a web server, assuming GC would clean them up. The pool grew to 2 million entries, causing 10-second GC pauses and eventual PermGen OOM. Rule: never intern() unbounded input — use a bounded LRU cache instead.intern() is for canonicalization only.intern() is a memory leak; always bound the set of strings you pool.Where the String Pool Lives — and Why It Moved
Before Java 7, the String Pool lived in PermGen (Permanent Generation), a fixed-size memory region outside the regular heap. PermGen stored class metadata, interned strings, and other JVM internals. The hard ceiling on PermGen size meant that applications with large numbers of unique interned strings — think XML parsers, ORMs loading thousands of column names, or apps that called intern() naively — would hit 'java.lang.OutOfMemoryError: PermGen space' and crash. Tuning required guessing '-XX:MaxPermSize' upfront, and getting it wrong meant either wasted reserved memory or production outages.
Java 7 moved the String Pool onto the main heap. This was a quiet but massive change. The pool can now grow and shrink with the rest of heap allocations, is subject to normal GC pressure, and participates in full GC cycles. Pooled strings that are no longer referenced by any live class loader or String variable can finally be collected. Java 8 went further and eliminated PermGen entirely, replacing it with Metaspace (native memory), which makes the old PermGen OOM effectively impossible for string-related reasons.
The practical consequence: on Java 7+ you don't need to panic about the pool size for normal applications, but you still need to understand its structure because careless use of intern() on dynamic strings can still create subtle memory leaks by anchoring objects to the heap longer than you expect.
public class StringPoolLocation { public static void main(String[] args) { // Literal strings: JVM places these in the String Pool at class-load time. // Both variables point to THE SAME object in the pool. String greeting1 = "hello"; String greeting2 = "hello"; // new String() bypasses the pool and allocates on the regular heap. // This creates a BRAND NEW object, even though the content is identical. String greeting3 = new String("hello"); // intern() looks up the pool for a canonical copy. // If "hello" is already pooled (it is — we declared it as a literal above), // intern() returns that pooled reference. No new object is created. String greeting4 = greeting3.intern(); System.out.println("=== Reference Equality (==) ==="); // true — both literals resolve to the same pooled object System.out.println("greeting1 == greeting2 : " + (greeting1 == greeting2)); // false — greeting3 is a heap object, NOT the pooled reference System.out.println("greeting1 == greeting3 : " + (greeting1 == greeting3)); // true — intern() returned the same pooled object that greeting1 points to System.out.println("greeting1 == greeting4 : " + (greeting1 == greeting4)); System.out.println("\n=== Value Equality (equals) ==="); // All three print true — equals() compares characters, not memory addresses System.out.println("greeting1.equals(greeting2) : " + greeting1.equals(greeting2)); System.out.println("greeting1.equals(greeting3) : " + greeting1.equals(greeting3)); System.out.println("greeting1.equals(greeting4) : " + greeting1.equals(greeting4)); System.out.println("\n=== Identity Hash Codes (approximates memory address) ==="); // greeting1 and greeting2 will show the SAME hash — same object System.out.println("greeting1 identity: " + System.identityHashCode(greeting1)); System.out.println("greeting2 identity: " + System.identityHashCode(greeting2)); // greeting3 will show a DIFFERENT hash — different heap object System.out.println("greeting3 identity: " + System.identityHashCode(greeting3)); // greeting4 matches greeting1 — intern() handed back the pooled reference System.out.println("greeting4 identity: " + System.identityHashCode(greeting4)); } }
intern() in a loop, or upgrade to Java 8+ where the problem is structurally eliminated.How the JVM Populates the Pool — Compile Time vs Runtime
The pool is not populated by some magic background process — it fills up in two distinct phases, and confusing them causes real bugs.
Phase 1 — Compile time: The Java compiler (javac) scans your source for string literals and writes them into the class file's constant pool section. When the JVM loads that class, it resolves those constant pool entries and interns each unique string literal automatically. This is why two separate .java files that both declare 'status = "active"' end up sharing the same pooled object at runtime — the interning happens as part of class loading, before your main() even runs.
Phase 2 — Runtime via intern(): Any string created dynamically at runtime — from user input, file reads, network data, StringBuilder.toString(), String.format(), and so on — starts its life as a plain heap object. It has nothing to do with the pool unless you explicitly call intern() on it. When you call intern(), the JVM looks up its internal hash table (the pool's backing data structure). If it finds a string with equal content, it returns that reference. If not, it adds this string to the pool and returns it.
String concatenation with '+' is worth its own paragraph. When you write 'String result = "foo" + "bar"', the compiler collapses constant expressions at compile time — the bytecode contains a single literal 'foobar', not a concatenation. But 'String result = prefix + suffix' where either operand is a variable produces a StringBuilder call at runtime, yielding a heap object that is NOT pooled.
public class StringPoolPopulation { // This constant is resolved at COMPILE TIME. // The bytecode for this class will contain the literal "active" in its constant pool. static final String COMPILE_TIME_STATUS = "active"; public static void main(String[] args) { // --- Compile-time constant folding --- // The compiler sees two string literals being concatenated. // It folds them into one literal "activeuser" at compile time. // Bytecode: ldc "activeuser" — a single load-constant instruction. String foldedAtCompile = "active" + "user"; // This is also the literal "activeuser" — same pooled object. String explicitLiteral = "activeuser"; // true — compiler folded the concatenation; both are the same pooled object System.out.println("Compile-time fold == literal: " + (foldedAtCompile == explicitLiteral)); // --- Runtime concatenation — NOT folded --- String roleSuffix = "user"; // roleSuffix is a variable, not a compile-time constant // At runtime the JVM calls: // new StringBuilder().append("active").append(roleSuffix).toString() // toString() allocates a NEW String on the heap. Not pooled. String builtAtRuntime = "active" + roleSuffix; // false — builtAtRuntime is a heap object, NOT the pooled "activeuser" System.out.println("Runtime concat == literal : " + (builtAtRuntime == explicitLiteral)); // true — content is the same; equals() doesn't care about pool membership System.out.println("Runtime concat .equals() : " + builtAtRuntime.equals(explicitLiteral)); // --- intern() bridges the gap --- // Force the runtime-built string into the pool (or get back the existing entry). String internedRuntime = builtAtRuntime.intern(); // true — intern() returned the canonical pooled reference System.out.println("After intern() == literal : " + (internedRuntime == explicitLiteral)); // --- final fields ARE compile-time constants (if primitives or String literals) --- final String finalPrefix = "active"; // treated as a compile-time constant String builtFromFinal = finalPrefix + "user"; // compiler CAN fold this // true — because finalPrefix is a compile-time constant, the compiler folds it System.out.println("Final field fold == literal: " + (builtFromFinal == explicitLiteral)); } }
intern() Internals, Performance Cost, and When It's Worth It
The String Pool is backed by a fixed-size hash table inside the JVM (implemented in native C++ code in HotSpot). The default table size is 60013 buckets in Java 8 (a prime number to reduce hash collisions). You can tune it with the JVM flag '-XX:StringTableSize=N'. Each bucket is a linked list of String references — a classic separate-chaining hash table.
Every intern() call does the following: compute the string's hash, lock the relevant bucket (the table uses striped locking, so it's not a global lock), walk the bucket's chain looking for a matching string using equals(), and either return the found reference or insert the new one and return it. This means intern() is not free — it has a synchronisation cost and a hash-computation cost. On a highly concurrent system, hammering intern() from many threads on strings that map to the same bucket can create hot lock contention.
So when is intern() worth it? The classic legitimate use cases are: (1) Parsing large datasets where the same string value repeats millions of times — think reading CSV files where a column has 10 distinct values but 10 million rows. Interning the column values collapses those 10 million heap objects to 10 pooled references, saving significant RAM. (2) Implementing fast string-keyed caches where you want identity equality for keys. Outside these cases, don't intern(). The JVM's GC is better at managing short-lived string objects than you are at managing a pool that never shrinks until full GC.
import java.util.ArrayList; import java.util.List; public class InternPerformanceDemo { // Simulate a dataset where only 5 distinct country codes appear // but they repeat across millions of records. private static final String[] COUNTRY_CODES = {"US", "GB", "DE", "FR", "JP"}; public static void main(String[] args) throws InterruptedException { final int RECORD_COUNT = 5_000_000; // --- Scenario A: No interning — 5 million heap String objects --- List<String> rawStrings = new ArrayList<>(RECORD_COUNT); long beforeRaw = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); for (int i = 0; i < RECORD_COUNT; i++) { // new String() forces a fresh heap allocation every time. // Even though the content is one of only 5 values, we create 5M objects. String countryCode = new String(COUNTRY_CODES[i % COUNTRY_CODES.length]); rawStrings.add(countryCode); } long afterRaw = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); System.out.printf("Without intern(): ~%,d bytes used for string objects%n", (afterRaw - beforeRaw)); rawStrings = null; // allow GC of the raw list System.gc(); Thread.sleep(200); // give GC a moment // --- Scenario B: With interning — only 5 pooled objects, list holds 5M refs --- List<String> internedStrings = new ArrayList<>(RECORD_COUNT); long beforeInterned = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); for (int i = 0; i < RECORD_COUNT; i++) { // intern() ensures we store a reference to one of 5 canonical pool objects. // The temporary new String() object becomes immediately eligible for GC. String countryCode = new String(COUNTRY_CODES[i % COUNTRY_CODES.length]).intern(); internedStrings.add(countryCode); } long afterInterned = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); System.out.printf("With intern(): ~%,d bytes used for string objects%n", (afterInterned - beforeInterned)); // --- Verify correctness: interned copies are reference-equal --- String firstEntry = internedStrings.get(0); // "US" String sixthEntry = internedStrings.get(5); // "US" again (index 5 % 5 == 0) // true — both are the same pooled "US" object System.out.println("\nSame pooled reference for repeated value: " + (firstEntry == sixthEntry)); // The interned country codes are reference-equal to the original literals. // "US" was already in the pool because we have a literal COUNTRY_CODES array. System.out.println("Interned 'US' == literal 'US': " + (firstEntry == "US")); } }
intern() in hot paths; prefer bounded domains.intern() only for bounded, high-repetition string domains.intern() anchors objects.String Deduplication (G1 GC) — The Pool's Lesser-Known Sibling
Java 8u20 introduced G1 GC String Deduplication (-XX:+UseStringDeduplication), and many engineers confuse it with the String Pool. They are completely different mechanisms solving the same problem from different angles.
The String Pool is proactive and developer-driven: you opt in by writing a literal or calling intern(). String Deduplication is reactive and JVM-driven: the G1 GC garbage collector, during a concurrent marking phase, scans surviving String objects on the heap, hashes their underlying char[] (or byte[] since Java 9's compact strings), and replaces duplicate backing arrays with a single shared reference. The String objects themselves remain as separate heap objects — only the backing character arrays are deduplicated.
This matters for a few reasons. Deduplication only applies to strings that have survived at least one GC cycle (young-gen objects are not deduplicated). It has a small CPU overhead during GC pauses. It does NOT make == comparisons return true for duplicates — you still get false for two String objects that point to the same deduplicated char[]. It's a transparent memory saving that requires no code changes, which makes it excellent for legacy codebases where you can't audit every string creation.
The rule of thumb: use the String Pool (via literals and careful intern()) when you need reference equality and maximum control. Enable String Deduplication when you're inheriting a large codebase with high string memory usage and can't refactor the allocation sites. They're complementary, not competing.
public class DeduplicationVsPool { /** * Run with: java -XX:+UseG1GC -XX:+UseStringDeduplication * -XX:+PrintStringDeduplicationStatistics * DeduplicationVsPool * * This demo highlights the conceptual difference between the String Pool * and G1 String Deduplication. */ public static void main(String[] args) throws InterruptedException { // --- String Pool behaviour (reference equality) --- String pooledA = "transaction"; // goes into pool at class-load time String pooledB = "transaction"; // JVM returns the SAME pooled reference // true — same object in the pool System.out.println("Pool: pooledA == pooledB → " + (pooledA == pooledB)); // --- Heap strings (candidates for G1 deduplication) --- // These are NOT in the pool — they're regular heap objects. // new String(char[]) always allocates fresh, regardless of content. String heapA = new String(new char[]{'t','r','a','n','s','a','c','t','i','o','n'}); String heapB = new String(new char[]{'t','r','a','n','s','a','c','t','i','o','n'}); // false — two separate heap objects, even if G1 later deduplicates their char[] System.out.println("Heap: heapA == heapB → " + (heapA == heapB)); // true — character content is identical System.out.println("Heap: heapA.equals(heapB) → " + heapA.equals(heapB)); // Trigger a GC cycle so G1 can deduplicate if the flag is set. // After this, heapA and heapB's INTERNAL byte[] may be the same object // (G1 deduplication), but heapA and heapB themselves are still different. System.gc(); Thread.sleep(500); // Still false — deduplication only collapses the backing array, // NOT the String wrapper objects. == still compares object references. System.out.println("After GC: heapA == heapB → " + (heapA == heapB)); // --- intern() converts a heap string to a pool reference --- String internedA = heapA.intern(); String internedB = heapB.intern(); // true — both now refer to the canonical pooled "transaction" System.out.println("After intern(): internedA == internedB → " + (internedA == internedB)); // true — pooled reference equals the original literal System.out.println("internedA == pooledA → " + (internedA == pooledA)); } }
String Pool Anti-Patterns: What Breaks in Production
Even with the pool on the heap, several anti-patterns cause production headaches:
1. Interning every string from an unbounded source. If you call intern() on user input, HTTP headers, or any data with unlimited distinct values, you'll grow the pool unboundedly. The pool never shrinks until a full GC removes entries with no references. But if references are held by caches or collections, those strings stay forever — a slow memory leak.
2. Using == after a hand-off across components. One component interns strings, another doesn't. The == check passes in unit tests where both sides use literals but fails in production where one side gets a heap string. This is the silent logic bug that only surfaces in integration environments.
3. Ignoring the StringTableSize default. If your application legitimately needs a large pool (e.g., an in-memory store with 500k unique strings), the default 60013 buckets cause deep hash chains. intern() degrades from O(1) to O(n) per operation. Profiling shows high CPU in StringTable::intern.
4. Confusing intern() with deduplication. Engineers sometimes expect G1 deduplication to make their == comparisons work. It doesn't. They add intern() calls anyway, defeating the purpose of deduplication.
The fix for all: know your data cardinality. Profile the pool size. Tune the table if needed. Always equals() for safety, intern() only for performance where you know the domain.
public class StringPoolAntiPatterns { // Anti-pattern 1: interning unbounded data public static void antiPattern1(String userInput) { // userInput comes from an HTTP request — could be anything. // Interning it grows the pool with every distinct input. String interned = userInput.intern(); // NEVER DO THIS } // Anti-pattern 2: assuming == works across boundaries public static boolean antiPattern2(String fromDB) { // fromDB is not interned — it's a heap string from JDBC return fromDB == "PENDING"; // Always false } // Correct way public static boolean correctCheck(String fromDB) { return "PENDING".equals(fromDB); // Always correct } // Anti-pattern 3: large pool without tuning // Run with -XX:StringTableSize=1000003 to reduce chains public static void main(String[] args) { // Simulate checking statuses System.out.println(correctCheck("PENDING")); // true System.out.println(antiPattern2("PENDING")); // false — BUG } }
equals().intern() is a slow memory leak on modern JDKs.equals() unless you control both sides' pooling.intern() everything — know your data cardinality.equals() is safer than ==.Garbage Collection and the String Pool: What Actually Gets Freed
Most junior devs think strings in the pool live forever. That was true before Java 7, when the pool lived in PermGen — a fixed-size region that never got garbage collected. Classic memory leak. Then Java 7 moved the pool to the heap, and everything changed. Now the pool behaves like any other heap region: unreferenced interned strings can be GC'd. But here's the trap — strings interned via intern() or loaded from class files at compile time often have strong references from the classloader. If you're running a Spring Boot app with hot reload or custom classloaders, those strings stick around until the classloader dies. That's why long-running applications with aggressive interning can silently fill the old gen. The JVM doesn't leak — your classloader's reference graph does.
// io.thecodeforge public class StringPoolGcDemo { public static void main(String[] args) { // Force interning — this string lives in the pool String interned = new String("production-secret").intern(); // Clear the only reference interned = null; // Suggest GC — still might not collect if classloader holds ref System.gc(); // Demonstrate the pool entry persists via class reference String same = "production-secret"; System.out.println("Still alive? " + same); // Always prints } }
intern()ed strings are safe to leak. If you're building a plugin system or dynamic classloading framework, interned strings are pinned until the classloader unloads. Monitor old gen growth with -XX:+PrintStringTableStatistics.String Pool and Spring Boot: Property Resolution Under the Hood
Spring Boot loads thousands of property keys at startup. Every @Value("${something}") resolves through the Environment abstraction, which creates strings for keys and values. Those strings are not interned by default. But here's the kicker: property placeholders like ${server.port} get resolved by parsing the expression into a String literal, then matched against the pool if the same placeholder appears elsewhere. Spring doesn't call intern() — but the way it concatenates and caches property sources means duplicate placeholders often end up sharing the same underlying char array via String.substring() in Java 7+. In practice, if you define a property in application.yml and reference it ten times, you get ten different String objects unless the JVM deduplicates them via G1 GC. The fix is trivial: call .intern() on high-cardinality property values at the point of injection, or use Spring's Constant bean pattern.
// io.thecodeforge @Component public class PropertyConsumer { private final String dbUrl; public PropertyConsumer(@Value("${app.database.url}") String rawUrl) { // Force the pool to deduplicate — critical for high-volume services this.dbUrl = rawUrl.intern(); } public String getDbUrl() { return dbUrl; // Guaranteed single copy in pool } }
Java 9+ Compact Strings: Why Your Pool Just Got Smaller
Before Java 9, every char in a String took two bytes — UTF-16 internally. Most real-world strings are Latin-1 (one byte per char). Java 9 introduced Compact Strings: if all characters fit in a byte, the JVM stores the String as byte[] instead of char[]. This cuts memory usage in half for most strings. The String Pool inherits this optimization automatically. When you intern a string like "error-404", the pool entry uses a byte array — half the old size. But there's a subtle performance trap: Compact Strings add a coder field to every String object (Latin1 vs UTF16). The JVM checks this coder on every equality comparison, substring, and concatenation. In hot paths, that extra check adds branch mispredictions. For high-throughput services with millions of interned short strings (HTTP status codes, metric names, log levels), the space savings dwarf the micro-branch cost. Configure -XX:-CompactStrings only if profiling proves the branch cost exceeds heap savings — I've never seen it happen.
// io.thecodeforge public class CompactStringCheck { public static void main(String[] args) { // All Latin-1 characters — stored as byte[] in Java 9+ String latin1 = "status:200".intern(); // Contains a Unicode character — stored as char[] String utf16 = "status:200€".intern(); System.out.println("Latin-1 length: " + latin1.length()); System.out.println("UTF-16 length: " + utf16.length()); // Both look the same, but memory differs by ~50% } }
The PermGen OOM That Brought Down a Trading Platform
String.intern() on every parsed element name and attribute string. With 500,000 unique XML tag names across different message schemas, the pool filled PermGen.intern() dynamic strings. 3. Added -XX:StringTableSize=1000003 to reduce collisions after upgrade.- Never call
intern()on unbounded dynamic strings — you're anchoring them permanently. - On Java 6 and below, the pool in PermGen is a finite resource that cannot be GC'd.
- Upgrade to Java 8+ eliminates PermGen entirely for new apps.
intern() calls with -XX:+PrintStringTableStatistics. Upgrade to Java 8+ or remove unnecessary intern() calls.String.intern(). Switch to HashMap<String, String> with .intern() removed, or use a dedicated intern pool with ConcurrentHashMap.intern() on bounded domains.equals() for all value comparisons. Audit code for reliance on string literal interning beyond compile-time constants.jcmd <pid> VM.stringtablejcmd <pid> GC.heap_info | grep -i stringintern() calls or increase StringTableSize on next restart.jstack <pid>Thread.getAllStackTraces() via JMXintern() with ConcurrentHashMap<String, String> for dynamic strings.jstat -gcpermcapacity <pid>jmap -permgen <pid>intern() loops.Use System.identityHashCode() on suspected stringsCheck if .intern() was called| Aspect | String Pool (intern()) | G1 String Deduplication |
|---|---|---|
| Mechanism | Hash table of canonical String references in heap (Java 7+) | GC scans surviving Strings; shares backing byte[] arrays |
| Trigger | Explicit: string literal or intern() call | Automatic: happens during G1 concurrent GC phase |
| Effect on == | Makes == return true for equal-content strings | No effect — == still returns false for separate String objects |
| Memory saved | Entire String object + backing array deduplicated | Only the backing byte[] array is shared; String wrappers remain |
| GC eligibility | Pooled strings collected when no live references remain (Java 7+) | Only strings surviving at least one GC cycle are candidates |
| CPU overhead | intern() hash lookup + possible lock contention per call | Small overhead during GC concurrent marking phase |
| Code changes required | Yes — must use literals or call intern() | No — enable with JVM flag only |
| Best use case | Known finite sets of repeated strings; cache keys | Legacy codebases with high string memory; no refactoring budget |
| JVM flag | N/A (built-in behaviour) | -XX:+UseG1GC -XX:+UseStringDeduplication |
| Available since | Java 1.0 (PermGen); modern behaviour since Java 7 | Java 8u20 |
Key takeaways
Common mistakes to avoid
3 patternsComparing strings with == instead of equals()
equals() for value comparison. Use == only when you've explicitly interned both sides and need the performance of a pointer comparison.Calling intern() on every string in a high-throughput path
intern() call acquires a striped lock on the pool's hash table bucket. Calling it millions of times per second on distinct strings floods the pool, creates long bucket chains, and turns a theoretically O(1) operation into O(n).Intern() only strings from a bounded, known-finite domain (status codes, country codes, enum-like values). For arbitrary user data, use equals() and let the GC manage the heap normally.Assuming the String Pool was always on the heap
intern() to avoid PermGen OOM' on Java 8+ applications, creating unnecessary pool pressure. Worse: assuming that because intern() 'saves memory' it should be used everywhere.intern() for the data-processing use case described above, and use 'jcmd <pid> VM.stringtable' to inspect actual pool statistics before optimising.Interview Questions on This Topic
Can a string in the pool be garbage collected? Walk me through the answer for Java 6 versus Java 7 and later.
You have a method that reads 50 million rows from a CSV file where a 'status' column contains only the values 'PENDING', 'ACTIVE', or 'CLOSED'. Would you call intern() on each status string? Why or why not — and what are the trade-offs?
intern() each status string. The domain is bounded (only 3 distinct values) and the repetition is massive (50 million rows). Without interning, you'd create 50 million identical heap objects, consuming enormous memory. With intern(), you get exactly 3 pooled String objects and 50 million references to them. The CPU cost of the intern() call per row is negligible compared to the memory savings. The trade-off is that interning adds synchronization overhead, but since all three values hash to different buckets, contention is minimal. If the dataset were unbounded (e.g., free-text descriptions), intern() would be a disaster — I'd let GC handle it.What does this print and why? — String a = new String("hello").intern(); String b = "hello"; System.out.println(a == b); — Follow-up: what if you remove the .intern() call?
intern() returns the canonical pooled reference, which is the same object as b. So both refer to the same pooled object. Second case: without intern(), a is a fresh heap object, b is the pooled reference. Then a == b prints false because they are different objects. This demonstrates why equals() is safer — it works correctly in both cases.Frequently Asked Questions
Yes, but with nuance. The pool's underlying hash table uses striped locking — each bucket has its own lock rather than one global lock. This means concurrent intern() calls on strings hashing to different buckets can proceed in parallel. However, high throughput on a narrow set of hash buckets can still cause contention. The pool structure itself is safe; the performance characteristics under concurrency require profiling.
No. Methods like String.valueOf(42) and Integer.toString(someNumber) return new heap-allocated String objects, not pooled ones. The only way a runtime-created string ends up in the pool is via explicit intern() or if the JVM's G1 deduplication shares its backing byte array (which doesn't affect == equality anyway). If you need the result pooled, call .intern() on the return value — but only if you have a genuine use case for it.
The == operator always compares object references — memory addresses. It happens to return true for string literals and compile-time constant concatenations because those all resolve to the same pooled object. It fails for runtime-created strings (new String(), StringBuilder.toString(), method return values) because those are separate heap objects. The safe rule: always use equals() for string value comparison. Treat any == returning true for strings as an implementation detail, not a contract.
Use the 'jcmd' command: 'jcmd <pid> VM.stringtable'. This prints the number of entries, bucket distribution, and memory usage. Alternatively, you can use '-XX:+PrintStringTableStatistics' at JVM startup to see pool details at exit. For live monitoring, JMX MBeans don't expose the pool directly, but 'jcmd' is the standard tool.
The entire string gets added to the pool's hash table. On modern JDKs, this is allowed, but it consumes heap space proportional to the string size. Since the pool never shrinks until a full GC, a very long interned string will stay in heap until no references remain. For large strings, consider using a different caching strategy (e.g., a separate WeakHashMap) rather than interning, because intern() is designed for many small strings, not large blobs.
20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.
That's Strings. Mark it forged?
10 min read · try the examples if you haven't