What is Garbage Collection?
Garbage Collection in computer science is
a form of automatic memory management. The garbage collector, or just
collector, attempts to reclaim garbage, or memory occupied by objects that are
no longer in use by the program. Garbage collection does not traditionally
manage limited resources other than memory that typical programs use, such as
network sockets, database handles, user interaction windows, and file and
device descriptors.
History:
Garbage collection was invented by John
McCarthy around 1959 to solve problems in Lisp.
Basic principle of Garbage Collection:
The basic principles of garbage collection are:
Find data objects in a program that
cannot be accessed in the futureReclaim the resources used by those objects
It is kind of interesting to know how the
objects without reference are found. Java normally finds all the objects that
have reference and then regards rest of the objects are reference less – which
is in fact a very smart way of finding the unreferenced java objects.
Types of Java Garbage Collectors/Garbage Collection Algorithms:
On J2SE 5.0 and above, one can normally
find the following types of Java Garbage collectors that the programmers can
normally choose to do a garbage collection through JVM Parameters.
1.) Serial Collector:
JVM Option Parameter: -XX:+UseSerialGC
2.) Throughput Collector or The Parallel Collector:
JVM Option Parameter: -XX:+UseParallelGC
Young Generation GC done in parallel threads
Tenured Generation GC done in serial threads.
Parallel Old Generation Collector:
JVM Option Parameter:
-XX:+UseParallelOldGC
Certain phases of an ‘Old Generation’ collection can be performed in
parallel, speeding up a old generation collection.
The Concurrent Low Pause Collector:
JVM Option Parameter -Xincgc or
-XX:+UseConcMarkSweepGC
The concurrent collector is used to collect the tenured generation and
does most of the collection concurrently with the execution of the application.
The application is paused for short periods during the collection.
A parallel version of the young generation copying collector is used
with the concurrent collector.
The concurrent low pause collector is used if the option
-XX:+UseConcMarkSweepGC is passed on the command line.
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Selects the Concurrent Mark Sweep collector.
This collector may deliver better response time properties for the
application (i.e., low application pause time).
It is a parallel and mostly-concurrent collector and and can be a good
match for the threading ability of an large multi-processor systems.
The incremental (sometimes called train) low pause collector:
JVM Option Parameter: -XX:+UseTrainGC
This collector has not changed since the
J2SE Platform version 1.4.2 and is currently not under active development.
It will not be supported in future
releases.
Note that -XX:+UseParallelGC should not
be used with -XX:+UseConcMarkSweepGC .
Benefits of Garbage Collection:
Garbage collection frees the programmer
from manually dealing with memory deallocation. As a result, certain categories
of bugs are eliminated or substantially reduced:
Dangling pointer bugs, which occur when a
piece of memory is freed while there are still pointers to it, and one of those
pointers is then used. By then the memory may have been re-assigned to another
use, with unpredictable results.
Double free bugs, which occur when the program tries to free a region of
memory that has already been freed, and perhaps already been allocated again.
Certain kinds of memory leaks, in which a program fails to free memory
occupied by objects that will not be used again, leading, over time, to memory
exhaustion.
Disadvantages of Garbage Collection:
Consumes computing resources in deciding what
memory is to be freed, reconstructing facts that may have been known to the
programmer often leading to decreased or uneven performance.
Interaction with memory hierarchy effects can make this overhead
intolerable in circumstances that are hard to predict or to detect in routine
testing.
The moment when the garbage is actually collected can be unpredictable,
resulting in stalls scattered throughout a session.
Memory may leak despite the presence of a garbage collector, if
references to unused objects are not themselves manually disposed of. This is
described as a logical memory leak.The belief that garbage collection
eliminates all leaks leads many programmers not to guard against creating such
leaks.
In virtual memory environments, it can be difficult for the garbage
collector to notice when collection is needed, resulting in large amounts of
accumulated garbage, a long, disruptive collection phase, and other programs’
data swapped out.
Garbage collectors often exhibit poor locality (interacting badly with
cache and virtual memory systems), occupy more address space than the program
actually uses at any one time, and touch otherwise idle pages.
Garbage collectors may cause thrashing, in which a program spends more
time copying data between various grades of storage than performing useful
work.
what's the difference between the serial, parallel and CMS
(Concurrent-Mark-Sweep) collectors?
first of all, let's take a look which
collectors operate on the young generation and which on the tenured (old)
generation:
the following collectors operate on the young generation:
-XX:+UseSerialGC
-XX:+UseParallelGC
-XX:+UseParNewGC
the following collectors operate on the old generation:
-XX:+UseParallelOldGC
-XX:+UseConcMarkSweepGC
what's the difference between the Serial and the Parallel collector?
both the
serial and parallel collectors cause a stop-the-world during the GC.
so what's the difference between them?
a serial
collector is a default copying collector which uses only one GC thread for the
GC operation, while a parallel collector uses multiple GC threads for the GC
operation.
what's the difference between the Parallel and the CMS collector?
the CMS
performs the following steps (all made by only one GC thread):
- initial
mark
- concurrent
marking
- remark
-
concurrent sweeping
there are two differences between a parallel
and a CMS collectors:
1) the
parallel uses multiple GC threads, while the CMS uses only one.
2) the
parallel is a 'stop-the-world' collector, while the CMS stops the world only
during the initial mark and remark phases.
during the
concurrent marking and sweeping phases, the CMS thread runs along with the
application's threads.
if you wish to combine both parallelism and
concurrency in your GC, you can use the following:
-XX:UserParNewGC
for the new generation (multiple GC threads)
-XX:+UseConcMarkSweepGC
for the old generation (one GC thread, freezes the JVM only during the initial
mark and remark phases)
Selecting a Garbage Collector
If
the application has a small data set (up to approximately 100MB), then
select the serial collector with
-XX:+UseSerialGC.
If the application will be run on a single processor and there are no
pause time requirements, then
let the VM select the collector, or
select the serial collector with
-XX:+UseSerialGC.
If (a) peak application performance is the first priority and (b) there
are no pause time requirements or pauses of one second or longer are
acceptable, then
let the VM select the collector, or
select the parallel collector with
-XX:+UseParallelGC and (optionally) enable parallel compaction with
-XX:+UseParallelOldGC.
If response time is more important than overall throughput and garbage
collection pauses must be kept shorter than approximately one second, then
select the concurrent collector with
-XX:+UseConcMarkSweepGC. If only one or two processors are available, consider using
incremental mode
Parameter to Enable Garbage Collection Logs
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails