Saturday 1 December 2012

Garbage collection Document



What is Garbage Collection?
Garbage Collection in computer science is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Garbage collection does not traditionally manage limited resources other than memory that typical programs use, such as network sockets, database handles, user interaction windows, and file and device descriptors.

History:
Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.

Basic principle of Garbage Collection:
The basic principles of garbage collection are:
Find data objects in a program that cannot be accessed in the futureReclaim the resources used by those objects
It is kind of interesting to know how the objects without reference are found. Java normally finds all the objects that have reference and then regards rest of the objects are reference less – which is in fact a very smart way of finding the unreferenced java objects.

Types of Java Garbage Collectors/Garbage Collection Algorithms:
On J2SE 5.0 and above, one can normally find the following types of Java Garbage collectors that the programmers can normally choose to do a garbage collection through JVM Parameters.

1.) Serial Collector:
JVM Option Parameter: -XX:+UseSerialGC

2.) Throughput Collector or The Parallel Collector:
 JVM Option Parameter: -XX:+UseParallelGC
    Young Generation GC done in parallel threads
    Tenured Generation GC done in serial threads.

Parallel Old Generation Collector:
 JVM Option Parameter: -XX:+UseParallelOldGC
    Certain phases of an ‘Old Generation’ collection can be performed in parallel, speeding up a old generation collection.

The Concurrent Low Pause Collector:
  JVM Option Parameter -Xincgc or -XX:+UseConcMarkSweepGC
    The concurrent collector is used to collect the tenured generation and does most of the collection concurrently with the execution of the application. The application is paused for short periods during the collection.
    A parallel version of the young generation copying collector is used with the concurrent collector.
    The concurrent low pause collector is used if the option -XX:+UseConcMarkSweepGC is passed on the command line.
    -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
        Selects the Concurrent Mark Sweep collector.
        This collector may deliver better response time properties for the application (i.e., low application pause time).
        It is a parallel and mostly-concurrent collector and and can be a good match for the threading ability of an large multi-processor systems.

The incremental (sometimes called train) low pause collector:
JVM Option Parameter: -XX:+UseTrainGC
This collector has not changed since the J2SE Platform version 1.4.2 and is currently not under active development.
It will not be supported in future releases.

Note that -XX:+UseParallelGC should not be used with -XX:+UseConcMarkSweepGC .

Benefits of Garbage Collection:
Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:
 Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is then used. By then the memory may have been re-assigned to another use, with unpredictable results.
    Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again.
    Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that will not be used again, leading, over time, to memory exhaustion.

Disadvantages of Garbage Collection:
 Consumes computing resources in deciding what memory is to be freed, reconstructing facts that may have been known to the programmer often leading to decreased or uneven performance.
    Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing.
    The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session.
    Memory may leak despite the presence of a garbage collector, if references to unused objects are not themselves manually disposed of. This is described as a logical memory leak.The belief that garbage collection eliminates all leaks leads many programmers not to guard against creating such leaks.
    In virtual memory environments, it can be difficult for the garbage collector to notice when collection is needed, resulting in large amounts of accumulated garbage, a long, disruptive collection phase, and other programs’ data swapped out.
    Garbage collectors often exhibit poor locality (interacting badly with cache and virtual memory systems), occupy more address space than the program actually uses at any one time, and touch otherwise idle pages.
    Garbage collectors may cause thrashing, in which a program spends more time copying data between various grades of storage than performing useful work.



what's the difference between the serial, parallel and CMS (Concurrent-Mark-Sweep) collectors?
 first of all, let's take a look which collectors operate on the young generation and which on the tenured (old) generation:
 the following collectors operate on the young generation:
-XX:+UseSerialGC
-XX:+UseParallelGC
-XX:+UseParNewGC

the following collectors operate on the old generation:
-XX:+UseParallelOldGC
-XX:+UseConcMarkSweepGC

what's the difference between the Serial and the Parallel collector?
both the serial and parallel collectors cause a stop-the-world during the GC.
so what's the difference between them?
a serial collector is a default copying collector which uses only one GC thread for the GC operation, while a parallel collector uses multiple GC threads for the GC operation.

what's the difference between the Parallel and the CMS collector?
the CMS performs the following steps (all made by only one GC thread):
- initial mark
- concurrent marking
- remark
- concurrent sweeping
 there are two differences between a parallel and a CMS collectors:
1) the parallel uses multiple GC threads, while the CMS uses only one.
2) the parallel is a 'stop-the-world' collector, while the CMS stops the world only during the initial mark and remark phases.
during the concurrent marking and sweeping phases, the CMS thread runs along with the application's threads.
  if you wish to combine both parallelism and concurrency in your GC, you can use the following:
-XX:UserParNewGC for the new generation (multiple GC threads)
-XX:+UseConcMarkSweepGC for the old generation (one GC thread, freezes the JVM only during the initial mark and remark phases)

Selecting a Garbage Collector
 If the application has a small data set (up to approximately 100MB), then
        select the serial collector with -XX:+UseSerialGC.

    If the application will be run on a single processor and there are no pause time requirements, then
        let the VM select the collector, or
        select the serial collector with -XX:+UseSerialGC.

    If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of one second or longer are acceptable, then
        let the VM select the collector, or
        select the parallel collector with -XX:+UseParallelGC and (optionally) enable parallel compaction with -XX:+UseParallelOldGC.

    If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then
        select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one or two processors are available, consider using incremental mode

Parameter to Enable Garbage Collection Logs
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails