Reverse engineering

.NET Memory Internals

Ajay Yadav
August 13, 2013 by
Ajay Yadav

Abstract

This article delves into various aspects of memory access and management. There are distinctive categorizations of memory, for instance static data areas, registers, heaps, virtual memory and thread local storages. Static and global values are stored automatically in the SDA, registers that holds data and require quick and efficient memory access and the thread local storages (TLS) is a 32 bit values that contains thread specific information. Heaps contains memory allocated at run time, from the virtual memory of an application and controlled by heap manager. Virtual memory is memory that the programmer directly manipulates at run time. Managed applications can explicitly operate the stack, managed heap and TLS. Other forms of memory, such as register and virtual memory, are largely unavailable except through interoperability. So this article manifests how the CLR manages allocated type instance via garbage collector. As you perhaps know, C# developer never directly de-allocates a managed object from memory. Finally, you'll learn how to programmatically interact with garbage collector and how Finalize() method release the internal memory resources.

Memory Objects

Objects typically, refer to an allocation (a particular slot in memory). When we define a new class or structure type along with their members, we can therefore, allocate memory space for any number of its objects using the C# new keyword. That the new keyword defines as reference to the object on the heap, not the actual object itself. In case of, declaring reference variables as local variables in a method scope. They are stored on the stack for further manipulation and we can invoke members of an object by using a dot operator. Here in the following sample, depicts an object allocation in the memory as following;

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

[c]

public class test

{

public test() { }

public test(string name)

{

this.PetName = name;

}

public string PetName { get; set; }

public override string ToString()

{

return string.Format("Hi {0}",PetName) ;

}

}

class Program

{

static void Main(string[] args)

{

Console.WriteLine("Garbage Collection Demo");

// Creates New object of test class

// on managed heap.

test obj = new test("ajay");

// Here (.) used to invoke members

Console.WriteLine(obj.ToString());

}

}

[/c]

Value types are always allocated directly on the stack and never placed in Heap. Whereas reference types such as class object, string, arrays etc..., are placed in a managed heap.

Once the class is instantiated, the garbage collector will destroy an object when it's no longer required. Here is one question commonly asked: How does the GC determine when an object is no longer needed? In fact the garbage collector removes an object from the heap when it is unreachable by any part of your code. Just ponder over the following code:

[c]

static void NewMethod()

{

//this object may be destroyed when method returns.

test dummyObj = new test();

}

[/c]

The aforementioned NewMethod creates an object of test class which is never passed or used outside the defining scope. Thus, once this method call completes, dummyObj reference is no longer reachable and therefore goes to the garbage collection. You can't guarantee that this object will be reclaimed from memory immediately after the method call is completed.

If you compile that sample code and investigate the resulting assembly using ILDASM.exe, you will find that when C# compiler is encountered with a new keyword, it emits a CIL newobj instruction into the method implementation as the following:

[plain]

.method private hidebysig static void NewMethod() cil managed

{

.maxstack 1

.locals init ([0] class memory.test dummyObj)

IL_0000: nop

IL_0001: newobj instance void memory.test::.ctor()

IL_0006: stloc.0

IL_0007: ret

} // end of method Program::NewMethod

[/plain]

The newobj instruction performs couple of tasks. First, it examines the heap to ensure that sufficient space is available to host an object. Second, it calculates the total amount of memory required for the object to be allocated. Finally, it determines the next object pointer to point to the next available slot on the managed heap. In the following image, you can view the basic process as:

We can refine the aforesaid NewMethod() code by manually destroying the object reference from memory using null rather than relying on GC as following.

test dummyObj = new

test();

dummyObj = null;

You can ensure with reference from the generated CIL code by spotting the null value entry. In the instruction, you will find that ldnull opcode is followed by stloc and is added as following:

[c]

.maxstack 1

.locals init ([0] class memory.test dummyObj)

IL_0000: nop

IL_0001: newobj instance void memory.test::.ctor()

IL_0006: stloc.0

IL_0007: ldnull

IL_0008: stloc.0

IL_0009: ret

[/c]

Object Generation

The idea behind object generation is very simple; the longer an object has existed on the heap, the more likely it is to stay there. For instance, the main class of desktop applications stay in the memory until the application is terminated. Each object on the heap belongs to one of the following generations. The initial size of generations are about 256 KB, 2MB and 10MB respectively.

  • Generation-0: Identifies a newly allocated object that has never been marked for collection.
  • Generation-1: Identifies an object that has survived from garbage collection.
  • Generation-2: Identifies an object that has survived more than one sweep of GC.

The garbage collection will investigate all generation objects first. If sweeping one of these objects results in the required amount of free memory, any surviving objects are prompted to generation 1. To see how an object generation affects the collection process, ponder over the following images:

If all generation 0 objects have been evaluated, but additional memory is still required, generation 1 objects are then investigated for reachability and collected accordingly. Surviving generation 1 objects are then prompted to generation 2. If the GC still requires additional memory, generation 2 objects are evaluated.

Garbage Collection

The beauty of C# programming is that, the programmer doesn't bother about memory management. The Garbage collector in particular deals with the problem of memory resources cleanup. Ideally, when GC runs, it will remove all those objects from the heap that are no longer referenced. Immediately after this, the heap recovers the free space.

In the managed environment, garbage collection is nondeterministic. Generally, the garbage collector runs when the .NET runtime determines that garbage collection is required. You can force GC to run at a certain point in your code by summing GC.Collect() method. The GC class is intended for rare situations in which you know that it's a perfect time to call GC. However, the logic of GC does not guarantee that all unreferenced objects will be removed from the heap in a single execution.

The base class libraries of System.GC namespaces allows you to programmatically interact with the garbage collector using a bunch of methods. We can manipulate and monitor garbage collection with the GC class. But you will seldom need to make use of such methods. Typically, you consume them when you are creating a class that makes internal use of unmanaged resources. The following table offers some of the interesting members of Garbage Collector types:

System.GC Methods Description

Collect() This method force GC to perform Garbage collection.

CollectionCount() Return the number of GC cycle for the specified generation.

GetTotalMemory() Return total number of bytes allocated for the heap.

SuppressFinalize() It suppresses future finalization of the specified object.

MaxGeneration() It returns the maximum number of object generations.

GetGeneration() Returns the generation of the specified object.

AddMemoryPressure() Recognize memory allocation for unmanaged resources.

RemoveMemoryPressure() This method removes some memory put aside for unmanaged resources.

KeepAlive() This method keeps alive the specified object from the beginning of the current method.

The following code is a sample manipulation with the System.GC namespace various members in context of obtaining garbage collection related details. We shall determine the total allocated space using GetTotalMemory() along with total number of objects generated.

[c]

static void Main(string[] args)

{

Console.WriteLine("Estimated Bytes on Heap : {0}",GC.GetTotalMemory(false));

Console.WriteLine("Object Generation by OS : {0}", GC.MaxGeneration + 1);

// Creates New object of test class

test obj = new test("ajayn");

Console.WriteLine(obj.ToString());

Console.WriteLine("Generation of 'obj' is : {0}", GC.GetGeneration(obj));

GC.Collect(0, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();

Console.WriteLine("Generation of 'obj' is : {0}", GC.GetGeneration(obj));

Console.WriteLine("Gen 0 has swept : {0}", GC.CollectionCount(0));

Console.WriteLine("Gen 1 has swept : {0}", GC.CollectionCount(1));

Console.WriteLine("Gen 2 has swept : {0}", GC.CollectionCount(2));

}
[/c]

In a very rare circumstances, it may be beneficial to programmatically force a garbage collector using GC.Collect() method. You can explicitly trigger a garbage collector using this code:

GC.Collect(0, GCCollectionMode.Forced);

GC.WaitForPendingFinalizers();

Here the WaitForPending() method will suspend the calling thread during the collection process. Therefore, we are calculating, how many times the object generation sweep is encountered in the managed heap:


Finalize Method

The Finalize method performs a necessary cleaning up of resources. It's not possible to explicitly call an object's Finalize() method from a class instance via dot operator because this method is defined as protected in the Object class. The call to Finalize() methods will occur during a natural garbage collection or when we force a collection via GC.Collect() programmatically. The Finalizer() method will automatically be called when the application domain hosting your application is unloaded from the memory heap. As you know, application domain is used to host an executable assemblies and its associating libraries. When the application domain is unloaded from memory, the CLR finalizes every object that can be finalized, created during its lifetime.

We can explicitly define the finalize method implementation by overriding System.Object.Finalize() method. The dot net CLR devise a mechanics referred to destructor in the context of clean up resources manually and calling Finalize() method explicitly. Here is the syntax as the following;

[c]

public class test

{

~test()

{

// Resources Cleaning-up

Console.Beep();

}

}

[/c]

The important thing to be noted here is that we can't anticipate when the beep sound will echo. Because executing the Finalize() method is a nondeterministic process. If you were to examine this C# destructor using the ildasm utility, you would notice that the compiler inserts some mandatory error checking code instructions in the form of try/finally block which ensures that your base classes Finalize() method will always be executed, regardless of any exception encountered.

[plain]

.method family hidebysig virtual instance void Finalize() cil managed

{

// Code size 20 (0x14)

.maxstack 1

.try

{

IL_0000: nop

IL_0001: call void [mscorlib]System.Console::Beep()

IL_0006: nop

IL_0007: nop

IL_0008: leave.s IL_0012

} // end .try

finally

{

IL_000a: ldarg.0

IL_000b: call instance void [mscorlib]System.Object::Finalize()

IL_0010: nop

IL_0011: endfinally

} // end handler

IL_0012: nop

IL_0013: ret

} // end of method test::Finalize

[/plain]

When C# compiler compiles a destructor, it implicitly translates the destructor code to the equivalent of a Finalize() method, which ensures that the Finalize() method of the parent class is executed. The following example illustrates the C# code equivalent to the IL, which the compiler would generate for the –test class destructor:

[c]

protected override void Finalize()

{

try

{

// destructor implementation

}

finally

{

base.Finalize();

}

}

[/c]

IDisposable Interface

When defining a class, we often use IDisposable interface mechanisms to automate the freeing of unmanaged resources. IDisposable provides a deterministic way for freeing resources and avoids the garbage collector related problem. The IDisposable interface declares single method names Dispose(), which accepts no parameters and return Void.

[c]

public class test :IDisposable

{

public void Dispose()

{

//code

}

}

[/c]

The implementation of Dispose() method should explicitly free all unmanaged resources used directly by an object and call Dispose() on any encapsulated objects that also implement the IDisposable interface. In this way, the Dispose() method provides exact control over when unmanaged resources are freed.

A Dispose() method is not only responsible for releasing the type's resources but also call Dispose() on any other constrained disposable method. We can also insert GC.SuppresFinalize() method into Dispose() method which informs the CLR that it is no longer necessary to call the destructor when this object is garbage-collected.

[c]
using System;

namespace memory

{

public class test : IDisposable

{

public void Dispose()

{

//clean up resources

GC.SuppressFinalize(this);

Console.WriteLine("Dispose method is calling");

}

}

class Program

{

static void Main(string[] args)

{

test obj = new test();

if (obj is IDisposable)

{

obj.Dispose();

}

Console.ReadLine();

}

}

}

[/c]

We can also release resource by C# using a keyword. If we free an object by using a keyword then it's not required to release resources explicitly as seen in the following:

[c]

using (test obj = new test())

{

// use of obj

}

[/c]

If you notice the generated CIL code, you'll have found that using syntax does indeed expand to try/catch logic, with the expected call to Dispose() method as:

[plain]

.method private hidebysig static void Main(string[] args) cil managed

{

.entrypoint

// Code size 35 (0x23)

.maxstack 2

.locals init ([0] class memory.test obj,

[1] bool CS$4$0000)

IL_0000: nop

IL_0001: newobj instance void memory.test::.ctor()

IL_0006: stloc.0

.try

{

IL_0007: nop

IL_0008: nop

IL_0009: leave.s IL_001b

} // end .try

finally

{

IL_000b: ldloc.0

IL_000c: ldnull

IL_000d: ceq

IL_000f: stloc.1

IL_0010: ldloc.1

IL_0011: brtrue.s IL_001a

IL_0013: ldloc.0

IL_0014: callvirt instance void [mscorlib]System.IDisposable::Dispose()

IL_0019: nop

IL_001a: endfinally

} // end handler

IL_001b: nop

IL_001c: call string [mscorlib]System.Console::ReadLine()

IL_0021: pop

IL_0022: ret

}

[/plain]

Summary

So this article demystified the process of garbage collection and internal object cleaning processes in detail. As you have seen, the GC will only execute when it is unable to acquire the mandatory memory from the unmanaged heap. This article also illustrated how to programmatically interact with the garbage collector using the System.GC class. We also wrapped up the IDisposable interface and Finalize() method in order to clean up resources.

Ajay Yadav
Ajay Yadav

Ajay Yadav is an author, Cyber Security Specialist, SME, Software Engineer, and System Programmer with more than eight years of work experience. He earned a Master and Bachelor Degree in Computer Science, along with abundant premier professional certifications. For several years, he has been researching Reverse Engineering, Secure Source Coding, Advance Software Debugging, Vulnerability Assessment, System Programming and Exploit Development.

He is a regular contributor to programming journal and assistance developer community with blogs, research articles, tutorials, training material and books on sophisticated technology. His spare time activity includes tourism, movies and meditation. He can be reached at om.ajay007[at]gmail[dot]com