Automated (simple) deobfuscation of .NET malware

upsidedwn
10 min readJul 14, 2023

--

Obfuscation techniques can make the analysis of obfuscated .NET malware challenging, even though .NET binaries can be easily decompiled. Manually analyzing such obfuscated malware can be a tedious process. In this article, we will explore how to automate the deobfuscation of a SolarMarker malware sample using the de4dot and MSIL tools.

A look at the sample

The SolarMarker dropper we received is a simple form of malware that primarily acts as a downloader. It receives commands and downloads additional payloads from a command and control (C2) server. Here’s what the obfuscated sample looks like.

In the methods listing on the left, we notice that this sample contains methods that return strings and do not take any arguments. These methods are used for string obfuscation in the binary.

The right shows the decompiled code of the binary’s entry point. The decompiled code of the binary’s entry point reveals the presence of junk Math and Thread statements. Additionally, the original names of classes, methods, and variables have been obfuscated and replaced with gibberish. During the analysis, these names need to be manually renamed to make the binary more understandable.

String deobfuscation with de4dot

In the SolarMarker sample, string obfuscation is achieved by transforming strings into arrays of numbers and performing numerical manipulation on them. For example:

private static string A()
{
string arg = "";
int num = 47540;
arg += (char)(num + -47420);
int num2 = 98419;
arg += (char)(num2 + -98365);
int num3 = 73724;
return arg + (char)(num3 + -73672);
}

Simply running the function will generate the deobfuscated string. However, de4dot has some limitations out of the box, which prevents it from deobfuscating such a sample. Running de4dot without any special options, yields no results. Digging further below, we see that if the user does not specify any specific string decrypter methods, de4dot will attempt to find these methods on its own. However, if it is unable to identify any obfuscator, the Unknown obfuscator class implements an empty method, and hence de4dot will not actually deobfuscate the strings for us.

Instead, the user will have to supply the --strtok option, to provide de4dot with some information on what string decrypter methods look like. Looking at the help text, --strtok METHOD String decrypter method token or [type::][name][(args,...)] it seems we should be able to use the following option --strtyp delegate --strtok "()" -o solarmarker-deobfuscated. However, this does not work as well. Delving into de4dot’s code, we see that this is due to a bug in FindMethodTokens.

// From de4dot.code.ObfuscatedFile
IEnumerable<int> GetMethodTokens() {
// Checks if the user specified any particular string decrypter methods
if (!userStringDecrypterMethods) {
// If not, use the deobfuscator's heuristics to identify string decrypter methods.
// However, if de4dot does not detect any known obfuscator in use, this method
// does not do anything.
return deob.GetStringDecrypterMethods();
}

// Here, we see that the user can either specify methods using their method tokens,
// or using a string pattern.
var tokens = new List<int>();
foreach (var val in options.StringDecrypterMethods) {
var tokenStr = val.Trim();
// User specifies specific method tokens
if (Utils.StartsWith(tokenStr, "0x", StringComparison.OrdinalIgnoreCase))
tokenStr = tokenStr.Substring(2);
if (int.TryParse(tokenStr, NumberStyles.HexNumber, null, out int methodToken))
tokens.Add(methodToken);
else
// User specifies a string pattern describing string decrypter methods,
// search for these methods.
tokens.AddRange(FindMethodTokens(val));
}
return tokens;
}

IEnumerable<int> FindMethodTokens(string methodDesc) {
var tokens = new List<int>();
SplitMethodDesc(methodDesc, out string typeString, out string methodName, out var argsStrings);

foreach (var type in module.GetTypes()) {
// Exact match on type name, if provided
if (typeString != null && typeString != type.FullName)
continue;
foreach (var method in type.Methods) {
// Must be a static method
if (!method.IsStatic)
continue;
// Must return a string
if (method.MethodSig.GetRetType().GetElementType() != ElementType.String && method.MethodSig.GetRetType().GetElementType() != ElementType.Object)
continue;
// Exact match on method name, if provided
if (methodName != null && methodName != method.Name)
continue;

var sig = method.MethodSig;
if (argsStrings == null) {
if (sig.Params.Count == 0)
continue;
}
else {
// Unfortunately, this does not work. This is because, SplitMethodDesc,
// internally splits an empty string, as we did not provide any arguments.
// This will result in a String array of [''], which means this check will
// resolve to true, and skip our intended function.
if (argsStrings.Length != sig.Params.Count)
continue;
for (int i = 0; i < argsStrings.Length; i++) {
if (argsStrings[i] != sig.Params[i].FullName)
continue;
}
}

Logger.v("Adding string decrypter; token: {0:X8}, method: {1}", method.MDToken.ToInt32(), Utils.RemoveNewlines(method.FullName));
tokens.Add(method.MDToken.ToInt32());
}
}

return tokens;
}

In the code above, if “()” is provided as the argument pattern, argsStrings will contain an array of one empty string — [””], and as a result, the method will never be added as a string decrypter. This bug can be resolved by adding the following lines to FindMethodTokens:

using System.Linq
// ...

IEnumerable<int> FindMethodTokens(string methodDesc) {
var tokens = new List<int>();
SplitMethodDesc(methodDesc, out string typeString, out string methodName, out var argsStrings);
argsStrings = argsStrings.Where(x => !string.IsNullOrEmpty(x)).ToArray();
// ...
}

And… great! De4dot inlined some strings, instantly making the decompilation more legible. Interestingly, this bug (if it is indeed not a feature), was also spotted a long time ago by LifeInHex here.

Results of de4dot deobfuscation before the patch
Results of de4dot deobfuscation after the patch

While de4dot inlines the strings, improving the readability of the decompiled code, it does not automatically remove unused string methods or the junk Math and Thread code. These tasks need to be handled separately. Although these obfuscations do not impede analysis too much, lets delve deeper into de4dot’s deobfuscation process to better understand how .NET binaries are deobfuscated.

String deobfuscation with a custom de4dot plugin

We first start by creating a custom deobfuscator within de4dot, which will identify string obfuscation methods, inline their return values, and remove these methods from the binary. In de4dot, custom deobfuscators live in the de4dot.code/deobfuscators directory. Here, I created a directory SolarMarker_Malware, for the custom deobfuscator, along with two files, Deobfuscator.cs and StringDecoderService.cs.

The Deobfuscator.cs file will mainly contain the metadata related to our custom deobfuscator plugin, as well as the logic on how to identify string obfuscation methods. A sample template for this file is found here.

The StringDecoderService.cs file will contain a class to help run the string obfuscation functions in a separate process, in a bid to improve the reliability of the deobfuscator.

To identify string obfuscation methods, we use de4dot’s interface, and the dnlib library to parse .NET metadata, to sieve out functions which meet our criteria — static methods with no parameters, and which return strings. To be on the safe side, we also implemented additional checks which scan method bodies for some patterns that appear common among string deobfuscation methods.

// Check method body for patterns
private static bool IsStringObfuscationMethod(MethodDef method) {
if (method.Body.Instructions.Count < 10) {
return false;
}

// Check that one of the first two instructions is a Ldstr
if (method.Body.Instructions[0].OpCode != dnlib.DotNet.Emit.OpCodes.Ldstr
&& method.Body.Instructions[1].OpCode != dnlib.DotNet.Emit.OpCodes.Ldstr) {
return false;
}

foreach (var instruction in method.Body.Instructions) {
if (instruction.OpCode == dnlib.DotNet.Emit.OpCodes.Call) {
var calledMethod = instruction.Operand as MemberRef;
if (calledMethod == null) {
return false;
}
if (calledMethod.DeclaringType.FullName != "System.String" || calledMethod.Name != "Concat") {
return false;
}
}
}
return true;
}

private static List<MethodDef> GetSolarMakerStringObfuscationFunctions(ModuleDefMD module) {
List<MethodDef> obfuscatedStringMethods = new List<MethodDef>();
IEnumerable<TypeDef> types = module.GetTypes();
foreach(TypeDef type in types) {
foreach(MethodDef method in type.Methods) {
var methodSig = method.MethodSig;
string methodName = method.FullName;
string methodReturnType = method.ReturnType.FullName;

if (!method.IsPrivate || !method.IsStatic || !method.HasBody) {
continue;
}
if (methodSig == null) {
continue;
}
if(methodSig.Params.Count != 0) {
continue;
}
if (methodReturnType != "System.String") {
continue;
}
if(method.Body.Variables[0].Type.FullName != "System.String") {
continue;
}
if(!IsStringObfuscationMethod(method)) {
continue;
}
obfuscatedStringMethods.Add(method);
}
}
return obfuscatedStringMethods;
}

The next step, after identifying the string deobfuscation methods, would be to actually call them to retrieve the deobfuscated strings. Reflection can be used to load and execute the methods, but de4dot provides an interface, IUserGenericServicervice, to run these methods in a remote process. This is probably to improve stability of the deobfuscator, as malicious code can potentially be run.

Lastly, these identified methods can then be passed to the AddMethodToBeRemoved provided by de4dot, to remove the method from the deobfuscated binary.

The resultant code is available here.

Removing junk code with a custom de4dot plugin

The last thing we need to do, is to remove junk Math and Thread statements. De4dot does not provide a higher level of .NET IL analysis. Instead, the developer of the custom plugin manually manipulates the .NET IL, removing and adding instructions manually. Hence, it is important to exercise caution when modifying the binary, as any incorrect manipulation of the IL instructions may result in a non-functional or broken binary.

Another limitation I observed, was that the deobfuscation interface provided by de4dot does not support multiple-passes over the binary. The deobfuscation performs on a per-method basis, and this makes it difficult to perform deobfuscation which requires knowledge about the multiple methods in the binary. For example, I first had the thought to remove the obvious junk codes and methods, which will leave me with a bunch of empty methods. I then thought to remove these empty methods and all references to these methods using a second pass through the binary. Unfortunately, this is not possible without some large changes to de4dot.

Hence, in the custom deobfuscator plugin, we can iterate over the methods of the deobfuscated binary and analyze the IL instructions to identify and remove the junk code. For example, if we want to remove all calls to ThreadPool.QueueUserWorkItem, we can use the following approach:

  1. Iterate over the methods of the deobfuscated binary.
  2. For each method, iterate over the IL instructions.
  3. Identify the instructions that represent a call to ThreadPool.QueueUserWorkItem.
  4. Remove the corresponding instructions from the IL body.
private void RemoveThreadJunkCode(Blocks blocks) {
// Remove the creation of thread below. Does not care about what
// the target method does, assumes that it is junk code, which
// could be easily WRONG. Some of these checks for opcodes are also
// very brittle, and do not check for alternative instructions that
// can perform the same actions.
// ThreadPool.QueueUserWorkItem(delegate(object ...) {
// ...
// });
// In IL:
// 68 ldsfld class [mscorlib]System.Threading.WaitCallback a.a::'CS$<>9__CachedAnonymousMethodDelegate17'
// 69 brtrue.s 75 (00D6) ldsfld class [mscorlib]System.Threading.WaitCallback a.a::'CS$<>9__CachedAnonymousMethodDelegate17'
// 70 ldnull
// 71 ldftn void a.a::'<vAW9mPPYiZletM0pnQX>b__b'(object)
// 72 newobj instance void [mscorlib]System.Threading.WaitCallback::.ctor(object, native int)
// 73 stsfld class [mscorlib]System.Threading.WaitCallback a.a::'CS$<>9__CachedAnonymousMethodDelegate17'
// 74 br.s 75 (00D6) ldsfld class [mscorlib]System.Threading.WaitCallback a.a::'CS$<>9__CachedAnonymousMethodDelegate17'
// 75 ldsfld class [mscorlib]System.Threading.WaitCallback a.a::'CS$<>9__CachedAnonymousMethodDelegate17'
// 76 call bool [mscorlib]System.Threading.ThreadPool::QueueUserWorkItem(class [mscorlib]System.Threading.WaitCallback)
// 77 pop
// In the IL, we see that .NET uses a caching mechanism. It first checks if
// the anonymous method has been created before. If it has, it skips the creation and
// proceeds directly to queue the method for execution.

string methodName = blocks.Method.FullName;
var allBlocks = blocks.MethodBlocks.GetAllBlocks();
var instructionsBuffer = new List<Tuple<int, int, Instr> >();
// ... insert instructions into our instructionsBuffer ...

var junkInstructions = new List<System.Tuple<int, int> >();
int insIdx = -1;
foreach (var item in instructionsBuffer) {
var blockNum = item.Item1;
var blockInsNum = item.Item2;
var ins = item.Item3;
insIdx += 1;

if (!(insIdx < 4 && insIdx + 4 >= instructionsBuffer.Count)
&& ins.OpCode.Code == Code.Newobj
&& ins.Operand is MemberRef targetMethod2
&& targetMethod2.IsMethodRef
&& targetMethod2.GetDeclaringTypeFullName() == "System.Threading.WaitCallback"
&& targetMethod2.Name == ".ctor"
) {
var prevIns1 = instructionsBuffer[insIdx - 1].Item3;
var prevIns2 = instructionsBuffer[insIdx - 2].Item3;
var prevIns3 = instructionsBuffer[insIdx - 3].Item3;
var prevIns4 = instructionsBuffer[insIdx - 4].Item3;

var nextIns1 = instructionsBuffer[insIdx + 1].Item3;
var nextIns2 = instructionsBuffer[insIdx + 2].Item3;
var nextIns3 = instructionsBuffer[insIdx + 3].Item3;
var nextIns4 = instructionsBuffer[insIdx + 4].Item3;

if(prevIns4.OpCode.Code == Code.Ldsfld && prevIns4.Operand is FieldDef field4 && field4.FieldType.ToString() == "System.Threading.WaitCallback"
&& prevIns3.OpCode.Code == Code.Brtrue_S
&& prevIns2.OpCode.Code == Code.Ldnull
&& prevIns1.OpCode.Code == Code.Ldftn
&& nextIns3.OpCode.Code == Code.Call && nextIns3.Operand is MemberRef nmember3 && nmember3.IsMethodRef && nmember3.GetDeclaringTypeFullName() == "System.Threading.ThreadPool" && nmember3.Name == "QueueUserWorkItem"
&& nextIns4.OpCode.Code == Code.Pop
// Check that store and loads reference the same slot
&& nextIns1.OpCode.Code == Code.Stsfld && nextIns1.Operand is FieldDef nfield1 && nfield1 == field4
&& nextIns2.OpCode.Code == Code.Ldsfld && nextIns2.Operand is FieldDef nfield2 && nfield2 == field4) {
for(int i=-4; i <= 4; i++) {
junkInstructions.Add(Tuple.Create(instructionsBuffer[insIdx + i].Item1, instructionsBuffer[insIdx + i].Item2));
}
}
}
}

// Actually removing the junk instructions from their respective blocks
foreach(var item in junkInstructions) {
var blockNum = item.Item1;
var blockInsNum = item.Item2;
Logger.e($"Removing instruction: {allBlocks[blockNum].Instructions[blockInsNum]}");
allBlocks[blockNum].Replace(blockInsNum, 1, OpCodes.Nop.ToInstruction());
}
return;
}

And here’s the result from our automated deobfuscation. Notice that junk codes have been removed, and strings are now inlined! Also, down from tens of methods, we are left with only 6 to analyze in this class.

Conclusion

Automating the deobfuscation process for .NET malware can greatly simplify the analysis of obfuscated samples. By using tools like de4dot and custom deobfuscator plugins, we can automate the removal of obfuscation techniques and improve the readability of the decompiled code. This enables security researchers to more effectively analyze and understand the behavior of malware samples.

It is worth taking some time here to appreciate the developers and contributors behind tools such as de4dot and dnSpy, who have made significant efforts to open source and maintain them over the years, which has been invaluable to researchers. Unfortunately, these projects have been archived and are no longer actively maintained.

During my research, I came across this page, GitHub repository: .NET-Deobfuscator, which curates collection of various .NET deobfuscators. Some of these deobfuscators employ advanced techniques to address more sophisticated obfuscation techniques. Exploring these alternative tools can provide additional options and insights for handling more intricate obfuscated samples.

As the field of malware analysis continues to evolve, it’s crucial for researchers to stay updated with the latest tools, techniques, and community-driven efforts. By embracing automation and leveraging the collective knowledge of the community, analysts can effectively combat obfuscation and gain deeper insights into the inner workings of malicious software.

--

--