I was recently asked how to investigate malware that has owned a particular windows host. This post will address some of the elements, methods, and goals of such an analysis but I will avoid any discussion of tools.
Elements of the whole
A holistic approach is needed but you must understand each element to gain the most value. These elements contain the evidence needed in order to gain insight into what happened. Not all are needed, but the more there are the more likely a validation of the dataset can be done.
The first element is at the network layer. This is your most trusted source of information as there are no questions regarding the integrity of the data. These sources include netflows (or any session data), any full packet captures, proxy logs, authentication logs (AD Domain Controllers, RADIUS, etc). The ideal situation would be a full packet capture during the initial infection as well as continually to track network activity. This also, barring encryption, confirms or denies if any information leakage occurred.
Additionally we have the operating system layer. This consists of items such as full memory dumps or process lists, services, eventlogs, application logs (such as antivirus, HIPS, IIS, SQL, etc). All this information may be suspect if the malware did gain complete ownership of the host. Because of this integrity question, any operating system information gained should be validated against either the file system or network layers. The operating system layer is much less obvious than the network layer and requires a lot of small items which may build into a builder picture.
The final element is the file system layer. The file system should be inspected through either a bootable CD or mounted in another system. This prevents integrity issues from rootkits. Important starting points include the Internet Cache, modified or creation times, prefetch, as well as AV quarantine directories.
Mindset and minute details should always be front of mind. A simple spyware infection could turn into a full blown data loss fiasco. Collection of as much evidence as can be obtained should be done quickly but more importantly in sound practice to prevent any spoliation. It will slow things down, but it will preserve any sort of case that may need to be developed.
Once collected, a copy of the data set is then used for analysis. This analysis typically ends up with lots of tree branches going out in different directions. Most of those branches will end up as a dead end. The ones that do not should all collaborate with each other. If they don’t, you are missing something. A single piece of data is uninteresting if you don’t have at least one other source that gives validation.
The goal in such an intrusion is to discover precisely “what” the malware does. This will allow you to answer the more important questions of the who, why, and how. The who will allow you to watch for future threats. The why will define the attack’s motive which will clarify the potential and/or residual risk. The how will show where weaknesses are and allow for remediation of said weaknesses.