An
Environment for Controlled Worm Replication and Analysis
or: Internet-inna-Box
Ian Whalley
Bill Arnold, David Chess,
John Morar, Alla Segal, Morton Swimmer
IBM TJ Watson Research Center, PO
Box 704, Yorktown Heights, NY 10598, USA
Tel +1-914-784-7808 · Fax +1-914-784-6054
· Email
inw@watson.ibm.com
So-called 'worms' have been a feature of the malware landscape since the beginning, and yet have been largely ignored by anti-virus companies until comparatively recently. However, the near-complete connectivity of computers in today's western world, coupled with the largely Win32-centric base of installed operating systems make the rise of worms inevitable.
The author will describe techniques and mechanisms for constructing and utilising an environment enabling the automatic examination of worms and network-aware viruses. Whilst these techniques are being developed for incorporation into the IBM/Symantec Immune System for Cyberspace, the paper is not intended to be a discussion of the Immune System concept. Instead, the intent is to describe an approach that has been applied to the problem with some measure of success.
The approach involves building a virtual SOHO network, which is in turn connected to a virtual Internet. Both the virtual LAN and WAN are populated with virtual machines. The suspected worm is introduced into this environment, and executed therein. The whole system is closely monitored as execution progresses in the isolated environment, and data is amassed describing what the suspected worm did as it executed. This data is then processed by the system in an attempt to automatically determine whether or not the suspect programming is performing actions indicative of a worm or internet-aware malware.
As is now well-documented (see [1], and also the papers it references, perhaps most
notably [2] and [3]), it is possible to create a functioning and
practical system for automatically handling creation of detection and repair
instructions for a significant percentage of viruses. This system (the ‘Immune
System’) also handles distribution of the new detection/repair instructions to
the clients in a scalable and manageable way. It is a subset of the process by
which detection and repair instructions are produced with which this paper is
concerned.
Whilst it is true that the Immune
System performs well when it comes to producing detection and repair for the
types of threat that it understands, it is also true that, from time to time, it
needs to be ‘taught’ how to deal with new threat types. For example, at the time
the Immune System was originally conceived, the clear and present danger in the
area of viruses was from DOS file and boot infectors. Subsequently, the ability
to handle macro viruses was added to the system – the system is designed in a
modular fashion precisely to allow new components to be added without affecting
the functionality of other components.
Several new components are at
various stages of completion at the current time – this paper will touch briefly
on the Win32 virus replicator, but will concentrate on the more
interesting design and implementation challenges posed by a replication system
for worms. This author has dubbed the system ‘Internet-inna-Box’.[1]
In addition, this paper will concern
itself chiefly with the construction and deployment of a suitable environment
for worm replication – nonetheless, it will discuss mechanisms used to determine
what actions (if any) the program under test performed.
As with at least one of this
author’s previous papers [4], the question of defining the type of malware under
discussion is more vexed than one might have hoped. As with that previous work,
there follows a brief list of previously suggested definitions of ‘worm’, in the
context of malware.
[A worm is] a program that distributes
multiple copies of itself within a system or across a distributed
system. [5] and [6]
Programs which are able to replicate
themselves (usually across computer networks) as stand-alone programs (or sets
of programs) and which do not depend on the existence of a host program are
called computer worms. [7]
A worm is an independent program which, when run on a computer, will attempt to infect other computer systems […] In this case the host program is the operating system of the computer, and the infected code is a stand-alone process or thread of execution running under the operating system. [8]
The computer Worm is a program that is designed to copy itself from one computer to another, leveraging some network medium: email, TCP/IP, etc. The Worm is more interested in infecting as many machines as possible on the network, and less interested in spreading many copies of itself on a single computer (like a computer virus). The prototypical Worm infects (or causes code to run on) a target system only once; after the initial infection, the Worm attempts to spread to other machines on the network. [9]
Worms are another form of software that, like viruses, spread by making copies of themselves. Unlike a typical virus, a typical worm spreads as a single self-sufficient program, rather than injecting itself into other programs. While there is general agreement that a self-sufficient program that spreads itself across a network of connected computers without the need for human intervention properly counts as a worm, there is less consensus on which of these features is the most essential to wormhood [sic]. In popular culture, novels such as Neuromancer by William Gibson refer to worms by the term "virus", and the media has often used the words essentially interchangeably. There is currently no generally-accepted set of criteria for determining whether or not a given self-reproducing program is most properly called a "virus" or a "worm"; both words, however, refer to programs that spread, and that therefore pose similar problems in the area of security. [10]
Readers who desire a possible formal
definition of ‘worm’ are referred to [11] – however, math-phobes (the author included) may rest
assured that such mathematical treatments are not necessary for understand this
intensely practical paper!
The marked definition is, in many
ways, the most conceptually helpful. Whilst it is (inevitably) inexact in some
ways, and tends to anthropomorphise the malware it is discussing, it also
provides a clear and relatively concise definition. It does not explicitly
mention the fact that most experts consider worms to be programs that do not
infect other programs (in the generally parasitic manner of viruses), but this
is strongly implied. However, other definitions make up for this omission. This
author does not wish to spend time on suggesting another possible definition for
‘worm’ at this point, and will assume that readers understand the general
concept.
In order to understand the requirements of a worm replication system, it will be helpful to first examine (briefly!) the history of worms
We will begin by looking at three very early worms – whilst these do not map directly onto the current problem, they are helpful in forming a historical perspective.
In the early 1980s, Shock and Hepps [12] conducted experiments with worm-like programs to perform computations on idle hosts on an early Ethernet LAN. Interestingly enough, the worm programs they were using would, occasionally, get out of control and have to be killed, which proved harder than expected, due to the worms’ ability to hop from host to host [7].
Another early worm was CHRISTMA EXEC, which was released in December 1987, and is well documented in [5], [7] and [9]. CHRISTMA EXEC required the user to consciously execute the infected email (which was written in REXX), and also required the most significantly afflicted company (IBM) to write what was probably the first ‘mail scanner’ in order to eradicate the worm from its systems.
In November 1988, the now famous RTM Internet worm was released – again, this worm has been well-documented in the past, and will not be covered in detail here. Interested readers are referred primarily to [13], and to [7] for a summary.
The worms described above (see Section 4.1) are atypical of modern worms – this is almost inevitable, given their age! In recent years, new forms of worms have appeared – worms concerned with consumer-level protocols and applications, and it is these worms with which Internet-inna-box is concerned. A brief description of some of these worms follows – this list is not intended to be exhaustive, it merely provides a glance at how the modern worm landscape has evolved.
In December 1987, the first well-known ‘IRC worms’ appeared. These worms relied on mIRC (in later years, PIRCH has also become a popular target), and would spread themselves to the machines of people participating in the same IRC channels as an infected user.
Many of these worms permitted an attacker a primitive form of remote control over the infected PCs – the type of control later offered by so-called ‘Remote Access Trojans’ (RATs), but in a handy worm form.
In January 1999, the well-known virus writer known as Spanska released perhaps his best-known creation – the worm known both has Happy99 and as Ska [14].[2]
Happy99 worked by subverting the IP subsystem of Windows 95 such that it was able to recognise SMTP email being sent. It would then follow each message that the user sent with another message to the same recipient, containing a copy of itself. Even to this day, Happy99 is still widely reported – see [15] and [16].
Happy99 works best in consumer situations – because it hooks SMTP mail transmissions, it will not work in many corporate scenarios – non-SMTP-based corporate email systems will not be affected.
Melissa (which appeared at the end of March 1999) is another case of a piece of malware that is tricky to classify – even this author’s original article describing it [17] called it a virus. The truth of the matter is that Melissa has two methods of spread – either as a traditional macro virus (by infecting documents as they are opened in the local Word installation), or as an Outlook-enabled worm.
It is, of course, the second method of spreading that made Melissa infamous, and enables it to be considered as a worm within the context of this paper. The Melissa-type of malware has since been referred to as a ‘virus/worm hybrid’ [9].
Melissa’s worm component, as readers will recall, worked by sending the currently active document to the first 50 people in each and every Outlook address book – however, this payload was only performed once. The upshot of this was that the first document that the user received was remailed, which in turn meant that it was usually the ‘seed document’ (an alleged list of passwords for porn sites) that was remailed.
Melissa quickly became the most widespread virus/worm in history – at least, up to that point in time. Estimates vary, but oft-quoted numbers claim that Melissa infected around 1,000,000 computers [18] and caused ‘between $93 [million] and $385 million in actual damages’ [19]. Melissa was only widespread for a week – after that, incidents had dropped off very significantly.
ExplorerZip (or ExploreZip) [20] appeared in July 1999 – again (as with Melissa) it relied on the user manually executing an attachment on an email, but this did not hinder its rapid spread. Unlike Melissa, ExplorerZip was ‘binary’ malware (that is to say, it was not written in a macro language – in fact, it was written in Delphi). ExplorerZip spread both by copying itself to unprotected SMB shares (Server Message Block – that is to say, Windows shares) on the LAN (and attempting to install itself into the Windows start-up procedure on any remote Windows installations it could access in this way), and by sending itself (in the form of a reply) to the originators of messages sitting in the user’s Outlook inbox.[3]
ExplorerZip was very widespread both on the Internet and within corporations – groups with unprotected network shares were particularly vulnerable from its payload, which involved destroying (by truncation) the contents of files with certain extensions.
February 2000 saw the arrival of VBS/Kakworm [21], which built upon techniques demonstrated by VBS/BubbleBoy [22] – BubbleBoy was never widespread in the wild, but unfortunately Kakworm was (and is).
Kakworm used the now infamous Scriptlet.Typelib vulnerability [23] [24] to cause itself to execute as soon as an Outlook user viewed an infected email. Kakworm still causes problems to some mail scanning systems, due to the fact that the infective code is appended to outgoing HTML email as the signature.
In early May 2000, LoveLetter stole the coveted ‘most widespread malware in history’ crown from Melissa. LoveLetter [25] (although written in VBS) is a Melissa-style worm – it relies on the user to manually execute an email attachment, whereupon it infects the local machine, and sends itself to every address in every Outlook address book. Worse yet, next time it executes, it checks for newly added addresses in the address books, and is able to send itself to those.
LoveLetter is also IRC-aware, but this spreading mechanism is unimportant when compared to the mass-mailing functionality.
Section 4.2 teaches us several things about the way modern worms behave, and the way that the definitions of ‘worm’ and ‘virus’ are not (at least in any useful way) mutually exclusive. These lessons are useful in designing a practical worm replication system.
Worms can spread across Local-Area Networks in a variety of ways – worms that have been seen so far (some of which are described above) have used the following techniques:
· Unprotected network shares
· Protected network shares (via password guessing)
· Already-connected network shares (via drive letters)
· Corporate email systems
The primary technique for spreading from organisation to organisation (across the Internet) at the present time is email, a well-designed worm replication system will be able to detect attempts to spread via other mechanisms, and at least raise an alert for the attention of a human analyst.
In addition to the simple matter of spreading, malware has been seen which downloads components from remote Internet sites – the piece of the worm that initially arrives on the victim computer is a bootstrap system – in cases seen so far, this has had enough power to replicate. A good example is Babylonia [26].
Thus we come to the Internet-inna-Box system itself – in particular, this section considers the basic services that must be provided by a worm replication system in order that it can work correctly and adequately.
A functional worm replication system will have to appear to provide a number of Internet-style services from a wide variety of Internet hosts. In particular, the following services will be very relevant:
· HTTP (HyperText Transfer Protocol)
· FTP (File Transfer Protocol)
· IRC (Internet Relay Chat)
· DNS (Domain Name Service)
· Drive sharing
· Packet routing
This section is intended to describe some of the more important areas in which Internet-inna-Box differs from other Analysis Centre components – namely, single-virtual-machine virus replicators.
To replicate worms, it is clear that a network is required. As suggested by the above definitions, a worm may well not exhibit any behaviour that can be considered ‘malicious’ with any degree of certainty in the absence of a network. Clearly, attempts to study worms (by execution) in an environment consisting of a single, non-networked, machine would be futile.
Immune System components (such as Internet-inna-Box) are expected to be able to run without any human intervention. The Immune System must run 24 hours a day, 7 days a week, 365 days a year, and consequently there can be no recourse to humans to reboot machines, shift network cables, reconfigure networking components, or install new operating systems. All such work must either be done ahead of time, or done automatically as the system executes. Whilst this requirement is basically the same as for a single-virtual-machine virus replicator, it is much harder to achieve with a complex virtual network of virtual machines.
Current Internet-inna-Box development efforts are concentrated towards the various Win32 platforms, as this is where worms currently affect the computing population most significantly. In spite of this concentration of effort on Win32 platforms, the fundamental technology used in the Internet-inna-Box component are by no means restricted to Win32 (as will be shown later [see Section 9Error! Reference source not found.], non-Win32 operating systems provide critical services within the Internet-inna-Box environment).
As discussed in Section 5, a wide variety of network services (of both the LAN and WAN variety) must be provided in order to persuade a worm to replicate, or at the very least exhibit suspicious behaviour. These network services must be ‘faked out’ to some extent so that it is not necessary to provide access to the whole, real, Internet!
There are two basic ways a worm replication system meeting the above requirements could be produced – either using real (physical) machines as the replication hosts, or using emulated (virtual) machines as the replication hosts. Both techniques have advantages and disadvantages – for the purposes of the IBM/Symantec Immune System, the virtual machine approach was chosen as being the most flexible, the best-suited to the already existing environment, and the most easily expandable. Research has been carried out on the physical-machine approach, but the primary research effort is now concentrated on the virtual machine approach.
For the purposes of this paper, the machine on which the emulated sessions are running will be called the ‘physical’ machine; and the emulated sessions running on the physical machine will be called the ‘virtual’ machines.
A emulation package which supported flexible emulated networking was chosen – the package in question allows multiple virtual machines to run atop of a single physical machine, and have all the machines (both the virtual machines and the physical machine) accessible to one another via an emulated network (see Figure 1). The package also allows multiple virtual machines to be run atop multiple physical machines, and have all the machines (both the virtual machines and the physical machines) accessible to one another via an emulated network (which crosses a physical network in order to pass from physical machine to physical machine) – see