Elegant C: security

Wednesday, July 15, 2015

PhD Defense Yacin Nadji - Understanding DNS-based Criminal Infrastructure for Informing Takedowns

Yacin Nadji, a PhD candidate in security at Georgia Tech, successfully defended his dissertation work today.

How does one disable a botnet? It is difficult to identify and repair individually infected machines. Therefore, targeting the command and control servers can instead break the linkage between the infected machines and the malicious controller.

Manual identification is time-consuming and can lead to collateral damage. Automation is required to enumerate the machines, evaluate the threat, identify the takedown mechanism, and determine the potential collateral damage by the takedown. Using a dataset of DNS registrations over time, the tools were tested across this sample of the Internet over time (from Damballa).

APT (Advance persistent threats) are particularly troublesome as they are machines that persist and change their presence overtime according to the botnet controller. The C&C machines also attempt to go dark by changing their IP resolution to localhost (127.0.0.1), thereby minimizing their observed signature by only having network traffic when an attack is required. This leads to a suite of detection features that can lead to identifying the actual C&C machines, such as having short-lived IP addresses, changing the domain name to NULL or localhost, and varying the IP address across a diverse set of infrastructure and geographic locations.

Then develop a machine learning algorithm, initially with a ground truth of 50k records of APTs. The features are scored and then run through different models using the 90/10 on the ground truth dataset. The following results are only approximate, as I was trying to copy them during the presentation.

Model	Accuracy	True Positive Rate	False Positive Rate
Naive Bayes	70	91	40
General Linear Regression	98	93	1
Random Forest	99	97	0.04

Then apply to the full dataset of 300 million records. These are clustered to 1.1 million clusters, of which ~700 are above 0.8 confidence of being APTs. At 90% confidence, the clusters all contain less than 1000 domain names.

How then do botnets attempt to evade detection? The infected machines generally use DNS to lookup their C&C machines; however, the lookup can be occasionally spurious or to legitimate IPs. The machines could be peer to peer, but this requires active connections that are often blocked or restricted by networks (against "legitimate" uses such as bittorrent).

The suite of tools also operates on the malware running in VMs, whereby it works through possible takedown mechanisms and then observes the response of the infection to takedown thereby identifying other, possibly unused, communication approaches. For most infections, this takes on the order of hours to enumerate through the approaches; however, some can take days.

Open Problems:

Attributing the botnet to physical entities
Targeting P2P-based botnets

Thursday, April 3, 2014

The Information Technology Implications of the President's Intelligence Review Panel

Peter Swire gave a Thomas E. Noonan Distinguished Lecture, titled “The Information Technology Implications of the President's Intelligence Review Panel". An interesting talk based on his time last fall on the President's 5-person committee charged with reviewing the practices of the intelligence community, partially in response to Snowden's leaks. Many recommendations were made in their 300 page report, including the often cited statement "Section 215 is 'not essential'."

A major theme of the talk was the claim that the "half life of secrets is declining". At one time, something classified would stay that way for 25 or more years. There is now increasing probability that directly (through leaks) or indirectly (by inference in non-classified sources) a secret will be publicly disclosed. Decisions must now be made by the intelligence community in light of the fact that their actions will likely be revealed in this near future.

Furthermore, there is a offense / defense tension to the gathering of intelligence. In the past, the discovery of a vulnerability in codes (e.g., encryption), etc would result in orders to change, orders that themselves would likely be undetected by potential foes. But how do you ensure that current systems remain secure, when most (90+%) are in the private sector. And clarify the tension where by e-commerce and dissent are weighed against intelligence gathering and military support (e.g., drones), and all dominated by cat videos.

How does the United States resolve the tension of promoting a freedom agenda (use of Twitter, etc in undemocratic countries) and the need of surveillance against foreign and domestic foes? In the past, secrets and intelligence were the actions of nation-states. Often gathered on physically separate networks against the background of predominantly local communication. Now, the predominant threat is from individuals (i.e., terrorists) and operating in a backdrop of global communication.
Three final points:

Increased privacy protections for non-citizens regardless of locale (see PPD-29)
ACM/IETF Code of Ethics as relates to confidentiality and security
MLAT and the time scales of the treaty versus the internet

I take no stance beyond saying that I recognize that legitimate needs result in a tension and that I found the talk very interesting.