Posts

Showing posts from March, 2010

NSDI Paper - SplitScreen Virus Scanning

(work-related post ahead) The camera-ready version of our NSDI paper, SplitScreen: Enabling Efficient, Distributed Malware Detection is now online. I'm mentioning it because it has (to me) a fun story behind it: The project started out with absolutely nothing to do with virus or malware scanning. I like this one as an example of happenstance in research, as I'll explain. And it has a cute trick involving Bloom filters, so what more could you want? Two years ago, I was chatting with Tom Mitchell about his Read the Web project . Part of RTW involves searching the web to count how many times a phrase occurs, for millions of phrases. In other words, fgrep -f BigPhraseFile web/* . The BigPhraseFile is big. The web is really big. We wanted to run this search on our FAWN cluster of tiny computers, to see if we could do it better, faster, cheaper, lower-power. But then a problem arose: grep built a 4 gigabyte search trie (using a variant of Aho-Corasick ). Our first-