This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
autoformat [2010/05/27 16:40] dxu created |
autoformat [2010/05/27 16:43] dxu created |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== REWARDS: Automatic Reverse Engineering of Data Structures from Binary Execution ====== | + | ====== AutoFormat: Automatic Protocol Format Reverse Engineering Through Context-Aware Monitored Execution ====== |
- | With only the binary executable of a program, it is useful to discover the program's data structures and infer their syntactic and semantic definitions. Such knowledge is highly valuable in a variety of security and forensic applications. Although there exist efforts in program data structure inference, the existing solutions are not suitable for our targeted application scenarios. In this paper, we propose a reverse engineering technique to automatically reveal program data structures from binaries. Our technique, called REWARDS, is based on dynamic analysis. More specifically, each memory location accessed by the program is tagged with a timestamped type attribute. Following the program's runtime data flow, this attribute is propagated to other memory locations and registers that share the same type. During the propagation, a variable's type gets resolved if it is involved in a type-revealing execution point or type sink. More importantly, besides the forward type propagation, REWARDS involves a backward type resolution procedure where the types of some previously accessed variables get recursively resolved starting from a type sink. This procedure is constrained by the timestamps of relevant memory locations to disambiguate variables re-using the same memory location. In addition, REWARDS is able to reconstruct in-memory data structure layout based on the type information derived. We demonstrate that REWARDS provides unique benefits to two applications: memory image forensics and binary fuzzing for vulnerability discovery. | + | Protocol reverse engineering has often been a manual process that is considered time-consuming, tedious and error-prone. To address this limitation, a number of solutions have recently been proposed to allow for automatic protocol reverse engineering. Unfortunately, they are either limited in extracting protocol fields due to lack of program semantics in network traces or primitive in only revealing the flat structure of protocol format. In this paper, we present a system called AutoFormat that aims at not only extracting protocol fields with high accuracy, but also revealing the inherently "non-flat", hierarchical structures of protocol messages. AutoFormat is based on the key insight that different protocol fields in the same message are typically handled in different execution contexts (e.g., the runtime call stack). As such, by monitoring the program execution, we can collect the execution context information for every message byte (annotated with its offset in the entire message) and cluster them to derive the protocol format. We have evaluated our system with more than 30 protocol messages from seven protocols, including two text-based protocols (HTTP and SIP), three binary-based protocols (DHCP, RIP, and OSPF), one hybrid protocol (CIFS/SMB), as well as one unknown protocol used by a real-world malware. Our results show that AutoFormat can not only identify individual message fields automatically and with high accuracy (an average 93:4% match ratio compared with Wireshark), but also unveil the structure of the protocol format by revealing possible relations (e.g., sequential, parallel, and hierarchical) among the message fields. |
===== Publications ===== | ===== Publications ===== | ||
- | * "Automatic Reverse Engineering of Data Structures from Binary Execution". Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Proceedings of the 17th Network and Distributed System Security Symposium (NDSS 2010), San Diego, CA, February 2010 | + | * "Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution". Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. Proceedings of the 15th Network and Distributed System Security Symposium (NDSS 2008), San Diego, CA, February 2008 |
- | * [[http://friends.cs.purdue.edu/pubs/NDSS10.pdf|Paper]] in PDF format. | + | * [[http://friends.cs.purdue.edu/pubs/NDSS08.pdf|Paper]] in PDF format. |
- | * [[http://www.cs.purdue.edu/homes/zlin/file/NDSS10.ppt|Slides]] in PPT format. | + | * [[http://www.cs.purdue.edu/homes/zlin/file/NDSS08.ppt|Slides]] in PPT format. |
===== Software ===== | ===== Software ===== | ||
- | We are working on the next generation REWARDS. We will release our code shortly. | + | Right now we have two versions of AutoFormat, a Valgrind based and a QEMU based. If you want to play with it, write to us. |
- | + | ===== People ==== | |
- | ===== People ===== | + | |
* [[http://www.cs.purdue.edu/homes/zlin/|Zhiqiang Lin]] | * [[http://www.cs.purdue.edu/homes/zlin/|Zhiqiang Lin]] | ||
+ | * [[http://www.csc.ncsu.edu/faculty/jiang/|Xuxian Jiang]] | ||
+ | * [[http://www.cs.purdue.edu/homes/dxu/|Dongyan Xu]] | ||
* [[http://www.cs.purdue.edu/homes/xyzhang/|Xiangyu Zhang]] | * [[http://www.cs.purdue.edu/homes/xyzhang/|Xiangyu Zhang]] | ||
- | * [[http://www.cs.purdue.edu/homes/dxu/|Dongyan Xu]] | ||