Table of Content
What is Deterministic Record/Replay
Deterministic application record and replay is the ability to record application execution and deterministically replay it at a later time. Record-replay has many potential uses:
- Diagnosing and debugging applications by capturing and reproducing hard to find bugs.
- Dynamic application analysis by performing costly instrumentation on replicas that replay application behavior recorded on production systems.
- Intrusion analysis by capturing intrusions involving non-deterministic effects.
- Fault-tolerance by providing replicas that replay execution and at the occurrence of a fault, go live in place of the previously running application instance.
What is Scribe
Scribe is a record/replay engine to provide deterministic execution record and replay of generic applications on Linux.
- Deterministic: replay produces the same outcome (filesystem, network, etc) recorded.
- Transparent: no change, relink, or recompile applications/libraries, no specialized hardware.
- Multi-process/threaded: record and replay of applications with multiple processes and threads.
- Go-Live: replay of a recorded execution can transition to live execution at any point.
- Low runtime overhead: efficient record and replay of both server and desktop applications.
- Robust: tested on a wide variety of applications, such as Apache, MySQL, Nginx, MPlayer, and more.
- Extensible: python API to control record and replay behavior and to explore execution recordings.
TL;DR Scribe is a time machine for applications – some wizardry to bend the time-space continuum.
Watch Scribe in Action
download mp4 (12:43 / 720p / 37Mb)
Scribe is Extensible
Scribe records application execution into a log file, and can later replay the same application from the log file. It has APIs to inspect the log files, to modify the logged execution, to control the recording and replaying and fiddle with its state.
But there is more to Scribe
- Scribe can do tandem-like application execution, where the recording on one host is streamed to a second host and replayed in real time.
- Scribe can be used to record application execution, then modify the resulting log to force difference behavior when the application is replayed. For example, replay an multi-process applications with different scheduling to automatically expose and detect harmful race conditions (Racepro).
- Scribe can be used to record an application execution, and replay it after modifying the application, tolerating a divergent execution from the original one to some extent. For example, replay an application with debugging enabled from a recording without debugging output. This is a new concept that we’ve introduced in the mutable replay paper. The mutable replay engine plugs into the Scribe engine through the Python library.
Here’s what I’d focus on next:
Coverage: Some applications don’t replay well. Some used to work on the original prototype, like Firefox and OpenOffice, but not anymore for obscure reasons that need further exploration.
Interpreters: Scribe is transparent, meaning it requires no changes to the applications. In some cases, it actually does make sense to change the application. For example, if the goal is to record and replay programs in languages like Ruby/Python/Java, we may get away without record an replay of the internals of the respective VM. I started to patch the Ruby interpreter to make it Scribe aware (see here). Mutable replay works much better when it has context.
Distributed: I want to be able to record an application that spans multiple servers. Because Scribe records all the interactions the application has with its external environment, you don’t want to record separately the database and the application.
“I have a Dream”
With these three components in place, I can fulfil a dream: being a web developer, I’d like to have an entire web stack recorded. When a user clicks on the “Feedback” button, I would replay the whole system locally and observe exactly what the user got by replaying her entire session. With that, I’d like to replay it faithfully down to the race that may have happened in the database. I’d also like to be able to modify the code to understand it better while it’s replaying. With enough brains on this, we can make it a reality.
Installation instructions are in the kernel repository. The mutable replay engine has no documentation (sorry), but you can try
Try it yourself
Note: the VM does not contain the latest version Scribe, so you may want to update the sources and recompile before you start to play with it.
The implementation of Scribe resulted in three papers:
Transparent Mutable Replay for Multicore Debugging and Patch Validation
Nicolas Viennot, Siddharth Nair, and Jason Nieh, Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013), Houston, Texas, March 2013
Pervasive Detection of Process Races in Deployed Systems
Oren Laadan, Nicolas Viennot, Chia-Che Tsai, Chris Blinn, Junfeng Yang, and Jason Nieh, Proceedings of the Twenty-third ACM Symposium on Operating Systems Principles (SOSP 2011), Cascais, Portugal, October 2011
Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems
Oren Laadan, Nicolas Viennot, Jason Nieh, Proceedings of ACM SIGMETRICS 2010 Conference on Measurement and Modeling of Computer Systems, New York, NY, June 2010
- My friend Sid Nair implemented the mutable replay engine with me, he was great to work with.
Without Sid, mutable replay it would still be fiction.
- My PhD mentor Oren Laadan heavily brainstormed with me during the implementation of the first scribe prototype. He had all the good ideas.
- My PhD advisor Jason Nieh asked the right questions at the right moment, and pushed me really hard to make it happen.