Table of Content
What is Deterministic Record/Replay
Deterministic application record and replay is the ability to record application execution and deterministically replay it at a later time. Record-replay has many potential uses:
- Diagnosing and debugging applications by capturing and reproducing hard to find bugs.
- Dynamic application analysis by performing costly instrumentation on replicas that replay application behavior recorded on production systems.
- Intrusion analysis by capturing intrusions involving non-deterministic effects.
- Fault-tolerance by providing replicas that replay execution and at the occurrence of a fault, go live in place of the previously running application instance.
What is Scribe
Scribe is a record/replay engine to provide deterministic execution record and replay of generic applications on Linux.
- Deterministic: replay produces the same outcome (filesystem, network, etc) recorded.
- Transparent: no change, relink, or recompile applications/libraries, no specialized hardware.
- Multi-process/threaded: record and replay of applications with multiple processes and threads.
- Go-Live: replay of a recorded execution can transition to live execution at any point.
- Low runtime overhead: efficient record and replay of both server and desktop applications.
- Robust: tested on a wide variety of applications, such as Apache, MySQL, Nginx, MPlayer, and more.
- Extensible: python API to control record and replay behavior and to explore execution recordings.
TL;DR Scribe is a time machine for applications – some wizardry to bend the time-space continuum.
Watch Scribe in Action
download mp4 (12:43 / 720p / 37Mb)
Scribe is Extensible
Scribe records application execution into a log file, and can later replay the same application from the log file. It has APIs to inspect the log files, to modify the logged execution, to control the recording and replaying and fiddle with its state.
But there is more to Scribe ...
- Scribe can do tandem-like application execution, where the recording on one host is streamed to a second host and replayed in real time.
- Scribe can be used to record application execution, then modify the resulting log to force difference behavior when the application is replayed. For eaxmple, replay an multi-process applications with different scheduling to automatically expose and detect harmful race conditions (Racepro).
- Scribe can be used to record application exeuction, then replay a slight modified application while tolerating certain divergence from the expected execution indicated in the logs. For example, replay an application with debugging enabled from a recording without debugging output. (mutable replay).
Indeed, in the video, I spend some time showing what we call mutable replay. It’s a new concept that we’ve introduced and formalized. I will be presenting our research next March at the ASPLOS 2013 conference. The mutable replay engine plugs into the Scribe engine through the Python library. The mutable replay sources are not yet distributed.
Scribe is Work-in-Progress
Scribe is pretty robust, but not production ready. Some applications are not well supported. I am looking for hackers to join the project, as I cannot do it by myself: after having dedicated three years to this project, it has reached the complexity level that evidently demands additional brain power.
The code quality is fairly high, but there is room for improvements.
This is how I envision the roadmap:
Coverage: Some applications don’t replay well. Some used to work on the original prototype, like Firefox and OpenOffice, but not anymore for obscure reasons that need further exploration.
Interpreters: Scribe is transparent, meaning it requires no changes to the applications. In some cases, it actually does make sense to change the application. For example, if the goal is to record and replay programs in languages like Ruby/Python/Java, we may get away without record an replay of the internals of the respective VM. In fact, I started to patch the Ruby interpreter to make it Scribe aware (see here). Mutable replay works much better when it has context.
Distributed: I want to be able to record an application that spans multiple servers. Because Scribe records all the interactions the application has with its external environment, you don’t want to record separately the database and the application.
“I have a Dream”
With these three componants in place, I can fullfil a dream: being a web developer, I’d like to have an entire web stack recorded. When a user clicks on the “Feedback” button, I would replay the whole system locally and observe exactly what the user got by replaying her entire session. With that, I’d like to replay it faithfully down to the race that may have happened in the database. I’d also like to be able to modify the code to understand it better while it’s replaying. With enough brains on this, we can make it a reality. I will expand on this idea in another blog post.
Scribe is Open-source
Installation instructions are in the kernel repository.
Try it yourself
Note: the VM does not contain the latest version Scribe, so you may want to update the sources and recompile before you start to play with it.
These papers are a must read for anybody who wants to understand more:
Transparent Mutable Replay for Multicore Debugging and Patch Validation
Nicolas Viennot, Siddharth Nair, and Jason Nieh, Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013), Houston, Texas, March 2013. To appear.
Pervasive Detection of Process Races in Deployed Systems
Oren Laadan, Nicolas Viennot, Chia-Che Tsai, Chris Blinn, Junfeng Yang, and Jason Nieh, Proceedings of the Twenty-third ACM Symposium on Operating Systems Principles (SOSP 2011), Cascais, Portugal, October 2011
Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems
Oren Laadan, Nicolas Viennot, Jason Nieh, Proceedings of ACM SIGMETRICS 2010 Conference on Measurement and Modeling of Computer Systems, New York, NY, June 2010
- My friend Sid Nair implemented the mutable replay engine with me, he was great to work with.
Without Sid, mutable replay it would still be fiction.
- My PhD mentor Oren Laadan heavily brainstormed with me during the implementation of the first scribe prototype. He had all the good ideas.
- My PhD advisor Jason Nieh asked the right questions at the right moment, and pushed me really hard to make it happen.