Scribe: The Deterministic Transparent Record/Replay Engine

Table of Content

» What is Deterministic Record/Replay
» What is Scribe
» Watch Scribe in Action
» Scribe is Extensible
» Scribe is Work-in-Progress
» Scribe is Open-source
» Scribe Publications
» Acknowledgments

What is Deterministic Record/Replay

Deterministic application record and replay is the ability to record application execution and deterministically replay it at a later time. Record-replay has many potential uses: * Diagnosing and debugging applications by capturing and reproducing hard to find bugs. * Dynamic application analysis by performing costly instrumentation on replicas that replay application behavior recorded on production systems. * Intrusion analysis by capturing intrusions involving non-deterministic effects. * Fault-tolerance by providing replicas that replay execution and at the occurrence of a fault, go live in place of the previously running application instance.

What is Scribe

Scribe is a record/replay engine to provide deterministic execution record and replay of generic applications on Linux.

  • Deterministic: replay produces the same outcome (filesystem, network, etc) recorded.
  • Transparent: no change, relink, or recompile applications/libraries, no specialized hardware.
  • Multi-process/threaded: record and replay of applications with multiple processes and threads.
  • Go-Live: replay of a recorded execution can transition to live execution at any point.
  • Low runtime overhead: efficient record and replay of both server and desktop applications.
  • Robust: tested on a wide variety of applications, such as Apache, MySQL, Nginx, MPlayer, and more.
  • Extensible: python API to control record and replay behavior and to explore execution recordings.

TL;DR Scribe is a time machine for applications – some wizardry to bend the time-space continuum.

Watch Scribe in Action

download mp4 (12:43 / 720p / 37Mb)

Scribe is Extensible

Scribe records application execution into a log file, and can later replay the same application from the log file. It has APIs to inspect the log files, to modify the logged execution, to control the recording and replaying and fiddle with its state.

But there is more to Scribe …

  • Scribe can do tandem-like application execution, where the recording on one host is streamed to a second host and replayed in real time.
  • Scribe can be used to record application execution, then modify the resulting log to force difference behavior when the application is replayed. For example, replay an multi-process applications with different scheduling to automatically expose and detect harmful race conditions (Racepro).
  • Scribe can be used to record an application execution, and replay it after modifying the application, tolerating a divergent execution from the original one to some extent. For example, replay an application with debugging enabled from a recording without debugging output. This is a new concept that we’ve introduced in the mutable replay paper. The mutable replay engine plugs into the Scribe engine through the Python library.

Future Work

Here’s what I’d focus on next:

  • Coverage: Some applications don’t replay well. Some used to work on the original prototype, like Firefox and OpenOffice, but not anymore for obscure reasons that need further exploration.

  • Interpreters: Scribe is transparent, meaning it requires no changes to the applications. In some cases, it actually does make sense to change the application. For example, if the goal is to record and replay programs in languages like Ruby/Python/Java, we may get away without record an replay of the internals of the respective VM. I started to patch the Ruby interpreter to make it Scribe aware (see here). Mutable replay works much better when it has context.

  • Distributed: I want to be able to record an application that spans multiple servers. Because Scribe records all the interactions the application has with its external environment, you don’t want to record separately the database and the application.

“I have a Dream”

With these three components in place, I can fulfil a dream: being a web developer, I’d like to have an entire web stack recorded. When a user clicks on the “Feedback” button, I would replay the whole system locally and observe exactly what the user got by replaying her entire session. With that, I’d like to replay it faithfully down to the race that may have happened in the database. I’d also like to be able to modify the code to understand it better while it’s replaying. With enough brains on this, we can make it a reality.


Scribe sources

Git Repositories

Installation instructions are in the kernel repository. The mutable replay engine has no documentation (sorry), but you can try mrecord and mreplay -h.

Try it yourself

Download an Ubuntu 12.04 VM loaded with vanilla Scribe here (1.1Gb).
You need VMware Workstation or VMware Player (free) to use it.
Login with root/root.

Note: the VM does not contain the latest version Scribe, so you may want to update the sources and recompile before you start to play with it.

Scribe Publications

The implementation of Scribe resulted in three papers:

Acknowledgments

  • My friend Sid Nair implemented the mutable replay engine with me, he was great to work with.
    Without Sid, mutable replay it would still be fiction.
  • My PhD mentor Oren Laadan heavily brainstormed with me during the implementation of the first scribe prototype. He had all the good ideas.
  • My PhD advisor Jason Nieh asked the right questions at the right moment, and pushed me really hard to make it happen.

Thank you.


Published: November 27 2012

Edited: March 26 2014

blog comments powered by Disqus