Watcher Documentation

Watcher is a generative AI model that simulates patient timelines. The AI takes a patient timeline—real or synthetic—as input and autoregressively generates future clinical timeline. It enables computational modeling of patient trajectories. These capabilities may support future downstream applications such as personalized medicine, in-silico clinical trials, and counterfactual predictions. Or, it can simply synthesize large-scale synthetic patient databases.

How the model generates patient timelines

Watcher was initially designed as the backend simulator in our digital-twin framework (see below). However, it can also be used independently as a stand-alone simulation engine for various applications.

Start with 👉 Tutorial

GitHub 👉 https://github.com/yuakagi/Watcher

Model Overview

Watcher model architecture

Watcher takes a patient timeline as input and autoregressively generates future clinical events. It encodes categorical, numeric, and temporal entries into vectors, which are processed by decoder-only transformer layers. At each step, the model outputs a probability distribution over its entire vocabulary.

Digital-Twin EHR System

Watcher serves as the backend simulator in our digital-twin framework (Figure):

Watcher model architecture

This system enables the simulation of patient trajectories based on real-world clinical data, allowing opportunities for possible downstream applications such as personalized medicine, in-silico clinical trials and more. It consists of three components:

  1. AI model [GitHub]: A generative model that simulates patient timelines. (👈 This package you are currently reading.)

  2. Digital-twin EHR [GitHub]: A web-based, AI-powered EHR that interacts with the model and visualizes simulation results.

  3. Data pipeline: A data pipeline that supplies real-world clinical data to the data server. (Use our data pipeline or,use clinical data you prepared yourself)

To use the full digital-twin system, please follow these steps:

Step 1: Prepare your clinical data (please see the note below)
  • Required clinical data are defined in Clinical Records

  • You can use your own clinical data or publicly available datasets.

  • Or, for Japanese hospitals, you can use our data pipeline to collect and clean clinical data.

Step 2: Upload clinical data to database (Use Watcher package)
  • Watcher package provides a docker container for PostgreSQL database.

  • You can upload your clinical data to the database using the package.

Step 3: Train the AI model (Use Watcher package)
  • Train (pretrain & fine-tune) the AI model using the Watcher package following the tutorial.

Step 4: Launch the simulation API server (Use Watcher package)
  • The Watcher package provides a simulation API server that runs the AI model (gunicorn + Flask).

  • This will be the API server that the digital-twin EHR system will communicate with.

  • Launch the server following the tutorial.

Step 5: Launch the digital-twin EHR system (Use TwinEHR)
  • TwinEHR is a Django based web application that provides a user interface for the simulation API.

  • Clone the repository and set proper environment variables to connect to the simulation API server.

  • Then, run the Django server and access the web application.

Note

  • Although we provide a data pipeline for the Japanese HL7 standard only, users outside Japan are fully supported.

  • The digital-twin system can run independently of location, as long as clinical data are structured into relational tables following our schema (see Clinical Records).

  • Prepare your own datasets or publicly available clinical datasets, and preprocess them to match the schema used in this package.

  • For Japanese users, our data pipeline can conveniently collect and clean clinical data, but its use is not mandatory.