Table definitions

Simulation Results

The simulator models (e.g., watcher.models.Simulator.simulate()) return a pandas.DataFrame containing the simulation results. The DataFrame has the following columns:

Note

  • By default, The age column represents patient age in minutes, relative to the date of birth (assumed to be 00:00). Thus, the hours and minutes of age match the event’s clock time.

  • By setting watcher.models.Simulator.simulate(age_as_timestamp=True), the age column will represent the timestamp of the event instead of age.

  • The simulations are sorted by timeline length, from shortest to longest (patient_id=’simulation0’ is the shortest).

pd.DataFrame generated by watcher.models.Simulator

Column Name

Data Type

Description

patient_id

str

pseudo-ID like: simulation0, simulation1, …, simulation255

type

int

0: demographic, 1: admission, 2: discharge, 3: diagnosis, 4: prescription/injection order, 6: laboratory test result

age

str

Patient age as timedelta from date-of-birth or timestamp at event

code

str

Medical code or token (e.g., [DSC] for discharge, [ADM] for admission)

text

str

Human-readable label for the event (e.g., ‘discharge’ for [DSC], or disease name for ICD-10 code)

result

str

Associated value (e.g., lab result, discharge outcome)

Clinical Records

Please prepare all the CSV files listed below. These are uploaded into the PostgreSQL database. If the csv name contains a wildcard (*), it indicats that files can be split into multiple files. Make sure all the CSV files are in the same directory.

Note

  • Ideally, the CSV files for medical codes (dx_codes.csv, med_codes.csv, lab_codes.csv) should cover all codes in your dataset. The model can still work with partial code lists, but the readability of inference results may be reduced (some codes may remain untranslated).

  • If you cannot separate medication orders into prescription and injection categories, you may place all records into either of the two CSV files. (Internally, these will be aggregated.)

dx_codes.csv

A list of diagnosis codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Diagnosis code (e.g., ICD-10)

item_name

str

Human-readable label for the code (e.g., disease name)

med_codes.csv

A list of medication codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Medication code (e.g., ATC)

item_name

str

Human-readable label for the code (e.g., drug name)

lab_codes.csv

A list of laboratory test codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Laboratory test code (e.g., LOINC, JLAC10)

item_name

str

Human-readable label for the code (e.g., disease name)

patients.csv

Basic demographic information for each patient

Note

  • First and last names are not used by the Watcher AI model. The model does not learn or memorize patient names.

  • Patient name columns exist only for display in the digital-twin EHR app.

  • Therefore, patient names can be deidentified, random, or left empty.

  • Dates of birth are used only to compute patient age at each event. The model does not learn or memorize DOBs.

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

sex

str

One of ‘M’ (male), ‘F’ (female), ‘O’ (other), ‘U’ (unknown), ‘A’ (ambiguous), ‘N’ (not applicable)

first_name

str

First name (Not used for training)

last_name

str

Last name (Not used for training)

date_of_birth

datetime

Date of birth (expected format: %Y%m%d)

admission_records*.csv

Hospital admissions

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Admission timestamp (format: %Y%m%d %H:%M)

discharge_records*.csv

Hospital discharge events

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Discharge timestamp (format: %Y%m%d %H:%M)

disposition

int

Survival status: 1 (survived), 0 (died)

diagnosis_records*.csv

Diagnosis events with codes and provisional flags

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Diagnosis timestamp (format: %Y%m%d %H:%M)

item_code

str

Diagnosis code (e.g., ICD-10)

provisional

int

Provisional flag: 1 if provisional, 0 otherwise

prescription_order_records*.csv

Medication orders

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Prescription order timestamp (format: %Y%m%d %H:%M)

item_code

str

Medication code (e.g., ATC)

injection_order_records*.csv

Injectable medication orders

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Prescription order timestamp (format: %Y%m%d %H:%M)

item_code

str

Medication code (e.g., ATC)

laboratory_test_results*.csv

Laboratory test results including numeric and categorical values

Warning

  • Each laboratory code should be associated with a single unit.

  • If a code is reported with multiple units (e.g., mg/dL and mg/L), users are encouraged to map them to a common unit whenever possible.

  • Although optional, standardizing units is highly recommended for efficient model training.

Note

  • Fill either numeric or nonnumeric, not both.

  • One of numeric or nonnumeric must be present.

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Laboratory test timestamp (format: %Y%m%d %H:%M)

item_code

str

Laboratory test code (e.g., LOINC, JLAC10)

numeric

float

Numeric test result

unit

str

Unit associated with numeric result

nonnumeric

str

Non-numeric test result