Table definitions

Simulation Results

The simulator models (e.g., watcher.models.Simulator.simulate()) return a pandas.DataFrame containing the simulation results. The DataFrame has the following columns:

Note

  • By default, The age column represents patient age in minutes, relative to the date of birth (assumed to be 00:00). Thus, the hours and minutes of age match the event’s clock time.

  • By setting watcher.models.Simulator.simulate(age_as_timestamp=True), the age column will represent the timestamp of the event instead of age.

  • The simulations are sorted by timeline length, from shortest to longest (patient_id=’simulation0’ is the shortest).

pd.DataFrame generated by watcher.models.Simulator

Column Name

Data Type

Description

patient_id

str

pseudo-ID like: simulation0, simulation1, …, simulation255

type

int

0: demographic, 1: admission, 2: discharge, 3: diagnosis, 4: prescription/injection order, 6: laboratory test result

age

str

Patient age as timedelta from date-of-birth or timestamp at event

code

str

Medical code or token (e.g., [DSC] for discharge, [ADM] for admission)

text

str

Human-readable label for the event (e.g., ‘discharge’ for [DSC], or disease name for ICD-10 code)

result

str

Associated value (e.g., lab result, discharge outcome)

Clinical Records

Please prepare all the CSV files listed below.
These are uploaded into the PostgreSQL database using watcher.db.init_db_with_csv().
Place all the CSV files in a single directory and provide the directory path to the function as data_source.
If the csv name contains a wildcard (*), it indicats that files can be split into multiple files. * can be replaced with any string (e.g., admission_records_1.csv, admission_records_2.csv, etc.).

Note

  • Ideally, the CSV files for medical codes (dx_codes.csv, med_codes.csv, lab_codes.csv) should cover all codes in your dataset. The model can still work with partial code lists, but the readability of inference results may be reduced (some codes may remain untranslated).

  • If you cannot separate medication orders into prescription and injection categories, you may place all records into either of the two CSV files. (Internally, these will be aggregated.)

dx_codes.csv

A list of diagnosis codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Diagnosis code (e.g., ICD-10)

item_name

str

Human-readable label for the code (e.g., disease name)

med_codes.csv

A list of medication codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Medication code (e.g., ATC)

item_name

str

Human-readable label for the code (e.g., drug name)

lab_codes.csv

A list of laboratory test codes in your clinical dataset.

Column Name

Data Type

Non-null

Description

item_code

str

Laboratory test code (e.g., LOINC, JLAC10)

item_name

str

Human-readable label for the code (e.g., disease name)

patients.csv

Basic demographic information for each patient

Note

  • First and last names are not used by the Watcher AI model. The model does not learn or memorize patient names.

  • Patient name columns exist only for display in the digital-twin EHR app.

  • Therefore, patient names can be deidentified, random, or left empty.

  • Dates of birth are used only to compute patient age at each event. The model does not learn or memorize DOBs.

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

sex

str

One of ‘M’ (male), ‘F’ (female), ‘O’ (other), ‘U’ (unknown), ‘A’ (ambiguous), ‘N’ (not applicable)

first_name

str

First name (Not used for training)

last_name

str

Last name (Not used for training)

date_of_birth

datetime

Date of birth (expected format: %Y/%m/%d)

admission_records*.csv

Hospital admissions

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Admission timestamp (format: %Y/%m/%d %H:%M)

discharge_records*.csv

Hospital discharge events

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Discharge timestamp (format: %Y/%m/%d %H:%M)

disposition

int

Survival status: 1 (survived), 0 (died)

diagnosis_records*.csv

Diagnosis events with codes and provisional flags

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Diagnosis timestamp (format: %Y/%m/%d %H:%M)

item_code

str

Diagnosis code (e.g., ICD-10)

provisional

int

Provisional flag: 1 if provisional, 0 otherwise

prescription_order_records*.csv

Medication orders

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Prescription order timestamp (format: %Y/%m/%d %H:%M)

item_code

str

Medication code (e.g., ATC)

injection_order_records*.csv

Injectable medication orders

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Prescription order timestamp (format: %Y/%m/%d %H:%M)

item_code

str

Medication code (e.g., ATC)

laboratory_test_results*.csv

Laboratory test results including numeric and categorical values

Warning

  • Each laboratory code should be associated with a single unit.

  • If a code is reported with multiple units (e.g., mg/dL and mg/L), users are encouraged to map them to a common unit whenever possible.

  • Although optional, standardizing units is highly recommended for efficient model training.

Note

  • Fill either numeric or nonnumeric, not both.

  • One of numeric or nonnumeric must be present.

Column Name

Data Type

Non-null

Description

patient_id

str

Unique identifier for each patient

timestamp

datetime

Laboratory test timestamp (format: %Y/%m/%d %H:%M)

item_code

str

Laboratory test code (e.g., LOINC, JLAC10)

numeric

float

Numeric test result

unit

str

Unit associated with numeric result

nonnumeric

str

Non-numeric test result