Table definitions

Simulation Results 

The simulator models (e.g., watcher.models.Simulator.simulate()) return a pandas.DataFrame containing the simulation results. The DataFrame has the following columns:

Note

By default, The age column represents patient age in minutes, relative to the date of birth (assumed to be 00:00). Thus, the hours and minutes of age match the event’s clock time.
By setting watcher.models.Simulator.simulate(age_as_timestamp=True), the age column will represent the timestamp of the event instead of age.
The simulations are sorted by timeline length, from shortest to longest (patient_id=’simulation0’ is the shortest).

pd.DataFrame generated by watcher.models.Simulator
Column Name	Data Type	Description
patient_id	str	pseudo-ID like: simulation0, simulation1, …, simulation255
type	int	0: demographic, 1: admission, 2: discharge, 3: diagnosis, 4: prescription/injection order, 6: laboratory test result
age	str	Patient age as timedelta from date-of-birth or timestamp at event
code	str	Medical code or token (e.g., [DSC] for discharge, [ADM] for admission)
text	str	Human-readable label for the event (e.g., ‘discharge’ for [DSC], or disease name for ICD-10 code)
result	str	Associated value (e.g., lab result, discharge outcome)

Please prepare all the CSV files listed below. These are uploaded into the PostgreSQL database. If the csv name contains a wildcard (*), it indicats that files can be split into multiple files. Make sure all the CSV files are in the same directory.

Note

Ideally, the CSV files for medical codes (dx_codes.csv, med_codes.csv, lab_codes.csv) should cover all codes in your dataset. The model can still work with partial code lists, but the readability of inference results may be reduced (some codes may remain untranslated).
If you cannot separate medication orders into prescription and injection categories, you may place all records into either of the two CSV files. (Internally, these will be aggregated.)

dx_codes.csv 

A list of diagnosis codes in your clinical dataset.

Column Name	Data Type	Non-null	Description
item_code	str	✓	Diagnosis code (e.g., ICD-10)
item_name	str	✓	Human-readable label for the code (e.g., disease name)

med_codes.csv 

A list of medication codes in your clinical dataset.

Column Name	Data Type	Non-null	Description
item_code	str	✓	Medication code (e.g., ATC)
item_name	str	✓	Human-readable label for the code (e.g., drug name)

lab_codes.csv 

A list of laboratory test codes in your clinical dataset.

Column Name	Data Type	Non-null	Description
item_code	str	✓	Laboratory test code (e.g., LOINC, JLAC10)
item_name	str	✓	Human-readable label for the code (e.g., disease name)

patients.csv 

Basic demographic information for each patient

Note

First and last names are not used by the Watcher AI model. The model does not learn or memorize patient names.
Patient name columns exist only for display in the digital-twin EHR app.
Therefore, patient names can be deidentified, random, or left empty.
Dates of birth are used only to compute patient age at each event. The model does not learn or memorize DOBs.

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
sex	str	✓	One of ‘M’ (male), ‘F’ (female), ‘O’ (other), ‘U’ (unknown), ‘A’ (ambiguous), ‘N’ (not applicable)
first_name	str		First name (Not used for training)
last_name	str		Last name (Not used for training)
date_of_birth	datetime	✓	Date of birth (expected format: %Y%m%d)

admission_records*.csv 

Hospital admissions

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Admission timestamp (format: %Y%m%d %H:%M)

discharge_records*.csv 

Hospital discharge events

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Discharge timestamp (format: %Y%m%d %H:%M)
disposition	int	✓	Survival status: 1 (survived), 0 (died)

diagnosis_records*.csv 

Diagnosis events with codes and provisional flags

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Diagnosis timestamp (format: %Y%m%d %H:%M)
item_code	str	✓	Diagnosis code (e.g., ICD-10)
provisional	int	✓	Provisional flag: 1 if provisional, 0 otherwise

prescription_order_records*.csv 

Medication orders

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Prescription order timestamp (format: %Y%m%d %H:%M)
item_code	str	✓	Medication code (e.g., ATC)

injection_order_records*.csv 

Injectable medication orders

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Prescription order timestamp (format: %Y%m%d %H:%M)
item_code	str	✓	Medication code (e.g., ATC)

laboratory_test_results*.csv 

Laboratory test results including numeric and categorical values

Warning

Each laboratory code should be associated with a single unit.
If a code is reported with multiple units (e.g., mg/dL and mg/L), users are encouraged to map them to a common unit whenever possible.
Although optional, standardizing units is highly recommended for efficient model training.

Note

Fill either numeric or nonnumeric, not both.
One of numeric or nonnumeric must be present.

Column Name	Data Type	Non-null	Description
patient_id	str	✓	Unique identifier for each patient
timestamp	datetime	✓	Laboratory test timestamp (format: %Y%m%d %H:%M)
item_code	str	✓	Laboratory test code (e.g., LOINC, JLAC10)
numeric	float		Numeric test result
unit	str		Unit associated with numeric result
nonnumeric	str		Non-numeric test result