Table definitions
Simulation Results
The simulator models (e.g., watcher.models.Simulator.simulate()
) return a pandas.DataFrame containing the simulation results.
The DataFrame has the following columns:
Note
By default, The age column represents patient age in minutes, relative to the date of birth (assumed to be 00:00). Thus, the hours and minutes of age match the event’s clock time.
By setting watcher.models.Simulator.simulate(age_as_timestamp=True), the age column will represent the timestamp of the event instead of age.
The simulations are sorted by timeline length, from shortest to longest (patient_id=’simulation0’ is the shortest).
Column Name |
Data Type |
Description |
---|---|---|
patient_id |
str |
pseudo-ID like: simulation0, simulation1, …, simulation255 |
type |
int |
0: demographic, 1: admission, 2: discharge, 3: diagnosis, 4: prescription/injection order, 6: laboratory test result |
age |
str |
Patient age as timedelta from date-of-birth or timestamp at event |
code |
str |
Medical code or token (e.g., [DSC] for discharge, [ADM] for admission) |
text |
str |
Human-readable label for the event (e.g., ‘discharge’ for [DSC], or disease name for ICD-10 code) |
result |
str |
Associated value (e.g., lab result, discharge outcome) |
Clinical Records
Please prepare all the CSV files listed below. These are uploaded into the PostgreSQL database. If the csv name contains a wildcard (*), it indicats that files can be split into multiple files. Make sure all the CSV files are in the same directory.
Note
Ideally, the CSV files for medical codes (dx_codes.csv, med_codes.csv, lab_codes.csv) should cover all codes in your dataset. The model can still work with partial code lists, but the readability of inference results may be reduced (some codes may remain untranslated).
If you cannot separate medication orders into prescription and injection categories, you may place all records into either of the two CSV files. (Internally, these will be aggregated.)
dx_codes.csv
A list of diagnosis codes in your clinical dataset.
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
item_code |
str |
✓ |
Diagnosis code (e.g., ICD-10) |
item_name |
str |
✓ |
Human-readable label for the code (e.g., disease name) |
med_codes.csv
A list of medication codes in your clinical dataset.
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
item_code |
str |
✓ |
Medication code (e.g., ATC) |
item_name |
str |
✓ |
Human-readable label for the code (e.g., drug name) |
lab_codes.csv
A list of laboratory test codes in your clinical dataset.
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
item_code |
str |
✓ |
Laboratory test code (e.g., LOINC, JLAC10) |
item_name |
str |
✓ |
Human-readable label for the code (e.g., disease name) |
patients.csv
Basic demographic information for each patient
Note
First and last names are not used by the Watcher AI model. The model does not learn or memorize patient names.
Patient name columns exist only for display in the digital-twin EHR app.
Therefore, patient names can be deidentified, random, or left empty.
Dates of birth are used only to compute patient age at each event. The model does not learn or memorize DOBs.
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
sex |
str |
✓ |
One of ‘M’ (male), ‘F’ (female), ‘O’ (other), ‘U’ (unknown), ‘A’ (ambiguous), ‘N’ (not applicable) |
first_name |
str |
First name (Not used for training) |
|
last_name |
str |
Last name (Not used for training) |
|
date_of_birth |
datetime |
✓ |
Date of birth (expected format: %Y%m%d) |
admission_records*.csv
Hospital admissions
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Admission timestamp (format: %Y%m%d %H:%M) |
discharge_records*.csv
Hospital discharge events
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Discharge timestamp (format: %Y%m%d %H:%M) |
disposition |
int |
✓ |
Survival status: 1 (survived), 0 (died) |
diagnosis_records*.csv
Diagnosis events with codes and provisional flags
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Diagnosis timestamp (format: %Y%m%d %H:%M) |
item_code |
str |
✓ |
Diagnosis code (e.g., ICD-10) |
provisional |
int |
✓ |
Provisional flag: 1 if provisional, 0 otherwise |
prescription_order_records*.csv
Medication orders
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Prescription order timestamp (format: %Y%m%d %H:%M) |
item_code |
str |
✓ |
Medication code (e.g., ATC) |
injection_order_records*.csv
Injectable medication orders
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Prescription order timestamp (format: %Y%m%d %H:%M) |
item_code |
str |
✓ |
Medication code (e.g., ATC) |
laboratory_test_results*.csv
Laboratory test results including numeric and categorical values
Warning
Each laboratory code should be associated with a single unit.
If a code is reported with multiple units (e.g., mg/dL and mg/L), users are encouraged to map them to a common unit whenever possible.
Although optional, standardizing units is highly recommended for efficient model training.
Note
Fill either numeric or nonnumeric, not both.
One of numeric or nonnumeric must be present.
Column Name |
Data Type |
Non-null |
Description |
---|---|---|---|
patient_id |
str |
✓ |
Unique identifier for each patient |
timestamp |
datetime |
✓ |
Laboratory test timestamp (format: %Y%m%d %H:%M) |
item_code |
str |
✓ |
Laboratory test code (e.g., LOINC, JLAC10) |
numeric |
float |
Numeric test result |
|
unit |
str |
Unit associated with numeric result |
|
nonnumeric |
str |
Non-numeric test result |