Abstract Type : Oral presentation
Abstract Submission No.: A-0623
Abstract Topic : Dialysis

Real-Time Optimization of Ultrafiltration and Blood Flow Rate in Hemodialysis Using Offline Reinforcement Learning

Min Woo Kang
Department of Internal Medicine-Nephrology, Korea University Guro Hospital, Korea, Republic of

Objectives : Intradialytic hypotension (IDH) and overly aggressive ultrafiltration (UF) remain key barriers to safe, individualized hemodialysis. We developed an offline reinforcement learning (RL) agent to recommend real-time adjustments of UF and blood flow rate (BFR) using the HEMOBP dataset.
Methods : We formulated hemodialysis management as an offline Markov decision process with 10-minute decision intervals over a 4-hour horizon. The state included time-series hemodynamic and dialysis machine signals aggregated per bin and static patient/session characteristics. An Implicit Q-Learning (IQL) policy with a recurrent encoder and twin critics was trained to output bounded continuous actions for UF and BFR. Off-policy evaluation (OPE) incorporated weighted importance sampling–based estimators and fitted Q evaluation (FQE). Uncertainty was summarized using 95% confidence intervals (CIs) from the OPE procedures.
Results : The cohort comprised 924 patients with 126,466 hemodialysis sessions, split into train/validation/test sets of 648/92/184 patients and 89,461/12,826/24,179 sessions, respectively. On the test set, the observed IDH incidence under the behavior (clinician) policy was 54.25% (95% CI 51.95–56.25), whereas the IQL policy was estimated to reduce IDH to 46.00% (95% CI 41.94–50.13) (absolute reduction ≈ 8.3 percentage points). The gap to dry weight target was comparable between behavior and IQL policies (0.45 kg [0.42–0.49] vs 0.45 kg [0.40–0.51], respectively), with similar mean total UF (2.53 L vs 2.53 L). FQE estimated improved expected return for the IQL policy compared with behavior (−57.55 [−58.43 to −56.67] vs −85.44 [−86.66 to −84.22]), consistent with lower IDH-related penalties.
Conclusions : In offline evaluation on HEMOBP, an IQL-based agent learned clinically plausible UF/BFR control and was estimated to lower IDH risk without compromising dry-weight targeting on the test set. Prospective validation and safety-constrained deployment strategies are warranted before clinical translation.

Tables_1.jpg