Embedding OR in Product

From Dashboards to Decisions

There is a spectrum of ambition for how operations research appears inside a healthcare product. At one end, OR informs a metric on a screen — utilization displayed as a number, wait time shown as a trend. This is passive. The product reports what the queueing model would predict; the operator interprets and acts. At the other end, OR drives a decision — the system auto-adjusts the schedule, reallocates staff, or reroutes patients without human intervention. This is active. Between these poles lies the territory where most healthcare products should operate: OR-derived intelligence presented with enough context that operators make better decisions, but the system does not act autonomously on their behalf.

The question for product owners is not “how much OR can we embed?” but “how much OR should we embed, given the trust environment, integration complexity, and failure consequences of this domain?” The answer, for healthcare in 2026, is less than technologists want and more than operators expect. This page ranks the three highest-value OR capabilities for healthcare products, explains why they should be built in a specific order, and identifies the failure modes that accompany each level of ambition.

The Spectrum: Passive Display to Active Control

Consider five points along the spectrum:

Level 1 — Raw metric display. The product shows utilization percentage, average wait time, throughput count. No OR model is involved. The numbers come from operational data; interpretation is entirely the operator’s job. This is where most healthcare dashboards sit today.

Level 2 — OR-informed metrics. The product displays utilization on the delay curve — not just “utilization is 87%” but “utilization is 87%, which at your measured variability corresponds to the steep part of the curve.” The Kingman approximation (Module 2) is running behind the display, but the operator still decides what to do about it. This is passive OR: the model informs the presentation, not the action.

Level 3 — Threshold alerting. The product fires an alert when a queueing-model-derived threshold is crossed. Not “utilization is above 85%” (an arbitrary number) but “utilization has entered the zone where the VUT equation predicts wait-time acceleration given your current variability profile.” The OR model determines when to alert; the operator determines what to do. This is the first level of active OR — the system initiates attention.

Level 4 — Scenario recommendation. The product runs what-if analyses and presents the results with reasoning. “If you lose one provider at Site B, wait times increase from 12 days to 31 days. Adding 0.5 FTE of telehealth coverage holds wait times to 18 days.” The OR model generates alternatives; the operator selects. This is analytical OR — the system extends human reasoning capacity.

Level 5 — Autonomous optimization. The product adjusts the schedule, reassigns patients, or modifies overbooking rules without human approval. The OR model acts. This is the level that generates the most excitement and the most risk, and for most healthcare contexts, it is premature.

Healthcare products should aim for Level 3 as the baseline, Level 4 as the standard for staffing and planning tools, and Level 5 only for narrowly scoped, reversible decisions where the model’s assumptions are well-validated and the consequences of error are bounded.

Capability 1: Threshold Alerting from Queueing Models

Value: High. Complexity: Low. Integration requirement: Minimal.

This is the single highest-value OR capability a healthcare product can deliver, and it is the least technically demanding. The concept: use Erlang-C and the Kingman approximation to calculate utilization and wait-time thresholds that are operationally meaningful — not round numbers chosen by committee — and fire alerts when the system crosses them.

Most alerting in healthcare operations is based on static thresholds set by administrative convention. “Alert when ED occupancy exceeds 90%.” “Flag when wait time exceeds 30 minutes.” These numbers are not wrong, but they are context-blind. A 90% occupancy threshold makes sense for an ED with low arrival variability and fast turnover. For an ED with high variability and complex case mix, the delay curve’s knee may sit at 78%. The static threshold fires after the system is already in trouble.

OR-derived thresholds are dynamic. They account for the system’s measured variability profile (coefficient of variation of arrivals and service times), the number of servers (providers, beds, treatment spaces), and the current service rate. The Kingman approximation (Module 2, utilization-delay curve) provides the formula:

W_q ≈ ((c_a² + c_s²) / 2) × (ρ / (1 - ρ)) × (1/μ)

A product can calibrate this to a specific operational context. Measure c_a and c_s from historical data. Set the alert threshold at the utilization level where predicted wait time exceeds the organization’s access target. For a primary care practice with a 48-hour access target, the threshold utilization is different from an ED with a 30-minute target — and both differ from a behavioral health intake with a 14-day target.

Healthcare example. A regional health system operates three urgent care sites. Site A has low arrival variability (appointments plus predictable walk-in volume) with c_a of 0.6. Site C has high arrival variability (mostly walk-in, near a transit hub) with c_a of 1.3. Using the Kingman approximation, the product calculates that Site A can tolerate 88% utilization before predicted wait time exceeds 25 minutes, while Site C hits the same wait-time threshold at 72% utilization. A static “alert at 85%” rule would under-alert for Site A and over-alert for Site C. The OR-derived threshold is calibrated to each site’s actual dynamics.

Why this comes first. Three reasons. First, the analytical machinery is simple — calibrated formulas from well-understood queueing models, not optimization solvers or simulation engines. Second, it requires no integration with scheduling or workflow systems; it reads utilization and volume data that most EHRs and practice management systems already generate. Third, it demands no change in operator behavior — the operator still makes decisions; the alert just makes invisible dynamics visible before they produce a crisis. Parasuraman and Riley’s framework for automation levels (Parasuraman, Sheridan, and Wickens, 2000) classifies this as “information acquisition and analysis” — the lowest automation level that still extends human perception meaningfully.

Warning signs of misapplication. Alert fatigue from thresholds set too aggressively. If the system alerts every time utilization crosses the threshold — and it crosses frequently because the threshold is at the operational norm — operators will ignore it. The threshold must be set at the level where action is both warranted and feasible, not at the theoretical onset of degradation. Calibrate to action capacity, not just model output.

Capability 2: Scenario Testing

Value: High. Complexity: Medium. Integration requirement: Moderate (data access, not workflow integration).

Scenario testing answers what-if questions using queueing models, Monte Carlo simulation (Module 6), or simplified discrete-event simulation. It extends an operator’s reasoning capacity from “I think losing a provider would be bad” to “losing one provider at Site B increases mean wait from 12 days to 31 days, and 23% of patients would exceed the 30-day access threshold.”

The analytical methods range in sophistication:

Queueing model scenarios: Adjust arrival rate, service rate, or server count in an Erlang-C or M/M/c model and display the resulting wait-time and throughput changes. Fast, transparent, and sufficient for many staffing questions.
Monte Carlo scenarios: Sample from empirical distributions of demand, service time, no-show rates, and staffing levels to produce probability distributions of outcomes. More realistic than closed-form models, particularly when distributions are non-standard. Module 6 covers the methodology.
Simplified DES: Build a lightweight simulation of patient flow through a specific pathway (e.g., ED arrival to disposition) and run intervention scenarios. More computationally intensive but captures interactions between queues that closed-form models cannot.

Healthcare example. A federally qualified health center (FQHC) is preparing its annual budget and considering whether to fund a 0.5 FTE behavioral health provider. The product runs three scenarios against the current queueing model:

Baseline: 4 BH providers, 30% no-show rate, 14-day mean wait for intake. Current utilization: 91%.
Scenario A (lose 0.5 FTE to turnover): Utilization rises to 97%. Predicted mean wait: 42 days. Abandonment model predicts 35% of referred patients will never schedule.
Scenario B (add 0.5 FTE): Utilization drops to 82%. Predicted mean wait: 6 days. Abandonment drops to 12%.

The scenario does not make the budget decision. It quantifies the operational consequences of each option, converting a political negotiation (“do we value behavioral health enough?”) into an operational trade-off with measurable parameters.

Why this comes second. Scenario testing requires more analytical infrastructure than threshold alerting — the product must maintain calibrated models that can be parameterized with user-specified inputs. It also requires operators who are analytically literate enough to interpret probabilistic results and understand that scenarios are predictions, not guarantees. But it does not require integration with scheduling or workflow systems. It reads operational data and produces analysis. The operator acts on the analysis through existing channels. This is Parasuraman’s “decision recommendation” level — the system suggests alternatives; the human selects.

Implementation note: The explainability requirement is critical at this level. A scenario result that says “wait time increases to 31 days” without explaining why — without showing that the Erlang-C model predicts queue buildup because utilization crosses 95% with a c_s of 1.1 — is a black box. Operators will not trust it, and they should not. Every scenario output must trace to its assumptions and model parameters. The product must show its reasoning, not just its answer.

Warning signs of misapplication. Precision theater — presenting scenario results to two decimal places when the input data has 20% uncertainty. Over-reliance on a single scenario rather than a range. Failure to stress-test the model’s own assumptions (Module 6, scenario stress testing).

Capability 3: Scheduling Optimization

Value: High. Complexity: High. Integration requirement: High (bidirectional integration with scheduling systems, real-time data feeds).

Scheduling optimization uses OR models to directly improve appointment templates, staff rosters, overbooking rules, or resource allocation. It is the capability with the greatest direct operational impact — and the greatest implementation difficulty.

Three forms, ordered by increasing complexity:

Optimized overbooking rules. Use predictive no-show models (Module 5, no-shows) to set slot-level overbooking. Instead of blanket 20% overbooking, the product calculates that Monday 8 AM intake slots should double-book (predicted no-show: 45%) while Wednesday 2 PM follow-ups should not (predicted no-show: 8%). The Bailey-Welch rule provides the structural logic; patient-level predictive models provide the calibration.

Demand-matched shift templates. Analyze historical arrival patterns by hour, day-of-week, and season. Optimize staff schedules to match demand profiles, subject to labor law constraints, fatigue limits (Human Factors Module 2), and skill mix requirements (Workforce Module 3). This is a constrained optimization problem (Module 3) where the objective is minimizing the gap between staffing levels and demand while respecting hard constraints on hours, rest periods, and role coverage.

Real-time resource allocation. Reassign patients to providers, rooms, or treatment spaces based on current system state. This is the most sophisticated form — it requires real-time data feeds, fast optimization solvers, and tight integration with operational workflow. For most healthcare settings in 2026, this is aspirational.

Healthcare example. A multi-site primary care network implements optimized appointment templates. Using 18 months of historical data — arrival patterns, visit durations by type, no-show rates by patient risk tier — the product generates templates that vary by day-of-week. Monday templates front-load complex visits (when providers are fresh and no-show risk is highest for simple follow-ups). Friday templates shift toward shorter visits and telehealth slots (matching measured demand patterns). The templates are generated by a mixed-integer programming model that maximizes expected throughput subject to provider preference constraints, visit-type minimums, and a maximum daily cognitive load proxy derived from visit complexity weights.

Result: third-next-available appointment decreases from 11 days to 6 days. Provider overtime drops 22%. Patient wait-in-clinic time decreases 15%. The model did what no human scheduler could — it simultaneously optimized across multiple constraints using data that no individual could hold in working memory.

Why this comes third. Not because the value is lower — it may be the highest. Because the prerequisites are steep. Scheduling optimization requires: (1) reliable historical data on arrivals, service times, and no-shows, which threshold alerting and scenario testing force the organization to collect and validate; (2) bidirectional integration with EHR or practice management scheduling systems, which is technically complex and vendor-dependent; (3) operator trust in model-generated recommendations, which must be built through successful experience with alerting and scenario testing first. You cannot start here. You earn the right to optimize by proving the models work at lower-stakes levels.

The trust calibration problem. Parasuraman and Riley (1997) identified two failure modes in human-automation interaction: automation disuse (the operator does not trust the system and ignores its recommendations) and automation misuse (the operator trusts the system too much and stops monitoring for errors). Both are present in OR-embedded scheduling.

Disuse is the more common initial failure. An operator who has been building schedules for 15 years will not cede that judgment to an algorithm without evidence that the algorithm understands the nuances — the provider who cannot see patients before 9 AM, the patient population that clusters on Mondays, the room that is unavailable every other Wednesday. If the optimization model does not account for these constraints — or cannot explain that it has — the operator will override it, and eventually stop consulting it.

Misuse emerges later, after the system has been accurate long enough that operators stop questioning it. This is Parasuraman, Sheridan, and Wickens’ automation bias — the tendency to accept automated recommendations without independent verification. In scheduling, this means the operator stops noticing when the model’s assumptions have drifted — when the no-show rate has shifted due to a new patient population, or when a provider’s service time has increased due to a new documentation requirement. The model optimizes for yesterday’s parameters while today’s reality has changed.

The product design response to both failure modes is the same: explainability. Every optimization recommendation must come with its reasoning. Not “schedule 2 patients at 8 AM” but “schedule 2 patients at 8 AM because the predicted no-show rate for this slot is 38% based on patient history and day-of-week patterns, and an empty first slot costs approximately $180 in idle provider time.” The operator who sees the reasoning can verify whether the assumptions match their current reality. The operator who sees only the answer cannot.

What NOT to Automate

Three categories of decisions should remain human even when the OR model could technically generate an answer:

Decisions with significant human judgment components. Which patients get the last available appointment slot when two are clinically urgent. How to communicate a schedule change to a provider who is already stressed. Whether to keep a site open during a weather event. These decisions involve context, relationships, and judgment that the model cannot access.

Politically sensitive allocation. How to distribute scarce specialty access across sites when each site’s leadership advocates for their patients. How to allocate grant-funded positions across programs. These are optimization problems with well-defined objectives, but the objective function itself is contested. Automating the optimization without resolving the political question of what to optimize for produces answers that are mathematically correct and organizationally unacceptable.

Situations where the model’s assumptions are fragile. A scheduling optimization model calibrated on 18 months of pre-pandemic data should not auto-adjust schedules during a demand surge that violates every distributional assumption in the model. When the world changes faster than the model can recalibrate, human judgment must override. The product should surface model confidence indicators — “this recommendation is based on data that matches current conditions” vs. “current arrivals are 2.3 standard deviations above the training distribution” — so operators know when to trust and when to question.

Phased Implementation: An 18-Month Product Roadmap

The ordering of the three capabilities is not just a theoretical ranking. It is a build sequence. Each phase produces the data infrastructure, operator trust, and analytical validation that the next phase requires.

Phase 1: Threshold Alerting (Months 1-6)

Build:

Ingest utilization, volume, and wait-time data from EHR/practice management system (read-only integration)
Calculate coefficients of variation for arrival and service time distributions from historical data
Implement Erlang-C and Kingman approximation to compute site-specific and service-line-specific utilization thresholds
Build alert engine: notify operators when utilization enters a zone where the model predicts wait-time acceleration
Display utilization on the delay curve — visual representation of current position relative to the knee

Operator outcome: Operations leaders see, for the first time, which sites and services are operating past the knee of the utilization-delay curve. Alerts fire before wait-time crises become visible in patient complaints or access reports.

What this phase proves: The queueing models, when calibrated to local data, predict wait-time behavior accurately. This is the validation step that earns credibility for more ambitious capabilities.

Organizational prerequisite: Data quality. This phase will expose gaps in utilization tracking, inconsistent timestamp capture, and missing volume data. Fixing these gaps is the real Phase 1 deliverable — the alerting is almost a side effect.

Phase 2: Scenario Testing (Months 7-12)

Build:

Parameterizable queueing models: operators can adjust staffing levels, demand assumptions, no-show rates, and service mix
Monte Carlo engine for probabilistic scenario analysis: “What is the probability that wait times exceed 14 days if we lose one provider?” with confidence intervals
Scenario library: pre-built scenarios for common questions (provider vacancy, demand surge, site closure, service line expansion)
Comparison view: side-by-side display of baseline vs. scenario with explicit assumptions listed

Operator outcome: Staffing decisions, budget requests, and service-line changes are informed by quantitative scenario analysis instead of intuition and precedent.

What this phase proves: Operators will use model-based analysis when the interface is accessible and the results are explainable. This validates the trust relationship needed for Phase 3.

Organizational prerequisite: Analytical literacy. Not OR expertise — but the ability to interpret probabilistic results and understand that “31-day predicted wait with 80% confidence interval of 22-45 days” is more honest and more useful than “about a month.”

Phase 3: Scheduling Optimization (Months 13-18)

Build:

Predictive no-show model using patient history, visit type, scheduling lead time, and day-of-week
Optimized overbooking rules by slot, updated weekly as the model recalibrates
Demand-matched appointment templates generated by constrained optimization
Provider preference and constraint engine (availability windows, visit-type preferences, maximum daily complexity load)
Override tracking: when operators override the model, capture the reason and the outcome to improve future recommendations

Operator outcome: Schedules are designed by optimization model, reviewed and approved by human schedulers, and continuously improved through feedback. Third-next-available decreases. Provider overtime decreases. Overbooking consequences decrease.

What this phase proves: OR can drive operational decisions in healthcare when the foundation of data quality, model validation, and operator trust has been built through Phases 1 and 2.

Organizational prerequisite: Bidirectional system integration and change management. The scheduling system must accept model-generated templates. Schedulers must be trained on the new workflow. Leadership must commit to a feedback cycle where overrides are analyzed, not just permitted.

Integration Hooks

Human Factors Module 6 (Product Design). Every OR capability described here is also a human factors problem. Threshold alerts must follow progressive disclosure — the alert appears first; the queueing model explanation is available on demand, not forced on every operator. Scenario testing interfaces must manage cognitive load — presenting three well-chosen scenarios is more useful than offering infinite parameterization. Scheduling optimization must calibrate automation level to trust — start with recommendations that require explicit approval, and only move toward auto-adjustment after sustained accuracy. The automation bias literature (Parasuraman, Sheridan, and Wickens, 2000; Mosier and Skitka, 1996) predicts that operators will over-rely on optimization outputs once they have been accurate for a period. The product must be designed to keep operators engaged — showing reasoning, requiring periodic review, and flagging when model confidence drops.

Public Finance Module 8 (Grant Product Design). Grant program dashboards face the identical embedding question. Phase 1 for a grant management product: alert when burn rate diverges from milestone completion rate (the grant-program equivalent of utilization crossing the delay curve’s knee). Phase 2: scenario testing for budget reallocation — “What if we shift 15% of travel budget to personnel to cover a vacancy?” Phase 3: optimization of milestone sequencing and resource allocation across concurrent grants. The three-capability framework and the phased build sequence apply directly. The OR is the same; the domain objects change from patients and providers to milestones and budget line items.

Product Owner Lens

What is the operational problem? Healthcare products either display raw operational metrics (which operators cannot interpret through the lens of queueing theory) or attempt full optimization (which operators do not trust and systems cannot integrate). The middle ground — OR-informed intelligence that extends operator reasoning — is underbuilt.

What mechanism explains it? OR models (Erlang-C, Kingman, M/M/c, constrained optimization) provide precise, validated predictions of system behavior under varying conditions. But these predictions have value only if they are embedded in products at the right level of automation for the organization’s trust maturity and integration capacity.

What intervention levers exist? Three, in order: threshold alerting (makes invisible dynamics visible), scenario testing (extends reasoning to counterfactuals), scheduling optimization (directly improves operational decisions). Each requires the prior capability as foundation.

What should software surface? Phase 1: utilization displayed on the delay curve with OR-derived alert thresholds. Phase 2: parameterizable what-if scenarios with explicit assumptions and confidence intervals. Phase 3: model-generated scheduling recommendations with full reasoning and override tracking.

What metric reveals degradation earliest? The rate at which operators override model recommendations without documented reasoning. Rising override rates signal either model drift (the assumptions no longer match reality) or trust erosion (the operator has lost confidence). Either requires immediate investigation — the first is a model problem, the second is a product design problem, and both will destroy the value of the OR capability if left unaddressed.

Summary

The question is not whether to embed OR in healthcare products. It is how deeply, in what order, and with what safeguards. The answer is a phased approach that matches analytical ambition to organizational readiness: threshold alerting first, because it is high-value and low-risk; scenario testing second, because it extends decisions without requiring system integration; scheduling optimization third, because it requires the data quality, model validation, and operator trust that the first two phases build.

The trust calibration problem is real and persistent. Parasuraman and Riley’s insight — that automation can fail through both disuse and misuse — applies with particular force in healthcare, where the consequences of both under-reaction and over-reaction to model recommendations affect patient access and safety. The product design discipline is not just building the model. It is building the interface between the model and the human — an interface that makes reasoning visible, keeps operators engaged, and degrades gracefully when the model’s assumptions no longer hold.

Start with alerts. Build toward optimization. Show your work at every level.