Systems as Models

Every operations research analysis begins with a model. Not a dashboard, not a dataset, not a process map pinned to a conference room wall — a model. A disciplined simplification of a real system that preserves the dynamics you need to reason about and deliberately discards the rest. The quality of every downstream decision — staffing, scheduling, capacity planning, resource allocation — depends on whether the model captures the right structure at the right level of abstraction. Get the model wrong and optimization is just expensive precision applied to the wrong problem.

What a Model Is (and Is Not)

A model is a formal representation of a system’s structure and behavior, built for a specific analytical purpose. That last clause matters. A model is not a replica. It is not a complete description. It is a purpose-built instrument, and its value is judged entirely by whether it supports the decisions it was built to inform.

George Box’s canonical formulation — “all models are wrong, but some are useful” (Box and Draper, Empirical Model-Building and Response Surfaces, 1987) — is widely quoted and widely misunderstood. It is not a license for sloppiness. It is a precise epistemological claim: because a model is an abstraction, it necessarily omits detail. The discipline lies in choosing which detail to omit. A model that includes everything is not a model — it is a copy of the system, equally opaque and equally expensive to operate. A model that omits the wrong structure is not useful — it produces answers that feel rigorous but mislead.

The operational definition: a model is a set of explicit assumptions about what entities exist in a system, how they interact, what drives their behavior, and what constraints bound their performance. Every assumption is a choice. Every omission is a choice. The modeler’s job is to make those choices deliberately, document them transparently, and test whether the resulting structure reproduces the behaviors that matter for the decision at hand.

This distinguishes OR modeling from two adjacent activities it is often confused with:

Process mapping documents what happens in sequence. It does not capture why the system behaves the way it does under load, variability, or constraint shifts. A process map of an emergency department shows triage, registration, assessment, treatment, and disposition. It does not show why a 5% increase in daily arrivals can produce a 40% increase in wait times. That requires a model with queueing structure.
Data analysis describes what has happened. It can identify patterns, trends, and correlations in historical performance. It cannot answer counterfactual questions — what would happen if we added a second provider on Tuesday afternoons, or if no-show rates dropped from 18% to 12%. That requires a model with causal structure.

The Anatomy of a System Model

Every OR model, regardless of specific technique, shares a common anatomy. Understanding this anatomy is what allows an operator to evaluate whether a model is fit for purpose — or whether it has been built to answer the wrong question.

Entities are the things that flow through or populate the system. In healthcare operations, the primary entities are patients, but also referrals, lab orders, prior authorization requests, claims, and information packets. Each entity has attributes — acuity level, insurance type, referring provider, time of arrival — that determine how the system processes it.

Resources are the constrained capacities that entities compete for. Providers, exam rooms, infusion chairs, OR suites, inpatient beds, care coordinators. The defining characteristic of a resource in OR terms is that it has finite capacity and that demand for it can exceed supply, creating queues. A model that does not identify which resources are capacity-constrained has not yet found the problem.

Processes are the rules that govern how entities move through the system, consume resources, and exit. These include arrival patterns (when do patients show up, and with what variability?), service rules (how long does each step take, and what drives the variation?), routing logic (what determines whether a patient goes to imaging or directly to treatment?), and priority rules (who gets seen first when the queue is full?).

State variables track the system’s condition at any point in time — current queue length, number of occupied beds, providers currently available, patients currently in treatment. These are not historical metrics. They are the real-time description of where the system stands, and they determine what happens next.

Performance measures are the outputs you care about — wait time, throughput, utilization, length of stay, abandonment rate, cost per encounter. A model without explicit performance measures is not yet a model. It is a description waiting for a question.

How Modeling Goes Wrong

The failure modes of system modeling in healthcare are predictable, and most of them trace to the same root cause: the model’s abstraction choices do not match the decision it is supposed to inform.

Wrong boundary. The model includes too little or too much of the system. A common example: modeling an ED in isolation when the actual constraint is inpatient bed availability. The ED model will show adequate throughput capacity, but patients board in the ED because there are no beds to admit them to. The model is valid within its boundary and useless for the actual problem. Churchman and Ackoff, in their foundational Introduction to Operations Research (1957), identified system boundary definition as the first and most consequential modeling decision — a point that remains true and routinely violated.

Wrong level of detail. Adding granularity does not automatically improve a model. A scheduling model that tracks individual patient acuity scores across 47 categories may be less useful than one that groups patients into three acuity tiers, if the staffing decision it informs operates at the tier level. Robinson’s conceptual modeling framework (2008) formalizes this as the “simplest model that achieves the modeling objectives” — not the most detailed model the data can support.

Static treatment of dynamic behavior. Many healthcare capacity plans use average daily census to determine bed need, or average appointment duration to set schedule templates. These static calculations systematically underestimate required capacity because they ignore variability. A system that averages 80% utilization does not operate at 80% utilization — it oscillates between 60% and 100%, and the peaks are where patients wait, staff burn out, and errors concentrate. Any useful model of a healthcare system must represent variability, not just central tendency.

Missing feedback. Sterman’s work on system dynamics (Business Dynamics, 2000) demonstrates that feedback loops and delays are the structures that produce counterintuitive system behavior. In healthcare, feedback is everywhere: long wait times cause patients to leave without being seen, which reduces measured demand, which reduces the urgency for staffing increases, which maintains long wait times. A model that treats demand as exogenous — arriving independently of system performance — misses these dynamics entirely. This is particularly dangerous in behavioral health, where patients who cannot access services do not simply wait longer; they disappear from the system and present later in crisis at higher acuity and cost.

Confusing the model with the system. Once a model is built and calibrated, there is a persistent temptation to treat its outputs as facts about the system rather than as implications of the model’s assumptions. Every model output carries the qualifier “if our assumptions hold.” When a simulation says adding a provider will reduce wait times by 20%, the correct interpretation is: given our assumptions about arrival rates, service times, routing logic, and staffing, the model predicts a 20% reduction. If any assumption is materially wrong, the prediction is wrong. This is why Law and Kelton’s verification and validation framework (Simulation Modeling and Analysis, multiple editions since 1982) distinguishes verification (does the model do what we intended?) from validation (does the model’s behavior match the real system’s behavior for our purpose?). Both are required. Neither is optional.

A Healthcare Example: Modeling Behavioral Health Intake at a Rural FQHC

Consider a Federally Qualified Health Center in eastern Washington serving a three-county rural catchment area. The behavioral health program has two licensed clinical social workers (LCSWs) and one psychiatric nurse practitioner. The intake process — from initial screening call to completed assessment — takes an average of 14 days but shows high variability (standard deviation of 9 days). The no-show rate for intake appointments is 28%. The program receives approximately 15 new referrals per week but is losing 4-5 prospective patients per week who never complete intake — they either cancel, no-show repeatedly, or never call back after being told the next available appointment is three or more weeks out.

The program director’s instinct is to request funding for a third LCSW. A grant application is being prepared. But is a third clinician the right intervention?

Building a model forces the question into structure. The system has:

Entities: Patients seeking behavioral health services, arriving at ~15/week with weekly variability (coefficient of variation approximately 0.6)
Resources: 2 LCSWs (each available 32 clinical hours/week), 1 psychiatric NP (16 hours/week for intake-related work), 2 assessment rooms, 1 telehealth line
Process: Phone screen (15 min, done by front desk) -> scheduling (delay: average 12 days to next available slot) -> intake assessment (90 min with LCSW) -> treatment plan (30 min, may require NP co-signature within 5 days)
State variables: Current intake queue length, days-to-next-available for each provider, number of patients in intake-but-not-yet-assessed status
Performance measures: Days from referral to completed assessment, intake completion rate, abandonment rate (patients who exit before completing intake)

With this structure explicit, several things become visible that narrative reasoning obscures:

First, the scheduling delay (12 days average) is not primarily a capacity problem — it is a variability and batching problem. The LCSWs batch their intake slots into two half-days per week each. If a patient calls on Wednesday and the next intake half-day is the following Tuesday, the minimum scheduling delay is 6 days regardless of capacity. Redistributing intake slots across more days of the week — even without adding staff — could reduce the average scheduling delay.

Second, the 28% no-show rate compounds the scheduling problem. Each no-show wastes a 90-minute slot that cannot be recovered, reducing effective capacity. But the no-show rate is itself partly caused by the long scheduling delay — patients referred in acute distress who are told to wait three weeks are more likely to not show up. This is a feedback loop: long waits drive no-shows, no-shows reduce effective capacity, reduced capacity drives longer waits.

Third, the NP co-signature step introduces a hidden delay. If the NP is only on-site two days per week and a treatment plan arrives on Wednesday, it may sit unsigned until the following Monday. This does not appear in any wait-time metric because the patient is technically “in treatment planning,” but the delay extends the total intake cycle and may delay medication starts.

A model of this system — even a simple spreadsheet-based queueing model — reveals that the binding constraint is not raw clinician hours but the interaction between scheduling batching, no-show feedback, and NP availability synchronization. A third LCSW would add capacity, but if scheduling structure and no-show dynamics remain unchanged, much of that capacity will be absorbed by the same inefficiencies. The model suggests a sequenced intervention: first, redistribute intake slots across the week and implement same-day or next-day intake for acute referrals; second, address no-shows with appointment reminders and motivational pre-engagement calls; third, synchronize NP availability with LCSW assessment days. Only after those structural changes is the incremental capacity of a third LCSW fully realized.

This is what a model does that intuition does not: it separates the structural problem from the capacity problem and identifies the sequence in which interventions should be deployed.

Product Design Implications

Software that supports healthcare operations should embed the logic of system modeling, not just report historical metrics.

What to surface: The system state variables that drive future performance — current queue depth, days-to-next-available by provider type, utilization rate by resource, and the ratio of scheduled-to-completed appointments (the inverse of the abandonment signal). These are leading indicators. By the time average wait time degrades in a monthly report, the system has been failing for weeks.

What to compute: Counterfactual scenarios. If a product can answer “what happens to intake wait times if we add two intake slots on Thursdays?” using embedded queueing logic, it shifts decision-making from retrospective analysis to prospective planning. This does not require a full simulation engine — even Kingman’s approximation (the G/G/1 formula relating utilization, variability, and wait time) implemented in a scheduling tool provides decision support that most clinics currently lack.

What to alert on: Threshold crossings that indicate the system is entering a degraded regime. For the FQHC example: when the intake queue exceeds a length where the expected wait time (via Little’s Law: wait = queue length / throughput rate) crosses the threshold at which no-show rates historically spike, the system should flag it. This is not a simple metric alert — it is a model-informed alert that connects current state to predicted future behavior.

The Metric That Reveals Degradation Earliest

In most healthcare operations, the ratio of demand arrival rate to effective service rate — not utilization, not average wait time, not throughput — is the earliest degradation signal. When this ratio approaches or exceeds 1.0, the system is at or beyond capacity and queues will grow without bound in the long run. But the ratio must be computed using effective service rate, which accounts for no-shows, cancellations, rework, and scheduling inefficiencies that reduce the nominal capacity to its actual throughput. A system that looks 75% utilized on paper may be functionally saturated because 25% of its nominal capacity is consumed by waste that does not appear in utilization calculations.

Tracking this ratio weekly, decomposed by resource type, gives operators a leading indicator that precedes visible queue growth by days to weeks — enough time to intervene before patients experience the failure.

Integration Hooks

Human Factors. A model of a healthcare system that omits human operator behavior is incomplete in a specific and dangerous way. Clinicians are not machines with fixed service rates — their throughput, error rates, and decision quality degrade under sustained high utilization (see Human Factors Module 2 on fatigue and decision degradation). A model that treats provider service time as a fixed distribution will overestimate system capacity under stress, precisely when accurate capacity estimates matter most. The practical requirement: models used for staffing or scheduling decisions must include utilization-dependent service time functions, or at minimum, must flag when modeled utilization exceeds the threshold (typically 80-85% sustained) where human performance degradation becomes material.

Public Finance. Grant program logic models — the theory-of-change documents required by HRSA, SAMHSA, and other federal funders — are a form of system modeling, though they are rarely treated as such. A logic model that says “hire 2 BH providers -> increase access -> improve outcomes” is an implicit model with implicit assumptions about capacity, demand, and the relationship between staffing and access. Making these assumptions explicit — formalizing the logic model as an operational model with arrival rates, service times, and capacity constraints — converts a narrative grant application into a testable operational plan. This is the bridge between OR modeling and grant program execution described in Public Finance Module 4.

Warning Signs of Misapplication

The model is more complex than the decision it informs. If a scheduling decision requires a 50,000-line simulation when a queueing formula would suffice, the modeling effort is misallocated. Start with the simplest model that answers the question. Escalate complexity only when the simple model demonstrably fails.
The model has never been validated against observed system behavior. A model that has not been compared to reality is a hypothesis, not a tool. At minimum, a model should reproduce known historical patterns (average wait times, throughput rates, utilization levels) before being used to predict the effects of changes.
Stakeholders cannot state the model’s key assumptions. If the people using the model’s outputs cannot name its three most important assumptions, the model is a black box. Black-box models erode trust and produce decisions that no one can defend or revisit when conditions change.
The model is treated as permanent. Systems change. Demand patterns shift, staffing turns over, policies evolve. A model calibrated to 2024 data may mislead in 2026. Models require periodic recalibration — not rebuilding, but checking whether key parameters still reflect the current system.
Sensitivity analysis is skipped. Every model has parameters that are uncertain. If no one has asked “what happens to our conclusions if this parameter is 20% higher or lower than we assumed?”, the model’s outputs carry unknown risk. Sensitivity analysis is not optional rigor — it is the mechanism that distinguishes a robust recommendation from a fragile one.