AI-Driven Shift Planning for Emergency Services

This study explores the use of Large Language Models (LLMs), heuristic algorithms, and communication platforms like WhatsApp to automate shift planning for volunteer firefighters, highlighting the technical and operational challenges in solving NP-difficult rostering problems.

Shift planning, or rostering, is a deceptively complex operational challenge. Though commonly regarded as an administrative task, it represents a class of NP-difficult problems, particularly in domains such as healthcare, industrial manufacturing, and emergency services. Here, we address this challenge by exploring the integration of artificial intelligence – specifically Large Language Models (LLMs) – with algorithmic optimization and user-centered interfaces to streamline rostering in a high-stakes, volunteer-based context.

The study focuses on the shift planning operations of a fire and emergency response organization in Portugal, where shifts were still being organized manually using WhatsApp messages and paper forms. Our goal is not merely to digitize this process, but to demonstrate that combining natural language interfaces with optimization algorithms can substantially reduce planning time, minimize errors, and improve fairness and coverage – all while maintaining adaptability for a highly variable workforce.

The shift planning problem is a known computational challenge, particularly when constraints around roles, availability, and workload balance are introduced. Traditional approaches to rostering, including cyclic and non-cyclic models, have long been studied in operations research. Cyclic rostering, suitable for stable environments, can be resolved in polynomial time. In contrast, non-cyclic rostering – common in emergency services – demands flexibility and grows exponentially in complexity as constraints increase.

Current solutions often fall into two categories: exact algorithms and heuristic-based systems. Exact approaches, such as Constraint Programming (CP), offer guaranteed optimality but are computationally infeasible for large problem spaces. Heuristic and metaheuristic methods – like Tabu Search or Simulated Annealing – trade precision for scalability, delivering "good enough" solutions quickly.

Hybrid approaches like Constraint Bound Rostering (CBR) seek to balance both worlds by differentiating between hard (rigid) and soft (flexible) constraints. Despite these advances, scaling remains a challenge, particularly when last-minute changes or limited availability disrupt initial plans.

Recent developments in artificial intelligence offer a promising new angle. Large Language Models have demonstrated early-stage potential in interpreting loosely structured input, making them suitable for communication-driven tasks. Their applicability to combinatorial optimization is still being evaluated, but their use in parsing natural language requests and converting them into structured data formats introduces significant UX advantages in rostering workflows.

Our study explores how these models can be practically integrated with existing algorithms and communication platforms to improve usability without compromising solution quality.

We employed GPT-4o as our primary LLM, leveraging its capabilities in few-shot prompting and natural language understanding. The model parses informal, unstructured user messages (e.g., "I'm available next weekend during the day") and translates them into structured availability formats. Despite its linguistic prowess, we observe critical limitations in consistency and context preservation, particularly as problem complexity increases. These constraints lead us to position the LLM as an auxiliary interface rather than a decision engine.

Given the NP-difficulty of the rostering problem, we developed a custom heuristic scheduling algorithm that prioritizes compliance with rigid constraints (e.g., skill requirements, team sizes, and rest periods). The algorithm calculates a weighted score for each firefighter-turn pairing using four factors: availability coverage, role seniority, skill matching, and rest interval compliance. This method allows us to generate near-optimal solutions quickly and consistently, even in the presence of sparse or skewed availability data.

We implemented custom data encoding methods to reduce prompt size and maintain model coherence. Availability is modeled using irregular 2D matrices, while firefighter skills and roles are encoded in compact binary formats. We also employ modular decomposition techniques to isolate computationally intensive sub-problems and resolve them incrementally.

Study Details

The goal of this study is to replace a labor-intensive, error-prone scheduling process used by the Bombeiros Voluntários da Figueira da Foz with an automated, AI-supported system. At its core, the challenge lies in addressing a real-world instance of the rostering problem under high constraint density and low predictability – characteristics that typify NP-difficult problems.

We established three primary objectives for the study: enable intuitive submission of availabilities via natural language, generate optimized shift allocations that comply with strict operational rules, and automate the end-to-end scheduling workflow with minimal human intervention. To meet these goals, we design and validate a composite system that blends language processing, constraint reasoning, and system orchestration.

Our methodology consists of two parallel development streams: one focusing on user interaction via WhatsApp and LLMs, and another dedicated to shift allocation using heuristics.

The first component enables firefighters to submit availability using informal text messages. These messages are received by a middleware layer that connects to the WhatsApp Business API. The message is then processed by a Large Language Model (GPT-4o), which extracts temporal references and converts them into structured availability matrices. To ensure robustness, we implement a pipeline that includes whitelisting, time-based access gating, and ambiguity detection. Messages like "Available first week of the month, daytime only" are correctly interpreted and mapped to the appropriate turn slots.

The second component deals with generating the monthly shift plan. Given the limited utility of LLMs in high-complexity optimization, we implement a heuristic scheduling engine. This engine traverses each turn, applying weighted selection criteria to determine the most suitable firefighter based on availability, skill requirements, rest intervals, and leadership roles. The engine uses a scoring function ω(t,b), which aggregates four dimensions: availability ratio, career level penalty, skill match bonus, and rest period compliance.

We distinguish between rigid constraints, which must always be met (e.g., one health technician per shift), and flexible constraints, which improve solution quality but may be relaxed when necessary (e.g., spacing of shifts). When a shift cannot be filled under current rules, the system iteratively reduces constraint sensitivity and re-attempts allocation, allowing graceful degradation rather than outright failure.

Automation plays a critical role in system orchestration. The scheduling cycle includes a pre-defined set of states – open, draft, and final – triggered by calendar-based events. The system sends availability requests 10 days before the end of the month, issues daily reminders to those who have not responded, generates a draft plan 2 days before month-end, and finalizes the roster on the last day. This eliminates human oversight at every stage except the final validation.

Findings

During the implementation phase, we tested the system across a range of synthetic scenarios with varying availability rates and workforce sizes. These simulations assess both the viability of the planning algorithm and the consistency of LLM output.

The LLM-based scheduling approach, despite optimizations in data structure and prompt engineering, proved insufficient. Even with compressed representations and minimal context overload, the model fails to maintain consistency when exposed to a high number of constraints. However, its performance in parsing and structuring availability messages remains strong. This asymmetry leads us to use LLMs exclusively for interface tasks and reserve scheduling logic for deterministic computation.

In contrast, the heuristic algorithm achieves complete or near-complete coverage of shifts in all tested scenarios. For a 36-shift schedule with 100 firefighters at 40% random availability, the system consistently produces full rosters. When availability drops to 10%, the system adapts by distributing the load across fewer members, with reduced team sizes but preserved rule compliance. The average shift generation time remains below 0.1 seconds on development-grade hardware, with end-to-end processing under one minute – even with datasets of up to 1,000 firefighters.

This performance contrasts sharply with the previous manual workflow, which consumed up to six hours and frequently required rework. Post-deployment, users report a shift in engagement: the firefighter’s role is reduced to sending two brief WhatsApp messages in the best case, while the system handles all downstream complexity.

This study provides a concrete implementation of AI-enhanced scheduling in a domain where traditional methods are no longer sustainable. It validates a hybrid architecture where language models support human interaction, while deterministic algorithms ensure rule compliance and consistency. Most importantly, it reduces operational overhead in an environment where human attention is a scarce and valuable resource.