Optimizing First-Attempt Parcel Delivery Using Explainable Machine Learning

A study exploring how machine learning and explainable AI can improve first-attempt delivery rates in last-mile logistics, using real-world data

The efficiency of last-mile delivery has become a decisive factor for operational success in the logistics sector. In this study, we investigate the application of machine learning (ML) and explainable artificial intelligence (XAI) to forecast the success of parcel deliveries on the first attempt. Our objective is to identify the conditions that influence delivery outcomes and build predictive tools that can be used in real-time by route planners to minimize failed delivery attempts.

As part of our collaboration with a major player in the parcel logistics market, we analyze delivery data to develop predictive models tailored to their operational context. The ultimate goal is to enhance operational efficiency by improving the First Attempt Delivery Rate (FADR), a key performance indicator that directly affects cost, reputation, and customer satisfaction.

The global parcel delivery market has experienced steady growth, driven by e-commerce expansion and cross-border transactions. In this context, first-attempt delivery is a critical operational metric. Although industry benchmarks aim for FADR values above 90%, real-world rates frequently fall short – often ranging from 80% to 95%. The consequences of failed deliveries include additional operational costs, reduced customer satisfaction, and logistical inefficiencies.

Traditional optimization techniques have proved insufficient to tackle the complexity of last-mile delivery, especially considering the heterogeneity of influencing factors such as recipient behavior, route design, and urban infrastructure. Recent academic studies and industry reports highlight the emergence of ML-based solutions for delivery prediction. Among them, gradient boosting methods like XGBoost and LightGBM have demonstrated strong performance, especially when combined with structured feature engineering and optimization frameworks such as the Vehicle Routing Problem (VRP).

At the same time, there is growing concern over the interpretability of ML models, particularly when used in operational decision-making. XAI techniques, especially SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), have been introduced to provide transparency into the decision-making logic of models, making them more suitable for integration into business-critical applications.

Despite promising advances, we observe a gap in the literature when it comes to the application of XAI specifically in predicting first-attempt delivery success. Most existing studies focus on optimization or route planning but do not address the need for interpretable predictive tools for delivery reliability.

To build the predictive system, we leverage a combination of advanced ML models, scalable data engineering pipelines, and XAI frameworks. Our approach begins with access to real-world operational data provided by our parcel logistics partner. These data include detailed event logs related to parcel movements across millions of delivery instances. We process and engineer these datasets to extract structured features that capture sender and recipient identity, delivery timing, route information, and package characteristics.

For model development, we adopt a comparative approach using two families of ML models: ensemble decision trees and neural networks. Specifically, we implement:

  • XGBoost: Chosen for its high predictive accuracy, resistance to overfitting, and inherent model transparency. It serves as a strong baseline due to its effectiveness on tabular data and its compatibility with SHAP for post-hoc explainability.
  • Deep Neural Networks (DNNs): Selected for their ability to capture complex, non-linear patterns in high-dimensional data. These models are less interpretable by design but gain transparency through integration with SHAP.

We handle high-cardinality categorical variables (e.g., recipient and sender IDs) using target encoding, a method that preserves interpretability and avoids feature explosion typical in one-hot or binary encoding. For cyclical temporal features like time-of-day and day-of-week, we apply trigonometric encoding to maintain the continuity of circular time relationships.

For model explainability, we use SHAP values to measure the contribution of each feature to a model's prediction, both globally (across the dataset) and locally (for individual predictions). This integration of SHAP ensures that the models can be used as "glass-box" systems—transparent enough for route planners and operations managers to trust and act on their outputs.

The combination of interpretable feature engineering, scalable model training, and explainable outputs positions this system not just as a predictive engine but as a decision-support tool ready for operational deployment.

Study Details

Our study focuses on the development of predictive models to optimize first-attempt parcel delivery outcomes using real operational data. The collaboration with our logistics partner, a major player in the logistics market, gives us access to a large-scale, real-world logistics dataset, and allows us to investigate how machine learning can be used as a decision-support tool for route planning.

The primary goal of the study is to build predictive models capable of estimating the probability that a parcel will be successfully delivered on the first attempt. These predictions are intended to support route planners in decision-making, improving delivery success rates and operational efficiency.

To address this challenge, we first conduct a business analysis to understand key operational constraints and requirements. We define the First Attempt Delivery Rate (FADR) as the target metric and reframe the problem as a binary classification task using machine learning.

The dataset contains over 7 million delivery events from 2023. We preprocess the data using structured feature engineering, focusing on the following aspects:

  • Label generation: A custom “Failure” feature is created to identify whether a delivery failed on the first attempt, based on event codes indicating return-to-warehouse movements.
  • Feature selection and transformation: High-cardinality features (e.g., recipient names, route IDs) are encoded using target encoding. Temporal features (hour, day of week, etc.) are encoded trigonometrically to reflect cyclical patterns.
  • Data partitioning: The dataset is split into 75% training and 25% testing, using stratified sampling to maintain class distribution.

We explore two ML model families:

  1. XGBoost, for its performance and interpretability.
  2. Deep Neural Networks (DNNs), for their capacity to capture complex patterns.

For XGBoost, we perform hyperparameter tuning using the Hyperopt library to optimize metrics such as AUC-PR. For DNNs, we test various architectures using TensorFlow/Keras, adding dropout to prevent overfitting given the class imbalance in the dataset.

The trained models are evaluated using accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR. Importantly, we assess the confusion matrix with a focus on reducing false positives, as they have the most direct negative impact on service quality.

Once models are validated, we apply SHAP to both XGBoost and DNN to interpret their behavior and identify which features most influence predictions.

The models trained during this study demonstrate reliable predictive performance. The XGBoost model achieves an AUC-ROC of 86% and an AUC-PR of 73%, while the DNN model achieves 84% and 69%, respectively. These results show that both models can distinguish between successful and failed delivery attempts with reasonable confidence.

The confusion matrix for XGBoost shows an 84% accuracy rate, with false positives at 6%. This conservative behavior, prioritizing operational integrity over false alarms, is aligned with the practical needs, where unnecessarily flagging a successful delivery as failed may erode trust in the system.

Model interpretability, achieved through SHAP values, reveals that the recipient’s name is the most influential feature, followed by the delivery route ID. This finding is consistent across both XGBoost and DNN models, indicating a strong underlying pattern in the data. For example, certain recipients may frequently be unavailable during delivery hours, or certain routes may face recurrent logistical issues. These insights are actionable and allow route planners to proactively adjust delivery strategies.

A graphical user interface was developed to allow users to input new delivery scenarios and receive predictions in real time. The system provides both the prediction and its explanation, empowering route managers to make informed, data-driven decisions.

Technical and Business Relevance

From a technical standpoint, the study demonstrates that ML and XAI can be successfully integrated into delivery logistics systems. By combining data-driven predictions with explainability, we provide a foundation for intelligent routing tools that adapt to real-world complexities.

From a business perspective, improving the FADR can significantly reduce re-delivery costs and customer complaints. The ability to anticipate delivery failures and reorganize routes accordingly enhances customer satisfaction and reduces operational friction.

While the models provide a solid proof-of-concept, the study highlights areas for further development:

  • Data expansion: Incorporating external data such as weather, traffic conditions, and public holidays could improve prediction accuracy.
  • Model ensemble: Future versions may combine XGBoost and DNN into ensemble models to leverage the strengths of both.
  • Dynamic integration: Embedding predictions into real-time VRP solvers would close the loop between prediction and execution.

This study provides a practical and effective blueprint for applying machine learning and explainable AI to improve first-attempt delivery success in logistics operations. By demonstrating how predictive modeling can be grounded in interpretable, actionable outputs, we contribute to the growing body of applied AI in operational management. The approach and tools developed here can be extended to other logistics partners, offering a scalable framework for delivery optimization.