The persistent challenge of balancing operational expenses with complex architectural demands in cloud-native environments has reached a pivotal turning point with the recent introduction of AWS Lambda Durable Functions. For many years, the industry relied on orchestration layers that, while powerful, introduced significant overhead in both cost and cognitive load for developers. This research focuses on evaluating the tangible efficiency and architectural differences between this newly released code-first orchestration model and the traditional declarative approach of AWS Step Functions. By examining a common yet demanding use case—long-running ETL pipelines with human intervention—the study explores whether stateful serverless computing can finally move toward a model that is both economically sustainable and developer-friendly.
The research specifically addresses the “wait-time” bottleneck, a scenario where a process must stop and await external input before proceeding. Historically, these pauses were expensive or difficult to manage without constant polling or active compute usage. This study investigates the newly introduced checkpointing mechanism in Lambda, which allows a function to save its progress and shut down completely while waiting for a callback. The focus remains on determining if this architectural shift can replace the multi-service orchestration of traditional state machines, particularly when the workload is heavily centered on Lambda-based logic.
Furthermore, the investigation seeks to understand the trade-offs involved in moving from a visual, service-integrated orchestration model to one defined entirely by programming code. While cost is a primary driver, the research also considers operational factors such as deployment complexity, observability, and the learning curve associated with new Software Development Kits. By providing a side-by-side comparison, this article aims to equip engineering leaders with the data necessary to make informed decisions about their serverless strategy for the next several years of cloud development.
The Evolution of AWS Orchestration and Human-in-the-Loop Pipelines
Before the arrival of Lambda Durable Functions at re:Invent 2025, developers were often forced into a difficult choice when building complex, multi-step workflows. AWS Step Functions served as the gold standard for state management, providing a robust visual interface and native integrations with over 200 AWS services. However, the reliance on Amazon States Language—a JSON-based declarative language—frequently created friction for development teams more accustomed to standard programming languages like Python or TypeScript. Moreover, the pricing model for Step Functions, which charges per state transition, often led to unexpectedly high bills for high-volume applications that required granular logging and many intermediary steps.
The necessity for human-in-the-loop approvals in document processing and data validation has always been a particular point of architectural pain. In these systems, a workflow might extract and transform data in seconds, but then sit dormant for minutes, hours, or even days while a human reviewer verifies the output. Managing this “pause” state required either long-lived state machines or complex custom logic involving DynamoDB and manual polling mechanisms. The emergence of durable execution within the Lambda runtime itself represents a shift toward consolidating these patterns into a single, cohesive execution environment that mirrors the experience of writing local code while benefiting from the scalability of the cloud.
This research is particularly significant because it addresses the economic disparity between declarative state machines and code-first durable execution. As serverless architectures mature, organizations are looking for ways to scale their throughput without seeing a linear, or sometimes exponential, increase in orchestration fees. Understanding how Lambda Durable Functions manage state transitions internally—and why these transitions do not incur the same costs as Step Functions—is crucial for any enterprise aiming to optimize its cloud budget. This shift marks a move toward a more integrated serverless experience, where the “glue” between tasks is as efficient as the tasks themselves.
Research Methodology, Findings, and Implications
Methodology
To ensure a rigorous and fair comparison, the study employed a side-by-side architectural implementation using the AWS SAM (Serverless Application Model). Two distinct stacks were deployed: one utilizing Lambda Durable Functions and the other utilizing AWS Step Functions for identical ETL tasks. Both systems operated on the Python 3.14 runtime, leveraging the latest performance enhancements available in the modern cloud environment. The core workflow involved an automated trigger from an S3 bucket upload, followed by three distinct stages of data processing: extraction of raw CSV data, transformation and cleaning of the records, and loading the processed data back into a destination bucket. A shared API Gateway and DynamoDB backend were implemented to handle the human-in-the-loop approval phase, ensuring that the human interaction experience remained consistent across both tests.
The experiment was conducted by executing 1,000 unique document processing workflows through each system over a period of several days. To mimic real-world conditions, each workflow included a forced delay of 20 minutes to represent the average time taken for a human reviewer to interact with the system. Data was collected through a combination of AWS CloudWatch metrics, AWS X-Ray for tracing, and detailed AWS Billing reports. The metrics focused specifically on the number of Lambda invocations, the total compute duration, and the volume of state transitions. By using identical compute resources, such as 1024 MB of memory for the orchestrators and 512 MB for the worker functions, the methodology minimized variables that could skew the cost comparison, focusing purely on the orchestration overhead.
Moreover, the study utilized the ARM64 (Graviton) architecture for all Lambda functions to maximize cost efficiency at the compute layer. The Step Functions implementation used Standard Workflows to accommodate the long-running nature of the human approval phase, as Express Workflows are limited to five minutes of execution. This choice ensured that both systems were compared at their most appropriate configuration for the specific use case. The data extraction process was automated to pull from the Cost and Usage Reports (CUR), providing a granular view of every micro-cent spent during the 1,000-run cycle.
Findings
The data revealed a staggering 79% reduction in total operational costs when using Lambda Durable Functions compared to Step Functions. Specifically, the total cost for 1,000 workflow executions under the Durable Functions model was approximately $0.044, whereas the same 1,000 runs using Step Functions cost $0.207. The primary driver of this disparity was the elimination of state transition charges, which accounted for more than 84% of the total cost in the Step Functions implementation. Because Durable Functions handle state through a checkpoint and replay mechanism managed internally by the Lambda service, they do not incur the per-step fee that is fundamental to the Step Functions pricing model.
While the orchestration costs dropped significantly, the study observed a slight increase in Lambda duration charges for the Durable Functions orchestrator. This was attributed to the higher memory requirement (1024 MB) needed to manage the internal state and the SDK overhead, as well as the fact that the orchestrator function must replay its logic after every checkpoint. Specifically, the duration cost for Durable Functions was $0.031 compared to the $0.018 accrued by the smaller, more modular functions used in the Step Functions implementation. However, this 72% increase in duration cost was negligible compared to the massive savings achieved by removing the $0.175 transition fee per 1,000 runs.
Additionally, the research confirmed the effectiveness of the wait_for_callback pattern. During the 20-minute approval window, both systems recorded zero active compute charges, demonstrating that Lambda Durable Functions successfully suspend execution and release all resources while waiting for external signals. An interesting technical discovery was the invocation count; the Durable Functions required 1,788 invocations for the 1,000 runs, nearly double the expected count. This was determined to be a direct result of the replay mechanism, which requires the function to restart and re-run its logic from the beginning to rebuild its state every time it resumes from a checkpoint.
Implications
The implications of these findings are profound for organizations processing high volumes of long-running tasks. At a scale of 100,000 workflows per day, the 79% cost difference translates to thousands of dollars in annual savings. This financial incentive suggests that for Lambda-centric workflows—where the primary logic is already written in a programming language—switching to Durable Functions is a highly effective optimization strategy. The move away from Amazon States Language also implies a potential increase in developer velocity, as teams can use familiar Python or TypeScript constructs, including loops, conditional logic, and standard error handling, rather than learning a proprietary JSON-based syntax.
However, the research also highlights a significant trade-off in observability and maintenance. Step Functions provide a rich visual execution graph that allows even non-technical stakeholders to see exactly where a process is stuck or why it failed. In contrast, Durable Functions rely more heavily on traditional logging and X-Ray traces, which can be more difficult to parse in complex, multi-step scenarios. This suggests that the choice between the two services should not be based solely on cost. For mission-critical workflows that require frequent visual auditing and high-level service integration, the “premium” price of Step Functions may still be justified.
Furthermore, the study indicates that code-first orchestration introduces a new set of responsibilities for developers, specifically regarding code determinism. Because the function replays multiple times, any non-deterministic logic—such as generating a random ID or fetching a current timestamp—can cause the state to drift and the workflow to fail. This requirement for “deterministic” coding practices means that while the syntax is familiar, the underlying execution model requires a deeper understanding of how the SDK checkpoints state. Organizations must decide whether their development teams are ready to manage these technical nuances in exchange for the massive cost savings offered by the durable model.
Reflection and Future Directions
Reflection
The research process uncovered several unexpected hurdles that underscore the complexity of transitioning to a durable execution model. One of the most challenging aspects was managing non-deterministic behavior within the Python environment. During the initial testing phases, several workflows failed during the replay phase because a timestamp generated outside of a context.step() block produced a different value upon resumption. This experience highlighted the fact that while Durable Functions allow for a “code-first” approach, they do not allow for “code-anyway” practices. Every external interaction and every variable that influences the execution path must be explicitly managed within the SDK’s provided wrappers to ensure consistency.
Another area of reflection involves the consolidation of logic into a single orchestrator function. While this simplifies the deployment of the stack by reducing the number of individual Lambda functions to manage, it creates a concentrated demand on Lambda concurrency. In the Step Functions model, the workload was distributed across multiple small functions, which naturally smoothed out the concurrency usage. With Durable Functions, the single orchestrator carries the full burden of every stage of the ETL process. This realization suggested that for very large-scale systems, managing reserved concurrency and account-level limits becomes a much more critical task when using the durable model than it was under the distributed state-machine model.
The study could have been further enriched by exploring the performance impact of different memory configurations on the replay mechanism. It was noted that as the execution history of a durable function grows, the time taken for the replay phase increases slightly. While this did not significantly impact the cost in this specific ETL use case, it could become a factor in workflows with hundreds of steps. The research also demonstrated that the current monitoring tools for Durable Functions, while functional, still lag behind the mature visual ecosystem of Step Functions, which remains a primary consideration for teams that prioritize operational transparency over absolute cost efficiency.
Future Directions
Looking ahead, there are several avenues for further investigation that could expand the understanding of serverless orchestration economics. A primary area of interest is the recently announced Java support for Lambda Durable Functions. Given Java’s prevalence in enterprise environments and its different cold-start characteristics, a similar benchmarking study would provide valuable insights for large-scale corporate migrations. Additionally, investigating the performance of TypeScript in the same ETL scenario would help determine if the cost savings are consistent across all supported runtimes or if certain languages offer better efficiency during the replay phase.
Another critical path for future research is the evaluation of hybrid architectures. There is significant potential in a model where AWS Step Functions are used for high-level, cross-service orchestration, while Lambda Durable Functions handle the granular, logic-heavy sub-tasks within that broader workflow. Determining the “sweet spot” for this hybrid approach—where one service hands off to the other—could provide the best of both worlds: the visual observability of state machines and the cost-efficiency of durable code. Research into the latency impact of these hand-offs would be essential for real-time or near-real-time data processing systems.
Finally, the long-term maintainability of durable codebases versus declarative state machines remains an open question. As a workflow evolves over several years, the ease of updating a Python script versus a JSON state machine definition may change. Investigating how teams handle versioning and “flighted” executions—where an older version of code must finish running while a new version is deployed—will be crucial for the adoption of Durable Functions in production environments. The community would also benefit from further studies on the security implications of storing execution state internally within the Lambda service versus the externalized state management of traditional orchestration tools.
Redefining Serverless Economics for Long-Running Workflows
The comprehensive analysis of Lambda Durable Functions against the established Step Functions model provided a clear confirmation that code-first orchestration offers a significant economic advantage for specific serverless workloads. By achieving a 79% reduction in total costs, the study demonstrated that the elimination of state transition fees was the single most impactful factor in optimizing cloud spend for long-running ETL processes. The successful implementation of the zero-cost waiting mechanism showed that developers no longer need to sacrifice simplicity or budget to handle human-in-the-loop approvals. The transition from a declarative, multi-function architecture to a unified, stateful function proved that the complexity of serverless orchestration can be effectively minimized without losing the benefits of horizontal scalability.
The findings indicated that the trade-offs between these two services were not purely financial, but rather centered on the balance between observability and developer flexibility. While Step Functions remained the superior choice for visual monitoring and complex multi-service integrations, Durable Functions provided a more natural environment for Python-centric logic. The team observed that the requirement for deterministic code was the primary technical hurdle, yet once mastered, it allowed for more dynamic and expressive workflow definitions. Ultimately, the research established that Lambda Durable Functions are a transformative tool for high-volume, Lambda-heavy applications where cost and code-native development are prioritized. As cloud architectures continue to evolve toward 2030 and beyond, this shift toward integrated, stateful serverless execution is likely to redefine the standard for how organizations build and scale their most critical business processes.
