The rapid expansion of global financial corridors requires a technological infrastructure that can facilitate instantaneous money movement while managing the intricate complexities of multi-currency portfolios. Airwallex has emerged as a cornerstone of the modern fintech sector by fundamentally reimagining how data serves as the lifeblood of international commerce. Originally founded in Melbourne and now maintaining a heavy operational presence in major financial centers like Singapore and San Francisco, the organization has faced the immense pressure of scaling its platform to meet the needs of a borderless economy. As the volume of transactions surged, the traditional methods of data warehousing and the reliance on fragmented open-source tooling began to show signs of strain, threatening the agility that defined the company’s early success. This friction necessitated a comprehensive strategic pivot toward a unified data and AI ecosystem that could harmonize disparate information streams into a single, actionable source of truth.
To navigate this transition, Airwallex moved away from an architectural landscape dominated by a mix of BigQuery and various open-source software such as Airflow and Spark. While these technologies provided a baseline for initial growth, they eventually created a maintenance-heavy environment where engineering talent was frequently bogged down by infrastructure management rather than product innovation. The primary objective became the creation of a seamless backbone capable of processing both structured financial logs and unstructured product data with fintech-grade security. By streamlining these workflows, the organization sought to eliminate the silos that hampered real-time visibility and delayed the deployment of critical financial services. This evolution was not merely about upgrading hardware or software but about establishing a robust foundation that could support the sophisticated machine learning workloads and automated decision-making processes required in the current high-stakes financial landscape.
Overcoming Structural Friction and Latency
Identifying Operational Bottlenecks
The operational burden of maintaining a legacy open-source stack often leads to a phenomenon where highly skilled data engineers spend the majority of their time on “keeping the lights on” rather than developing high-value features. At Airwallex, the necessity for manual infrastructure tuning for components like Spark and Airflow created a significant bottleneck that slowed the pace of innovation across the entire product suite. This reliance on manual intervention meant that every expansion of the data environment required a proportional increase in engineering effort, which is an unsustainable model for a company aiming for global dominance. Furthermore, the fragmented nature of these tools often led to inconsistencies in data handling, where different teams might interpret the same financial metrics through slightly different lenses, complicating the internal reporting process and making it difficult to maintain a unified view of the company’s global health.
Beyond the internal resource drain, the most pressing technical challenge was the persistent data latency that hampered real-time financial oversight. The existing architecture suffered from a 60-minute “source-to-gold” delay, meaning that an hour could pass before raw transaction data was fully processed, cleaned, and made available for high-level analysis. In the fast-paced world of global finance, where currency fluctuations and fraud risks can change in seconds, a one-hour gap is a substantial liability that impacts everything from risk management to customer support. This latency created a “blind spot” that prevented the organization from reacting instantaneously to market shifts or operational anomalies. Consequently, the push for a new architecture was driven by the urgent need to close this gap and provide the business with the real-time insights necessary to maintain its reputation for reliability and speed in cross-border payments.
Managing Complexity and Governance
Managing approximately 1,600 batch pipelines created an environment of staggering complexity that made scaling a labor-intensive and error-prone process. This massive volume of workflows was a byproduct of a fragmented data strategy where individual services and product lines often developed their own independent data paths. Such a decentralized approach made it nearly impossible to implement universal governance standards or to ensure that every dashboard across the global organization was reflecting the same reality. For a fintech company operating across multiple jurisdictions, this complexity also posed a regulatory risk, as tracking data lineage and ensuring compliance with privacy laws like GDPR or local financial regulations became increasingly difficult as the number of pipelines grew. The need for a more declarative and consolidated approach to data engineering was clear, as the current trajectory threatened to overwhelm the team’s ability to maintain a secure and auditable environment.
Effective data democratization in a regulated industry requires a delicate balance between providing widespread access and maintaining strict security protocols. Airwallex recognized that as it scaled, more stakeholders—from marketing analysts to product managers—needed direct access to data to drive growth and improve user experiences. However, the existing siloed structure made it difficult to grant this access without risking data over-exposure or non-compliance with strict financial standards. The challenge was to find a solution that could centralize role-based access control and provide a transparent audit trail of who accessed what data and when. Establishing this “secure-by-design” framework was essential to fostering a culture of data-driven decision-making where employees could explore insights independently, confident that they were working within a governed and protected environment that met the highest fintech requirements.
Implementing the Unified Lakehouse Architecture
Streamlining Workflows with Databricks and Google Cloud
The decision to integrate Databricks with Google Cloud represented a strategic shift toward a unified Lakehouse architecture, designed to merge the best features of data lakes and warehouses. By standardizing on Delta Lake as the primary storage layer, Airwallex was able to adopt a “medallion architecture” that systematically organizes data into Bronze, Silver, and Gold layers based on its level of refinement and business readiness. This structured approach allowed the engineering team to consolidate their sprawling infrastructure, effectively reducing the number of batch pipelines from 1,600 down to just 60. This 96% reduction in pipeline volume was achieved through the use of declarative programming models that simplified how data is ingested, transformed, and managed. By removing the need for manual tuning and repetitive code, the organization freed its engineers to focus on building sophisticated financial products rather than managing the plumbing of the data ecosystem.
This architectural overhaul yielded immediate dividends in terms of operational efficiency and the overall reliability of the data platform. The transition to a unified environment on Google Cloud Storage (GCS) enabled seamless scaling of compute and storage resources, ensuring that the platform could handle sudden spikes in transaction volume without performance degradation. Moreover, the move to a Lakehouse model eliminated the need to move data between disparate systems for different use cases, such as moving information from a lake for machine learning to a warehouse for business intelligence. This consolidation not only reduced the cloud egress costs and storage redundancies but also ensured that all teams were working from the exact same dataset. The resulting environment is one where data integrity is maintained throughout its lifecycle, providing a stable and scalable foundation that can support the company’s aggressive growth targets in the coming years.
Enhancing Speed and Financial Oversight
One of the most transformative outcomes of the Lakehouse implementation was the drastic reduction in data latency, which directly enhanced the company’s financial oversight capabilities. By slashing the “source-to-gold” latency from 60 minutes to just 15 minutes, Airwallex enabled its teams to monitor global money movement with unprecedented precision. This 75% improvement in data freshness means that risk engines, fraud detection algorithms, and customer-facing dashboards are now powered by information that is nearly current. In the context of international banking, this speed allows for more proactive management of liquidity and exposure, ensuring that the platform remains resilient even during periods of high market volatility. The ability to see and act upon data in near-real-time has become a critical differentiator, allowing the organization to provide its customers with a level of transparency and responsiveness that traditional banking institutions often struggle to match.
The implementation of the Databricks Unity Catalog provided the necessary governance layer to manage this high-velocity data environment securely. By centralizing access controls and auditing within a single interface, Airwallex could finally achieve the goal of safe data democratization across its global offices. This centralized catalog ensures that sensitive financial information is only accessible to authorized personnel while providing a clear, automated record for regulatory compliance. The integration with Google Cloud’s broader security ecosystem further bolstered this framework, creating a multi-layered defense strategy that protects against both internal misconfigurations and external threats. This balance of rapid data availability and rigorous governance has transformed the data platform from a technical utility into a strategic asset, enabling the company to scale its operations while maintaining the trust of its global client base and regulatory partners.
Driving Innovation Through AI and Self-Service
Empowering Stakeholders and Building Future Capabilities
The democratization of data through the Lakehouse platform has fundamentally changed how internal stakeholders interact with information, moving away from a world of bottlenecked requests to one of self-service exploration. By leveraging integrated AI assistants and natural language processing tools, non-technical team members in departments like marketing and operations can now query complex datasets without needing to write SQL. For instance, a marketing manager can instantly analyze the ROI of a specific international campaign by asking the system questions in plain English, receiving accurate metrics derived directly from the “Gold” layer of the data lake. This shift has not only accelerated the pace of business decision-making but has also reduced the daily ticket volume for the data engineering team, allowing them to remain focused on high-level architecture and the development of advanced machine learning models for the core banking platform.
For the technical side of the house, the new environment has significantly shortened the development lifecycle for machine learning and predictive analytics. Developers now have access to collaborative notebooks and integrated model-tracking tools that allow for high-velocity experimentation and seamless deployment. This productivity boost is evident in the way Airwallex now serves insights directly to its customers through its web and mobile applications; these features are no longer powered by stale, batch-processed data but by a secure and reliable pipeline that reflects the most recent transactions. By providing engineers with a unified playground for data science and production-grade engineering, the company has fostered an environment where new AI-driven features can be prototyped, tested, and rolled out to a global audience in a fraction of the time it previously took, ensuring that the platform remains at the cutting edge of fintech innovation.
Architecting the Era of Autonomous Agents
The forward-looking strategy of Airwallex is increasingly centered on the development of “AI Agents” that can autonomously synthesize unstructured data to drive product evolution. These agents are designed to ingest and analyze vast quantities of customer feedback, support tickets, and market trends to provide product managers with actionable insights into emerging user needs. Instead of manually sifting through thousands of qualitative inputs, the organization can use these intelligent systems to identify specific friction points in the user journey or to spot high-demand features that have not yet been implemented. This capability allows the company to remain hyper-responsive to the market, ensuring that its development roadmap is always aligned with the actual requirements of businesses moving money across borders. This shift toward agentic AI represents a transition from descriptive analytics—knowing what happened—to prescriptive intelligence, where the system helps determine what should happen next.
Maintaining a multi-model AI strategy is a cornerstone of this future-proof foundation, ensuring that Airwallex can utilize the best tool for every specific task without becoming dependent on a single provider. By integrating Google’s Gemini models alongside other prominent large language models, the organization maintains the flexibility to switch or combine technologies as the AI landscape evolves. This approach is applied across various domains, from improving log observability and system health monitoring to personalizing the customer experience through hyper-targeted growth strategies. By building an architecture that is model-agnostic and data-rich, Airwallex has positioned itself to lead the industry into an era where financial services are not just digital, but truly intelligent. The transition to this unified data and AI ecosystem has successfully removed the technical friction that once hindered growth, creating a blueprint for how modern fintechs can leverage a Lakehouse architecture to achieve global scale and operational excellence.
