Home / AI & Trends / How Can Generative AI Simplify Large-Scale Notebook Migration?

How Can Generative AI Simplify Large-Scale Notebook Migration?

May 20, 2026

Grace MorainDigital Transformation Consultant

The massive scale of data operations at financial institutions like Deutsche Börse Group often creates complex technical debt that resurfaces during mandatory platform upgrades. As the StatistiX platform manages approximately ninety-five percent of clearing and trading data, maintaining seamless access for hundreds of business users remains a top operational priority in 2026. For years, these users relied on Zeppelin notebooks within a Cloudera ecosystem, but the decommissioning of these tools by 2027 necessitated a rapid transition to Databricks. This migration presented a significant challenge, involving over two thousand users and a massive volume of notebooks deeply integrated into daily workflows. Manual rewriting would have consumed years of engineering resources, prompting a search for a more automated, intelligent solution. The decision to leverage generative AI transformed this looming bottleneck into a manageable, scalable transition project that redefined how the organization handles large-scale technical migrations.

1. Strategic Shifts in Data Infrastructure

The core difficulty in notebook migration lies in the inherent complexity of the files themselves, which are far more than simple scripts or static code blocks. These Zeppelin notebooks contained intricate SQL and Python logic, custom interpreters, and references to legacy Oracle and HDFS data systems that have been refined over several years of active use. Because the logic within these notebooks was highly heterogeneous and specific to various business departments, a traditional rule-based translation engine proved entirely impractical for the task. Instead, the strategy shifted toward a cleaner design that separated the deterministic structural elements from the variable business logic that required deeper understanding. By recognizing that large language models are exceptionally skilled at structural conversion, the team was able to automate the formatting while leaving the nuances of the code to a more sophisticated, AI-driven reconstruction process that maintains the original intent of the developers.

Building this solution on the Databricks Apps framework allowed for the creation of a specialized environment tailored specifically to the unique migration workflow of the group. The resulting converter handles the tedious structural aspects, such as mapping Zeppelin paragraphs to Databricks cells and translating interpreter syntax like Pyspark or SQL into their modern equivalents. This process ensures that the notebook metadata is reformatted into valid JSON while preserving the original content exactly as it was written in the legacy system. By not attempting to rewrite the logic during the initial conversion phase, the tool avoids the common pitfalls of automated systems that often introduce subtle errors in complex code. This methodical separation of concerns ensures that the foundation of the notebook is solid before any AI intervention occurs, allowing for a much higher success rate during the final stages of the migration where business logic is reconstructed and validated.

2. Implementation of the Conversion Workflow

The actual process for business users was designed to be as frictionless as possible, allowing individuals to manage their own migrations without constant intervention from the IT department. The workflow begins with the user saving the original Zeppelin file by downloading the existing notebook in its native JSON format directly from the legacy environment. Once the file is secured, the user submits the file by uploading the JSON into the custom-built Databricks App which serves as the primary interface for the transformation. Next, the user must run the transformation by selecting the specific option to convert the file within the application’s dashboard. After the processing is complete, the user can get the new file by downloading the resulting processed .ipynb file, which is now compatible with the modern analytics platform. This sequence of actions effectively removes the technical barriers that usually discourage non-technical business users from participating in large-scale platform migrations.

Following the initial conversion, the user proceeds to import and initialize the notebook by opening it within the Databricks environment and starting Genie to insert the AI prompt created by the app. This prompt is critical because it contains specific environmental context, such as custom data sources and configuration patterns that are unique to the organization’s infrastructure. Finally, the user enters the stage to refine the code, where they use Genie to answer follow-up questions and finalize the reconstruction of the internal logic. This interactive phase allows the AI to address the specific nuances of the original script, ensuring that the final output is not just syntactically correct but functionally identical to the original version. By delegating the variable parts of the code to an intelligent assistant that can ask for clarification, the system avoids the brittleness of static rules. The result is a high-quality migration that respects the complexity of the original work while modernizing the codebase.

3. Preservation of Critical Business Intelligence

A significant component of the migration strategy involved identifying which elements should be intentionally left untouched by the automated conversion tool to ensure high levels of accuracy. The system does not rewrite business-critical SQL or Python logic, nor does it attempt to modify existing data visualizations or interactive widgets that users have come to rely on. Furthermore, any references to specific data sources like Oracle or HDFS are preserved exactly as they appeared in the source material to prevent unauthorized or incorrect data mapping. Scheduling configurations and custom business code also remain intact, as these elements often contain institutional knowledge that is difficult for a simple script to interpret correctly. By keeping these components original, the tool fosters user trust and ensures that the most sensitive parts of the analytics workflow are reviewed by human experts and sophisticated AI tools.

This hybrid approach acknowledges that while structural conversion is a solved problem, logic reconstruction requires a degree of judgment and context that only a human or a specialized AI can provide. When these preserved elements are later addressed during the Genie interaction phase, the system uses the context-aware prompts generated by the app to provide a roadmap for the reconstruction. This ensures that the migration process is not just a blind copy-paste operation but a thoughtful transition that takes into account the specific goals of each notebook. Because the tool avoids making assumptions about business-specific logic, it eliminates the risk of introducing hallucinations or logical errors that could lead to incorrect financial reporting or trading insights. Maintaining this high standard of data integrity is essential for a clearinghouse and trading group where precision is paramount. This strategy ultimately balances the speed of automation with the reliability of human oversight and intelligent logic interpretation.

4. Lessons Learned and Future Directions

The transition from legacy systems to Databricks demonstrated that efficiency gains are most significant when organizations focus on simplicity rather than overengineering their AI architectures. The project team discovered that a straightforward user interface paired with a clean backend was far more effective than a complex agentic system that added unnecessary overhead to the process. By reducing the time required to migrate a single notebook from several hours to roughly fifteen to twenty minutes, the organization successfully transformed a massive hurdle into a scalable workflow. This improvement allowed business users to take ownership of their own migration paths, significantly reducing the burden on specialized engineering teams. The success of this initiative highlighted the importance of high-quality prompts that include specific details about the local environment. Generic AI instructions often produce generic results, but context-rich inputs ensure that the migrated code remains functional.

In the end, the project proved that generative AI could successfully manage the heterogeneous nature of large-scale notebook migrations by focusing on the handoff between automation and intelligence. The team concluded that the most effective way to handle diversity in logic was to delegate variable tasks to models capable of asking clarifying questions rather than relying on brittle, rule-based rewriting. They prepared for a wider rollout by validating the tool across various business entities and refining prompt definitions to further improve accuracy levels. This implementation established a repeatable framework for future cloud transformations, turning what was once a resource-intensive manual task into a streamlined digital process. Moving forward, the organization looked to expand this methodology to other legacy migrations, ensuring that data accessibility remained uncompromised during infrastructure shifts. These steps provided a clear roadmap for any enterprise seeking to leverage artificial intelligence for technical debt reduction.