The long-theorized potential for artificial intelligence to proactively uncover complex security vulnerabilities has been compellingly demonstrated in a real-world assessment of Coolify, a popular open-source, self-hosted platform. In a deliberate move away from theoretical discussions, a security firm set its proprietary AI pentesting system against the mature and widely-used application to evaluate its efficacy in a practical scenario. The AI was provided with no prior information about Coolify’s security history, forcing it to analyze the application’s behavior and codebase from a completely fresh perspective. This rigorous test yielded significant results, uncovering seven distinct security flaws so severe that they were each assigned Common Vulnerabilities and Exposures (CVE) identifiers. The discoveries ranged from weaknesses in authentication mechanisms to critical vulnerabilities that permitted complete system compromise through remote code execution and privilege escalation, showcasing the tangible power of AI in modern cybersecurity.
A New Frontier in Security Testing
The assessment underscored the effectiveness of a hybrid security model that strategically combines the strengths of autonomous AI agents with the nuanced oversight of human experts. While the AI system independently identified and surfaced a number of exploitable issues, human security researchers played an indispensable role in the process. Their involvement was crucial for verifying the exploitability and true impact of the AI’s findings, as well as for identifying potential gaps in the AI’s analytical logic. This collaborative relationship created a powerful feedback loop; the insights gained from manual analysis were immediately used to train and refine the next generation of AI agents. This iterative process of discovery and improvement continuously enhances the depth, coverage, and sophistication of automated security testing, proving that the most effective approach is not a competition between human and machine intelligence but a synergistic partnership that leverages the best of both.
To achieve its comprehensive results, the AI pentesting solution employed a multi-faceted and layered strategy that targeted a specific build of the platform deployed in a standard cloud environment. The system initiated its analysis with automated black-box testing, where it interacted with the application’s exposed endpoints and user-facing workflows just as an external attacker would, probing for common vulnerabilities without any knowledge of the internal source code. This was complemented by an AI-driven white-box analysis, which involved a direct examination of the Coolify source code. This allowed the system to identify potential flaws in security-sensitive code paths that are not visible from the outside. A key capability demonstrated was the AI’s continuous cross-functional reasoning, which enabled it to connect seemingly disparate pieces of information from authentication, authorization, and command execution modules to identify complex, multi-step attack chains that would likely be missed by a single-technique approach.
From Theory to Impact Uncovering Critical Flaws
Among the seven vulnerabilities discovered, the most critical were two separate command injection flaws that provided a direct path to Remote Code Execution (RCE). The first of these critical issues, identified as CVE-2025-64419, stemmed from the system’s failure to properly sanitize user-provided parameters within docker-compose.yaml files. An attacker could craft a malicious repository containing a specially configured Docker Compose file, and when a user deployed it through Coolify, the unsanitized parameters would be interpreted and executed as system commands. A similar vulnerability, CVE-2025-64424, arose from inadequate sanitization of user input in the Git source configuration fields. In both scenarios, an attacker with access to these configuration areas could inject malicious shell commands. The impact of both flaws was a full compromise of the host server, as the injected commands were executed with root privileges, granting the attacker complete and unrestricted control over the system.
Another avenue for complete system takeover was discovered in CVE-2025-64420, a critical vulnerability that resulted from the improper exposure of the root user’s private SSH key. A low-privileged user with only basic authenticated access to the Coolify instance was able to view and retrieve this highly sensitive credential directly through the application’s interface. In the world of server administration, possession of the root user’s private SSH key is tantamount to having the master password for the entire system. This flaw effectively allowed any low-level attacker to bypass all application-level security controls and gain full administrative access to the underlying host machine. The discovery of such a fundamental security misconfiguration highlights the AI’s ability to identify not only complex code-based vulnerabilities but also critical operational security gaps that can lead to a swift and total compromise of the infrastructure.
Exploiting Trust and Bypassing Defenses
The AI-driven assessment also exposed multiple pathways for privilege escalation by exploiting weaknesses in the platform’s user management and invitation logic. One high-severity flaw, CVE-2025-64421, allowed a logged-in user with minimal permissions to manipulate the invitation process. They could generate an administrator-level invitation and send it to an email address under their control, accept it, and subsequently gain full administrative privileges over the Coolify instance. A related issue, CVE-2025-64423, revealed that pending administrator invitation links were visible to existing low-privileged users. This meant an attacker already inside the system could monitor for new admin invitations and use the link to register their own account before the intended recipient, effectively hijacking the privileged role. Both vulnerabilities demonstrated the AI’s capacity to understand and exploit logical flaws within an application’s trust model, moving beyond simple code scanning to analyze how features and permissions interact.
Further investigation by the AI system uncovered vulnerabilities that undermined standard user account security and defensive measures. A high-severity account takeover vulnerability, CVE-2025-64425, was identified in the password reset functionality, which was susceptible to Host header injection. An attacker could initiate a password reset for a victim and manipulate the Host header in the request, causing the application to generate a reset link that pointed to an attacker-controlled domain. Finally, the platform’s defenses against brute-force attacks were found to be inadequate. While the login endpoint implemented a rate limit, CVE-2025-64422 demonstrated that this protection could be easily circumvented. An attacker could bypass the limit by simply changing the X-Forwarded-For HTTP header with each login attempt, enabling them to perform unlimited password guessing attacks and significantly increasing the likelihood of gaining unauthorized access to user accounts.
The Evolving Symbiosis of AI and Human Expertise
The comprehensive security assessment of Coolify served as a powerful case study on the tangible effectiveness of AI-driven penetration testing in today’s complex software environments. The successful identification of seven distinct CVEs, including multiple critical vulnerabilities that led directly to full system compromise, demonstrated that these advanced systems have moved beyond conceptual promise to become practical and highly effective tools. This exercise proved that AI can meticulously analyze production-grade applications and uncover significant, high-impact security flaws that may have eluded traditional testing methods. The synergy between artificial intelligence and human expertise represented the current state-of-the-art in security validation. While the AI excelled at automated discovery and analysis at a scale and speed unattainable by human testers alone, human oversight remained indispensable for validating the real-world impact of findings, understanding nuanced contextual issues, and strategically guiding the AI’s continuous learning process. This collaborative feedback loop was the key that unlocked the full potential of the assessment, paving the way for a future where autonomous security testing becomes increasingly comprehensive, efficient, and scalable.
