Monday morning in a development team at a mid-sized fintech. A senior developer finishes a payment function that GitHub Copilot generated in 15 minutes instead of the planned two hours. Code review passes without comments—the code looks clean, unit tests pass, the PR gets merged to main. Two weeks later, the security team discovers the function is vulnerable to an injection attack. An audit reveals that Copilot copied a pattern from an outdated repository where the same vulnerability was patched three years ago.

This is not a hypothetical scenario. According to the Veracode 2025 GenAI Code Security Report, which analyzed code produced by over 100 LLM models across 80 real programming tasks, generative AI introduces security vulnerabilities in 45% of cases. For Java—the most popular enterprise language—this rate exceeds 70%.

Meanwhile, JetBrains research from October 2025 shows that 85% of nearly 25,000 surveyed developers regularly use AI coding tools. Google reports similarly: 90% of software development professionals have adopted AI. We’re dealing with mass adoption of technology that produces code with security vulnerabilities in nearly half of cases.

This article analyzes the causes of this situation, presents specific vulnerability categories most commonly introduced by AI, and provides practical risk minimization strategies for teams that—rightfully—don’t intend to abandon the productivity benefits of AI-assisted development.

Where does the 45% vulnerability rate data come from?

The Veracode 2025 GenAI Code Security Report is the most comprehensive study of AI-generated code security conducted to date. The methodology included analyzing code produced by over 100 different language models (LLMs), including GPT-4, Claude, Gemini, CodeLlama, and dozens of smaller specialized models.

Researchers assigned models 80 real programming tasks representing typical enterprise development use cases: form handling, input validation, file operations, database queries, user authentication, session management. Each generated code fragment was then scanned with SAST (Static Application Security Testing) tools for known vulnerability categories according to CWE (Common Weakness Enumeration) classification.

Results surprised even pessimists. 45% of all generated code samples contained at least one security vulnerability—even when developers used the newest, most advanced models. Moreover, many samples contained multiple vulnerabilities, often from different categories.

Results for individual languages are particularly alarming. Java—dominant in enterprise, banking, and insurance applications—showed the highest failure rate exceeding 70%. Python, C#, and JavaScript fell in the 38-45% range. These numbers mean that statistically every second or third AI code fragment requires security fixes before production deployment.

The Georgetown CSET (Center for Security and Emerging Technology) study identifies three broad risk categories: models generating dangerous code, models themselves vulnerable to attacks and manipulation, and downstream cybersecurity impacts such as feedback loops in training future AI systems. In other words—the problem isn’t limited to individual code fragments but has cascading potential.

Which vulnerability categories does AI-generated code most commonly introduce?

The Veracode report reveals clear patterns in the types of vulnerabilities generated by LLMs. Two categories overwhelmingly dominate the statistics.

Cross-Site Scripting (XSS), classified as CWE-80, affects 86% of examined samples. AI models consistently generate code that doesn’t properly sanitize input data before rendering it in an HTML context. This is a fundamental vulnerability known for two decades, yet LLMs haven’t “learned” to avoid it.

Log Injection (CWE-117) appears in 88% of samples. Generated code writes user data directly to logs without validation, enabling attackers to manipulate log files, hide attack traces, or inject false entries complicating forensic analysis.

Injection attacks generally—SQL injection, command injection, LDAP injection—constitute the third largest category. LLMs regularly generate SQL queries through string concatenation instead of using parameterized queries, even though this practice has been considered an antipattern for over 15 years.

Memory management vulnerabilities appear particularly often in C and C++ code. Buffer overflows, use-after-free, null pointer dereferences—classic errors that are impossible in garbage-collected languages but remain critical in low-level programming.

Authentication and session management problems include: hardcoded credentials, weak password hashing algorithms, predictable session tokens, lack of token validation. LLMs often generate code that “works” from a functional perspective but is completely unsafe in the context of real-world attacks.

Insecure deserialization—a particularly dangerous vulnerability in Java—appears when LLMs generate code deserializing data from untrusted sources without validation. Attackers can exploit this vulnerability for remote code execution.

Why does Java have the highest failure rate exceeding 70%?

Java dominates enterprise development for good reason—stability, mature ecosystem, rich standard library, strong typing. Paradoxically, these same characteristics make AI-generated Java code particularly vulnerable to security problems.

The first factor is ecosystem complexity. Java has dozens of web frameworks (Spring, Jakarta EE, Struts, Play), each with its own security patterns. An LLM trained on code from different frameworks may “mix” patterns, generating code that looks correct but uses deprecated or unsafe APIs. For example, a model might generate code using old Spring Security API with known vulnerabilities because that code appeared frequently in training data.

The second factor is Java’s backward compatibility. Code written in Java 6 still compiles and runs in Java 21. This means LLMs have enormous amounts of outdated code in their training data—code that was secure when written but contains vulnerabilities discovered later. The model doesn’t “know” that a specific pattern was deemed unsafe in 2019.

The third factor is serialization—a Java-specific feature that’s the source of countless CVEs. Insecure deserialization in Java enables remote code execution and has been exploited in high-profile attacks (Apache Struts, WebLogic). LLMs regularly generate code deserializing ObjectInputStream without validation because such patterns are common in training data.

The fourth factor is JDBC and SQL. Java was one of the first languages with native database support through JDBC. As a result, training data contains millions of examples of SQL through string concatenation—a practice common in the 90s and early 2000s, and today the definition of SQL injection vulnerability.

For enterprise teams using Java, these statistics mean every AI-generated code fragment requires particularly thorough security review. SAST automation is essential but not sufficient—many vulnerabilities require contextual understanding that goes beyond pattern matching.

What is the “hallucinated dependencies” problem and why is it dangerous?

“Hallucinated dependencies” is a relatively new threat that emerges as a side effect of how LLMs work. When generating code, the model sometimes “invents” package or library names that don’t exist—but look plausible.

The attack mechanism is simple and elegant from an attacker’s perspective. The LLM generates code with an instruction npm install some-plausible-name or pip install useful-sounding-package. The developer, trusting the AI assistant, executes the command. If the package doesn’t exist, installation fails—but an attacker can register a package with that name and place malicious code in it. The next developer who receives the same suggestion from the LLM will install the malicious package.

This is a variant of an attack known as “dependency confusion” or “typosquatting,” but with a new distribution vector. Instead of waiting for someone to make a typo when typing a package name, attackers can actively “poison” LLM suggestions by creating repositories with names that models might “hallucinate.”

The problem escalates in the context of package managers with global namespaces (npm, PyPI). A name just needs to sound plausible—react-utils-helper, django-security-fix, spring-boot-utils—and a developer will install it. Some of these malicious packages were downloaded thousands of times before detection.

Defense requires a multi-layered approach. At the process level: always verify package existence and reputation before installation. At the tools level: use dependency scanning (Snyk, Dependabot, OWASP Dependency-Check) to detect suspicious packages. At the infrastructure level: consider a private registry as a proxy to public repositories with a whitelist of approved packages.

How do AI Pull Requests compare to human-written code?

Comparative studies between AI code and human code yield consistent results: AI generates more quality issues per unit of code.

Pull Request analysis reveals that PRs containing AI-generated code have an average of 10.83 issues per PR, while PRs with human-written code have 6.45 issues. That’s 1.7x more problems requiring attention during code review. This translates to longer review time and increased risk that some problems will slip into production.

More importantly, PRs with AI code contain 1.4x more critical issues and 1.7x more major issues. So it’s not just cosmetic problems or style convention violations—AI generates proportionally more serious defects that can affect application security, performance, or stability.

These statistics don’t mean AI is “worse” than humans at programming. They mean current models optimize for functional correctness (does the code do what was asked?) at the expense of other quality dimensions (does it do it securely? efficiently? in a maintainable way?). A human developer writing code automatically considers project context, security requirements, team conventions. An LLM generates code in isolation from this context.

Practical implication: teams adopting AI-assisted development must invest in more rigorous code review processes. Paradoxically, time saved on writing code may be consumed by longer reviews—unless we automate part of this process through SAST and linting tools integrated with CI/CD.

Can you trust AI code in business-critical applications?

The question of trusting AI code in critical applications requires a nuanced answer. Complete abandonment of AI-assisted development means losing the productivity advantage. Unquestioning trust means accepting unacceptable risk. The solution is conditional trust based on verification processes.

Georgetown CSET in their report on AI-generated code cybersecurity risks states it clearly: “Developers must treat AI-generated code as potentially vulnerable and apply security testing and review processes just as they would for any human-written code.” This is a fundamental principle, but in practice often ignored under deadline pressure.

Guidelines should differentiate scrutiny levels depending on context. AI-generated code for a prototype or internal tool requires less verification than code for a payment system or healthcare. Not every code fragment carries the same risk—but every fragment should go through minimum baseline checks.

A conditional trust framework might look like this:

For low-risk code (internal tools, prototypes): automatic SAST scanning, basic code review.

For medium-risk code (customer-facing features without sensitive data): SAST plus DAST, extended code review with security checklist.

For high-risk code (authentication, payments, personal data): full security review by a specialist, penetration testing, external audit for critical components.

For critical code (infrastructure, cryptography, compliance): AI-generated code as a starting point requiring rewriting by a security-trained developer, formal verification where possible.

What tools help detect vulnerabilities in AI code?

The code security testing tool ecosystem is mature, and most solutions work equally well for AI code as for human code. The key is their systematic integration with the development workflow.

Static Application Security Testing (SAST) analyzes source code without running it. Tools like SonarQube, Checkmarx, Veracode Static Analysis, Snyk Code identify vulnerability patterns through pattern matching and data flow analysis. SAST is ideal for catching problems early—ideally in the developer’s IDE or as a git hook before commit.

Dynamic Application Security Testing (DAST) tests a running application by simulating attacks. Tools like OWASP ZAP, Burp Suite, Acunetix send malicious payloads and observe application responses. DAST detects vulnerabilities that SAST might miss—especially those dependent on runtime configuration.

Software Composition Analysis (SCA) scans dependencies for known vulnerabilities and license compliance. Snyk, Dependabot, OWASP Dependency-Check, WhiteSource compare dependency trees with CVE databases and alert on vulnerable library versions. In the context of “hallucinated dependencies,” SCA can also flag unknown or suspicious packages.

Interactive Application Security Testing (IAST) combines SAST and DAST, instrumenting the application during functional tests. Contrast Security, Hdiv are examples of IAST tools that detect vulnerabilities in the context of actual data flow.

AI-augmented security tools are the newest category. Tools like Snyk Code AI, GitHub Advanced Security with Copilot use ML models to detect vulnerabilities with greater precision than traditional pattern matching. Ironically, AI helps fix problems caused by AI.

For teams at ARDURA, we recommend a minimum stack: SAST in CI/CD (blocker on critical/high), SCA with automatic PRs for security updates, DAST in staging pipeline. This is a baseline that catches most problems without significant overhead on velocity.

How do you integrate security into the AI-assisted development workflow?

Integrating security with the AI-assisted development workflow requires rethinking the traditional “security at the end” model. When AI accelerates code writing, the volume of code requiring review proportionally increases—and the traditional bottleneck at the security review stage becomes critical.

The “shift-left security” approach means moving security verification as early as possible in the development cycle. In the context of AI-assisted development, this means:

At the IDE level: configuring linters and SAST to work in real-time, flagging potential problems before code is committed. Some tools (Snyk, SonarLint) offer integration with popular IDEs and can analyze Copilot-generated code the moment it’s accepted.

At the pre-commit level: git hooks running quick security checks before each commit. Blocking commits with critical vulnerabilities forces the developer to fix the problem immediately, not postponing it for later.

At the PR level: automatic PR scanning by SAST/SCA integrated with GitHub Actions, GitLab CI, Azure Pipelines. Requiring approval from a security-trained reviewer for changes in sensitive areas (auth, payments, data handling).

At the CI/CD level: full security scan as part of the build pipeline. Fail build for critical/high vulnerabilities, warning for medium/low. DAST on staging environment before promotion to production.

At the runtime level: monitoring security metrics in production, alerting on suspicious patterns, automatic blocking of known attack vectors through WAF.

This multi-layered model assumes no single layer is perfect. Defense in depth—if a problem slips through the IDE, pre-commit will catch it. If it slips through pre-commit, PR review will catch it. And so on.

What does responsibility for AI code security mean according to regulators?

The question of legal liability for vulnerabilities in AI-generated code is actively discussed by regulators, but the direction of interpretation can already be identified.

Georgetown CSET in their report states: “Responsibility for ensuring the security of AI-generated code should not rest solely with individual users but also with AI developers and organizations.” This points to a shared responsibility model—but in practice, until regulations arrive, risk rests with the organization using AI tools.

The EU AI Act, which comes into full force in 2026, classifies AI systems by risk. Code-generating tools will probably fall into the “limited risk” category requiring transparency (users must know code was generated by AI)—unless they’re used in the context of high-risk applications (healthcare, critical infrastructure), where requirements are much stricter.

NIS2 and DORA for the financial sector impose obligations regarding security by design and resilience testing. Organizations using AI to generate code in systems covered by these regulations must demonstrate that the development process includes adequate security controls—regardless of whether the code was written by a human or machine.

Practical implication: document the AI code security verification process. In case of an incident, the ability to demonstrate that the organization applied reasonable security practices may be crucial for limiting liability. “We trusted Copilot” is not a defense; “The code went through our standard security pipeline including SAST, code review, and DAST”—is.

How do you train teams to use AI code assistants securely?

Team training is an often-overlooked element of AI-assisted development adoption. Most developers start using Copilot or ChatGPT for coding without any preparation—and develop habits that are hard to change later.

A training program should cover several areas:

AI limitations awareness: developers must understand that LLMs don’t “understand” security. The model generates statistically probable code based on patterns in training data—it doesn’t analyze security context. This awareness shifts the approach from “AI generated it, so it’s OK” to “AI generated it, I need to verify.”

Red flag recognition: training in identifying patterns that often indicate security problems. String concatenation in SQL, direct rendering of user input in HTML, deserialization without validation, hardcoded credentials—these patterns are relatively easy to detect even without tools.

Secure prompting: how prompts are formulated affects the quality of generated code. The prompt “write an authentication function” will give a worse result than “write an authentication function with bcrypt password hashing, rate limiting, and timing attack protection.” Explicit security requirements in the prompt improve output security.

Verification workflow: practical exercises in using SAST tools, interpreting results, fixing found vulnerabilities. A developer should be able to independently scan code and understand the report without security team support.

Security champions: designating someone with deeper security knowledge in each team who can support colleagues and escalate unusual situations. A security champion doesn’t replace professional audit but raises the baseline knowledge in the team.

At ARDURA, we offer secure coding workshops adapted to the context of AI-assisted development. We combine theory with practical exercises on real examples of vulnerabilities in code generated by popular AI tools.

Strategic table: Security checklist for AI-generated code

AreaControl QuestionAction if NOPriority
Pre-generationDoes the prompt contain explicit security requirements?Extend prompt with security requirementsHigh
Is the trust context of input data known?Determine source and trust level of inputHigh
Post-generationDid the code pass through SAST?Run scan before commitCritical
Were all critical/high findings addressed?Fix or document accepted riskCritical
Do dependencies exist and are they trusted?Verify in npm/PyPI/Maven, check age/popularityHigh
Do dependencies have no known vulnerabilities?Run SCA, update or find alternativeHigh
Code ReviewIs the reviewer aware the code comes from AI?Mark PR as AI-assistedMedium
Does the code handle edge cases and error handling?Add defensive codingMedium
Do sensitive operations have adequate logging?Add audit trailMedium
IntegrationIs the code covered by security-focused tests?Add tests for known attack vectorsHigh
Was DAST run on the integrated functionality?Include DAST in staging pipelineHigh
ProductionWill monitoring detect exploitation of this functionality?Configure alerts on suspicious patternsMedium
Is there a rollback plan for this change?Prepare quick rollback procedureMedium

Usage: Go through the checklist for each significant AI-generated code fragment. Not every point must be met for every code—but every “NO” should be a conscious decision, not an oversight.

How does ARDURA support organizations in secure development?

ARDURA Consulting has specialized in application testing and software security for over a decade. Our experience includes projects for the financial sector, healthcare, and enterprise, where security requirements are particularly rigorous.

In the context of AI-assisted development, we offer:

Security Code Review — our experts conduct manual code review with special attention to patterns typical of AI-generated code. We combine automatic scanning with contextual analysis that goes beyond SAST tool capabilities.

DevSecOps Implementation — we help integrate security tools with the CI/CD pipeline. We configure SAST, DAST, SCA in a way that doesn’t block team velocity but catches critical problems before production.

Security Training — workshops for development teams covering secure coding practices, vulnerability recognition, and safe use of AI code assistants. We adapt training to the technology stack and team skill level.

Penetration Testing — penetration testing of applications with special emphasis on areas where AI-generated code is used. We simulate attacks on real systems, identifying vulnerabilities that escaped automatic tools.

For teams using Java enterprise, we also offer support from our Flopsar Suite product for performance monitoring and diagnostics—complementary to security tools in identifying anomalies that may indicate exploitation.

Summary: security as an enabler, not blocker of AI adoption

The statistics are clear: 45% of AI-generated code contains security vulnerabilities, and for Java this rate exceeds 70%. At the same time, 85% of developers already use AI for coding. Ignoring this tension is not an option.

The solution is not abandoning AI-assisted development—the productivity benefits are too significant. The solution is treating AI code with the same skepticism as code from an unknown source: verification through automatic tools, conscious code review, security testing before production.

Key takeaways:

  1. Every AI code fragment should pass through SAST before commit—this is a non-negotiable baseline.

  2. Java requires special attention due to the 70% failure rate—consider additional security review for critical components.

  3. “Hallucinated dependencies” is a new attack vector—always verify package existence and reputation before installation.

  4. Document the verification process—in case of an incident, demonstrating reasonable practices limits liability.

  5. Training the team in secure AI use is an investment, not a cost—an unaware developer is the weakest link.

If your team is adopting AI-assisted development and needs support building secure development practices—contact ARDURA. We’ll help balance productivity with security without sacrificing either dimension.