DoD’s AI Balancing Act

To secure a strategic advantage, the DoD must manage the tension between vendor hype and extreme alarmism regarding AI adoption.

Generated by Gemini

Published: December 2, 2025 9:40 a.m.

Experts

By Sebastian Elbaum
Adjunct Senior Fellow for Emerging Computing Technologies

By

Jonathan Panter
Stanton Nuclear Security Fellow

Exaggerations and unsubstantiated claims pervade debates about the adoption of artificial intelligence (AI) across the government, economy, and society. The hype cuts both ways, with both proponents and opponents of AI adoption making claims that require more evidence and analysis to adjudicate. This battle is particularly salient in the realm of national security, in which the stakes of technological adoption can be life-and-death.

Advances in AI over the last decade—fueled by breakthroughs in deep learning, computing power, and the emergence of Generative AI (GenAI)—promise to transform national security decision-making and the exercise of military power. The U.S. Department of Defense (DoD) recognized this potential early, moving to embrace AI as a critical enabler for future warfare. This early recognition was evident in the Department’s 2014 Third Offset Strategy, and in its launch of the Defense Innovation Unit Experimental (DIUx) in 2015.

The unrelenting speed of AI evolution, however, has challenged the Department. Decades-old acquisition and oversight mechanisms, a lack of in-house technical expertise, and the fact that AI emerged organically from industry, hampered initial progress. The increasingly urgent need to field AI-enabled capabilities, from autonomous weapons to decision-support tools, required the DoD to adopt more agile procurement processes. These new processes focus on commercial partnerships, evaluation, and integration rather than solely research and development. Key milestones—such as the establishment of the Joint Artificial Intelligence Center (JAIC) in 2018, and the subsequent formation of the Chief Digital and Artificial Intelligence Office (CDAO) in 2022—reflect this pivot. These actions aim to align AI efforts across the DoD and leverage the innovation of Silicon Valley and the engine of private capital.

The scale of these commercial partnerships is significant, evidenced by multi-million-dollar awards and partnerships involving frontier AI companies such as Anthropic, Google, OpenAI, and xAI. Enabled by such partnerships, DoD’s AI adoption has dramatically expanded from efforts to optimize battle management (tactical and operational decision-making in warfare), towards models that assist with higher-level questions in national security. These “strategic” questions include assisting senior officials with scenario planning and course-of-action development, as well as red-teaming and wargaming.

This emerging and critical realm—the use of generative AI and Large Language Models (LLMs) for national security decision-making at the strategic level—is fraught with hype, both for and against the adoption of these technologies.

AI Adoption at DoD

Competing Strategies

Strategists face two competing risks when adopting emerging technologies for military use. The first is the risk of overestimation. Militaries that expect too much from new technology can grow overconfident or introduce unnecessary vulnerabilities into their forces. The second risk is the opposite—that of underestimation. Militaries that fail to understand the transformative potential of new technologies risk bestowing a decisive advantage on opponents who adapt more quickly. Because U.S. military adoption of AI is still in its infancy—and because the technology itself continues to mature rapidly—both risks are present.

The competition to sell AI-based products to the DoD remains in its early “Wild West” days. Given the interest in GenAI adoption, a bevy of companies—from multi-billion-dollar corporations developing frontier models to emerging “neo-primes” (e.g., Palantir, Anduril, ScaleAI) to new startups (e.g., ScoutAI, Anadyr Horizon) —are fiercely competing for a slice of the DoD AI pie.

Some of these companies‘ claims about AI’s immediate potential are debatable, and raise the risk of over-promising. Extreme advocacy often presents GenAI as a near-perfect solution, overlooking practical limitations. For example, some commercial entities have marketed their LLMs as capable of immediately replacing human analysts in sensitive intelligence fusion tasks, claiming superior fidelity and speed. These claims often fail to adequately account for the hallucination problem inherent in current models, the necessity of enormous, clean, and secure training datasets (which the DoD often lacks), and the fundamental challenges of explainability and verification required for battlefield and decision-making systems. For instance, recent research suggests that while sensor and firing networks of a larger scale than Israel’s Iron Dome have been proposed, it is unlikely that sufficient data yet exists to field one that can operate autonomously with a high degree of trust in the algorithm’s decisions.

Turning to underestimation of AI’s value, a variety of researchers and critics have made claims about AI safety that—in our view—are overblown and disproportionate to the near-term utility of the technology. Some claims overstate the dangers of military GenAI adoption, referencing to tests completed under conditions that do not sufficiently approximate real-world, responsibly-deployed use.

For instance, several recent studies have warned about the escalatory potential of off-the-shelf LLMs when applied to national security decision-making. These studies suggest that off-the-shelf models, when employed as-is (without any adjustments to the configuration, training, or deployment of the model) escalate unpredictably, often resulting in nuclear engagements, when queried about national security dilemmas and scenarios.

These results present an incomplete picture. In a counter-study, we find that even simple adjustments to the prompt, such as instructing the model to “be cautious” or to “consider de-escalation,” can significantly reduce escalatory outputs. Furthermore, if the simple methods we demonstrate render such a notable improvement, much more can be achieved with sophisticated, bespoke enterprise models designed with safety and control layers. This should provide pause for alarmism about the catastrophic escalatory potential of LLMs. Extreme skepticism risks stalling vital technological adoption, ultimately bestowing an advantage on less scrupulous competitors.

Navigating the dual risks of AI Adoption

Overestimating AI’s current capabilities risks creating overconfidence, misallocating critical resources, and introducing systemic vulnerabilities based on unproven speed and effectiveness. Simultaneously, underestimating AI’s ultimate transformative potential risks conceding a decisive technological advantage to adversaries who are willing to move faster and integrate AI more deeply into their military and national security architecture.

Navigating the narrow path between these risks will require at least three lines of effort. First, the DoD must establish a rigorous, independent vetting process by investing in an in-house technical evaluation capability, to move beyond high-level demonstrations to rigorously assess vendors‘ claims. Second, adoption must follow a continuous, incremental assessment strategy, where deployments are deliberately phased in to allow for real-time, continuous evaluation of the technology’s performance and impact. Third, DoD must foster a culture of realistic expectation and technical literacy, investing heavily in training for senior officials and program managers to cultivate critical, informed judgment that grounds strategic planning in technical reality, rather than sales hype.

The future of national security will be defined by how the U.S. successfully manages the tension between AI’s undeniable potential and its current limitations and risks. Only by resisting the urges to either panic, or to adopt technology based on unproven claims, will DoD ensure that AI provides the U.S military with a strategic advantage rather than a systemic vulnerability.

By experts and staff

Experts

By