What is Artificial Intelligence? A Primer

(This is the first part of a two-part series on understanding AI in defense.)

The problem with Artificial Intelligence (AI) is that nobody is quite sure what it is.

Yet, the infusion of all things “AI” in defense makes some understanding of the technology imperative. We do this through a high-level review of the dominant approaches in the field of AI. The major benefits and shortcomings of these approaches are related to their suitability in mission-critical domains.

Symbolic AI

The first burst of AI euphoria was in the 1980s over symbolic systems. These systems are hand-coded—directly engraved by human programmers—with knowledge that humans themselves have generated. This knowledge is represented by the system with symbols (e.g., words, rules, etc.). The system manipulate these symbols using the rules provided by humans to determine the relationships between them. When provided with an input, a symbolic system will manipulate these symbols to generate an output.

This is sometimes called Good Old-Fashioned AI, or GOFAI.

Symbolic AI captured the dominant feeling among AI researchers in the mid-to-late twentieth century: for them, human intelligence is, at its essence, logical reasoning. This was evidenced in part by the fascination with constructing a computer that could defeat a human Chess grandmaster—such a system must, some thought, be intelligent if it manages to pull this off.

This turned out to be incorrect. In 1997, IBM’s Deep Blue defeated Chess grandmaster Garry Kasparov (following a loss to Kasparov in 1996). Deep Blue was impressive, but not intelligent. So, too, for other strategy games like Go, where AlphaGo defeated South Korean professional Lee Sedol in 4 out of 5 games in 2016.

Despite some expectations among defense analysts in the 2010s (including certain, high-level People’s Liberation Army officials), AlphaGo did not in fact portend the rise of general strategizing agents.

Major Benefits in Production

Performance Guarantees. Symbolic systems provide performance guarantees. Symbolic systems use their human-defined knowledge to solve problems reliably and without indeterminacy. While they cannot function robustly beyond this knowledge, when applied sufficiently narrowly, the operator need not worry about hallucinated content.

Explainable / Human Interpretable. Second, symbolic systems are explainable. Because they are built with a foundation of human knowledge, and because they manipulate symbols (which are inherently open to evaluation), the output of the system can be traced back to its input. Humans can interpret these systems.

Compute-Efficient. Symbolic systems tend to be compute-efficient relative to today’s AI systems. They do not require internet-scale datasets on which to train, nor do they require inordinate amounts of energy to run.

Major Shortcomings in Production

Rigid and Inflexible. Symbolic systems are tightly bound to their base of human-defined knowledge. While that restrictiveness aids their ability to provide performance guarantees, it likewise means that their application in real-world situations must be deliberately and narrowly tailored.

Lack of Scalability. A corollary to their restricted natures, symbolic systems are not scalable; expanding their base of human-defined knowledge does not result in exponential capability increases.

Symbolic Systems as the Original Artificial General Intelligence

It is not uncommon today to find in the United States calls for a new “Manhattan Project-like program” for AI. The spirit is commendable. Yet, the metaphor is ahistorical.

It is ahistorical because it the U.S. already had its Manhattan Project for AI – and it failed.

The U.S. Defense Advanced Research Projects Agency (DARPA), amid the burgeoning enthusiasm for converging technological advancements in the early 1980s, established the Strategic Computing Initiative (SCI) in 1983. The SCI took as its ultimate aim the creation of “generic software systems that will be substantially independent of particular applications.” If it were established today, the founding document would likely replace the word “generic” with “general intelligence.” This was a vision of Artificial General Intelligence (AGI) in defense.

The SCI failed to achieve its ultimate aims, as symbolic AI could not meet expectations. Criticisms of the Initiative from the time are worth noting, as they are a kind of historical foretelling of today’s AI critiques. In 1984 some analysts in the Bulletin of Atomic Scientists wrote: the use of AI “creates a false sense of security” because such systems “act inappropriately in unanticipated situations” due to a “fundamental limit on their reliability.”

Critiques of contemporary AI models indeed share the same spirit, targeting deficiencies in reliability, the ability to adapt to new contexts, and the dangers of offloading tasks without proper human supervision.

DARPA’s SCI is therefore a cautionary tale, one that is useful amid enthusiasm for symbolic AI’s successor: machine learning.

Machine (and Deep) Learning

To be sure, machine learning is as old as symbolic AI, though it was less appreciated in the last century. The core idea behind machine learning is to build artificial neural networks that (crudely) emulate the interactivity between neurons in a biological brain. Rather than being provided with human-defined knowledge, these networks learn based on their training data (e.g., they learn how to detect and classify anti-aircraft missile tracking or control radar signals).

The most dominant subfield in machine learning today is deep learning. The difference is relatively straightforward: older neural networks were shallow, consisting of a single layer of artificial neurons between the input and the output. Today’s neural networks, however, can consist of hundreds of such layers, increasing the sophistication of the model’s eventual outputs.

Deep learning is responsible for the current AI boom. It is exemplified by a particular type of artificial neural network known as the Generative Pre-trained Transformer, or GPT. Chatbots like OpenAI’s ChatGPT are based on GPTs.

GPTs themselves are a type of Large Language Model, or LLM. However, these GPTs were not developed as chatbots and were primarily experimental research conducted by Google and then OpenAI. Today’s GPT-based chatbots are complicated affairs, with the GPT being the core of a multi-part system.

The learning done by a neural network is statistical. All this means is that the neural network will track the correlations between data points in a dataset. (How does one data point relate to another?) This occurs during training. Contemporary neural networks can track many billions or more of such correlations, provided that the dataset is large enough and the network is run on sufficiently powerful hardware. Once training is complete, the network will possess a kind of internal model of that dataset; a representation of the statistical relationships between those data points. When used in production, the model will use this representation in response to the input.

Consider what this entails for an LLM (the base GPT, not a chatbot). One could ask an LLM : Who are some of the largest defense contractors in the United States in 2025? From the LLM’s perspective, this is not so much a question, but a series of values that have a statistically likely follow up.

So, the LLM may output: Some of the largest defense contractors in the United States in 2025 are RTX, Boeing, and Lockheed Martin. Crucially, however, it only outputs this because, according to what it learned during training (which is all it has), those words are the most statistically likely values that would follow the user’s input. This is because the LLM was trained on internet-crawled data, where humans can be found saying similar things; the closer in proximity the user’s question is to what the model was trained on, the better the performance is likely to be.

LLM-based interfaces like chatbots – and other floated defense applications such as smart glasses or predictive vehicle maintenance – are not quite the same beast. These applications fundamentally derive their capabilities from the base GPT, but the outputs of the base GPT have either been post-trained in ways that tune their outputs for the application in question, or have their outputs fed into a separate model before returning an answer to the user (or both). The conversational style of popular chatbots, for instance, only have this style due to substantial post-training.

Major Benefits in Production

Flexibility of Application. Deep learning offers a flexibility of application that is unprecedented in AI’s history. These neural networks can, in principle, be applied to any well-defined problem for which data (in sufficient quantities) exists.

Scalability. Concomitant with this, neural networks are scalable. Their internal capacity and the amount of training data can be scaled up to datasets that are ultimately larger than all the data on the internet itself. The “reasoning” LLMs are pre-trained on internet-scale data but are post-trained in part with data annotated by human subject-matter experts privately contracted by AI firms.

Natural Language Interaction (LLM-specific). The GPT architecture affords an unprecedented capability to interact with GPT-based applications in flexible natural (human) language, rather than through rigid programming languages known only to coders. This capability can be conceived in part as a complement to the scalability of neural networks and in part because of GPTs’ architectural constraints.

Shortcomings in Production

Lack of Performance Guarantees. Though versatile, neural networks do not offer performance guarantees. Unlike symbolic systems that engage in a kind of narrow logical reasoning, neural networks’ outputs reflect a probability distribution.

There are two sides to this. One is the level of inaccuracy. A neural network-based application that has even a 95% rate of accuracy is insufficient for those domains where the output must meet the five nines of reliability (99.999% accurate). Such systems can be applied to many different things, but they cannot be applied with an expectation of equal or sufficient performance within a given context or across contexts.

The other side is indeterminacy. A model’s output is indeterministic, meaning the same input can lead to different outputs. To be sure, indeterminacy in systems like LLMs can be quelled. However, this thus far comes at the expense of the accuracy of the system’s outputs. It is unclear if this will be overcome.

In any event, both sides of this shortcoming severely limit the use of neural network-based applications in mission-critical or otherwise sensitive environments.

Lack of Explainability / Human-Interpretability. Neural networks are not explainable. Sometimes dubbed “black boxes,” neural networks generate outputs in ways that – while explainable in principle – are practically indecipherable. This owes in part to the complexity of modern deep neural networks, where those layers of neurons between input and output encode billions of connections between neurons that contribute to the model’s ability to represent its training dataset.

Compute- and Data-Intensiveness. Neural networks are often more compute-intensive and less data-efficient as they become more powerful. This is familiar with LLMs today. These systems’ pre-training requires specialized (and expensive) hardware using a dataset the size of the internet, with post-training requirements including subject-matter expert annotations on additional data, and substantial energy costs while using these models in production.

A Cautionary Note on Machine Learning

The reception to artificial neural networks today bears some similarities to the reception of symbolic systems in the 1980s. There is a risk that efforts undertaken in the lineage of the Manhattan Project will go the way of the SCI: impressive but doomed to fail.

It is therefore worthwhile to close with a simple observation. In 2016, if one wanted to get a sense of where the capabilities of AlphaGo – originally built via two neural networks – were heading, the worst thing they could have done was to pay attention only to its 4 wins against Lee Sedol.

In fact, it was the system’s single loss that was ultimately more instructive, as this loss represented the Achilles Heel of neural networks: it simply could not deal with the unfamiliarity of a move played by Lee. Its performance collapsed. General strategizing agents never spawned from this system.

(In Part 2, we will cover some possible directions for AI and what they portend for AI in defense.)

Vincent Carchidi
Website |  + posts

Vincent Carchidi has a background in defense and policy analysis, specializing in critical and emerging technologies. He is currently a Defense Industry Analyst with Forecast International. He also maintains a background in cognitive science, with an interest in artificial intelligence.

About Vincent Carchidi

Vincent Carchidi has a background in defense and policy analysis, specializing in critical and emerging technologies. He is currently a Defense Industry Analyst with Forecast International. He also maintains a background in cognitive science, with an interest in artificial intelligence.

View all posts by Vincent Carchidi →