From LLMs to SLMs: Are Small Language Models the Future of Agentic AI?

Jocelyn Wang

February 6, 2026

Abstract

In the age of AI, Large Language Models (LLMs) are widely used and power everything from chatbots to agentic AI systems. However, as AI data centers consume increasing amounts of resources and water, concerns about sustainability have grown. Developers are exploring more efficient alternatives, such as Small Language Models (SLMs), which due to their smaller size, are more efficient, reduce energy use and infrastructure costs, and are better suited for repetitive, specialized tasks. This paper will highlight the cost, design, and sustainability benefits of SLMs, their potential downsides, and how hybrid approaches to integrating SLMs and LLMs can strategically optimize AI deployment. 

Introduction to LLMs — Costs and Challenges 

According to OpenAI CEO Sam Altman, ChatGPT receives approximately 1 billion queries daily, each using around 0.34 Wh of electricity as of June 2025. Training LLMs requires approximately 50 GWh of electricity, nearly equal to the yearly electricity usage of some developing countries. The International Energy Agency (IEA) estimates that since 2017, data center electricity use has risen about 12% annually—over four times the growth rate of overall electricity consumption (UNESCO, 2025). In addition to high energy consumption, training and running LLMs also puts a heavy strain on water resources. For instance, the training process for GPT-3 consumed around 1,287 MWh of electricity, enough to power 120 US homes for a year—while Microsoft’s water usage rose 34% in 2022. (King, 2025). By 2027, global water use for AI operations by major companies could triple, potentially reaching 4.2 - 4.6 billion cubic meters, exceeding Denmark’s entire annual water consumption (UNESCO, 2025).  

These environmental costs are also compounded by global inequities in access. Generative AI is concentrated in regions with advanced digital infrastructure, leaving billions offline, particularly in the Global South and the African continent, where just 5% of AI talent has sufficient computing resources. (UNESCO, 2025). In addition to sustainability and accessibility concerns, privacy risks further emphasize the limitations of LLMs. The European Data Protection Board (EDPB) highlights that privacy risks occur throughout the lifecycle of LLMs, which include exposing sensitive data, creating unintended inferences about individuals, and amplifying biases embedded in training data (EDPB, 2025).   

Together, these sustainability, accessibility and privacy challenges expose systemic shortcomings with Large Language Models, underscoring the need for more efficient AI alternatives. To evaluate whether SLMs represent a viable path forward, it is first necessary to define the differences between LLMs and SLMs in terms of scale and function. 

Defining LLMs and SLMs

LLMs are massive neural network models, often containing hundreds of billions of parameters; in GPT-4’s case, they exceed 175 billion parameters (AI News Hub, 2025). These models serve as enabling agents, making strategic decisions on when and how to use available tools, managing the sequence of operations required to complete tasks, and decomposing complex problems into smaller subtasks when necessary. To put it simply, Large Language Models serve as the “brain” in agentic AI systems (Belcak et al, 2025). 

A Small Language Model is a Large Language Model that prioritizes specialization and is designed to run on standard consumer devices, delivering inference with low enough latency—the delay between a user’s request and the model’s generated output—to efficiently handle an individual user’s agentic requests (Belcak et al, 2025). Similar to LLMs, SLMs utilize neural network-based architecture called a transformer model, essential in natural language processing (NLP) and act as the foundation of models like the generative pre-trained transformer, better known as the GPT. To make SLMs smaller and faster, techniques like pruning (removing extra parameters,) quantization (using lower-precision data), low-rank factorization(simplifying weight matrices), and knowledge distillation (transferring knowledge from a larger model to a smaller one) are used. These methods create efficient SLMs that still perform on par with LLMs (IBM, 2024). 

Beyond the technical advantages mentioned previously, SLMs offer practical benefits. They can be trained and run using only 30 - 40% of the computational power needed for LLMs, making them much easier to adopt on edge devices and by smaller businesses (Kumar, Davenport, & Bean, 2025). With only a few billion parameters compared to the hundreds of billions or even trillions in LLMs, SLMs benefit from lower computational demands, faster training, simpler deployment, and greater efficiency in domain-specific applications (Kumar, Davenport, & Bean, 2025).

These advantages have prompted many analysts to predict a gradual industry shift toward SLMs. For example, Gartner, a consulting firm, projects that by 2027, small, task specific AI models will be adopted at three times the rate of general-purpose LLMs (Forbes Tech Council, 2025). 

Advantages of SLMs

While the cost of LLMs raise questions about long-term sustainability, SLMs offer a compelling alternative. Unlike LLMs which require thousands of GPUs, Small Language Models require less or no parallelization across GPUs and nodes, making them more affordable and practical to train. This efficiency enables rapid iteration and adaption, allowing systems to respond more quickly to user needs, emerging behaviors, and changing local regulations (Belcak et al, 2025). 

In terms of speed and efficiency, SLMs can be deployed on local servers, such as smartphones and home appliances and thus offer faster response times. Since SLMs are trained on a more focused set of data, they tend to be more dependable for domain-specific tasks and less likely to produce irrelevant responses (Kumar, Davenport, & Bean, 2025). 

At the same time, choosing between LLMs and SLMs involves a clear tradeoff. LLMs excel at complex reasoning and nuanced contextual understanding across diverse domains, making them well-suited for tasks such as strategic planning and sophisticated customer service. Yet, LLMs often fall short in highly specialized and industry-specific tasks (Forbes Tech Council, 2025). These differences are already guiding industry practices. Some companies are beginning to develop SLMs targeted to specific industry and business problems—such as Infosys—which has deployed customisable SLMs tailored to banking and IT operations (Kumar, Davenport, & Bean,2025). Because of their targeted training data, SLMs can be developed, fine-tuned and enhanced more quickly and cost-effectively than LLMs.A concrete example comes from Bayer, a multinational pharmaceutical and biotech company, which developed an SLM trained on Bayer’s crop protection knowledge to help its front-line personnel answer farmers’ questions. This model was found to be 40% more accurate than a large model in initial testing (Kumar, Davenport, & Bean, 2025). 

Furthermore, SLMs play a key role in promoting equitable access to AI technology. Because they require fewer resources, SLMs are more accessible to a wider range of users, particularly in the Global South. These countries can leverage SLMs to cultivate home-grown AI tools and create jobs, while also ensuring they do not fall behind in technological development. At the same time, SLMs can encourage international cooperation, helping to shape a more just global AI landscape that does not leave anyone behind (UNESCO, 2024). NVIDIA refers to this increase in accessibility as the democratization of AI: by enabling more individuals and organizations to develop language models for agentic systems, the population is more likely to reflect diverse perspectives and societal needs. This diversity helps mitigate systemic bias, fosters competition, and drives faster innovation, leading the field to advance more rapidly (Belcak et al, 2025). 

Thus, because SLMs can be deployed locally on consumer devices, they limit the transfer and storage of personal data in large, centralized infrastructures. As noted in the EDPB, such an approach strengthens data minimization and reduces the risk of data misuse (EDPB, 2025). 

Potential Barriers 

While many remain optimistic about the potential of SLMs in changing the AI landscape, concerns persist regarding their cost and ability to compete with larger models. Significant upfront investment in centralized LLM infrastructure, combined with limited public awareness regarding AI technology and SLMs proves there are still potential barriers to fully implementing SLMs. 

A common argument is that LLM generalists outperform SLMs on all language tasks due to scaling laws and possible “semantic hubs”, which allow for cross-domain knowledge transfer (Belcak et al, 2025). However, SLMs can be fine-tuned cheaply for specific tasks, and semantic hubs add limited value for the repetitive subtasks that dominate agentic applications. (Belcak et al, 2025)

Emerging techniques suggest these challenges may be temporary. Advances in inference scheduling optimize how AI queries are processed to reduce delays and resource strain. Modularization structures models into smaller components, allowing for easier updates and customization. Together, with reduced infrastructure costs, these developments help mitigate many of these concerns. (Belcak et al, 2025)

Practical Recommendations

SLMs can power customer service chatbots efficiently due to their low latency and conversational AI capabilities. Models such as Llama 3.2 (1B and 3B)and Gemini Nano can summarize content. With their multimodal capabilities, SLMs can operate on a vehicle’s onboard computers and combine voice commands with image classification (IBM, 2024). Agentic systems naturally allow for mixing different models, with LLMs handling core reasoning and SLMs performing specialized subtasks. Interactions within these systems generate structured, high quality data which can be logged and used to fine-tune expert SLMs—enabling continual improvement of agent systems and supporting replacing generalist LLMs with more specialized, cost-effective SLMs over time (Belcak et al, 2025) 

Model orchestration in agentic AI ensures efficient collaboration between SLMs and LLMs—determining which model is needed for a given task, directs inputs to the appropriate components and integrates their outputs into a single, cohesive response. (EDPB, 2025)

Ultimately, the rise of SLMs highlights a turning point in AI adoption. Their reduced environmental footprint, faster deployment, size and accessibility make them attractive for a wide range of users. While LLMs continue to dominate, SLMs are increasingly becoming popular and offer a practical tradeoff. These tradeoffs suggest that the future of AI will not be defined by a single model, but a hybrid system where LLMs and SLMs complement one another to strategically address user needs, shaping both industry practices and the everyday use and impact of AI.

References

AI News Hub. (2025). Small language models (SLMs): The future of accessible AI in 2025. AI News Hub. https://www.ainewshub.org/post/small-language-models-slms-the-future-of-accessible-ai-in-2025

Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., Lin, Y. C., & Molchanov, P. (2025). Small language models are the future of agentic AI. NVIDIA Research. https://research.nvidia.com/labs/lpr/slm-agents/

European Data Protection Board. (2025, April). AI privacy risks and mitigations in LLMs. https://www.edpb.europa.eu/system/files/2025-04/ai-privacy-risks-and-mitigations-in-llms.pdf

Forbes Tech Council. (2025, July 14). SLM or LLM agents? The trade-offs, the risks and the rewards. Forbes. https://www.forbes.com/councils/forbestechcouncil/2025/07/14/slm-or-llm-agents-the-trade-offs-the-risks-and-the-rewards/

IBM. (n.d.). What are large language models? IBM. https://www.ibm.com/think/topics/large-language-models

King, C. (2025, August 8). LLMs vs SLMs: How AI Models Impact Sustainability. Sustainability Magazine. https://sustainabilitymag.com/news/llms-vs-slms-how-ai-models-impact-sustainability

Kumar, A., Davenport, T. H., & Bean, R. (2025, September 8). The case for using small language models. Harvard Business Review. https://hbr.org/2025/09/the-case-for-using-small-language-models

UNESCO. (2024, March 18). Small language models (SLMs): A cheaper, greener route into AI. UNESCO. https://www.unesco.org/en/articles/small-language-models-slms-cheaper-greener-route-ai

UNESCO. (2025). AI large language models: New report shows small changes can reduce energy use 90%. UNESCO.https://www.unesco.org/en/articles/ai-large-language-models-new-report-shows-small-changes-can-reduce-energy-use-90