AI-Driven Task-Specific Hardware Optimization: Profiling Llama 3.2’s CPU Performance in Multi- Prompt Inference Workflows

Dhananjay Lakkawar; Ganesh Rathod; Prathamesh Mandavkar; Ajith Chougale; Akash Awachar; Suresh D. Mane

doi:10.65521/intjournalrecadvengtech.v14i2s.1447

Authors

Dhananjay Lakkawar Department of Computer Science & Engineering, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar, Kolhapur, India
Ganesh Rathod Assistant Professor, Department of Computer Science & Engineering, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar, Kolhapur, India
Prathamesh Mandavkar Department of Computer Science & Engineering, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar, Kolhapur, India
Ajith Chougale Assitant Professor, Department of Computer Science & Engineering, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar, Kolhapur, India
Akash Awachar Department of Computer Science & Engineering, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar, Kolhapur, India
Suresh D. Mane Principal, Dr. D. Y. Patil Pratisthan’s College of Engineering, Salokhenagar Kolhapur, India

DOI:

https://doi.org/10.65521/intjournalrecadvengtech.v14i2s.1447

Keywords:

Artificial Intelligence Llama 3.2 Large Language Models (LLMs) Hardware Compatibility, CPU Benchmarking Inference Performance Multi-Prompt Inference Concurrent Users Batching Latency Throughput Resource Consumption Cost-Performance Analysis Cloud Computing AI Infrastructure

Abstract

This research presents an in-depth hardware compatibility evaluation of Meta's Llama 3.2 model on ten varied tasks—from long-context summarization and code generation to multi-turn dialogue and structured output generation—to discern performance trends and computational bottlenecks. Outputs show that hardware requirements differ widely by task type: long-context processing (1,915 input tokens) maximizes memory bandwidth (34% utilization), whereas code generation (659 tokens) and creative storytelling (2,321 tokens) maximize Central Processing Unit (CPU) parallelism (355% utilization on 4 cores). Multi-turn dialog shows growing latency (45 → 58 seconds) and token output increase (115→846 tokens) due to the compounding overhead of context retention, highlighting inefficiency in recurrent attention mechanisms. Repetitive work (1,000 "A"s) shows excellent token throughput (373 tokens/47 seconds), in contrast with mathematical calculations (square root of 123,456,789 × π), which overwhelm CPUs out of proportion (382% utilization) with trivial outputs, indicating Large Language Models (LLM) weakness in numeric accuracy. Ordered outputs (JSON) and translation work further reveal formatting-centric CPU overhead (355–382%), with tax policy questions (1,110 input tokens) uncovering context-parsing latency (53 seconds to process 148 tokens), highlighting semantic extraction inefficiencies from high-density texts. These results develop LLM work in three fronts: (1) Hardware Optimization, promoting task-specific settings (code/math with multi-core CPUs, plenty of RAM for context-intensive processes); (2) Model Architecture, calling for enhanced context handling and arithmetic blocks; and (3) Deployment Strategy, emphasizing resource allocation in keeping with use case operationalization (e.g., memory over legal/text analysis versus clock speed over creative applications). By correlating task types with hardware profiles, this research offers a framework for LLM scalability optimization, inference cost reduction, and informing future research into energy-efficient designs. The research highlights the need for interdisciplinary collaboration between Artificial Intelligence (AI) researchers and systems engineers to close the gap between algorithmic innovation and hardware capabilities, enabling sustainable large-scale LLM adoption.

Downloads

Download data is not yet available.

AI-Driven Task-Specific Hardware Optimization: Profiling Llama 3.2’s CPU Performance in Multi- Prompt Inference Workflows

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Quick Links

For Authors

For Reviewers

Contact Us

Similar Articles

AI & Automation in OSP Construction Drawings

Recent Advances in Dynamic Path-Controllable Deep Unfolding Network to Predict the K-Barriers for Intrusion Detection Using a Wireless Sensor Network: A Systematic Review

Recent Advances in An Optimized Equivariant Split Attention Quantum Neural Network Based Recommendation System for Stock Market Prediction: A Systematic Review

Visiobot: Conversational Image Recognition Chatbot

A Smart Commercial Transport Platform-Tranzo

Recent Advances in Dynamic Path-Controllable Deep Unfolding Network to Predict the K-Barriers for Intrusion Detection Using a Wireless Sensor Network: A Systematic Review

Recent Advances in Attention-Based Sparse Graph Convolutional Neural Network -Based Forecast Model for Career Planning in Human Resource Management: A Systematic Review

Deep Learning and Optimization Approaches in Analysing Employee Management Using Enhanced Elman Spike Neural Network Techniques and Solutions in Human Resource Management: A Review

Deep Learning and Optimization Approaches in Joint Power and Delay Optimization Based Resource Allocation in MIMO-OFDM System- Deep Convolutional Red Piranha Pyramid-Dilated Neural Network: A Review

Recent Advances in IoT-Based Soil Nutrition and Plant Disease Detection System for Smart Agriculture Using Multi-Layer Stacked Residual Coordinate Boosted Sooty Tern Attention Network: A Systematic Review