Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf® Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, coupled with insights from academia, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect reproducible results.
Rigorous Evaluation of Computer Processors with Statistical Model Checking
Filip Mazurek, Arya Tschand, Yu Wang, Miroslav Pajic, Daniel Sorin
2023 IEEE/ACM International Symposium on Microarchitecture (MICRO)
Experiments with computer processors must account for the inherent variability in executions. Prior work has shown that real systems exhibit variability, and random effects must be injected into simulators to account for it. Thus, we can run multiple executions of a given benchmark and generate a distribution of results. Prior work uses standard statistical techniques that are not always suitable. While the result distributions may take any form that is unknown a priori, many works naively assume they are Gaussian, which can be far from the truth. To allow rigorous evaluation for arbitrary result distributions, we introduce statistical model checking (SMC) to the world of computer architecture. SMC provides a rigorous mathematical methodology employing experimental sampling for probabilistic evaluation of system performance.
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Shvetank Prakash, Andrew Cheng, Jason Yik, Arya Tschand, Radhika Ghosal, Ikechukwu Uchendu, Jessica Quaye, Jeffrey Ma, Shreyas Grampurohit, Sofia Giannuzzi, Arnav Balyan Fin Amin, Aadya Pipersenia, Yash Choudhary, Ankita Nayak, Amir Yazdanbakhsh, Vijay Janapa Reddi
2025 IEEE Computer Architecture Letters (CAL)
We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles on QAs regarding memory systems and interconnection networks. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and the leaderboard are accessible at https://quarch.ai/.