Modeling performance and power matrix of disparate computer systems using machine learning techniques (Modeling Compiler Systems Selection)
Abstract
In the last couple of decades, there has been an exponential growth in the processor,cache, and memory features of computer systems. These hardware featuresplay a vital role in determining the performance and power of a softwareapplication when executed on different computer systems. Furthermore, anyminor alterations in hardware features or applications can alter and impact theperformance and power consumption. Compute-intensive (compute-bound) applicationshave a higher dependence on processor features, while data-intensive(memory-bound) applications have a higher dependence on memory features. Tomatch the customized budgets in performance and power, selecting computersystems with appropriate hardware features (processor, cache, and memory) becomesextremely essential. To adhere to user-specific budgets, selecting computersystems requires access to physical systems to gather performance and powerutilization data. To expect a user to have access to physical systems to achievethis task is prohibitive in cost; therefore, it becomes essential to develop a virtualmodel which would obviate the need for physical systems.Researchers have used system-level simulators for decades to build simulatedcomputer systems using processor, cache, and memory features to provide estimatesof performance and power. In one approach, building virtual systemsusing a full-system simulator (FSS), provides the closest possible estimate of performanceand power measurement to a physical system. In the recent past, machinelearning algorithms have been trained on the above-mentioned accurate FSSmodels to predict performance and power for varying features in similar systems,achieving fairly accurate results. However, building multiple computer systemsin a full-system simulator is complex and an extremely slow process. The problem gets compounded due to the fact that access to such accurate simulators islimited.However, there is an alternative approach of utilizing the open-source gem5simulator using its emulation mode to rapidly build simulated systems. Unfortunately,it compromises the measurement accuracy in performance and power ascompared to FSS models. When these results are used to train any machine learningalgorithm, the predictions would be slightly inaccurate compared to thosetrained using FSS models. To make this approach useful, one needs to reduce theinaccuracy of the predictions that are introduced due to the nature and design ofthe gem5 functionality and as a consequence of this, the variation introduced dueto the types of applications, whether it is compute-intensive or data-intensive.This dissertation undertakes the above-mentioned challenge of whether onecan effectively combine the speed of the open-access gem5 simulated system alongwith the accuracy of a physical system to acquire accurate machine learning predictions.If this challenge is met, a user would be able to successfully select asystem either in the cloud or in the real world to run applications within ones�power and performance budget.In our proposed methodology, we first created several gem5 models usingthe emulation mode for available systems with varying features like the type ofprocessors (Instruction Set Architecture, speed and cache configuration), type ofmemory its speed and size. We executed compute-intensive and data-intensivebenchmark applications to these models to procure performance results. In thesecond step, 80% of the models, generated using the gem5 simulator in the emulationmode, were used to train machine learning algorithms like linear, supportvector, Gaussian, tree-based and neural network. The remaining 20% modelswere used for the purpose of performance prediction. It was found that the treebasedalgorithm predicted the closest performance values compared to the simulatedsystems� results obtained using the above-mentioned gem5 model. We subsequentlyused hardware configuration and application execution statistics datagenerated by the gem5 model and fed it to the Multicore Power Area and Timing(McPAT) modeling tool which would estimate power usage.To check the accuracy of the gem5 simulator results, the above-mentionedbenchmark applications were fed to real systems with identical features. Thegiven application code was modified to invoke the Performance Application ProgrammingInterface (PAPI) function to measure the power consumption. Therewas a sizeable difference between the results of the gem5 model and the real systemin terms of performance and power.We conceptualized the idea of using scaling and transfer learning in the contextof bridging the difference between predicted values to actual values. We proposeda scaling technique that can establish an application-specific scaling factorusing a correlation coefficient between hardware features and performance/power.This scaling factor would capture the difference and apply it to a set of predictedvalues to conform to those of the physical system. The results demonstrate thatfor selected benchmark applications the scaling technique achieves a predictionaccuracy of 75%-90% for performance and 60%-95% for power. The accuracy ofthe results validates that the scaling technique effectively attempts to bring predictedperformance and power values closer to that of physical systems to enablethe selection of an appropriate computer system(s).Another method to achieve better prediction values is to develop a modelbased on the existing transfer learning technique. To use the transfer learningmethod, we train the decision tree algorithm based on two sets of data; one, froma simulated system and the second from a closely matching physical system. Usingtrained models, we attempt to predict the performance and power of the targetphysical system. The target system is different from the source physical systemused for training the machine learning algorithm. This model uses performanceand power from a source physical system during training to bring predicted valuescloser to that of the target system. The results from the transfer learning techniquefor selected benchmark applications display the mean prediction accuracyfor different target systems to be between 10% to 50%.In this work, we have demonstrated that our proposed techniques, scalingand transfer learning, are effective in estimating fairly accurate performance andpower values for the physical system using the predicted values from a machinelearning model trained on a gem5 simulated systems dataset. Therefore, thesetechniques provide a method to estimate performance and power values for physicalcomputer systems, with known hardware features, without a need for accessto these systems. With estimated performance and power values coupled withhardware features of the physical systems, we can select system(s) based on userprovidedbudget/s of performance and power.
Collections
- PhD Theses [87]