A General Guide to Applying Machine Learning to Computer Architecture

Daniel Nemirovsky Barcelona Supercomputing Center (BSC)
Tugberk Arkose BSC
Nikola Markovic Microsoft
Mario Nemirovsky ICREA, BSC
Osman Unsal UPC, BSC
Adrian Cristal UPC, BSC
Mateo Valero UPC, BSC

DOI: https://doi.org/10.14529/jsfi180106

Abstract

The resurgence of machine learning since the late 1990s has been enabled by significant advancesin computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms toimprove the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and $k$-nearest neighbors.

Author Biographies

Daniel Nemirovsky, Barcelona Supercomputing Center (BSC)

Dr. Daniel Alexander Nemirovsky has a background in Computer Engineering and Political Philosophy, studying at University of Michigan, University of Manchester, the Polytechnic University of Catalonia and Pompeu Fabra University. He graduated with a Ph.D in October 2017 receiving cum laude honors. Highlights from his 7+ years of research include pioneering work on heterogeneous architectures and applying machine learning to increase system efficiency. He currently resides in San Francisco, CA and his research interests include IoT and applying machine learning in the field of computer architecture.

Tugberk Arkose, BSC

Tugberk is a P.h.D student with a passion for distributed computing, machine learning, and systems engineering.

Nikola Markovic, Microsoft

Experienced Software Design Engineer with a demonstrated history of working in the computer software industry. Skilled in Computer Science, Computer Architecture, C++, Algorithms, Big Data. Strong engineering professional with a Doctor of Philosophy (Ph.D.) focused in Computer Architecture from Universitat Politècnica de Catalunya.

Mario Nemirovsky, ICREA, BSC

Mario Nemirovsky is an ICREA Research Professor at the Barcelona Supercomputing Center, where he has been since 2007. He received his PhD in ECE from University of California, Santa Barbara, in 1990. Presently he is conducting pioneering work in the area of IoT (Fog as plataform for IoT and HEB - Hierarchical Emergent Behaviors), Disagregated Computing, HPC, Memory systems, Cloud Computing. He holds 65 USA patents. During his tenure with the University of California, Santa Barbara, Mario co-authored seminal works on simultaneous multithreading. Mario has made key contributions to other areas of computer architecture, including high performance, real-time, and network processors. He founded Miraveo Inc., Vilynx Inc., ConSentry Networks Inc., Flowstorm Inc. and XStream Logic Inc. Before that, he was a chief architect at National Semiconductor, PI Researcher at Apple Computers, and Chief Architect at Weitek Inc. and Delco Electronics, General Motors (GM).

Osman Unsal, UPC, BSC

Osman Sabri Ünsal is co-leader of the Architectural Support for Programming Models group at the Barcelona Supercomputing Center. Dr. Ünsal was also a researcher at the BSC-Microsoft Research Centre from 2006 to 2014.He holds BS, MS, and PhD degrees in electrical and computer engineering from Istanbul Technical University, Brown University, and University of Massachusetts, Amherst, respectively.

Adrian Cristal, UPC, BSC

Adrián Cristal received the “licenciatura” in Compuer Science from Universidad de Buenos Aires (FCEN) in 1995 and the PhD. degree in Computer Science in 2006, from the Universitat Politécnica de Catalunya (UPC), Spain. From 1992 to 1995 he has been lecturing in Neural Network and Compiler Design. In UPC, from 2003 to 2006 he has been lecturing on computer organization.Currently, and since 2006, he is researcher in Computer Architecture group at BSC. He is currently co-manager of the “Computer Arquitecture for parallel paradigms”. His research interests cover the areas of microarchitecture, multicore architectures, and programming models for multicore architectures. He has published around 60 publications in these topics and participated in several research projects with other universities and industries, in framework of the European Union programmes or in direct collaboration with technology leading companies.

Mateo Valero, UPC, BSC

Mateo Valero obtained his Telecommunication Engineering Degree from the Technical University of Madrid (UPM) in 1974 and his Ph.D. in Telecommunications from the Technical University of Catalonia (UPC) in 1980. Since 1974 he is a professor in the Computer Architecture Department at UPC, in Barcelona and since 1983 he is full professor. His research interests focuses on high performance architectures. He has published approximately 500 papers, has served in the organization of more than 300 International Conferences and he has given more than 300 invited talks. He is the director of the Barcelona Supercomputing Centre, the National Centre of Supercomputing in Spain.In December 1994, Professor Valero became a founding member of the Royal Spanish Academy of Engineering. In 2005 he was elected Correspondant Academic of the Spanish Royal Academy of Science and in 2006, member of the Royal Spanish Academy of Doctors and member of the “Academia Europaea”, the “Academy of Europe”. He is a Fellow of the IEEE, Fellow of the ACM and an Intel Distinguished Research Fellow.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P.A., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zhang, X.: Tensorflow: A system for large-scale machine learning. CoRR abs/1605.08695 (2016), http://arxiv.org/abs/1605.08695, accessed: 2018-03-01

Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (Aug 1996), DOI: 10.1007/bf00058655

Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC. ACM Press (2011), DOI: 10.1145/2063384.2063454

Fedorova, A., Vengerov, D., Doucette, D.: Operating system scheduling on heterogeneous core systems. In: Proceedings of the Workshop on Operating System Support for Heterogeneous Multicore Architectures (2007), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.369.7891

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 13–15 May 2010, Chia Laguna Resort, Sardinia, Italy. Proceedings of Machine Learning Research, vol. 9, pp. 249–256. PMLR (2010), http://proceedings.mlr.press/v9/glorot10a.html, accessed: 2018-03-01

Helmy, T., Al-Azani, S., Bin-Obaidellah, O.: A machine learning-based approach to estimate the CPU-burst time for processes in the computational grids. In: 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS). IEEE (Dec 2015), DOI: 10.1109/aims.2015.11

Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34(4), 1–17 (Sep 2006), DOI: 10.1145/1186736.1186737

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia. pp. 675–678. MM ’14, ACM, New York, NY, USA (2014), DOI: 10.1145/2647868.2654889

Jimenez, D.A., Lin, C.: Dynamic branch prediction with perceptrons. In: Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture. pp. 197–206. IEEE Comput. Soc (2001), DOI: 10.1109/HPCA.2001.903263

Jimenez, D.A., Teran, E.: Multiperspective reuse prediction. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50. pp. 436–448. ACM Press (2017), DOI: 10.1145/3123939.3123942

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (May 2015), DOI: 10.1038/nature14539

Li, C.V., Petrucci, V., Mosse, D.: Predicting thread profiles across core types via machine learning on heterogeneous multiprocessors. In: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC). IEEE (Nov 2016), DOI: 10.1109/sbesc.2016.017

Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (Jan 2003), DOI: 10.1109/mic.2003.1167344

Louppe, G., Geurts, P.: Ensembles on random patches. In: Machine Learning and Knowledge Discovery in Databases, pp. 346–361. Springer, Berlin, Heidelberg (2012), DOI: 10.1007/978-3-642-33460-3_28

Misra, J., Saha, I.: Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74(1-3), 239–255 (Dec 2010), DOI: 10.1016/j.neucom.2010.03.021

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013), http://arxiv.org/abs/1312.5602, accessed: 2018-03-01

Negi, A., Kumar, P.: Applying machine learning techniques to improve linux process scheduling. In: TENCON 2005 - 2005 IEEE Region 10 Conference. pp. 1–6. IEEE (Nov 2005), DOI: 10.1109/tencon.2005.300837

Nemirovsky, D., Arkose, T., Markovic, N., Nemirovsky, M., Unsal, O., Cristal, A.: A machine learning approach for performance prediction and scheduling on heterogeneous CPUs.In: 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). pp. 121–128. IEEE (Oct 2017), DOI: 10.1109/sbac-pad.2017.23

Nemirovsky, D., Arkose, T., Markovic, N., Nemirovsky, M., Unsal, O., Cristal, A., Valero, M.: A deep learning mapper (DLM) for scheduling on heterogeneous systems. In: Communications in Computer and Information Science, pp. 3–20. Springer International Publishing (Dec 2017), DOI: 10.1007/978-3-319-73353-1_1

Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE (Aug 2015), DOI: 10.1109/hotchips.2015.7477459

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (Nov 2011), http://dl.acm.org/citation.cfm?id=1953048.2078195, accessed: 2018-03-01

Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (Jan 1964), DOI: 10.1016/0041-5553(64)90137-5

Rai, J.K., Negi, A., Wankar, R., Nayak, K.D.: A machine learning based meta-scheduler for multi-core processors. International Journal of Adaptive, Resilient and Autonomic Systems 1(4), 46–59 (Oct 2010), DOI: 10.4018/jaras.2010100104

Shulga, D.A., Kapustin, A.A., Kozlov, A.A., Kozyrev, A.A., Rovnyagin, M.M.: The scheduling based on machine learning for heterogeneous CPU/GPU systems. In: 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EICon-RusNW). IEEE (Feb 2016), DOI: 10.1109/eiconrusnw.2016.7448189

Teran, E., Wang, Z., Jimenez, D.A.: Perceptron learning for reuse prediction. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE (Oct 2016), DOI: 10.1109/micro.2016.7783705

Woo, S., Ohara, M., Torrie, E., Singh, J., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings 22nd Annual International Symposium on Computer Architecture. pp. 24–36. ACM (Jun 1995), DOI: 10.1109/isca.1995.524546

Yeh, T.Y., Patt, Y.N.: Two-level adaptive training branch prediction. In: Proceedings of the 24th annual international symposium on Microarchitecture - MICRO 24. ACM Press (1991), DOI: 10.1145/123465.123475

Published

2018-04-23

Issue

Vol 5 No 1 (2018)

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.