MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs

Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a di...

Mô tả chi tiết

Lưu vào:
Hiển thị chi tiết
Tác giả chính: Kim, Jeongchul
Đồng tác giả: Son, Myungjun
Định dạng: BB
Ngôn ngữ:English
Thông tin xuất bản: IEEE Xplore 2020
Chủ đề:
Truy cập trực tuyến:http://tailieuso.tlu.edu.vn/handle/DHTL/9968
Từ khóa: Thêm từ khóa bạn đọc
Không có từ khóa, Hãy là người đầu tiên gắn từ khóa cho biểu ghi này!
id oai:localhost:DHTL-9968
record_format dspace
spelling oai:localhost:DHTL-99682020-12-23T07:44:47Z MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs Kim, Jeongchul Son, Myungjun Lee, Kyungyong Big data analytics Distributed matrix multiplication Performance modeling Cloud computing Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a distributed matrix multiplication task that is a core computation kernel in many machine learning algorithms are essential. This study aims to propose a Matrix Multiplication Performance Estimator on Cloud (MPEC) algorithm. The proposed algorithm predicts the latency incurred when executing distributed matrix multiplication tasks of various input sizes and shapes with diverse instance types and a different number of worker nodes on cloud computing environments. To achieve this goal, we first analyze the characteristics of distributed matrix multiplication tasks. With characteristics generated from qualitative analysis, we propose to apply an ensemble of non-linear regression algorithm to predict the execution time of arbitrary matrix multiplication tasks. Thorough experimental results reveal that the proposed algorithm demonstrates higher accuracy than a state-of-the-art machine learning task performance estimation engine, Ernest, by decreasing the Mean Absolute Percentage Error (MAPE) in half. https://doi.org/10.1109/TCC.2019.2950400 2020-12-23T07:43:47Z 2020-12-23T07:43:47Z 2019 BB http://tailieuso.tlu.edu.vn/handle/DHTL/9968 en IIEEE Transactions on Cloud Computing, (2019), pp 18 application/pdf IEEE Xplore
institution Trường Đại học Thủy Lợi
collection DSpace
language English
topic Big data analytics
Distributed matrix multiplication
Performance modeling
Cloud computing
spellingShingle Big data analytics
Distributed matrix multiplication
Performance modeling
Cloud computing
Kim, Jeongchul
MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
description Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a distributed matrix multiplication task that is a core computation kernel in many machine learning algorithms are essential. This study aims to propose a Matrix Multiplication Performance Estimator on Cloud (MPEC) algorithm. The proposed algorithm predicts the latency incurred when executing distributed matrix multiplication tasks of various input sizes and shapes with diverse instance types and a different number of worker nodes on cloud computing environments. To achieve this goal, we first analyze the characteristics of distributed matrix multiplication tasks. With characteristics generated from qualitative analysis, we propose to apply an ensemble of non-linear regression algorithm to predict the execution time of arbitrary matrix multiplication tasks. Thorough experimental results reveal that the proposed algorithm demonstrates higher accuracy than a state-of-the-art machine learning task performance estimation engine, Ernest, by decreasing the Mean Absolute Percentage Error (MAPE) in half.
author2 Son, Myungjun
author_facet Son, Myungjun
Kim, Jeongchul
format BB
author Kim, Jeongchul
author_sort Kim, Jeongchul
title MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
title_short MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
title_full MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
title_fullStr MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
title_full_unstemmed MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs
title_sort mpec: distributed matrix multiplication performance modeling on a scale-out cloud environment for data mining jobs
publisher IEEE Xplore
publishDate 2020
url http://tailieuso.tlu.edu.vn/handle/DHTL/9968
work_keys_str_mv AT kimjeongchul mpecdistributedmatrixmultiplicationperformancemodelingonascaleoutcloudenvironmentfordataminingjobs
_version_ 1768589931608801280