MPEC: Distributed Matrix Multiplication Performance Modeling on a Scale-out Cloud Environment for Data Mining Jobs

Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a di...

Mô tả chi tiết

Lưu vào:
Hiển thị chi tiết
Tác giả chính: Kim, Jeongchul
Đồng tác giả: Son, Myungjun
Định dạng: BB
Ngôn ngữ:English
Thông tin xuất bản: IEEE Xplore 2020
Chủ đề:
Truy cập trực tuyến:http://tailieuso.tlu.edu.vn/handle/DHTL/9968
Từ khóa: Thêm từ khóa bạn đọc
Không có từ khóa, Hãy là người đầu tiên gắn từ khóa cho biểu ghi này!
Mô tả
Tóm tắt:Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a distributed matrix multiplication task that is a core computation kernel in many machine learning algorithms are essential. This study aims to propose a Matrix Multiplication Performance Estimator on Cloud (MPEC) algorithm. The proposed algorithm predicts the latency incurred when executing distributed matrix multiplication tasks of various input sizes and shapes with diverse instance types and a different number of worker nodes on cloud computing environments. To achieve this goal, we first analyze the characteristics of distributed matrix multiplication tasks. With characteristics generated from qualitative analysis, we propose to apply an ensemble of non-linear regression algorithm to predict the execution time of arbitrary matrix multiplication tasks. Thorough experimental results reveal that the proposed algorithm demonstrates higher accuracy than a state-of-the-art machine learning task performance estimation engine, Ernest, by decreasing the Mean Absolute Percentage Error (MAPE) in half.