CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories

CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still the de facto standard for developing data mining and knowledge discovery projects. However, undoubtedl...

Mô tả chi tiết

Lưu vào:
Hiển thị chi tiết
Tác giả chính: Mart´ınez-Plumed, Fernando
Đồng tác giả: Contreras-Ochando, Lidia
Định dạng: BB
Ngôn ngữ:English
Thông tin xuất bản: IEEE Xplore 2020
Chủ đề:
Truy cập trực tuyến:http://tailieuso.tlu.edu.vn/handle/DHTL/9781
Từ khóa: Thêm từ khóa bạn đọc
Không có từ khóa, Hãy là người đầu tiên gắn từ khóa cho biểu ghi này!
id oai:localhost:DHTL-9781
record_format dspace
spelling oai:localhost:DHTL-97812020-11-25T08:11:45Z CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories Mart´ınez-Plumed, Fernando Contreras-Ochando, Lidia Ferri, Cesar Hernandez-Orallo, Jose Kull, Meelis Lachiche, Nicolas Ramırez-Quintana, Marıa Jose Flach, Peter Data Science Trajectories Data Mining Knowledge Discovery Process Data-driven Methodologies CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still the de facto standard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics https://doi.org/10.1109/TKDE.2019.2962680 2020-11-25T08:10:50Z 2020-11-25T08:10:50Z 2019 BB http://tailieuso.tlu.edu.vn/handle/DHTL/9781 en IEEE Transactions on Knowledge and Data Engineering, (2019), pp 14, issue 99 application/pdf IEEE Xplore
institution Trường Đại học Thủy Lợi
collection DSpace
language English
topic Data Science Trajectories
Data Mining
Knowledge Discovery Process
Data-driven Methodologies
spellingShingle Data Science Trajectories
Data Mining
Knowledge Discovery Process
Data-driven Methodologies
Mart´ınez-Plumed, Fernando
CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
description CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still the de facto standard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics
author2 Contreras-Ochando, Lidia
author_facet Contreras-Ochando, Lidia
Mart´ınez-Plumed, Fernando
format BB
author Mart´ınez-Plumed, Fernando
author_sort Mart´ınez-Plumed, Fernando
title CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
title_short CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
title_full CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
title_fullStr CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
title_full_unstemmed CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
title_sort crisp-dm twenty years later: from data mining processes to data science trajectories
publisher IEEE Xplore
publishDate 2020
url http://tailieuso.tlu.edu.vn/handle/DHTL/9781
work_keys_str_mv AT martınezplumedfernando crispdmtwentyyearslaterfromdataminingprocessestodatasciencetrajectories
_version_ 1768588911951478784