Data Mining Project Proposal

       Data Mining Project Proposal provides you a list of guidelines for writing your data mining project proposal. Data mining is a top research field that is highly working under by various country researchers. We have significant research experts who can well-prepared for your research proposal. A research proposal is a major part of your research career, so you have to spend some time. Writing a data mining project proposal is difficult and complex for current research researchers due to its numerous issues (complexity, security, privacy, cost, etc.). But we are also here for that, we prepare your project proposal with unique and novel ideas, and it should be original.  In every project proposal, we cover the following list of items:

Our Proposal Structure

  • Title of Project
  • Introduction/brief overview of your research field of data mining
  • Significance and Background
  • Study Objectives
  • Problem statement/potential pitfalls
  • Literature survey
  • Research Methodology/Proposed Work

                          -Data mining Tasks/Operations

                          -Datasets /Database

                          -Methods and Models

                          -Algorithms and Pseudocode

                          -Mathematical Formulation

  • Overall architecture
  • Simulation/Development of Software Application
  • Intended Results and also Applications
  • Timeline for Implementation
  • Scope and Conclusion

Mining Project Proposal

       Data Mining Project Proposal rendered by us that mainly involves with preparation of proposal for students and research scholars those who belong to final years. Data Mining has also a variety of research fields including Text Mining, Temporal Mining, Stream Mining, Spatial and also in  Geographical Mining, Utility Mining, Web mining, Distributed Data Mining, Ubiquitous Data Mining, Hypertext and also Hypermedia Data Mining, Multimedia Data mining, Time Series Data Mining, also in Constraint Based Data Mining, Phenomenal Data Mining etc.

We have 150+ world class engineers who are working on Data Mining Concepts, Tasks and Operations, Software tools, and their Applications. Our experts are also experts of experts who have completed their doctoral graduation at the world’s top university with gold medalists . If smart work is your weapon, success will be your slave. Reach of us for your Happy ending……

Current Trends in Data Mining

  • Software engineering with Data Mining
  • Visual Data Mining
  • Interactive and scalable methods also for Data Mining
  • Application Exploration
  • Biological Data Mining
  • New methods also for complex Data Mining
  • Standardization of query language also in Data Mining
  • Multi database and also Multi Rational Data Mining
  • Information Security and also Privacy Protection in Data Mining
  • Big Analytics integrated also with Cloud Computing

                                         -Hadoop

                                         -MapReduce

                                         -Apache Spark

                                         -Amazon EC2 and S3

Steps in Data Mining

  • Understanding of the application/relevant prior knowledge
  • Make target set also for discovery
  • Data preprocessing and also cleaning
  • Reduce invariant representations and also number of data variables
  • Select any of the following data mining tasks

                                -Regression

                                -Clustering

                                -Classification

                                -Association Rules

                                -Data visualization

                                -Feature Extraction and Selection

                                -Anomaly Detection

                                -Statistical data analysis

                                -Multidimensional analysis

  • Apply data mining algorithm
  • Patterns searching
  • Knowledge discover

Specific Models Used in Data Mining

  • Decision Tree
  • Non-negative Matrix Fabrication
  • K-means and also O-clustering
  • Naïve Bayes Algorithm
  • Support Vector Machines
  • Apriori and Hashing Techniques
  • Neural networks and also expert systems
  • Intelligent agents
  • Soft Sets also for Machine learning and Data mining
  • Genetic Algorithms
  • Artificial Neural Networks

Sample Data Mining Project Proposal Topics

  • Design Framework for Real time, country level location, and also  classification of worldwide tweets
  • A Review of Differentially Private Data Publishing and also Data Analysis
  • Design Scalable and also Flexible Algorithms also for CQA Post Voting Prediction
  • Semi-supervised clustering solutions also using Adaptive Ensembling
  • Question Routing also for Community Question Answering Services based on a Multi-objective optimization approach
  • Random tress based classification also for streaming emerging new classes in Data Mining
  • Heterogeneous Events Matching with Patterns also using Data Mining Approaches
  • Temporal graphs also based on Keyword search Mechanism
  • An Efficient Framework also for Keyword Aware Representative of Travel Route Recommendation.

We are all top inventors,

Each delivering out on a journey of knowledge discovery,, assisted each by a private chart of which there is no duplicate ……., related pages, services we offer.

Mathematical proof

Pseudo code

Conference Paper

Research Proposal

System Design

Literature Survey

Data Collection

Thesis Writing

Data Analysis

Rough Draft

Paper Collection

Code and Programs

Paper Writing

Course Work

Banner

  • RIT Libraries
  • Data Analytics Resources
  • Writing a Research Proposal
  • Electronic Books
  • Print Books
  • Data Science: Journals
  • More Journals, Websites
  • Alerts, IDS Express
  • Readings on Data
  • Sources with Data
  • Google Scholar Library Links
  • Zotero-Citation Management Tool
  • Writing a Literature Review
  • ProQuest Research Companion
  • Thesis Submission Instructions
  • Associations

Writing a Rsearch Proposal

A  research proposal  describes what you will investigate, why it’s important, and how you will conduct your research.  Your paper should include the topic, research question and hypothesis, methods, predictions, and results (if not actual, then projected).

Research Proposal Aims

Show your reader why your project is interesting, original, and important.

The format of a research proposal varies between fields, but most proposals will contain at least these elements:

  • Introduction

Literature review

  • Research design

Reference list

While the sections may vary, the overall objective is always the same. A research proposal serves as a blueprint and guide for your research plan, helping you get organized and feel confident in the path forward you choose to take.

Proposal Format

The proposal will usually have a  title page  that includes:

  • The proposed title of your project
  • Your supervisor’s name
  • Your institution and department

Introduction The first part of your proposal is the initial pitch for your project. Make sure it succinctly explains what you want to do and why.. Your introduction should:

  • Introduce your  topic
  • Give necessary background and context
  • Outline your  problem statement  and  research questions To guide your  introduction , include information about:  
  • Who could have an interest in the topic (e.g., scientists, policymakers)
  • How much is already known about the topic
  • What is missing from this current knowledge
  • What new insights will your research contribute
  • Why you believe this research is worth doing

As you get started, it’s important to demonstrate that you’re familiar with the most important research on your topic. A strong  literature review  shows your reader that your project has a solid foundation in existing knowledge or theory. It also shows that you’re not simply repeating what other people have done or said, but rather using existing research as a jumping-off point for your own.

In this section, share exactly how your project will contribute to ongoing conversations in the field by:

  • Comparing and contrasting the main theories, methods, and debates
  • Examining the strengths and weaknesses of different approaches
  • Explaining how will you build on, challenge, or  synthesize  prior scholarship

Research design and methods

Following the literature review, restate your main  objectives . This brings the focus back to your project. Next, your  research design  or  methodology  section will describe your overall approach, and the practical steps you will take to answer your research questions. Write up your projected, if not actual, results.

Contribution to knowledge

To finish your proposal on a strong note, explore the potential implications of your research for your field. Emphasize again what you aim to contribute and why it matters.

For example, your results might have implications for:

  • Improving best practices
  • Informing policymaking decisions
  • Strengthening a theory or model
  • Challenging popular or scientific beliefs
  • Creating a basis for future research

Lastly, your research proposal must include correct  citations  for every source you have used, compiled in a  reference list . To create citations quickly and easily, you can use free APA citation generators like BibGuru. Databases have a citation button you can click on to see your citation. Sometimes you have to re-format it as the citations may have mistakes. 

  • << Previous: ProQuest Research Companion
  • Next: DIR >>

Edit this Guide

Log into Dashboard

Use of RIT resources is reserved for current RIT students, faculty and staff for academic and teaching purposes only. Please contact your librarian with any questions.

Facebook icon

Help is Available

research proposal data mining

Email a Librarian

A librarian is available by e-mail at [email protected]

Meet with a Librarian

Call reference desk voicemail.

A librarian is available by phone at (585) 475-2563 or on Skype at llll

Or, call (585) 475-2563 to leave a voicemail with the reference desk during normal business hours .

Chat with a Librarian

Data analytics resources infoguide url.

https://infoguides.rit.edu/DA

Use the box below to email yourself a link to this guide

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PeerJ Comput Sci

Logo of peerjcs

Adaptations of data mining methodologies: a systematic literature review

Associated data.

The following information was supplied regarding data availability:

SLR Protocol (also shared via online repository), corpus with definitions and mappings are provided as a Supplemental File .

The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and ‘grey’ publications. We find that data mining methodologies are primarily applied ‘as-is’. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.

Introduction

The availability of Big Data has stimulated widespread adoption of data mining and data analytics in research and in business settings ( Columbus, 2017 ). Over the years, a certain number of data mining methodologies have been proposed, and these are being used extensively in practice and in research. However, little is known about what and how data mining methodologies are applied, and it has not been neither widely researched nor discussed. Further, there is no consolidated view on what constitutes quality of methodological process in data mining and data analytics, how data mining and data analytics are applied/used in organization settings context, and how application practices relate to each other. That motivates the need for comprehensive survey in the field.

There have been surveys or quasi-surveys and summaries conducted in related fields. Notably, there have been two systematic systematic literature reviews; Systematic Literature Review, hereinafter, SLR is the most suitable and widely used research method for identifying, evaluating and interpreting research of particular research question, topic or phenomenon ( Kitchenham, Budgen & Brereton, 2015 ). These reviews concerned Big Data Analytics, but not general purpose data mining methodologies. Adrian et al. (2004) executed SLR with respect to implementation of Big Data Analytics (BDA), specifically, capability components necessary for BDA value discovery and realization. The authors identified BDA implementation studies, determined their main focus areas, and discussed in detail BDA applications and capability components. Saltz & Shamshurin (2016) have published SLR paper on Big Data Team Process Methodologies. Authors have identified lack of standard in regards to how Big Data projects are executed, highlighted growing research in this area and potential benefits of such process standard. Additionally, authors synthesized and produced list of 33 most important success factors for executing Big Data activities. Finally, there are studies that surveyed data mining techniques and applications across domains, yet, they focus on data mining process artifacts and outcomes ( Madni, Anwar & Shah, 2017 ; Liao, Chu & Hsiao, 2012 ), but not on end-to-end process methodology.

There have been number of surveys conducted in domain-specific settings such as hospitality, accounting, education, manufacturing, and banking fields. Mariani et al. (2018) focused on Business Intelligence (BI) and Big Data SLR in the hospitality and tourism environment context. Amani & Fadlalla (2017) explored application of data mining methods in accounting while Romero & Ventura (2013) investigated educational data mining. Similarly, Hassani, Huang & Silva (2018) addressed data mining application case studies in banking and explored them by three dimensions—topics, applied techniques and software. All studies were performed by the means of systematic literature reviews. Lastly, Bi & Cochran (2014) have undertaken standard literature review of Big Data Analytics and its applications in manufacturing.

Apart from domain-specific studies, there have been very few general purpose surveys with comprehensive overview of existing data mining methodologies, classifying and contextualizing them. Valuable synthesis was presented by Kurgan & Musilek (2006) as comparative study of the state-of-the art of data mining methodologies. The study was not SLR, and focused on comprehensive comparison of phases, processes, activities of data mining methodologies; application aspect was summarized briefly as application statistics by industries and citations. Three more comparative, non-SLR studies were undertaken by Marban, Mariscal & Segovia (2009) , Mariscal, Marbán & Fernández (2010) , and the most recent and closest one by Martnez-Plumed et al. (2017) . They followed the same pattern with systematization of existing data mining frameworks based on comparative analysis. There, the purpose and context of consolidation was even more practical—to support derivation and proposal of the new artifact, that is, novel data mining methodology. The majority of the given general type surveys in the field are more than a decade old, and have natural limitations due to being: (1) non-SLR studies, and (2) so far restricted to comparing methodologies in terms of phases, activities, and other elements.

The key common characteristic behind all the given studies is that data mining methodologies are treated as normative and standardized (‘one-size-fits-all’) processes. A complementary perspective, not considered in the above studies, is that data mining methodologies are not normative standardized processes, but instead, they are frameworks that need to be specialized to different industry domains, organizational contexts, and business objectives. In the last few years, a number of extensions and adaptations of data mining methodologies have emerged, which suggest that existing methodologies are not sufficient to cover the needs of all application domains. In particular, extensions of data mining methodologies have been proposed in the medical domain ( Niaksu, 2015 ), educational domain ( Tavares, Vieira & Pedro, 2017 ), the industrial engineering domain ( Huber et al., 2019 ; Solarte, 2002 ), and software engineering ( Marbán et al., 2007 , 2009 ). However, little attention has been given to studying how data mining methodologies are applied and used in industry settings, so far only non-scientific practitioners’ surveys provide such evidence.

Given this research gap, the central objective of this article is to investigate how data mining methodologies are applied by researchers and practitioners, both in their generic (standardized) form and in specialized settings. This is achieved by investigating if data mining methodologies are applied ‘as-is’ or adapted, and for what purposes such adaptations are implemented.

Guided by Systematic Literature Review method, initially we identified a corpus of primary studies covering both peer-reviewed and ‘grey’ literature from 1997 to 2018. An analysis of these studies led us to a taxonomy of uses of data mining methodologies, focusing on the distinction between ‘as is’ usage versus various types of methodology adaptations. By analyzing different types of methodology adaptations, this article identifies potential gaps in standard data mining methodologies both at the technological and at the organizational levels.

The rest of the article is organized as follows. The Background section provides an overview of key concepts of data mining and associated methodologies. Next, Research Design describes the research methodology. The Findings and Discussion section presents the study results and their associated interpretation. Finally, threats to validity are addressed in Threats to Validity while the Conclusion summarizes the findings and outlines directions for future work.

The section introduces main data mining concepts, provides overview of existing data mining methodologies, and their evolution.

Data mining is defined as a set of rules, processes, algorithms that are designed to generate actionable insights, extract patterns, and identify relationships from large datasets ( Morabito, 2016 ). Data mining incorporates automated data extraction, processing, and modeling by means of a range of methods and techniques. In contrast, data analytics refers to techniques used to analyze and acquire intelligence from data (including ‘big data’) ( Gandomi & Haider, 2015 ) and is positioned as a broader field, encompassing a wider spectrum of methods that includes both statistical and data mining ( Chen, Chiang & Storey, 2012 ). A number of algorithms has been developed in statistics, machine learning, and artificial intelligence domains to support and enable data mining. While statistical approaches precedes them, they inherently come with limitations, the most known being rigid data distribution conditions. Machine learning techniques gained popularity as they impose less restrictions while deriving understandable patterns from data ( Bose & Mahapatra, 2001 ).

Data mining projects commonly follow a structured process or methodology as exemplified by Mariscal, Marbán & Fernández (2010) , Marban, Mariscal & Segovia (2009) . A data mining methodology specifies tasks, inputs, outputs, and provides guidelines and instructions on how the tasks are to be executed ( Mariscal, Marbán & Fernández, 2010 ). Thus, data mining methodology provides a set of guidelines for executing a set of tasks to achieve the objectives of a data mining project ( Mariscal, Marbán & Fernández, 2010 ).

The foundations of structured data mining methodologies were first proposed by Fayyad, Piatetsky-Shapiro & Smyth (1996a , 1996b , 1996c) , and were initially related to Knowledge Discovery in Databases (KDD). KDD presents a conceptual process model of computational theories and tools that support information extraction (knowledge) with data ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a ). In KDD, the overall approach to knowledge discovery includes data mining as a specific step. As such, KDD, with its nine main steps (exhibited in Fig. 1 ), has the advantage of considering data storage and access, algorithm scaling, interpretation and visualization of results, and human computer interaction ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a , 1996c ). Introduction of KDD also formalized clearer distinction between data mining and data analytics, as for example formulated in Tsai et al. (2015) : “…by the data analytics, we mean the whole KDD process, while by the data analysis, we mean the part of data analytics that is aimed at finding the hidden information in the data, such as data mining”.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g001.jpg

The main steps of KDD are as follows:

  • Step 1: Learning application domain: In the first step, it is needed to develop an understanding of the application domain and relevant prior knowledge followed by identifying the goal of the KDD process from the customer’s viewpoint.
  • Step 2: Dataset creation: Second step involves selecting a dataset, focusing on a subset of variables or data samples on which discovery is to be performed.
  • Step 3: Data cleaning and processing: In the third step, basic operations to remove noise or outliers are performed. Collection of necessary information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for data types, schema, and mapping of missing and unknown values are also considered.
  • Step 4: Data reduction and projection: Here, the work of finding useful features to represent the data, depending on the goal of the task, application of transformation methods to find optimal features set for the data is conducted.
  • Step 5: Choosing the function of data mining: In the fifth step, the target outcome (e.g., summarization, classification, regression, clustering) are defined.
  • Step 6: Choosing data mining algorithm: Sixth step concerns selecting method(s) to search for patterns in the data, deciding which models and parameters are appropriate and matching a particular data mining method with the overall criteria of the KDD process.
  • Step 7: Data mining: In the seventh step, the work of mining the data that is, searching for patterns of interest in a particular representational form or a set of such representations: classification rules or trees, regression, clustering is conducted.
  • Step 8: Interpretation: In this step, the redundant and irrelevant patterns are filtered out, relevant patterns are interpreted and visualized in such way as to make the result understandable to the users.
  • Step 9: Using discovered knowledge: In the last step, the results are incorporated with the performance system, documented and reported to stakeholders, and used as basis for decisions.

The KDD process became dominant in industrial and academic domains ( Kurgan & Musilek, 2006 ; Marban, Mariscal & Segovia, 2009 ). Also, as timeline-based evolution of data mining methodologies and process models shows ( Fig. 2 below), the original KDD data mining model served as basis for other methodologies and process models, which addressed various gaps and deficiencies of original KDD process. These approaches extended the initial KDD framework, yet, extension degree has varied ranging from process restructuring to complete change in focus. For example, Brachman & Anand (1996) and further Gertosio & Dussauchoy (2004) (in a form of case study) introduced practical adjustments to the process based on iterative nature of process as well as interactivity. The complete KDD process in their view was enhanced with supplementary tasks and the focus was changed to user’s point of view (human-centered approach), highlighting decisions that need to be made by the user in the course of data mining process. In contrast, Cabena et al. (1997) proposed different number of steps emphasizing and detailing data processing and discovery tasks. Similarly, in a series of works Anand & Büchner (1998) , Anand et al. (1998) , Buchner et al. (1999) presented additional data mining process steps by concentrating on adaptation of data mining process to practical settings. They focused on cross-sales (entire life-cycles of online customer), with further incorporation of internet data discovery process (web-based mining). Further, Two Crows data mining process model is consultancy originated framework that has defined the steps differently, but is still close to original KDD. Finally, SEMMA (Sample, Explore, Modify, Model and Assess) based on KDD, was developed by SAS institute in 2005 ( SAS Institute Inc., 2017 ). It is defined as a logical organization of the functional toolset of SAS Enterprise Miner for carrying out the core tasks of data mining. Compared to KDD, this is vendor-specific process model which limits its application in different environments. Also, it skips two steps of original KDD process (‘Learning Application Domain’ and ‘Using of Discovered Knowledge’) which are regarded as essential for success of data mining project ( Mariscal, Marbán & Fernández, 2010 ). In terms of adoption, new KDD-based proposals received limited attention across academia and industry ( Kurgan & Musilek, 2006 ; Marban, Mariscal & Segovia, 2009 ). Subsequently, most of these methodologies converged into the CRISP-DM methodology.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g002.jpg

Additionally, there have only been two non-KDD based approaches proposed alongside extensions to KDD. The first one is 5A’s approach presented by De Pisón Ascacbar (2003) and used by SPSS vendor. The key contribution of this approach has been related to adding ‘Automate’ step while disadvantage was associated with omitting ‘Data Understanding’ step. The second approach was 6-Sigma which is industry originated method to improve quality and customer’s satisfaction ( Pyzdek & Keller, 2003 ). It has been successfully applied to data mining projects in conjunction with DMAIC performance improvement model (Define, Measure, Analyze, Improve, Control).

In 2000, as response to common issues and needs ( Marban, Mariscal & Segovia, 2009 ), an industry-driven methodology called Cross-Industry Standard Process for Data Mining (CRISP-DM) was introduced as an alternative to KDD. It also consolidated original KDD model and its various extensions. While CRISP-DM builds upon KDD, it consists of six phases that are executed in iterations ( Marban, Mariscal & Segovia, 2009 ). The iterative executions of CRISP-DM stand as the most distinguishing feature compared to initial KDD that assumes a sequential execution of its steps. CRISP-DM, much like KDD, aims at providing practitioners with guidelines to perform data mining on large datasets. However,CRISP-DM with its six main steps with a total of 24 tasks and outputs, is more refined as compared to KDD. The main steps of CRIPS-DM, as depicted in Fig. 3 below are as follows:

  • Phase 1: Business understanding: The focus of the first step is to gain an understanding of the project objectives and requirements from a business perspective followed by converting these into data mining problem definitions. Presentation of a preliminary plan to achieve the objectives are also included in this first step.
  • Phase 2: Data understanding: This step begins with an initial data collection and proceeds with activities in order to get familiar with the data, identify data quality issues, discover first insights into the data, and potentially detect and form hypotheses.
  • Phase 3: Data preparation: The third step covers activities required to construct the final dataset from the initial raw data. Data preparation tasks are performed repeatedly.
  • Phase 4: Modeling phase: In this step, various modeling techniques are selected and applied followed by calibrating their parameters. Typically, several techniques are used for the same data mining problem.
  • Phase 5: Evaluation of the model(s): The fifth step begins with the quality perspective and then, before proceeding to final model deployment, ascertains that the model(s) achieves the business objectives. At the end of this phase, a decision should be reached on how to use data mining results.
  • Phase 6: Deployment phase: In the final step, the models are deployed to enable end-customers to use the data as basis for decisions, or support in the business process. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized, presented, distributed in a way that the end-user can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g003.jpg

The development of CRISP-DM was led by industry consortium. It is designed to be domain-agnostic ( Mariscal, Marbán & Fernández, 2010 ) and as such, is now widely used by industry and research communities ( Marban, Mariscal & Segovia, 2009) . These distinctive characteristics have made CRISP-DM to be considered as ‘de-facto’ standard of data mining methodology and as a reference framework to which other methodologies are benchmarked ( Mariscal, Marbán & Fernández, 2010 ).

Similarly to KDD, a number of refinements and extensions of the CRISP-DM methodology have been proposed with the two main directions—extensions of the process model itself and adaptations, merger with the process models and methodologies in other domains. Extensions direction of process models could be exemplified by Cios & Kurgan (2005) who have proposed integrated Data Mining & Knowledge Discovery (DMKD) process model. It contains several explicit feedback mechanisms, modification of the last step to incorporate discovered knowledge and insights application as well as relies on technologies for results deployment. In the same vein, Moyle & Jorge (2001) , Blockeel & Moyle (2002) proposed Rapid Collaborative Data Mining System (RAMSYS) framework—this is both data mining methodology and system for remote collaborative data mining projects. The RAMSYS attempted to achieve the combination of a problem solving methodology, knowledge sharing, and ease of communication. It intended to allow the collaborative work of remotely placed data miners in a disciplined manner as regards information flow while allowing the free flow of ideas for problem solving ( Moyle & Jorge, 2001 ). CRISP-DM modifications and integrations with other specific domains were proposed in Industrial Engineering (Data Mining for Industrial Engineering by Solarte (2002) ), and Software Engineering by Marbán et al. (2007 , 2009) . Both approaches enhanced CRISP-DM and contributed with additional phases, activities and tasks typical for engineering processes, addressing on-going support ( Solarte, 2002 ), as well as project management, organizational and quality assurance tasks ( Marbán et al., 2009 ).

Finally, limited number of attempts to create independent or semi-dependent data mining frameworks was undertaken after CRISP-DM creation. These efforts were driven by industry players and comprised KDD Roadmap by Debuse et al. (2001) for proprietary predictive toolkit (Lanner Group), and recent effort by IBM with Analytics Solutions Unified Method for Data Mining (ASUM-DM) in 2015 ( IBM Corporation, 2016 : https://developer.ibm.com/technologies/artificial-intelligence/articles/architectural-thinking-in-the-wild-west-of-data-science/ ). Both frameworks contributed with additional tasks, for example, resourcing in KDD Roadmap, or hybrid approach assumed in ASUM, for example, combination of agile and traditional implementation principles.

The Table 1 above summarizes reviewed data mining process models and methodologies by their origin, basis and key concepts.

NameOriginBasisKey conceptYear
Human-CenteredAcademyKDDIterative process and interactivity (user’s point of view and needed decisions)1996, 2004
Cabena et al.AcademyKDDFocus on data processing and discovery tasks1997
Anand and BuchnerAcademyKDDSupplementary steps and integration of web-mining1998, 1999
Two CrowsIndustryKDDModified definitions of steps1998
SEMMAIndustryKDDTool-specific (SAS Institute), elimination of some steps2005
5 A’sIndustryIndependentSupplementary steps2003
6 SigmasIndustryIndependentSix Sigma quality improvement paradigm in conjunction with DMAIC performance improvement model2003
CRISP-DMJoint industry and academyKDDIterative execution of steps, significant refinements to tasks and outputs2000
Cios et al.AcademyCrisp-DMIntegration of data mining and knowledge discovery, feedback mechanisms, usage of received insights supported by technologies2005
RAMSYSAcademyCrisp-DMIntegration of collaborative work aspects2001–2002
DMIEAcademyCrisp-DMIntegration and adaptation to Industrial Engineering domain2001
MarbanAcademyCrisp-DMIntegration and adaptation to Software Engineering domain2007
KDD roadmapJoint industry and academyIndependentTool-specific, resourcing task2001
ASUMIndustryCrisp-DMTool-specific, combination of traditional Crisp-DM and agile implementation approach2015

Research Design

The main research objective of this article is to study how data mining methodologies are applied by researchers and practitioners. To this end, we use systematic literature review (SLR) as scientific method for two reasons. Firstly, systematic review is based on trustworthy, rigorous, and auditable methodology. Secondly, SLR supports structured synthesis of existing evidence, identification of research gaps, and provides framework to position new research activities ( Kitchenham, Budgen & Brereton, 2015 ). For our SLR, we followed the guidelines proposed by Kitchenham, Budgen & Brereton (2015) . All SLR details have been documented in the separate, peer-reviewed SLR protocol (available at https://figshare.com/articles/Systematic-Literature-Review-Protocol/10315961 ).

Research questions

As suggested by Kitchenham, Budgen & Brereton (2015) , we have formulated research questions and motivate them as follows. In the preliminary phase of research we have discovered very limited number of studies investigating data mining methodologies application practices as such. Further, we have discovered number of surveys conducted in domain-specific settings, and very few general purpose surveys, but none of them considered application practices either. As contrasting trend, recent emergence of limited number of adaptation studies have clearly pinpointed the research gap existing in the area of application practices. Given this research gap, in-depth investigation of this phenomenon led us to ask: “How data mining methodologies are applied (‘as-is’ vs adapted) (RQ1)?” Further, as we intended to investigate in depth universe of adaptations scenarios, this naturally led us to RQ2: “How have existing data mining methodologies been adapted?” Finally, if adaptions are made, we wish to explore what the associated reasons and purposes are, which in turn led us to RQ3: “For what purposes are data mining methodologies adapted?”

Thus, for this review, there are three research questions defined:

  • Research Question 1: How data mining methodologies are applied (‘as-is’ versus adapted)? This question aims to identify data mining methodologies application and usage patterns and trends.
  • Research Question 2: How have existing data mining methodologies been adapted? This questions aims to identify and classify data mining methodologies adaptation patterns and scenarios.
  • Research Question 3: For what purposes have existing data mining methodologies been adapted? This question aims to identify, explain, classify and produce insights on what are the reasons and what benefits are achieved by adaptations of existing data mining methodologies. Specifically, what gaps do these adaptations seek to fill and what have been the benefits of these adaptations. Such systematic evidence and insights will be valuable input to potentially new, refined data mining methodology. Insights will be of interest to practitioners and researchers.

Data collection strategy

Our data collection and search strategy followed the guidelines proposed by Kitchenham, Budgen & Brereton (2015) . It defined the scope of the search, selection of literature and electronic databases, search terms and strings as well as screening procedures.

Primary search

The primary search aimed to identify an initial set of papers. To this end, the search strings were derived from the research objective and research questions. The term ‘data mining’ was the key term, but we also included ‘data analytics’ to be consistent with observed research practices. The terms ‘methodology’ and ‘framework’ were also included. Thus, the following search strings were developed and validated in accordance with the guidelines suggested by Kitchenham, Budgen & Brereton (2015) :

(‘data mining methodology’) OR (‘data mining framework’) OR (‘data analytics methodology’) OR (‘data analytics framework’)

The search strings were applied to the indexed scientific databases Scopus, Web of Science (for ‘peer-reviewed’, academic literature) and to the non-indexed Google Scholar (for non-peer-reviewed, so-called ‘grey’ literature). The decision to cover ‘grey’ literature in this research was motivated as follows. As proposed in number of information systems and software engineering domain publications ( Garousi, Felderer & Mäntylä, 2019 ; Neto et al., 2019 ), SLR as stand-alone method may not provide sufficient insight into ‘state of practice’. It was also identified ( Garousi, Felderer & Mäntylä, 2016 ) that ‘grey’ literature can give substantial benefits in certain areas of software engineering, in particular, when the topic of research is related to industrial and practical settings. Taking into consideration the research objectives, which is investigating data mining methodologies application practices, we have opted for inclusion of elements of Multivocal Literature Review (MLR) 1 in our study. Also, Kitchenham, Budgen & Brereton (2015) recommends including ‘grey’ literature to minimize publication bias as positive results and research outcomes are more likely to be published than negative ones. Following MLR practices, we also designed inclusion criteria for types of ‘grey’ literature reported below.

The selection of databases is motivated as follows. In case of peer-reviewed literature sources we concentrated to avoid potential omission bias. The latter is discussed in IS research ( Levy & Ellis, 2006 ) in case research is concentrated in limited disciplinary data sources. Thus, broad selection of data sources including multidisciplinary-oriented (Scopus, Web of Science, Wiley Online Library) and domain-oriented (ACM Digital Library, IEEE Xplorer Digital Library) scientific electronic databases was evaluated. Multidisciplinary databases have been selected due to wider domain coverage and it was validated and confirmed that they do include publications originating from domain-oriented databases, such as ACM and IEEE. From multi-disciplinary databases as such, Scopus was selected due to widest possible coverage (it is worlds largest database, covering app. 80% of all international peer-reviewed journals) while Web of Science was selected due to its longer temporal range. Thus, both databases complement each other. The selected non-indexed database source for ‘grey’ literature is Google Scholar, as it is comprehensive source of both academic and ‘grey’ literature publications and referred as such extensively ( Garousi, Felderer & Mäntylä, 2019 ; Neto et al., 2019 ).

Further, Garousi, Felderer & Mäntylä (2019) presented three-tier categorization framework for types of ‘grey literature’. In our study we restricted ourselves to the 1st tier ‘grey’ literature publications of the limited number of ‘grey’ literature producers. In particular, from the list of producers ( Neto et al., 2019 ) we have adopted and focused on government departments and agencies, non-profit economic, trade organizations (‘think-tanks’) and professional associations, academic and research institutions, businesses and corporations (consultancy companies and established private companies). The 1st tier ‘grey’ literature selected items include: (1) government, academic, and private sector consultancy reports 2 , (2) theses (not lower than Master level) and PhD Dissertations, (3) research reports, (4) working papers, (5) conference proceedings, preprints. With inclusion of the 1st tier ‘grey’ literature criteria we mitigate quality assessment challenge especially relevant and reported for it ( Garousi, Felderer & Mäntylä, 2019 ; Neto et al., 2019 ).

Scope and domains inclusion

As recommended by Kitchenham, Budgen & Brereton (2015) it is necessary to initially define research scope. To clarify the scope, we defined what is not included and is out of scope of this research. The following aspects are not included in the scope of our study:

  • Context of technology and infrastructure for data mining/data analytics tasks and projects.
  • Granular methods application in data mining process itself or their application for data mining tasks, for example, constructing business queries or applying regression or neural networks modeling techniques to solve classification problems. Studies with granular methods are included in primary texts corpus as long as method application is part of overall methodological approach.
  • Technological aspects in data mining for example, data engineering, dataflows and workflows.
  • Traditional statistical methods not associated with data mining directly including statistical control methods.

Similarly to Budgen et al. (2006) and Levy & Ellis (2006) , initial piloting revealed that search engines retrieved literature available for all major scientific domains including ones outside authors’ area of expertise (e.g., medicine). Even though such studies could be retrieved, it would be impossible for us to analyze and correctly interpret literature published outside the possessed area of expertise. The adjustments toward search strategy were undertaken by retaining domains closely associated with Information Systems, Software Engineering research. Thus, for Scopus database the final set of inclusive domains was limited to nine and included Computer Science, Engineering, Mathematics, Business, Management and Accounting, Decision Science, Economics, Econometrics and Finance, and Multidisciplinary as well as Undefined studies. Excluded domains covered 11.5% or 106 out of 925 publications; it was confirmed in validation process that they primarily focused on specific case studies in fundamental sciences and medicine 3 . The included domains from Scopus database were mapped to Web of Science to ensure consistent approach across databases and the correctness of mapping was validated.

Screening criteria and procedures

Based on the SLR practices (as in Kitchenham, Budgen & Brereton (2015) , Brereton et al. (2007) ) and defined SLR scope, we designed multi-step screening procedures (quality and relevancy) with associated set of Screening Criteria and Scoring System . The purpose of relevancy screening is to find relevant primary studies in an unbiased way ( Vanwersch et al., 2011 ). Quality screening, on the other hand, aims to assess primary relevant studies in terms of quality in unbiased way.

Screening Criteria consisted of two subsets— Exclusion Criteria applied for initial filtering and Relevance Criteria , also known as Inclusion Criteria .

Exclusion Criteria were initial threshold quality controls aiming at eliminating studies with limited or no scientific contribution. The exclusion criteria also address issues of understandability, accessability and availability. The Exclusion Criteria were as follows:

  • Quality 1: The publication item is not in English (understandability).
  • either the same document retrieved from two or all three databases.
  • or different versions of the same publication are retrieved (i.e., the same study published in different sources)—based on best practices, decision rule is that the most recent paper is retained as well as the one with the highest score ( Kofod-Petersen, 2014 ).
  • if a publication is published both as conference proceeding and as journal article with the same name and same authors or as an extended version of conference paper, the latter is selected.
  • Quality 3: Length of the publication is less than 6 pages—short papers do not have the space to expand and discuss presented ideas in sufficient depth to examine for us.
  • Quality 4: The paper is not accessible in full length online through the university subscription of databases and via Google Scholar—not full availability prevents us from assessing and analyzing the text.

The initially retrieved list of papers was filtered based on Exclusion Criteria . Only papers that passed all criteria were retained in the final studies corpus. Mapping of criteria towards screening steps is exhibited in Fig. 4 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g004.jpg

Relevance Criteria were designed to identify relevant publications and are presented in Table 2 below while mapping to respective process steps is presented in Fig. 4 . These criteria were applied iteratively.

Relevance criteriaCriteria definitionCriteria justification
Relevance 1Is the study about data mining or data analytics approach and is within designated list of domains?Exclude studies conducted outside the designated domain list. Exclude studies not directly describing and/or discussing data mining and data analytics
Relevance 2Is the study introducing/describing data mining or data analytics methodology/framework or modifying existing approaches?Exclude texts considering only specific, granular data mining and data analytics techniques, methods or traditional statistical methods. Exclude publications focusing on specific, granular data mining and data analytics process/sub-process aspects. Exclude texts where description and discussion of data mining methodologies or frameworks is manifestly missing

As a final SLR step, the full texts quality assessment was performed with constructed Scoring Metrics (in line with Kitchenham & Charters (2007) ). It is presented in the Table 3 below.

ScoreCriteria definition
3Data mining methodology or framework is presented in full. All steps described and explained, tests performed, results compared and evaluated. There is clear proposal on usage, application, deployment of solution in organization’s business process(es) and IT/IS system, and/or prototype or full solution implementation is discussed. Success factors described and presented
2Data mining methodology or framework is presented, some process steps are missing, but they do not impact the holistic view and understanding of the performed work. Data mining process is clearly presented and described, tests performed, results compared and evaluated. There is proposal on usage, application, deployment of solution in organization’s business process(es) and IT/IS system(s)
1Data mining methodology or framework is not presented in full, some key phases and process steps are missing. Publication focuses on one or some aspects (e.g., method, technique)
0Data mining methodology or framework not presented as holistic approach, but on fragmented basis, study limited to some aspects (e.g., method or technique discussion, etc.)

Data extraction and screening process

The conducted data extraction and screening process is presented in Fig. 4 . In Step 1 initial publications list were retrieved from pre-defined databases—Scopus, Web of Science, Google Scholar. The lists were merged and duplicates eliminated in Step 2. Afterwards, texts being less than 6 pages were excluded (Step 3). Steps 1–3 were guided by Exclusion Criteria . In the next stage (Step 4), publications were screened by Title based on pre-defined Relevance Criteria . The ones which passed were evaluated by their availability (Step 5). As long as study was available, it was evaluated again by the same pre-defined Relevance Criteria applied to Abstract, Conclusion and if necessary Introduction (Step 6). The ones which passed this threshold formed primary publications corpus extracted from databases in full. These primary texts were evaluated again based on full text (Step 7) applying Relevance Criteria first and then Scoring Metrics .

Results and quantitative analysis

In Step 1, 1,715 publications were extracted from relevant databases with the following composition—Scopus (819), Web of Science (489), Google Scholar (407). In terms of scientific publication domains, Computer Science (42.4%), Engineering (20.6%), Mathematics (11.1%) accounted for app. 74% of Scopus originated texts. The same applies to Web of Science harvest. Exclusion Criteria application produced the following results. In Step 2, after eliminating duplicates, 1,186 texts were passed for minimum length evaluation, and 767 reached assessment by Relevancy Criteria .

As mentioned Relevance Criteria were applied iteratively (Step 4–6) and in conjunction with availability assessment. As a result, only 298 texts were retained for full evaluation with 241 originating from scientific databases while 57 were ‘grey’. These studies formed primary texts corpus which was extracted, read in full and evaluated by Relevance Criteria combined with Scoring Metrics . The decision rule was set as follows. Studies that scored “1” or “0” were rejected, while texts with “3” and “2” evaluation were admitted as final primary studies corpus. To this end, as an outcome of SLR-based, broad, cross-domain publications collection and screening we identified 207 relevant publications from peer-reviewed (156 texts) and ‘grey’ literature (51 texts). Figure 5 below exhibits yearly published research numbers with the breakdown by ‘peer-reviewed’ and ‘grey’ literature starting from 1997.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g005.jpg

In terms of composition, ‘peer-reviewed’ studies corpus is well-balanced with 72 journal articles and 82 conference papers while book chapters account for 4 instances only. In contrast, in ‘grey’ literature subset, articles in moderated and non-peer reviewed journals are dominant ( n = 34) compared to overall number of conference papers ( n = 13), followed by small number of technical reports and pre-prints ( n = 4).

Temporal analysis of texts corpus (as per Fig. 5 below) resulted in two observations. Firstly, we note that stable and significant research interest (in terms of numbers) on data mining methodologies application has started around a decade ago—in 2007. Research efforts made prior to 2007 were relatively limited with number of publications below 10. Secondly, we note that research on data mining methodologies has grown substantially since 2007, an observation supported by the 3-year and 10-year constructed mean trendlines. In particular, the number of publications have roughly tripled over past decade hitting all time high with 24 texts released in 2017.

Further, there are also two distinct spike sub-periods in the years 2007–2009 and 2014–2017 followed by stable pattern with overall higher number of released publications on annual basis. This observation is in line with the trend of increased penetration of methodologies, tools, cross-industry applications and academic research of data mining.

Findings and Discussion

In this section, we address the research questions of the paper. Initially, as part of RQ1, we present overview of data mining methodologies ‘as-is’ and adaptation trends. In addressing RQ2, we further classify the adaptations identified. Then, as part of RQ3 subsection, each category identified under RQ2 is analyzed with particular focus on the goals of adaptations.

RQ1: How data mining methodologies are applied (‘as-is’ vs. adapted)?

The first research question examines the extent to which data mining methodologies are used ‘as-is’ versus adapted. Our review based on 207 publications identified two distinct paradigms on how data mining methodologies are applied. The first is ‘as-is’ where the data mining methodologies are applied as stipulated. The second is with ‘adaptations’; that is, methodologies are modified by introducing various changes to the standard process model when applied.

We have aggregated research by decades to differentiate application pattern between two time periods 1997–2007 with limited vs 2008–2018 with more intensive data mining application. The given cut has not only been guided by extracted publications corpus but also by earlier surveys. In particular, during the pre-2007 research, there where ten new methodologies proposed, but since then, only two new methodologies have been proposed. Thus, there is a distinct trend observed over the last decade of large number of extensions and adaptations proposed vs entirely new methodologies.

We note that during the first decade of our time scope (1997–2007), the ratio of data mining methodologies applied ‘as-is’ was 40% (as presented in Fig. 6A ). However, the same ratio for the following decade is 32% ( Fig. 6B ). Thus, in terms of relative shares we note a clear decrease in using data mining methodologies ‘as-is’ in favor of adapting them to cater to specific needs.The trend is even more pronounced when comparing numbers—adaptations more than tripled (from 30 to 106) while ‘as-is’ scenario has increased modestly (from 20 to 51). Given this finding, we continue with analyzing how data mining methodologies have been adapted under RQ2.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g006.jpg

RQ2: How have existing data mining methodologies been adapted?

We identified that data mining methodologies have been adapted to cater to specific needs. In order to categorize adaptations scenarios, we applied a two-level dichotomy, specifically, by applying the following decision tree:

  • Level 1 Decision: Has the methodology been combined with another methodology? If yes, the resulting methodology was classified in the ‘integration’ category. Otherwise, we posed the next question.
  • Level 2 Decision: Are any new elements (phases, tasks, deliverables) added to the methodology? If yes, we designate the resulting methodology as an ‘extension’ of the original one. Otherwise, we classify the resulting methodology as a modification of the original one.

Thus, when adapted three distinct types of adaptation scenarios can be distinguished:

  • Scenario ‘Modification’: introduces specialized sub-tasks and deliverables in order to address specific use cases or business problems. Modifications typically concentrate on granular adjustments to the methodology at the level of sub-phases, tasks or deliverables within the existing reference frameworks (e.g., CRISP-DM or KDD) stages. For example, Chernov et al. (2014) , in the study of mobile network domain, proposed automated decision-making enhancement in the deployment phase. In addition, the evaluation phase was modified by using both conventional and own-developed performance metrics. Further, in a study performed within the financial services domain, Yang et al. (2016) presents feature transformation and feature selection as sub-phases, thereby enhancing the data mining modeling stage.
  • Scenario ‘Extension’: primarily proposes significant extensions to reference data mining methodologies. Such extensions result in either integrated data mining solutions, data mining frameworks serving as a component or tool for automated IS systems, or their transformations to fit specialized environments. The main purposes of extensions are to integrate fully-scaled data mining solutions into IS/IT systems and business processes and provide broader context with useful architectures, algorithms, etc. Adaptations, where extensions have been made, elicit and explicitly present various artifacts in the form of system and model architectures, process views, workflows, and implementation aspects. A number of soft goals are also achieved, providing holistic perspective on data mining process, and contextualizing with organizational needs. Also, there are extensions in this scenario where data mining process methodologies are substantially changed and extended in all key phases to enable execution of data mining life-cycle with the new (Big) Data technologies, tools and in new prototyping and deployment environments (e.g., Hadoop platforms or real-time customer interfaces). For example, Kisilevich, Keim & Rokach (2013) presented extensions to traditional CRISP-DM data mining outcomes with fully fledged Decision Support System (DSS) for hotel brokerage business. Authors ( Kisilevich, Keim & Rokach, 2013 ) have introduced spatial/non-spatial data management (extending data preparation), analytical and spatial modeling capabilities (extending modeling phase), provided spatial display and reporting capabilities (enhancing deployment phase). In the same work domain knowledge was introduced in all phases of data mining process, and usability and ease of use were also addressed.
  • Scenario ‘Integration’: combines reference methodology, for example, CRISP-DM with: (1) data mining methodologies originated from other domains (e.g., Software engineering development methodologies), (2) organizational frameworks (Balanced Scorecard, Analytics Canvass, etc.), or (3) adjustments to accommodate Big Data technologies and tools. Also, adaptations in the form of ‘Integration’ typically introduce various types of ontologies and ontology-based tools, domain knowledge, software engineering, and BI-driven framework elements. Fundamental data mining process adjustments to new types of data, IS architectures (e.g., real time data, multi-layer IS) are also presented. Key gaps addressed with such adjustments are prescriptive nature and low degree of formalization in CRISP-DM, obsolete nature of CRISP-DM with respect to tools, and lack of CRISP-DM integration with other organizational frameworks. For example, Brisson & Collard (2008) developed KEOPS data mining methodology (CRIPS-DM based) centered on domain knowledge integration. Ontology-driven information system has been proposed with integration and enhancements to all steps of data mining process. Further, an integrated expert knowledge used in all data mining phases was proved to produce value in data mining process.

To examine how the application scenario of each data mining methodology usage has developed over time, we mapped peer-reviewed texts and ‘grey’ literature to respective adaptation scenarios, aggregated by decades (as presented in the Fig. 7 for peer-reviewed and Fig. 8 for ‘grey’).

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g007.jpg

For peer-reviewed research, such temporal analysis resulted in three observations. Firstly, research efforts in each adaptation scenario has been growing and number of publication more than quadrupled (128 vs. 28). Secondly, as noted above relative proportion of ‘as-is’ studies is diluted (from 39% to 33%) and primarily replaced with ‘Extension’ paradigm (from 25% to 30%). In contrast, in relative terms ‘Modification’ and ‘Integration’ paradigms gains are modest. Further, this finding is reinforced with other observation—most notable gaps in terms of modest number of publications remain in ‘Integration’ category where excluding 2008–2009 spike, research efforts are limited and number of texts is just 13. This is in stark contrast with prolific research in ‘Extension category’ though concentrated in the recent years. We can hypothesize that existing reference methodologies do not accommodate and support increasing complexity of data mining projects and IS/IT infrastructure, as well as certain domains specifics and as such need to be adapted.

In ‘grey’ literature, in contrast to peer-reviewed research, growth in number of publications is less profound—29 vs. 22 publications or 32% comparing across two decade (as per Fig. 8 ). The growth is solely driven by ‘Integration’ scenarios application (13 vs. 4 publications) while both ‘as-is’ and other adaptations scenarios are stagnating or in decline.

RQ3: For what purposes have existing data mining methodologies been adapted?

We address the third research question by analyzing what gaps the data mining methodology adaptations seek to fill and the benefits of such adaptations. We identified three adaptation scenarios, namely ‘Modification’, ‘Extension’, and ‘Integration’. Here, we analyze each of them.

Modification

Modifications of data mining methodologies are present in 30 peer-reviewed and 4 ‘grey’ literature studies. The analysis shows that modifications overwhelmingly consist of specific case studies. However, the major differentiating point compared to ‘as-is’ case studies is clear presence of specific adjustments towards standard data mining process methodologies. Yet, the proposed modifications and their purposes do not go beyond traditional data mining methodologies phases. They are granular, specialized and executed on tasks, sub-tasks, and at deliverables level. With modifications, authors describe potential business applications and deployment scenarios at a conceptual level, but typically do not report or present real implementations in the IS/IT systems and business processes.

Further, this research subcategory can be best classified based on domains where case studies were performed and data mining methodologies modification scenarios executed. We have identified four distinct domain-driven applications presented in the Fig. 9 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g009.jpg

IT, IS domain

The largest number of publications (14 or app. 40%), was performed on IT, IS security, software development, specific data mining and processing topics. Authors address intrusion detection problem in Hossain, Bridges & Vaughn (2003) , Fan, Ye & Chen (2016) , Lee, Stolfo & Mok (1999) , specialized algorithms for variety of data types processing in Yang & Shi (2010) , Chen et al. (2001) , Yi, Teng & Xu (2016) , Pouyanfar & Chen (2016) , effective and efficient computer and mobile networks management in Guan & Fu (2010) , Ertek, Chi & Zhang (2017) , Zaki & Sobh (2005) , Chernov, Petrov & Ristaniemi (2015) , Chernov et al. (2014) .

Manufacturing and engineering

The next most popular research area is manufacturing/engineering with 10 case studies. The central topic here is high-technology manufacturing, for example, semi-conductors associated—study of Chien, Diaz & Lan (2014) , and various complex prognostics case studies in rail, aerospace domains ( Létourneau et al., 2005 ; Zaluski et al., 2011 ) concentrated on failure predictions. These are complemented by studies on equipment fault and failure predictions and maintenance ( Kumar, Shankar & Thakur, 2018 ; Kang et al., 2017 ; Wang, 2017 ) as well as monitoring system ( García et al., 2017 ).

Sales and services, incl. financial industry

The third category is presented by seven business application papers concerning customer service, targeting and advertising ( Karimi-Majd & Mahootchi, 2015 ; Reutterer et al., 2017 ; Wang, 2017 ), financial services credit risk assessments ( Smith, Willis & Brooks, 2000 ), supply chain management ( Nohuddin et al., 2018 ), and property management ( Yu, Fung & Haghighat, 2013 ), and similar.

As a consequence of specialization, these studies concentrate on developing ‘state-of-the art’ solution to the respective domain-specific problem.

‘Extension’ scenario was identified in 46 peer-reviewed and 12 ‘grey’ publications. We noted that ‘Extension’ to existing data mining methodologies were executed with four major purposes:

  • Purpose 1: To implement fully scaled, integrated data mining solution and regular, repeatable knowledge discovery process— address model, algorithm deployment, implementation design (including architecture, workflows and corresponding IS integration). Also, complementary goal is to tackle changes to business process to incorporate data mining into organization activities.
  • Purpose 2: To implement complex, specifically designed systems and integrated business applications with data mining model/solution as component or tool. Typically, this adaptation is also oriented towards Big Data specifics, and is complemented by proposed artifacts such as Big Data architectures, system models, workflows, and data flows.
  • Purpose 3: To implement data mining as part of integrated/combined specialized infrastructure, data environments and types (e.g., IoT, cloud, mobile networks) .
  • Purpose 4: To incorporate context-awareness aspects.

The specific list of studies mapped to each of the given purposes presented in the Appendix ( Table A1 ). Main purposes of adaptations, associated gaps and/or benefits along with observations and artifacts are documented in the Fig. 10 below.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g010.jpg

Main adaptation purposePublications
(1) To implement fully scaled, integrated data mining solution , , , , , , , , , , , , , , ,
(2) To implement complex systems and integrated business applications with data mining model/solution as component or tool , , , , , , , , , , , , , , , , , , ,
(3) To implement data mining as part of integrated/combined specialized infrastructure,data environments and types (e.g., IoT, cloud, mobile networks) , , , , , , , , , , , , , , , , , , , ,
(4) To incorporate context-awareness aspects

In ‘Extension’ category, studies executed with the Purpose 1 propose fully scaled, integrated data mining solutions of specific data mining models, associated frameworks and processes. The distinctive trait of this research subclass is that it ensures repeatability and reproducibility of delivered data mining solution in different organizational and industry settings. Both the results of data mining use case as well as deployment and integration into IS/IT systems and associated business process(es) are presented explicitly. Thus, ‘Extension’ subclass is geared towards specific solution design, tackling concrete business or industrial setting problem or addressing specific research gaps thus resembling comprehensive case study.

This direction can be well exemplified by expert finder system in research social network services proposed by Sun et al. (2015) , data mining solution for functional test content optimization by Wang (2015) and time-series mining framework to conduct estimation of unobservable time-series by Hu et al. (2010) . Similarly, Du et al. (2017) tackle online log anomalies detection, automated association rule mining is addressed by Çinicioğlu et al. (2011) , software effort estimation by Deng, Purvis & Purvis (2011) , network patterns visual discovery by Simoff & Galloway (2008) . Number of studies address solutions in IS security ( Shin & Jeong, 2005 ), manufacturing ( Güder et al., 2014 ; Chee, Baharudin & Karkonasasi, 2016 ), materials engineering domains ( Doreswamy, 2008 ), and business domains ( Xu & Qiu, 2008 ; Ding & Daniel, 2007 ).

In contrast, ‘Extension’ studies executed for the Purpose 2 concentrate on design of complex, multi-component information systems and architectures. These are holistic, complex systems and integrated business applications with data mining framework serving as component or tool. Moreover, data mining methodology in these studies is extended with systems integration phases.

For example, Mobasher (2007) presents data mining application in Web personalization system and associated process; here, data mining cycle is extended in all phases with utmost goal of leveraging multiple data sources and using discovered models and corresponding algorithms in an automatic personalization system. Authors comprehensively address data processing, algorithm, design adjustments and respective integration into automated system. Similarly, Haruechaiyasak, Shyu & Chen (2004) tackle improvement of Webpage recommender system by presenting extended data mining methodology including design and implementation of data mining model. Holistic view on web-mining with support of all data sources, data warehousing and data mining techniques integration, as well as multiple problem-oriented analytical outcomes with rich business application scenarios (personalization, adaptation, profiling, and recommendations) in e-commerce domain was proposed and discussed by Büchner & Mulvenna (1998) . Further, Singh et al. (2014) tackled scalable implementation of Network Threat Intrusion Detection System. In this study, data mining methodology and resulting model are extended, scaled and deployed as module of quasi-real-time system for capturing Peer-to-Peer Botnet attacks. Similar complex solution was presented in a series of publications by Lee et al. (2000 , 2001) who designed real-time data mining-based Intrusion Detection System (IDS). These works are complemented by comprehensive study of Barbará et al. (2001) who constructed experimental testbed for intrusion detection with data mining methods. Detection model combining data fusion and mining and respective components for Botnets identification was developed by Kiayias et al. (2009) too. Similar approach is presented in Alazab et al. (2011) who proposed and implemented zero-day malware detection system with associated machine-learning based framework. Finally, Ahmed, Rafique & Abulaish (2011) presented multi-layer framework for fuzzy attack in 3G cellular IP networks.

A number of authors have considered data mining methodologies in the context of Decision Support Systems and other systems that generate information for decision-making, across a variety of domains. For example, Kisilevich, Keim & Rokach (2013) executed significant extension of data mining methodology by designing and presenting integrated Decision Support System (DSS) with six components acting as supporting tool for hotel brokerage business to increase deal profitability. Similar approach is undertaken by Capozzoli et al. (2017) focusing on improving energy management of properties by provision of occupancy pattern information and reconfiguration framework. Kabir (2016) presented data mining information service providing improved sales forecasting that supported solution of under/over-stocking problem while Lau, Zhang & Xu (2018) addressed sales forecasting with sentiment analysis on Big Data. Kamrani, Rong & Gonzalez (2001) proposed GA-based Intelligent Diagnosis system for fault diagnostics in manufacturing domain. The latter was tackled further in Shahbaz et al. (2010) with complex, integrated data mining system for diagnosing and solving manufacturing problems in real time.

Lenz, Wuest & Westkämper (2018) propose a framework for capturing data analytics objectives and creating holistic, cross-departmental data mining systems in the manufacturing domain. This work is representative of a cohort of studies that aim at extending data mining methodologies in order to support the design and implementation of enterprise-wide data mining systems. In this same research cohort, we classify Luna, Castro & Romero (2017) , which presents a data mining toolset integrated into the Moodle learning management system, with the aim of supporting university-wide learning analytics.

One study addresses multi-agent based data mining concept. Khan, Mohamudally & Babajee (2013) have developed unified theoretical framework for data mining by formulating a unified data mining theory. The framework is tested by means of agent programing proposing integration into multi-agent system which is useful due to scalability, robustness and simplicity.

The subcategory of ‘Extension’ research executed with Purpose 3 is devoted to data mining methodologies and solutions in specialized IT/IS, data and process environments which emerged recently as consequence of Big Data associated technologies and tools development. Exemplary studies include IoT associated environment research, for example, Smart City application in IoT presented by Strohbach et al. (2015) . In the same domain, Bashir & Gill (2016) addressed IoT-enabled smart buildings with the additional challenge of large amount of high-speed real time data and requirements of real-time analytics. Authors proposed integrated IoT Big Data Analytics framework. This research is complemented by interdisciplinary study of Zhong et al. (2017) where IoT and wireless technologies are used to create RFID-enabled environment producing analysis of KPIs to improve logistics.

Significant number of studies addresses various mobile environments sometimes complemented by cloud-based environments or cloud-based environments as stand-alone. Gomes, Phua & Krishnaswamy (2013) addressed mobile data mining with execution on mobile device itself; the framework proposes innovative approach addressing extensions of all aspects of data mining including contextual data, end-user privacy preservation, data management and scalability. Yuan, Herbert & Emamian (2014) and Yuan & Herbert (2014) introduced cloud-based mobile data analytics framework with application case study for smart home based monitoring system. Cuzzocrea, Psaila & Toccu (2016) have presented innovative FollowMe suite which implements data mining framework for mobile social media analytics with several tools with respective architecture and functionalities. An interesting paper was presented by Torres et al. (2017) who addressed data mining methodology and its implementation for congestion prediction in mobile LTE networks tackling also feedback reaction with network reconfigurations trigger.

Further, Biliri et al. (2014) presented cloud-based Future Internet Enabler—automated social data analytics solution which also addresses Social Network Interoperability aspect supporting enterprises to interconnect and utilize social networks for collaboration. Real-time social media streamed data and resulting data mining methodology and application was extensively discussed by Zhang, Lau & Li (2014) . Authors proposed design of comprehensive ABIGDAD framework with seven main components implementing data mining based deceptive review identification. Interdisciplinary study tackling both these topics was developed by Puthal et al. (2016) who proposed integrated framework and architecture of disaster management system based on streamed data in cloud environment ensuring end-to-end security. Additionally, key extensions to data mining framework have been proposed merging variety of data sources and types, security verification and data flow access controls. Finally, cloud-based manufacturing was addressed in the context of fault diagnostics by Kumar et al. (2016) .

Also, Mahmood et al. (2013) tackled Wireless Sensor Networks and associated data mining framework required extensions. Interesting work is executed by Nestorov & Jukic (2003) addressing rare topic of data mining solutions integration within traditional data warehouses and active mining of data repositories themselves.

Supported by new generation of visualization technologies (including Virtual Reality environments), Wijayasekara, Linda & Manic (2011) proposed and implemented CAVE-SOM (3D visual data mining framework) which offers interactive, immersive visual data mining with multiple visualization modes supported by plethora of methods. Earlier version of visual data mining framework was successfully developed and presented by Ganesh et al. (1996) as early as in 1996.

Large-scale social media data is successfully tackled by Lemieux (2016) with comprehensive framework accompanied by set of data mining tools and interface. Real time data analytics was addressed by Shrivastava & Pal (2017) in the domain of enterprise service ecosystem. Images data was addressed in Huang et al. (2002) by proposing multimedia data mining framework and its implementation with user relevance feedback integration and instance learning. Further, exploded data diversity and associated need to extend standard data mining is addressed by Singh et al. (2016) in the study devoted to object detection in video surveillance systems supporting real time video analysis.

Finally, there is also limited number of studies which addresses context awareness (Purpose 4) and extends data mining methodology with context elements and adjustments. In comparison with ‘Integration’ category research, here, the studies are at lower abstraction level, capturing and presenting list of adjustments. Singh, Vajirkar & Lee (2003) generate taxonomy of context factors, develop extended data mining framework and propose deployment including detailed IS architecture. Context-awareness aspect is also addressed in the papers reviewed above, for example, Lenz, Wuest & Westkämper (2018) , Kisilevich, Keim & Rokach (2013) , Sun et al. (2015) , and other studies.

Integration

‘Integration’ of data mining methodologies scenario was identified in 27 ‘peer-reviewed’ and 17 ‘grey’ studies. Our analysis revealed that this adaptation scenario at a higher abstraction level is typically executed with the five key purposes:

  • Purpose 1: to integrate/combine with various ontologies existing in organization .
  • Purpose 2: to introduce context-awareness and incorporate domain knowledge .
  • Purpose 3: to integrate/combine with other research or industry domains framework, process methodologies and concepts .
  • Purpose 4: to integrate/combine with other well-known organizational governance frameworks, process methodologies and concepts .
  • Purpose 5: to accommodate and/or leverage upon newly available Big Data technologies, tools and methods.

The specific list of studies mapped to each of the given purposes presented in Appendix ( Table A2 ). Main purposes of adaptations, associated gaps and/or benefits along with observations and artifacts are documented in Fig. 11 below.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-06-267-g011.jpg

Main adaptation purposePublications
(1) To integrate/combined with various ontologies existing in organization , , , , ,
(2) To introduce context-awareness and incorporate domain knowledge , , , , , ,
(3) To integrate/combine with other research/industry domains frameworks, process methodologies, and concepts , , , , , , , , , , , , ,
(4) To integrate/combine with other organizational governance frameworks, process methodologies, concepts , , , , , , , ,
(5) To accomodate or leverage upon newly available Big Data technologies, tools and methods , , , , , ,

As mentioned, number of studies concentrates on proposing ontology-based Integrated data mining frameworks accompanies by various types of ontologies (Purpose 1). For example, Sharma & Osei-Bryson (2008) focus on ontology-based organizational view with Actors, Goals and Objectives which supports execution of Business Understanding Phase. Brisson & Collard (2008) propose KEOPS framework which is CRISP-DM compliant and integrates a knowledge base and ontology with the purpose to build ontology-driven information system (OIS) for business and data understanding phases while knowledge base is used for post-processing step of model interpretation. Park et al. (2017) propose and design comprehensive ontology-based data analytics tool IRIS with the purpose to align analytics and business. IRIS is based on concept to connect dots, analytics methods or transforming insights into business value, and supports standardized process for applying ontology to match business problems and solutions.

Further, Ying et al. (2014) propose domain-specific data mining framework oriented to business problem of customer demand discovery. They construct ontology for customer demand and customer demand discovery task which allows to execute structured knowledge extraction in the form of knowledge patterns and rules. Here, the purpose is to facilitate business value realization and support actionability of extracted knowledge via marketing strategies and tactics. In the same vein, Cannataro & Comito (2003) presented ontology for the Data Mining domain which main goal is to simplify the development of distributed knowledge discovery applications. Authors offered to a domain expert a reference model for different kind of data mining tasks, methodologies, and software capable to solve the given business problem and find the most appropriate solution.

Apart from ontologies, Sharma & Osei-Bryson (2009) in another study propose IS inspired, driven by Input-Output model data mining methodology which supports formal implementation of Business Understanding Phase. This research exemplifies studies executed with Purpose 2. The goal of the paper is to tackle prescriptive nature of CRISP-DM and address how the entire process can be implemented. Cao, Schurmann & Zhang (2005) study is also exemplary in terms of aggregating and introducing several fundamental concepts into traditional CRISP-DM data mining cycle—context awareness, in-depth pattern mining, human–machine cooperative knowledge discovery (in essence, following human-centricity paradigm in data mining), loop-closed iterative refinement process (similar to Agile-based methodologies in Software Development). There are also several concepts, like data, domain, interestingness, rules which are proposed to tackle number of fundamental constrains identified in CRISP-DM. They have been discussed and further extended by Cao & Zhang (2007 , 2008) , Cao (2010) into integrated domain driven data mining concept resulting in fully fledged D3M (domain-driven) data mining framework. Interestingly, the same concepts, but on individual basis are investigated and presented by other authors, for example, context-aware data mining methodology is tackled by Xiang (2009a , 2009b) in the context of financial sector. Pournaras et al. (2016) attempted very crucial privacy-preservation topic in the context of achieving effective data analytics methodology. Authors introduced metrics and self-regulatory (reconfigurable) information sharing mechanism providing customers with controls for information disclosure.

A number of studies have proposed CRISP-DM adjustments based on existing frameworks, process models or concepts originating in other domains (Purpose 3), for example, software engineering ( Marbán et al., 2007 , 2009 ; Marban, Mariscal & Segovia, 2009 ) and industrial engineering ( Solarte, 2002 ; Zhao et al., 2005 ).

Meanwhile, Mariscal, Marbán & Fernández (2010) proposed a new refined data mining process based on a global comparative analysis of existing frameworks while Angelov (2014) outlined a data analytics framework based on statistical concepts. Following a similar approach, some researchers suggest explicit integration with other areas and organizational functions, for example, BI-driven Data Mining by Hang & Fong (2009) . Similarly, Chen, Kazman & Haziyev (2016) developed an architecture-centric agile Big Data analytics methodology, and an architecture-centric agile analytics and DevOps model. Alternatively, several authors tackled data mining methodology adaptations in other domains, for example, educational data mining by Tavares, Vieira & Pedro (2017) , decision support in learning management systems ( Murnion & Helfert, 2011 ), and in accounting systems ( Amani & Fadlalla, 2017 ).

Other studies are concerned with actionability of data mining and closer integration with business processes and organizational management frameworks (Purpose 4). In particular, there is a recurrent focus on embedding data mining solutions into knowledge-based decision making processes in organizations, and supporting fast and effective knowledge discovery ( Bohanec, Robnik-Sikonja & Borstnar, 2017 ).

Examples of adaptations made for this purpose include: (1) integration of CRISP-DM with the Balanced Scorecard framework used for strategic performance management in organizations ( Yun, Weihua & Yang, 2014 ); (2) integration with a strategic decision-making framework for revenue management Segarra et al. (2016) ; (3) integration with a strategic analytics methodology Van Rooyen & Simoff (2008) , and (4) integration with a so-called ‘Analytics Canvas’ for management of portfolios of data analytics projects Kühn et al. (2018) . Finally, Ahangama & Poo (2015) explored methodological attributes important for adoption of data mining methodology by novice users. This latter study uncovered factors that could support the reduction of resistance to the use of data mining methodologies. Conversely, Lawler & Joseph (2017) comprehensively evaluated factors that may increase the benefits of Big Data Analytics projects in an organization.

Lastly, a number of studies have proposed data mining frameworks (e.g., CRISP-DM) adaptations to cater for new technological architectures, new types of datasets and applications (Purpose 5). For example, Lu et al. (2017) proposed a data mining system based on a Service-Oriented Architecture (SOA), Zaghloul, Ali-Eldin & Salem (2013) developed a concept of self-service data analytics, Osman, Elragal & Bergvall-Kåreborn (2017) blended CRISP-DM into a Big Data Analytics framework for Smart Cities, and Niesen et al. (2016) proposed a data-driven risk management framework for Industry 4.0 applications.

Our analysis of RQ3, regarding the purposes of existing data mining methodologies adaptations, revealed the following key findings. Firstly, adaptations of type ‘Modification’ are predominantly targeted at addressing problems that are specific to a given case study. The majority of modifications were made within the domain of IS security, followed by case studies in the domains of manufacturing and financial services. This is in clear contrast with adaptations of type ‘Extension’, which are primarily aimed at customizing the methodology to take into account specialized development environments and deployment infrastructures, and to incorporate context-awareness aspects. Thirdly, a recurrent purpose of adaptations of type ‘Integration’ is to combine a data mining methodology with either existing ontologies in an organization or with other domain frameworks, methodologies, and concepts. ‘Integration’ is also used to instill context-awareness and domain knowledge into a data mining methodology, or to adapt it to specialized methods and tools, such as Big Data. The distinctive outcome and value (gaps filled in) of ‘Integrations’ stems from improved knowledge discovery, better actionability of results, improved combination with key organizational processes and domain-specific methodologies, and improved usage of Big Data technologies.

We discovered that the adaptations of existing data mining methodologies found in the literature can be classified into three categories: modification, extension, or integration.

We also noted that adaptations are executed either to address deficiencies and lack of important elements or aspects in the reference methodology (chiefly CRISP-DM). Furthermore, adaptations are also made to improve certain phases, deliverables or process outcomes.

In short, adaptations are made to:

  • improve key reference data mining methodologies phases—for example, in case of CRISP-DM these are primarily business understanding and deployment phases.
  • support knowledge discovery and actionability.
  • introduce context-awareness and higher degree of formalization.
  • integrate closer data mining solution with key organizational processes and frameworks.
  • significantly update CRISP-DM with respect to Big Data technologies, tools, environments and infrastructure.
  • incorporate broader, explicit context of architectures, algorithms and toolsets as integral deliverables or supporting tools to execute data mining process.
  • expand and accommodate broader unified perspective for incorporating and implementing data mining solutions in organization, IT infrastructure and business processes.

Threats to Validity

Systematic literature reviews have inherent limitations that must be acknowledged. These threats to validity include subjective bias (internal validity) and incompleteness of search results (external validity).

The internal validity threat stems from the subjective screening and rating of studies, particularly when assessing the studies with respect to relevance and quality criteria. We have mitigated these effects by documenting the survey protocol (SLR Protocol), strictly adhering to the inclusion criteria, and performing significant validation procedures, as documented in the Protocol.

The external validity threat relates to the extent to which the findings of the SLR reflect the actual state of the art in the field of data mining methodologies, given that the SLR only considers published studies that can be retrieved using specific search strings and databases. We have addressed this threat to validity by conducting trial searches to validate our search strings in terms of their ability to identify relevant papers that we knew about beforehand. Also, the fact that the searches led to 1,700 hits overall suggests that a significant portion of the relevant literature has been covered.

In this study, we have examined the use of data mining methodologies by means of a systematic literature review covering both peer-reviewed and ‘grey’ literature. We have found that the use of data mining methodologies, as reported in the literature, has grown substantially since 2007 (four-fold increase relative to the previous decade). Also, we have observed that data mining methodologies were predominantly applied ‘as-is’ from 1997 to 2007. This trend was reversed from 2008 onward, when the use of adapted data mining methodologies gradually started to replace ‘as-is’ usage.

The most frequent adaptations have been in the ‘Extension’ category. This category refers to adaptations that imply significant changes to key phases of the reference methodology (chiefly CRISP-DM). These adaptations particularly target the business understanding, deployment and implementation phases of CRISP-DM (or other methodologies). Moreover, we have found that the most frequent purposes of adaptions are: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). A key finding is that standard data mining methodologies do not pay sufficient attention to deployment aspects required to scale and transform data mining models into software products integrated into large IT/IS systems and business processes.

Apart from the adaptations in the ‘Extension’ category, we have also identified an increasing number of studies focusing on the ‘Integration’ of data mining methodologies with other domain-specific and organizational methodologies, frameworks, and concepts. These adaptions are aimed at embedding the data mining methodology into broader organizational aspects.

Overall, the findings of the study highlight the need to develop refinements of existing data mining methodologies that would allow them to seamlessly interact with IT development platforms and processes (technological adaptation) and with organizational management frameworks (organizational adaptation). In other words, there is a need to frame existing data mining methodologies as being part of a broader ecosystem of methodologies, as opposed to the traditional view where data mining methodologies are defined in isolation from broader IT systems engineering and organizational management methodologies.

Supplemental Information

Supplemental information 1.

Unfortunately, we were not able to upload any graph (original png files). Based on Overleaf placed PeerJ template we constructed graphs files based on the template examples. Unfortunately, we were not able to understand why it did not fit, redoing to new formats will change all texts flow and generated pdf file. We submit graphs in archived file as part of supplementary material. We will do our best to redo the graphs further based on instructions from You.

Supplemental Information 2

File starts with Definitions page—it lists and explains all columns definitions as well as SLR scoring metrics. Second page contains"Peer reviewed" texts while next one "grey" literature corpus.

Funding Statement

The authors received no funding for this work.

Additional Information and Declarations

The authors declare that they have no competing interests.

Veronika Plotnikova conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Marlon Dumas conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Fredrik Milani conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Primary Sources

S-Logix Logo

Office Address

  • 2nd Floor, #7a, High School Road, Secretariat Colony Ambattur, Chennai-600053 (Landmark: SRM School) Tamil Nadu, India
  • [email protected]
  • +91- 81240 01111

Social List

Phd research proposal topics for data mining.

research proposal data mining

The rapid evolution of the data mining field has facilitated enormous achievements and new developments in organizations. To extract the potentially valid, understandable, novel, and useful data, data mining has become a non-trivial process in the real world due to its advantages of broad applicability, understanding, and scientific progress. With the tremendous improvements in the technologies and complexities in the different fields, data mining often confronts the advanced network and computational resources, heterogeneous data formats, ever-increasing business challenges, disparate data sources, research, and scientific fields. Advancements have shaped the current data mining applications in the different integration models of the data mining methods to cope with the data mining challenges. Nowadays, ubiquitous data mining, short text mining, distributed data mining, multimedia data mining, sequence, and time-series data mining are the emerging data mining trends.

  • Guidelines for Preparing a Phd Research Proposal

Latest Research Proposal Ideas in Data Mining

  • Research Proposal on Recommender Systems
  • Research Proposal on Data preprocessing Methods
  • Research Proposal on Graph Mining
  • Research Proposal on Pattern mining
  • Research Proposal on Stream Data Mining
  • Research Proposal on Time-Series Data Mining
  • Research Proposal on Multimedia Data Mining
  • Research Proposal on Social Network Analysis
  • Research Proposal on Spatial Data Mining
  • Research Proposal on Semantic Analysis
  • Research Proposal on Market Analysis
  • Research Proposal on Fraud Detection
  • Research Proposal on Data Mining in Healthcare
  • Research Proposal on Financial Analysis
  • Research Proposal on Stock Market Analysis
  • Research Proposal on Network Alignment Techniques
  • Research Proposal on Classification Algorithms
  • Research Proposal on Clustering Algorithms
  • Research Proposal on Association Rule Mining
  • Research Proposal on Text Mining
  • Research Proposal on Text Summarization
  • Research Proposal on Topic Modeling
  • Research Proposal on Natural Language Processing
  • Research Proposal on Information Retrieval
  • Research Proposal on Question Answering System
  • Research Proposal on Sentiment Analysis
  • Research Proposal Topic on Discourse Structure and Opinion based Argumentation Mining
  • Research Proposal in Aspect based Opinion Mining for Personalized Recommendation
  • Research Proposal in Utterances and Emoticons based Multi-Class Emotion Recognition
  • Research Proposal in Negation Handling with Contextual Representation for Sentiment Classification
  • Research Proposal in Semi-supervised Misinformation Detection in Social Network
  • Research Proposal in Personalized Recommendation with Contextual Pre-Filtering
  • Research Proposal in Time-series Forecasting using Weighted Incremental Learning
  • Research Proposal in Serendipity-aware Product Recommendation
  • PhD Guidance and Support Enquiry
  • Masters and PhD Project Enquiry
  • PhD Research Guidance in Data Mining
  • PhD Research Guidance in Machine Learning
  • Research Topics in Data Mining
  • Research Topics in Machine Learning
  • PhD Research Proposal in Data Mining
  • PhD Research Proposal in Machine Learning
  • Latest Research Papers in Data Mining
  • Latest Research Papers in Machine Learning
  • Literature Survey in Data Mining
  • Literature Survey in Machine Learning
  • PhD Thesis in Data Mining
  • PhD Thesis in Machine Learning
  • PhD Projects in Data Mining
  • PhD Projects in Machine Learning
  • Leading Journals in Data Mining
  • Leading Journals in Machine Learning
  • Leading Research Books in Data Mining
  • Leading Research Books in Machine Learning
  • Research Topics in Federated Learning
  • Research Topics in Medical Machine Learning
  • Research Topics in Depression Detection based on Deep Learning
  • Research Topics in Recent Advances in Deep Recurrent Neural Networks
  • Research Topics in Multi-Objective Evolutionary Federated Learning
  • Research Topics in Recommender Systems based on Deep Learning
  • Research Topics in Computer Science
  • PhD Thesis Writing Services in Computer Science
  • PhD Paper Writing Services in Computer Science
  • How to Write a PhD Research Proposal in Computer Science
  • Ph.D Support Enquiry
  • Project Enquiry
  • Research Guidance in Data Mining
  • Research Proposal in Data Mining
  • Research Papers in Data Mining
  • Ph.D Thesis in Data Mining
  • Research Projects in Data Mining
  • Project Titles in Data Mining
  • Project Source Code in Data Mining

Data Mining

G22.3033-002

Dr. Jean-Claude Franchitti

New York University

Computer Science Department

Courant Institute of Mathematical Sciences

Session 4: Proposal Sample

Course Title: Data Mining                                                                                                        Course Number: G22.3033-002

Instructor : Jean-Claude Franchitti                                                                                             Session: 4

Title of Project Group Member 1, Group Member 2

The abstract should be one paragraph that summarizes what you will do for your project.

Introduction

Provide a brief overview of data mining. Describe what your proposal is about and the organization of the rest of the proposal. Include whether you will be performing data mining tasks, implementing a new algorithm in Weka (or another data mining tool), or modifying some other system to incorporate data mining features, etc. Basically, provide the nature of your project. This section should be a page or less in length.

Data Mining Task

Provide the specific tasks you will perform on the data set. Include specific questions you will investigate, and the goals for the tasks. This should be independent of the specific techniques you will use to achieve your goals. This section should be a page or less.

Describe the data set(s) you will be using in your project. Include the origin of the data set, an overview of the data set organization, attributes of the data, and challenges of the data set you've selected. Include any information you have about missing values in the data set. This should be one to two pages in length.

Methods and Models

Describe in detail the data mining methods and models you plan to employ to achieve the goals you set in the Data Mining Task section of your document. Include some mention of necessary data transformation. If you're implementing a technique, you should have some idea of how it will be implemented and incorporated into Weka (or some other data mining tool). If you are combining techniques, explain how you intend to use the output of one technique as input into another technique. This section should be up to 5 pages in length. Remember, be detailed, include how you will select the best model from the model space, etc.

Discuss the assessment methodology you will use to validate that you have found meaningful patterns. Will you use n-fold cross-validation, confidence intervals for accuracy, etc. How will you create your training and test sets? What baseline models will you use? This section should be about a page or two in length.

Presentation and Visualization

Describe how your results will be presented and visualized in such a way to show meaningful patterns in the data. This should be up to a page in length.

In this section, discuss the roles that each group member will have in the project. One paragraph per group member is sufficient.

The schedule is a table of dates and tasks that you plan to complete by those dates. Tasks to be done by the progress report must be listed, as well as any other dates you want to set for yourselves. Additional deadlines are highly recommended. Be sure to include when you will have data transformation, modeling, assessment, visualization, etc. completed.

??/??/10

   

Tasks completed by chosen date

??/??/10

   

Tasks to be completed by the progress report date

??/??/10

   

Tasks completed by the class presentation

Bibliography

This is where you list bibliographic information for any references you made throughout the proposal. You should have lots of references.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Applying Data Mining Research Methodologies on Information Systems

Profile image of Grace L Samson

In this paper we considered several frameworks for data mining. These frameworks are based on different approaches, including inductive databases approach, the reductionist statistical approaches, data compression approach, constructive induction approach and some others. We considered advantages and limitations of these frameworks. We presented the view on data mining research as continuous and never- ending development process of an adaptive DM system towards the efficient utilization of available DM techniques for solving a current problem impacted by the dynamically changing environment. We discussed one of the traditional information systems frameworks and, drawing the analogy to this framework, we considered a data mining system as the special kind of adaptive information system. We adapted the information systems development framework for the context of data-mining systems development. Key words: Data Mining, Information Systems, Knowledge Discovery Databases

Related Papers

Information Systems Development

Seppo Puuronen

research proposal data mining

Abstract Data mining applications are typically used in the decision making process. The knowledge discovery process (KDD process for short) is a typical iterative process, in which not only the raw data can be mined several times, but also the mined patterns might constitute the starting point for further mining on them.

Data Mining and Knowledge Discovery

Jean-François Boulicaut

Discovery Science

Panče Panov

Lecture Notes in Computer Science

Toon Calders , Christophe Rigotti

Data Mining Workshops, …

Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and usinga top level ontology.

Grace L Samson , Aminat Showole

Spatial data mining is the quantitative study of phenomena that are located in space. This paper investigates methods of mining patterns of a complex spatial data set (which generally describes any kind of data where the location in space of object holds importance). We based this research on the analysis of some spatial characteristics of certain objects. We began with describing the spatial pattern of events or objects with respect to their attributes; we looked at how to describe the spatial nature/characteristics of entities in an environment with respect to their spatial and non-spatial attributes. We also looked at modelling (predictive modelling/knowledge management of complex spatial systems), querying and implementing a complex spatial database (using data structure and algorithms). Critically speaking, the presence of spatial auto-correlation and the fact that continuous data types are always present in spatial data makes it important to create methods, tools and algorithms to mine spatial patterns in a complex spatial data set. This work is particularly useful to researchers in the ¯eld of data mining as it contributes a whole lot of knowledge to di®erent application areas of data mining especially spatial data mining. It can also be useful in teaching and likewise for other study purposes.

Abstract In recent years more interest of the data mining research community has been deserved in the topic of constrained-based mining because it increases the relevance of the result-set, reduces its volume and the amount of computational work. However, constrained-based mining will be completely feasible only when e cient optimizers for mining languages will be conceived and available. This paper is a rst step towards the construction of optimizers for a constraint-based mining language.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Lothar Richter

Logics for emerging …

Giuseppe Manco

Barbara Buttenfield , Mark Gahegan , May Yuan

Roberto Trasarti

Cosmin Popescu

Computers & Graphics

Fabrice Guillet

ACM Computing Surveys

Carl Mooney , John Roddick

Ruggero G. Pensa

Knowledge Engineering Review

Kenneth McGarry

Ciência da Informação

Scott Cunningham , Alan Porter

Nazha Selmaoui

Anustup Nayak

ACM Transactions on Database Systems

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08

Knowledge and Information Systems

Nazha Selmaoui-Folcher

Proceedings of the 2006 ACM symposium on Applied computing - SAC '06

Ieva Mitasiunaite

Maristella Matera

Elisa Fromont

Decision Engineering

Guisseppi Forgionne

Information Systems

Christoph Helma

IEEE Transactions on Knowledge and Data Engineering

Myra Spiliopoulou

Elisa Bertino

Proceedings of the ACM SIGKDD Workshop on Useful Patterns - UP '10

Carson Leung

Alfred Vella

Journal of Intelligent Information Systems

Bertrand Cuissart

ACM SIGKDD Workshop on Useful Patterns (UP'10)

Sigkdd Explorations

Daniel Lister

in Silico Biology

Mohamed Quafafou , Jean Vaillancourt

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Our Promise
  • Our Achievements
  • Our Mission
  • Proposal Writing
  • System Development
  • Paper Writing
  • Paper Publish
  • Synopsis Writing
  • Thesis Writing
  • Assignments
  • Survey Paper
  • Conference Paper
  • Journal Paper
  • Empirical Paper
  • Journal Support

Data Mining Research Proposal Structure

Data mining refers to the method in which useful patterns and data are extracted out of large datasets for further analysis . This article provides complete information on data mining research proposal. Data mining visualization is affected by the huge amount of data and output device display capacity.

In this place, visual data mining has taken shape recently. Visual data mining represents a novel approach in which very large datasets are explored by integrating the traditional methods of data mining and data visualization . Anyone can understand the importance of data mining methods once they get to know about the data mining applications in one of the most important areas of day-to-day applications that are the health care sector.

How is data mining useful in healthcare?

  • Healthcare scientists, organizations, and researchers can reveal useful information to enhance the clinical testing processes and reduce the medicinal time for drug development by employing data aggregation and data mining techniques.
  • Both data storage and data mining along with warehousing is critical in assisting healthcare businesses with judgment.
  • In the healthcare industry, particularly, decisions must be based on evidence.
  • The use of data storage and data mining has led to more precise decision-making.
  • Healthcare organizations may now easily monitor, understand, and analyze client data from a variety of resources.

Knowing the importance of data mining for the present and future large numbers of students and Research scholars from top universities are adopting Novelty in data mining by carrying out advanced studies. Let us first start with the steps involved in the data mining phases  

DATA MINING PHASES STEPS 

  • Identification of business objectives
  • Data mining aims have to be identified
  • Assessment of required documents
  • Gathering and understanding the data
  • Selection of required data
  • Cleaning and formatting the essential data
  • Selection of algorithms
  • Predictive model building
  • Training the model using dataset sample
  • Testing, verification, and iterations
  • Verifying final model
  • Preparation of visualization

For all these steps of data mining project development, we have dedicated teams of experts, engineers, professionals, writers, developers, and many more who are highly skilled and experienced in data mining research. As a result, we can provide you with a strategy for a chronological and organized approach towards data mining project development . What are the steps in executing data mining projects? The following are the fundamental approaches and steps in the execution of data mining projects.

  • Selection of features
  • Feature Representation
  • Data cleansing and integrating
  • Data sources for integration include financial statements, internal controls, auditing reports, and account ledger
  • Selection of data
  • Data transformation
  • Training and testing dataset
  • Fraud and nonfraud firms
  • Classification and clustering
  • Prediction and regression
  • Outlier detection and visualization
  • Neural networks and regression
  • Naive Bayes
  • Fuzzy logic and expert system
  • Genetic algorithm
  • Decision tree and nearest neighbor
  • Bayesian Belief network
  • Evaluation of pattern
  • Trend and pattern post-processing
  • Evaluation of performance
  • Performance metrics like error rate

Proper explanation with advanced technical notes will be provided to you on all these steps and approaches once you reach out to us. With the massive amount of reliable research data from updated sources of top journals and benchmark book references, we ensure to provide you with the best support for your data mining research proposal development . What are the methods of data mining?

Data Mining Methods 

  • Naive Bayes algorithms and decision tree
  • K means clustering and support vector machines
  • – clustering and nonnegative matrix fabrication
  • Expert systems
  • Data mining and machine learning soft sets
  • Intelligent agents and genetic algorithms
  • Hashing methods and apriori techniques
  • Artificial neural networks
  • Expert systems and neural networks

Practical explanations and research demonstrations on all these methods will be provided to you by our technical experts. More explanations on these methods of data mining are available at our website on data mining research proposal . The following are the steps involved in conventional methods of data mining

  • Problem definition
  • Accurate results
  • Automation potential (analysis of large dataset)

By integrating conventional methods and approaches towards data mining and the recent breakthroughs, data scientists have come up with data visualization techniques . The following are the points in data visualization

  • Vague exploration of objectives
  • Noisy and inhomogeneous data handling
  • Little knowledge about the data involved
  • Direct involvement of the user in the process of exploration

Data visualization methods vary depending upon the data type. Data can be a combination of multivariate, univariate, and bivariate . The following are the important points about multivariate data

  • System of dynamic parallel coordinates
  • Icon and pixel methods for representation
  • Representation and multiple dimensions

For further clarification on data visualization methods, you can feel free to contact us at any time. Get in touch with for any support in a data mining research project . The professionalism and reliable research guidance are assured to you as you enroll in our research guidance facility. Now we shall see about some of the major usage mining methods

  • Methods involved include mode, median, and frequency determination (to project length, recent reviews, and pages)
  • Data is gathered as filtration data (out of Preprocessing section)
  • Session logs are the primary datastore
  • Merits of pattern discovery
  • Used in the extraction of significant data out of established pattern correlations
  • Important algorithms
  • Fuzzy C means algorithm
  • K means genetic algorithm
  • Methods used include rolling up and drilling down approaches
  • Data from the discovered pattern is gathered
  • Data cube is the data store which is a multidimensional database
  • Merits of pattern analysis
  • Unwarranted patterns and rules are segregated
  • OLAP and SQL language
  • Web status code method is used
  • Data collection is made from web access logs, websites data logs, user login data, cookies, and caches
  • Weblog is the primary data source
  • Merits of data processing
  • Conversion of raw data into readable form
  • Interpretation in extended CLF and common LF for recording
  • Major algorithms
  • FP growth and Apriori algorithms

Advanced details including the real-time implementation examples of all these methods along with the details of our successful projects will be shared with you as you reach out to us. The world classified engineers with us are experts in the field of data mining research and development. We also provide complete support in writing by giving multiple revisions, formatting, and editing support with a total grammatical check. Let us now see about data mining datasets.

Datasets for Data Mining

  • Sports databases which is a collection of various sports like hockey, football, baseball, and basketball
  • Huge number of datasets which is more than ten thousand
  • It consists of data from government, businesses, education, and entertainment
  • Open data status assessment
  • Space exploration data from NASA
  • Documents related to life science, space, astrophysics, and solar physics
  • Press releases and statistical yearbooks
  • Reports and data from multiple websites (around seventy)
  • Data from countries like Latin America, Africa, Asia, and Europe
  • European union Pascal2 networks data repository
  • Advanced graphical data collection
  • Scientific computing, machine learning, and social science networking data
  • Market data access

Usually, students and scholars reach out to us for help in handling these datasets effectively. We have been providing extraordinary research help and support with all kinds of technologically updated data and authentic references. Data mining research proposal writing and thesis writing become easy with the help and support of our qualified writers. So you can confidently check out our services for your data mining research. Let us now talk about the future scope of data mining.

What is the future of data mining?

Among the most extensively utilized ways for extracting data from multiple sources and organizing it for optimal use is called data mining . Organizations are obliged to continue with all the new advancements in the world of data mining, which is evolving at a quick pace. Therefore knowing the emerging trends in data mining is highly important about which we have discussed below

  • Ubiquitous data mining
  • Multimedia data mining
  • Distributed data mining
  • Time series data mining
  • Visual data mining
  • Exploration of potential applications
  • Interactive methods for data mining research
  • Scalability
  • Data mining language standardization
  • Database and data mining integration with web and data warehousing

Since data mining research is developing at such a faster rate, taking up projects in this field will fetch you extensive knowledge, experience, awareness, and scope for future study . Completion of a research project does not end with project execution but it extends to writing thesis, proposals, assignments, and submitting research papers . In this regard let us have a look into the most important part of literature called research proposal. First of all, what is a scientific research proposal?

  • A scientific research proposal refers to any document that proposes a research topic, usually in the scientific disciplines or academics, and usually serves as a demand for funding.
  • Proposals are assessed based on the suggested study’s price and likely influence, as well as the feasibility of the suggested plan for conducting that out. In practice, research projects cover the following topics
  • How would the research findings be assessed?
  • How would the scientific research questions be handled, and how should they be managed? 
  • What previous study has been carried out on the subject?
  • What more time and money would the research require? 

Scientific research ideas are usually created on the basis of the key steps in the drafting of a thesis, academic papers, or dissertations. They usually have an abstract, a literature review, a description of research technique and objectives, and conclusions, just like a research paper. This fundamental basis may differ between initiatives and disciplines, with each having its own set of criteria. For all these aspects, our expert teams are here to guide you completely. Let us now look into the structure of the data mining research.

  • Project title
  • Abstract writing
  • Project introduction (data mining research overview)
  • Background and project significance
  • Objectives of the study
  • Problem under discussion
  • Possible pitfalls if any
  • Literature survey
  • Operations and tasks of data mining
  • Databases and different datasets
  • Models and methods involved
  • Mathematical and analytical formulations
  • Complete structure
  • Application development
  • Expected outcome
  • Potential areas of applications
  • Execution timeline
  • Scope for future research

Ultimately these are all the components of a research proposal. With separate teams of highly qualified experts , we are here to provide you with total support for your data mining research proposal . Get in touch with us to get any of your queries resolved.

MILESTONE 1: Research Proposal

Finalize journal (indexing).

Before sit down to research proposal writing, we need to decide exact journals. For e.g. SCI, SCI-E, ISI, SCOPUS.

Research Subject Selection

As a doctoral student, subject selection is a big problem. Phdservices.org has the team of world class experts who experience in assisting all subjects. When you decide to work in networking, we assign our experts in your specific area for assistance.

Research Topic Selection

We helping you with right and perfect topic selection, which sound interesting to the other fellows of your committee. For e.g. if your interest in networking, the research topic is VANET / MANET / any other

Literature Survey Writing

To ensure the novelty of research, we find research gaps in 50+ latest benchmark papers (IEEE, Springer, Elsevier, MDPI, Hindawi, etc.)

Case Study Writing

After literature survey, we get the main issue/problem that your research topic will aim to resolve and elegant writing support to identify relevance of the issue.

Problem Statement

Based on the research gaps finding and importance of your research, we conclude the appropriate and specific problem statement.

Writing Research Proposal

Writing a good research proposal has need of lot of time. We only span a few to cover all major aspects (reference papers collection, deficiency finding, drawing system architecture, highlights novelty)

MILESTONE 2: System Development

Fix implementation plan.

We prepare a clear project implementation plan that narrates your proposal in step-by step and it contains Software and OS specification. We recommend you very suitable tools/software that fit for your concept.

Tools/Plan Approval

We get the approval for implementation tool, software, programing language and finally implementation plan to start development process.

Pseudocode Description

Our source code is original since we write the code after pseudocodes, algorithm writing and mathematical equation derivations.

Develop Proposal Idea

We implement our novel idea in step-by-step process that given in implementation plan. We can help scholars in implementation.

Comparison/Experiments

We perform the comparison between proposed and existing schemes in both quantitative and qualitative manner since it is most crucial part of any journal paper.

Graphs, Results, Analysis Table

We evaluate and analyze the project results by plotting graphs, numerical results computation, and broader discussion of quantitative results in table.

Project Deliverables

For every project order, we deliver the following: reference papers, source codes screenshots, project video, installation and running procedures.

MILESTONE 3: Paper Writing

Choosing right format.

We intend to write a paper in customized layout. If you are interesting in any specific journal, we ready to support you. Otherwise we prepare in IEEE transaction level.

Collecting Reliable Resources

Before paper writing, we collect reliable resources such as 50+ journal papers, magazines, news, encyclopedia (books), benchmark datasets, and online resources.

Writing Rough Draft

We create an outline of a paper at first and then writing under each heading and sub-headings. It consists of novel idea and resources

Proofreading & Formatting

We must proofread and formatting a paper to fix typesetting errors, and avoiding misspelled words, misplaced punctuation marks, and so on

Native English Writing

We check the communication of a paper by rewriting with native English writers who accomplish their English literature in University of Oxford.

Scrutinizing Paper Quality

We examine the paper quality by top-experts who can easily fix the issues in journal paper writing and also confirm the level of journal paper (SCI, Scopus or Normal).

Plagiarism Checking

We at phdservices.org is 100% guarantee for original journal paper writing. We never use previously published works.

MILESTONE 4: Paper Publication

Finding apt journal.

We play crucial role in this step since this is very important for scholar’s future. Our experts will help you in choosing high Impact Factor (SJR) journals for publishing.

Lay Paper to Submit

We organize your paper for journal submission, which covers the preparation of Authors Biography, Cover Letter, Highlights of Novelty, and Suggested Reviewers.

Paper Submission

We upload paper with submit all prerequisites that are required in journal. We completely remove frustration in paper publishing.

Paper Status Tracking

We track your paper status and answering the questions raise before review process and also we giving you frequent updates for your paper received from journal.

Revising Paper Precisely

When we receive decision for revising paper, we get ready to prepare the point-point response to address all reviewers query and resubmit it to catch final acceptance.

Get Accept & e-Proofing

We receive final mail for acceptance confirmation letter and editors send e-proofing and licensing to ensure the originality.

Publishing Paper

Paper published in online and we inform you with paper title, authors information, journal name volume, issue number, page number, and DOI link

MILESTONE 5: Thesis Writing

Identifying university format.

We pay special attention for your thesis writing and our 100+ thesis writers are proficient and clear in writing thesis for all university formats.

Gathering Adequate Resources

We collect primary and adequate resources for writing well-structured thesis using published research articles, 150+ reputed reference papers, writing plan, and so on.

Writing Thesis (Preliminary)

We write thesis in chapter-by-chapter without any empirical mistakes and we completely provide plagiarism-free thesis.

Skimming & Reading

Skimming involve reading the thesis and looking abstract, conclusions, sections, & sub-sections, paragraphs, sentences & words and writing thesis chorological order of papers.

Fixing Crosscutting Issues

This step is tricky when write thesis by amateurs. Proofreading and formatting is made by our world class thesis writers who avoid verbose, and brainstorming for significant writing.

Organize Thesis Chapters

We organize thesis chapters by completing the following: elaborate chapter, structuring chapters, flow of writing, citations correction, etc.

Writing Thesis (Final Version)

We attention to details of importance of thesis contribution, well-illustrated literature review, sharp and broad results and discussion and relevant applications study.

How PhDservices.org deal with significant issues ?

1. novel ideas.

Novelty is essential for a PhD degree. Our experts are bringing quality of being novel ideas in the particular research area. It can be only determined by after thorough literature search (state-of-the-art works published in IEEE, Springer, Elsevier, ACM, ScienceDirect, Inderscience, and so on). SCI and SCOPUS journals reviewers and editors will always demand “Novelty” for each publishing work. Our experts have in-depth knowledge in all major and sub-research fields to introduce New Methods and Ideas. MAKING NOVEL IDEAS IS THE ONLY WAY OF WINNING PHD.

2. Plagiarism-Free

To improve the quality and originality of works, we are strictly avoiding plagiarism since plagiarism is not allowed and acceptable for any type journals (SCI, SCI-E, or Scopus) in editorial and reviewer point of view. We have software named as “Anti-Plagiarism Software” that examines the similarity score for documents with good accuracy. We consist of various plagiarism tools like Viper, Turnitin, Students and scholars can get your work in Zero Tolerance to Plagiarism. DONT WORRY ABOUT PHD, WE WILL TAKE CARE OF EVERYTHING.

3. Confidential Info

We intended to keep your personal and technical information in secret and it is a basic worry for all scholars.

  • Technical Info: We never share your technical details to any other scholar since we know the importance of time and resources that are giving us by scholars.
  • Personal Info: We restricted to access scholars personal details by our experts. Our organization leading team will have your basic and necessary info for scholars.

CONFIDENTIALITY AND PRIVACY OF INFORMATION HELD IS OF VITAL IMPORTANCE AT PHDSERVICES.ORG. WE HONEST FOR ALL CUSTOMERS.

4. Publication

Most of the PhD consultancy services will end their services in Paper Writing, but our PhDservices.org is different from others by giving guarantee for both paper writing and publication in reputed journals. With our 18+ year of experience in delivering PhD services, we meet all requirements of journals (reviewers, editors, and editor-in-chief) for rapid publications. From the beginning of paper writing, we lay our smart works. PUBLICATION IS A ROOT FOR PHD DEGREE. WE LIKE A FRUIT FOR GIVING SWEET FEELING FOR ALL SCHOLARS.

5. No Duplication

After completion of your work, it does not available in our library i.e. we erased after completion of your PhD work so we avoid of giving duplicate contents for scholars. This step makes our experts to bringing new ideas, applications, methodologies and algorithms. Our work is more standard, quality and universal. Everything we make it as a new for all scholars. INNOVATION IS THE ABILITY TO SEE THE ORIGINALITY. EXPLORATION IS OUR ENGINE THAT DRIVES INNOVATION SO LET’S ALL GO EXPLORING.

Client Reviews

I ordered a research proposal in the research area of Wireless Communications and it was as very good as I can catch it.

I had wishes to complete implementation using latest software/tools and I had no idea of where to order it. My friend suggested this place and it delivers what I expect.

It really good platform to get all PhD services and I have used it many times because of reasonable price, best customer services, and high quality.

My colleague recommended this service to me and I’m delighted their services. They guide me a lot and given worthy contents for my research paper.

I’m never disappointed at any kind of service. Till I’m work with professional writers and getting lot of opportunities.

- Christopher

Once I am entered this organization I was just felt relax because lots of my colleagues and family relations were suggested to use this service and I received best thesis writing.

I recommend phdservices.org. They have professional writers for all type of writing (proposal, paper, thesis, assignment) support at affordable price.

You guys did a great job saved more money and time. I will keep working with you and I recommend to others also.

These experts are fast, knowledgeable, and dedicated to work under a short deadline. I had get good conference paper in short span.

Guys! You are the great and real experts for paper writing since it exactly matches with my demand. I will approach again.

I am fully satisfied with thesis writing. Thank you for your faultless service and soon I come back again.

Trusted customer service that you offer for me. I don’t have any cons to say.

I was at the edge of my doctorate graduation since my thesis is totally unconnected chapters. You people did a magic and I get my complete thesis!!!

- Abdul Mohammed

Good family environment with collaboration, and lot of hardworking team who actually share their knowledge by offering PhD Services.

I enjoyed huge when working with PhD services. I was asked several questions about my system development and I had wondered of smooth, dedication and caring.

I had not provided any specific requirements for my proposal work, but you guys are very awesome because I’m received proper proposal. Thank you!

- Bhanuprasad

I was read my entire research proposal and I liked concept suits for my research issues. Thank you so much for your efforts.

- Ghulam Nabi

I am extremely happy with your project development support and source codes are easily understanding and executed.

Hi!!! You guys supported me a lot. Thank you and I am 100% satisfied with publication service.

- Abhimanyu

I had found this as a wonderful platform for scholars so I highly recommend this service to all. I ordered thesis proposal and they covered everything. Thank you so much!!!

Related Pages

Data Mining Research Proposal

Data Mining Research Proposal experts guide you through every step of your research, from crafting an introduction to defining the problem statement, establishing the significance of your research, setting aims and objectives, conducting a literature review, formulating research questions, selecting research methods, developing hypotheses, creating an analytical framework, and gathering data from various sources. Our team at phdprojects.org is here to assist you throughout the process.

Writing an efficient research proposal is examined as a fascinating and a little bit complicated task. Several major steps must be involved while writing a research proposal. Encompassing the issues and suggested solutions, we provide an extensive instance of a research proposal concentrated on data mining in healthcare:

Research Proposal: Enhancing Predictive Analytics for Early Disease Detection in Healthcare Using Data Mining

  • Introduction

Context and Background: From different resources such as medical imaging, electronic health records (EHRs), and patient monitoring models, healthcare frameworks produce huge amounts of data. Therefore, decreased healthcare expenses, early disease identification, and enhanced patient findings are resulted while examining this data in an efficient manner. Crucial limitations in obtaining eloquent perceptions are depicted by the complication and volume of healthcare data.

Problem Description: Generally, problems relevant to understandability, data quality, and scalability are faced by recent predictive analytics systems for early disease identification. The efficient utilization of data mining approaches in healthcare are interrupted by these limitations. Insufficient early diagnosis and treatment are produced.

  • As a means to manage data quality problems, we construct efficient data preprocessing approaches.
  • Generally, scalable data mining methods should be developed in such a manner that contains the ability to manage huge healthcare datasets.
  • In order to assure that the predictive models are practicable for healthcare service providers, our team improves the understandability of predictive models.
  • Literature Review

Current State of Research:

  • Data Quality in Healthcare: The popularity of missing, noisy, and unreliable data in healthcare, that make difficulties in predictive analytics are emphasized in this research.
  • Scalability Issues: Because of the rising size and complication of healthcare data, previous data mining systems are incapable of scaling in an efficient manner.
  • Model Interpretability: For clinicians, it is complicated to rely on and deploy the findings, because of several authentic predictive models which are based on deep learning that results in insufficiency of transparency.

Research Gaps:

  • Appropriate for healthcare, investigation based on extensive data preprocessing models are insufficient.
  • In order to process and examine extensive healthcare data in an effective manner, there is a requirement for scalable methods.
  • Efficient approaches for enhancing the understandability of complicated predictive models are inadequate.
  • Research Queries
  • In what way can data preprocessing approaches be enhanced to solve usual data quality problems in healthcare datasets?
  • What adaptable data mining methods can be constructed to manage extensive and complicated healthcare datasets in an efficient manner?
  • In what manner can the understandability of predictive models be improved to make them more useful for healthcare service providers?
  • Proposed Methodology

Data Preprocessing:

Issue: The healthcare data is unreliable, imperfect, and noisy. Therefore, the effectiveness of predictive models could be adversely influenced.

Suggested Solution: By encompassing the following factors, we construct an extensive data preprocessing model:

  • Data Cleaning: For missing data, our team focuses on applying innovative imputation approaches like k-Nearest Neighbors (k-NN) imputation.
  • Noise Filtering: In order to detect and rectify noisy data points, it is beneficial to employ anomaly detection techniques.
  • Normalization and Standardization: To assure reliability among various data resources, we plan to implement suitable methods for normalizing data.

Approaches:

  • Imputation Algorithms: Expectation-Maximization, k-NN.
  • Noise Filtering: Robust Principal Component Analysis (PCA), Isolation Forest.
  • Normalization: Z-score Standardization, Min-Max Scaling.

Data Mining Algorithms:

Issue: Due to the size of healthcare datasets, previous methods are incapable of scaling in an efficient manner. In actual world applications, this constrains their usage.

Suggested Solution: Concentrating on the below mentioned aspects, our team creates scalable methods for data mining:

  • Distributed Data Mining: As a means to disseminate the processing of huge datasets, our team makes use of models such as Apache Spark.
  • Incremental Learning: Without the requirement for widespread retraining, upgrade systems progressively when novel data occur, through applying appropriate methods.
  • Efficient Feature Selection: For choosing the most significant characteristics, we aim to create suitable techniques. It significantly enhances algorithm effectiveness and decreases the dimensionality of the data.
  • Distributed Algorithms: Hadoop MapReduce, Spark MLlib.
  • Incremental Learning: Incremental PCA, Online Gradient Descent.
  • Feature Selection: Lasso Regression, Recursive Feature Elimination (RFE).

Model Interpretability:

Issue: Generally, complicated predictive models are problematic to understand which employs deep learning. Therefore, their utilization and approval are constrained by healthcare experts.

Suggested Solution: By means of following perspectives, we improve model understandability:

  • Explainable AI Techniques: As a means to offer perceptions based on model forecasts, our team applies approaches such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations).
  • Simplified Models: Typically, the simplified versions of complicated frameworks have to be constructed in such a way which provide a trade-off among understandability and precision.
  • Visualization Tools: In order to facilitate healthcare experts to investigate and interpret system outputs, we focus on developing visualization tools.
  • Explainable AI: Integrated Gradients, SHAP, LIME.
  • Simplified Models: Rule-Based Systems, Decision Trees.
  • Visualization: Model-specific visualization tools, Interactive dashboards.
  • Expected Outcomes
  • Improved Data Quality: Generally, an efficient preprocessing model could be provided which contains the capability to clean and formulates healthcare data for analysis.
  • Scalable Data Mining Algorithms: To manage huge healthcare datasets, novel or improved methods can be offered. Therefore, beneficial and precise disease identification is produced.
  • Enhanced Model Interpretability: Predictive models are developed authentically as well as intelligibly. For healthcare experts, it could offer practical perspectives.
  • Evaluation and Validation

Evaluation Metrics:

  • Data Quality Improvement: Before and after preprocessing, focus on assessing parameters like data extensiveness, reliability, and precision.
  • Model Performance: It is approachable to evaluate computational effectiveness and scalability and also usual parameters such as precision, F1-score, accuracy, and recall.
  • Model Interpretability: Considering the quantitative criterions of model interpretability and user suggestion to interpret model outputs, test the time required for healthcare experts.

Validation Approach:

  • Data Quality Validation: On actual world healthcare datasets, we plan to compare data quality parameters before and after implementing preprocessing approaches.
  • Algorithm Validation: In huge healthcare datasets from resources such as MIMIC-III, it is approachable to assess the effectiveness and adaptability of constructed methods.
  • Interpretability Validation: As a means to evaluate the utilization and interpretability of the suggested frameworks and visualization tools, our team carries out user studies with healthcare experts.

I have to do a final year project on Data Mining for healthcare. I am finding it difficult to get Data set. Where to find the data set?

The process of choosing efficient and suitable datasets is determined as challenging as well as intriguing. We offer few reliable resources in which you could identify healthcare-based datasets:

Publicly Available Healthcare Datasets

  • Explanation: Involving healthcare, Kaggle manages several datasets among different fields. Specifically, for datasets and associative projects, it is examined as an excellent environment.
  • Instances of Datasets:
  • Diabetes Health Indicators Dataset
  • COVID-19 Open Research Dataset (CORD-19)
  • Heart Disease Data Set
  • URL: Kaggle Datasets
  • UCI Machine Learning Repository
  • Explanation: Encompassing a diversity of healthcare datasets, the UCI repository is considered as one of the earliest and most common resources for datasets.
  • Parkinson’s Disease Data Set
  • Breast Cancer Wisconsin (Diagnostic) Data Set
  • URL: UCI Machine Learning Repository
  • Explanation: To a huge set of logged physiologic signals and relevant data, PhysioNet provides open access. For data mining in healthcare, it is perfect and effective.
  • ICU data for patient monitoring
  • MIMIC-III Clinical Database (Medical Information Mart for Intensive Care)
  • PhysioBank: Contains ECG, EEG, and other physiological data
  • URL: PhysioNet
  • National Institutes of Health (NIH)
  • Explanation: A diversity of healthcare and biomedical datasets are offered by NIH, which are accessible to the public for research usages.
  • The Cancer Imaging Archive (TCIA)
  • Genomic Data Commons (GDC) Data Portal
  • National Cancer Institute (NCI) Genomic Data Commons
  • URL: NIH Data Sharing
  • MIMIC-III and MIMIC-IV (Medical Information Mart for Intensive Care)
  • Explanation: From ICU patients, MIMIC-III and MIMIC-IV contain extensive data such as major indicators, demographics, and lab outcomes.
  • Clinical notes and diagnostic codes
  • Patient data from intensive care units
  • Vital signs and laboratory measurements
  • URL: MIMIC-III, MIMIC-IV
  • Explanation: A huge collection of medical images of cancer are offered by TCIA, which is available for public download. For constructing data mining applications in medical imaging, it is helpful.
  • Breast cancer screening images
  • Lung cancer screening images
  • Brain tumor images
  • The Health Data Repository from Google Cloud
  • Explanation: Appropriate for analysis and model training, Google Cloud is capable of offering access to different healthcare datasets.
  • Clinical trials data
  • COVID-19 Open Data
  • Genomics and cancer research data
  • URL: Google Cloud Public Datasets
  • Explanation: For exchanging datasets, methods, and machine learning experimentations, OpenMl is determined as an openly available environment. Generally, healthcare datasets are encompassed.
  • Diabetes classification dataset
  • Sepsis survivor data
  • Breast cancer diagnostic data
  • URL: OpenML Healthcare Datasets

Academic and Governmental Datasets

  • Centers for Disease Control and Prevention (CDC)
  • Explanation: Relevant to public health and healthcare, the CDC offers a diversity of datasets.
  • National Hospital Ambulatory Medical Care Survey (NHAMCS)
  • National Health and Nutrition Examination Survey (NHANES)
  • Behavioral Risk Factor Surveillance System (BRFSS)
  • URL: CDC Data & Statistics
  • World Health Organization (WHO)
  • Explanation: Global health data are offered by WHO, which could be utilized for investigation in epidemiology and public health.
  • Disease incidence and mortality data
  • Global Health Observatory data repository
  • Health indicators and statistics
  • URL: WHO Global Health Observatory
  • European Union Open Data Portal
  • Explanation: A huge scope of data generated through EU universities are utilized by the EU Open Data Portal. This dataset involves clinical and healthcare datasets.
  • Healthcare access and quality data
  • Eurostat health data
  • ECDC COVID-19 data
  • URL: EU Open Data Portal
  • Explanation: A database of publicly and privately sponsored clinical studies carried out all over the world are employed by ClinicalTrials.gov.
  • Intervention and control data
  • Data from completed and ongoing clinical trials
  • Study outcomes and patient demographics

Data Mining Research Proposal Topics & Ideas

Data Mining Research Proposal Topics & Ideas – We have offered a widespread instance of a research proposal based on data mining in healthcare, as well as reliable sources that assist you to detect appropriate and effective healthcare-based datasets. The below indicated details will be beneficial as well as assistive.

  • Research on Improved Data-Mining Algorithm Based on Strong Correlation
  • Application of data mining in the analysis of needs of university library users
  • Data mining with inference networks
  • Intelligent data mining principles with privacy preserving procedures
  • Diagnostics of bar and end-ring connector breakage faults in polyphase induction motors through a novel dual track of time-series data mining and time-stepping coupled FE-state space modeling
  • The use of independent component analysis as a tool for data mining
  • An Evolutionary Data Mining Model for Fuzzy Concept Extraction
  • Visual Data Mining of SARS Distribution Using Self-Organization Maps
  • An empirical study of applying data mining techniques to the prediction of TAIEX Futures
  • An Intelligent Traffic Monitoring Embedded System using Video Data Mining
  • Data Mining Used in Rule Design for Active Database Systems
  • Data mining and automatic OLAP schema generation
  • An intelligent framework for protecting privacy of individuals empirical evaluations on data mining classification
  • Efficient analysis of pharmaceutical compound structure based on pattern matching algorithm in data mining techniques
  • Mapping Rules Based Data Mining for Effective Decision Support Application
  • Data Mining in The NBA: An Applied Approach
  • Prediction of Tumor in Mammogram Images Using Data Mining Models
  • A Review on Privacy-Preserving Data Mining
  • An IoT inspired semiconductor Reliability test system integrated with data-mining applications
  • Optimizing Data Mining Efficiency in Professional Farmer Simulation Training System with Cloud-Edge Collaboration
  • Using Genetic Algorithm for Data Mining Optimization in an Image Database
  • Data Mining and Fusion of Unobtrusive Sensing Solutions for Indoor Activity Recognition
  • Datawarehouse design for educational data mining
  • Data Mining Application Based on Cloud Model in Spatial Decision Support System
  • Very Short-Term Estimation of Global Horizontal Irradiance Using Data Mining Methods
  • Data Mining Technology Assists in The Construction of The Influencing Factor Model of Learners’ Satisfaction in Offline Online and Offline Hybrid Golden Courses
  • The Neural Network Algorithm for Data-Mining in Dynamic Environments
  • Targeting customers with data mining techniques: Classification
  • Research on the application of data mining to customer relationship management in the mobile communication industry
  • Construction of “One Belt and One Road” Intelligent Analysis System Based on Cloud Model Data Mining Algorithm
  • PHD Guidance
  • PHD PROJECTS UK
  • PHD ASSISTANCE IN BANGALORE
  • PHD Assistance
  • PHD In 3 Months
  • PHD Dissertation Help
  • PHD IN JAVA PROGRAMMING
  • PHD PROJECTS IN MATLAB
  • PHD PROJECTS IN RTOOL
  • PHD PROJECTS IN WEKA
  • PhD projects in computer networking
  • COMPUTER SCIENCE THESIS TOPICS FOR UNDERGRADUATES
  • PHD PROJECTS AUSTRALIA
  • PHD COMPANY
  • PhD THESIS STRUCTURE
  • PHD GUIDANCE HELP
  • PHD PROJECTS IN HADOOP
  • PHD PROJECTS IN OPENCV
  • PHD PROJECTS IN SCILAB
  • PHD PROJECTS IN WORDNET
  • NETWORKING PROJECTS FOR PHD
  • THESIS TOPICS FOR COMPUTER SCIENCE STUDENTS
  • IEEE JOURNALS IN COMPUTER SCIENCE
  • OPEN ACCESS JOURNALS IN COMPUTER SCIENCE
  • SCIENCE CITATION INDEX COMPUTER SCIENCE JOURNALS
  • SPRINGER JOURNALS IN COMPUTER SCIENCE
  • ELSEVIER JOURNALS IN COMPUTER SCIENCE
  • ACM JOURNALS IN COMPUTER SCIENCE
  • INTERNATIONAL JOURNALS FOR COMPUTER SCIENCE AND ENGINEERING
  • COMPUTER SCIENCE JOURNALS WITHOUT PUBLICATION FEE
  • SCIENCE CITATION INDEX EXPANDED JOURNALS LIST
  • THOMSON REUTERS INDEXED JOURNALS
  • DOAJ COMPUTER SCIENCE JOURNALS
  • SCOPUS INDEXED COMPUTER SCIENCE JOURNALS
  • SCI INDEXED COMPUTER SCIENCE JOURNALS
  • SPRINGER JOURNALS IN COMPUTER SCIENCE AND TECHNOLOGY
  • ISI INDEXED JOURNALS IN COMPUTER SCIENCE
  • PAID JOURNALS IN COMPUTER SCIENCE
  • NATIONAL JOURNALS IN COMPUTER SCIENCE AND ENGINEERING
  • MONTHLY JOURNALS IN COMPUTER SCIENCE
  • SCIMAGO JOURNALS LIST
  • THOMSON REUTERS INDEXED COMPUTER SCIENCE JOURNALS
  • RESEARCH PAPER FOR SALE
  • CHEAP PAPER WRITING SERVICE
  • RESEARCH PAPER ASSISTANCE
  • THESIS BUILDER
  • WRITING YOUR JOURNAL ARTICLE IN 12 WEEKS
  • WRITE MY PAPER FOR ME
  • PHD PAPER WRITING SERVICE
  • THESIS MAKER
  • THESIS HELPER
  • DISSERTATION HELP UK
  • DISSERTATION WRITERS UK
  • BUY DISSERTATION ONLINE
  • PHD THESIS WRITING SERVICES
  • DISSERTATION WRITING SERVICES UK
  • DISSERTATION WRITING HELP
  • PHD PROJECTS IN COMPUTER SCIENCE
  • DISSERTATION ASSISTANCE

Data Mining Research Proposal

Which state is better: Oklahoma or Missouri?

What state is best to invest in real estate: Florida or Alabama?

What state is best to invest in real estate: Florida or Massachusetts?

What state is best to buy a car: Georgia or Indiana?

What state is best to start an LLC: Indiana or Oklahoma?

A data mining research proposal is one which outlines the findings of the project by suing the tools provided by data mining. Data mining is a research component which involves the collection of data and then obtaining the requite findings from that data. It can be of several types like business data mining or academic data mining.

Sample Data Mining Research Proposal

Proposal compiled by: Grocery Supermarket Chain Pvt. Ltd.

Nature of proposal: We have used SPSS and Oracle software’s and used their data mining qualities in analyzing market trends and buying patterns. We have discovered many interesting phenomena using this technique of data mining and analysis:

  • Women are most likely to complete their grocery shopping on Tuesdays and Fridays.
  • Men are most likely to complete their grocery shopping on Fridays and Saturdays.

Data used: We have analyzed customer buying patterns over a period of three years and this data was then fed in to our software’s which is how we realized these interesting phenomena.

Benefits of such a data mining research proposal:

  • The results of this project give us an insight into the buying patterns of men and women. This will give rise of new considerations during marketing propaganda.
  • The findings support the theory of differential buying patterns on part of the two sexes.
  • The findings are interesting and make for amusing as well as entertaining reading.
  • The findings can be used for other statistical and data mining efforts.

Cost of data mining project: $ 2090000

Facebook

Related Posts:

Data Mining Project Proposal Template

  • Great for beginners
  • Ready-to-use, fully customizable Subcategory
  • Get started in seconds

slide 1

Are you ready to uncover valuable insights and make data-driven decisions? Look no further than ClickUp's Data Mining Project Proposal Template!

Crafting a winning data mining project proposal can be a daunting task, but with this template, you'll have everything you need to succeed.

With ClickUp's Data Mining Project Proposal Template, you can:

  • Clearly define project objectives, scope, and deliverables
  • Lay out a detailed timeline and allocate resources efficiently
  • Present your data mining approach and methodology with confidence

No matter the size or complexity of your data mining project, this template will guide you every step of the way. Get started today and turn data into actionable insights!

Benefits of Data Mining Project Proposal Template

Data mining is a powerful tool for extracting valuable insights from large datasets. By using the Data Mining Project Proposal Template, you can:

  • Clearly outline the objectives, scope, and timeline of your data mining project
  • Identify the specific data sources and variables that will be analyzed
  • Define the methodologies and techniques that will be used to extract and analyze the data
  • Present a comprehensive plan for data visualization and reporting
  • Ensure that your data mining project is aligned with your organization's goals and objectives
  • Streamline the proposal process and save time and effort in creating a professional and persuasive document.

Main Elements of Data Mining Project Proposal Template

ClickUp's Data Mining Project Proposal template is designed to help you efficiently plan and execute your data mining projects. Here are the main elements of this template:

  • Custom Statuses: Keep track of the progress of your data mining projects with two customizable statuses - Open and Complete.
  • Custom Fields: Utilize custom fields to capture important information related to your data mining projects, such as project objectives, data sources, target audience, and more.
  • Whiteboard View: Visualize your project proposal and brainstorm ideas using the Whiteboard view. Collaborate with your team, add sticky notes, and create a dynamic workspace for your data mining project.
  • Project Proposal View: Use the Project Proposal view to outline the scope, deliverables, timeline, and resources required for your data mining project. Easily share this view with stakeholders for approval and alignment.
  • Getting Started Guide View: Access a comprehensive guide that provides step-by-step instructions and best practices to help you get started with your data mining project. This view serves as a reference point for your team throughout the project lifecycle.

How to Use Project Proposal for Data Mining

If you're ready to dive into a data mining project, use these 6 steps to effectively utilize the Data Mining Project Proposal Template in ClickUp:

1. Define your project objectives

Start by clearly defining the objectives of your data mining project. What specific insights or outcomes do you hope to achieve? Whether it's identifying patterns, predicting trends, or optimizing processes, having a clear understanding of your goals will guide your entire project.

Use the Goals feature in ClickUp to outline and track your project objectives.

2. Gather and prepare your data

To begin your data mining project, you'll need to gather and prepare the necessary data. Identify the sources of data you'll be using and ensure that it is clean, organized, and relevant to your project objectives. This may involve data cleaning, data merging, or data transformation.

Utilize the Table view in ClickUp to organize and manage your data collection process.

3. Select the appropriate data mining techniques

Based on your project objectives and the nature of your data, choose the most appropriate data mining techniques to apply. This could include methods such as classification, clustering, regression, or association rule mining. Selecting the right techniques will ensure that you extract valuable insights from your data.

Create custom fields in ClickUp to track and document the specific data mining techniques you plan to use.

4. Develop a project timeline

Creating a project timeline is crucial for keeping your data mining project on track. Break down your project into smaller tasks and assign realistic deadlines to each task. This will help you stay organized and ensure that you're making progress towards your objectives.

Use the Gantt chart view in ClickUp to visualize and manage your project timeline effectively.

5. Implement and evaluate your models

Now it's time to implement the selected data mining techniques and build your models. Apply the chosen algorithms and analyze the results to gain insights and make data-driven decisions. Evaluate the performance of your models and fine-tune them as needed to improve accuracy and effectiveness.

Leverage the Automations feature in ClickUp to streamline and automate repetitive tasks during the implementation and evaluation process.

6. Present your findings and recommendations

Finally, prepare a comprehensive report that presents your findings, insights, and recommendations based on the results of your data mining project. Clearly communicate the implications and potential benefits of your findings to stakeholders, and outline any further actions that should be taken based on your conclusions.

Use the Docs feature in ClickUp to create visually appealing and informative reports to share with your team and stakeholders.

By following these 6 steps and utilizing the Data Mining Project Proposal Template in ClickUp, you'll be well-equipped to embark on a successful data mining project and uncover valuable insights from your data.

add new template customization

Get Started with ClickUp's Data Mining Project Proposal Template

Data analysts and researchers can use this Data Mining Project Proposal Template to help everyone stay on the same page when it comes to planning and executing data mining projects.

First, hit “Get Free Solution” to sign up for ClickUp and add the template to your Workspace. Make sure you designate which Space or location in your Workspace you’d like this template applied.

Next, invite relevant members or guests to your Workspace to start collaborating.

Now you can take advantage of the full potential of this template to mine valuable insights from data:

  • Use the Project Proposal View to outline the goals, objectives, and scope of the data mining project
  • The Getting Started Guide View will provide step-by-step instructions on how to set up the project and gather relevant data
  • Organize tasks into two different statuses: Open and Complete, to keep track of progress
  • Update statuses as you complete each task to ensure transparency and accountability
  • Assign tasks to team members and set deadlines to ensure timely completion
  • Collaborate with team members and stakeholders to analyze and interpret the data
  • Monitor and analyze progress to ensure the project is on track and deliver actionable insights

Related Templates

  • Mangrove Planting Project Proposal Template
  • Incubation Center Project Proposal Template
  • Sports Intramural Project Proposal Template
  • Civil Engineering Project Proposal Template
  • Interior Design Project Proposal Template

Template details

Free forever with 100mb storage.

Free training & 24-hours support

Serious about security & privacy

Highest levels of uptime the last 12 months

  • Product Roadmap
  • Affiliate & Referrals
  • On-Demand Demo
  • Integrations
  • Consultants
  • Gantt Chart
  • Native Time Tracking
  • Automations
  • Kanban Board
  • vs Airtable
  • vs Basecamp
  • vs MS Project
  • vs Smartsheet
  • Software Team Hub
  • PM Software Guide

Google Play Store

T4Tutorials.com

Data Mining Research Topics for MS PhD

Data Mining Research Topics

I am sharing with you some of the research topics regarding data mining that you can choose for your research proposal for the thesis work of MS, or Ph.D. Degree.

Categorizing the research into 4 categories in this tutorial

Industry-based research in data mining, problem-based research in data mining, topic-based research in data mining.

  • 900+ research ideas in data mining

List of some famous Industries in the world for industry-based research in data mining

  • Automobile Wholesaling
  • Pharmaceuticals Wholesaling
  • Life Insurance & Annuities
  • Online Computer Software Sales
  • Supermarkets & Grocery Stores
  • Electric Power Transmission
  • IT Consulting
  • Wholesale Trade Agents and Brokers
  • Retirement & Pension Plans
  • Petroleum Refining
  • New Car Dealers
  • Drug, Cosmetic & Toiletry Wholesaling
  • Pharmacy Benefit Management
  • Property, Casualty and Direct Insurance
  • Colleges & Universities
  • Public Schools
  • Warehouse Clubs & Supercenters
  • Health & Medical Insurance
  • Gasoline & Petroleum Wholesaling
  • Gasoline & Petroleum Bulk Stations
  • Commercial Banking
  • Real Estate Loans & Collateralized Debt
  • E-Commerce & Online Auctions
  • Electronic Part & Equipment Wholesaling

List of some problems for research in data mining.

  • Crime Rate Prediction
  • Fraud Detection
  • Website Evaluation
  • Market Analysis
  • Financial Analysis
  • Customer trend analysis
  • Data Warehouse and DBMS
  • Multidimensional data model
  • OLAP operations
  • Example: loan data set
  • Data cleaning
  • Data transformation
  • Data reduction
  • Discretization and generating concept hierarchies
  • Installing Weka 3 Data Mining System
  • Experiments with Weka – filters, discretization
  • Task relevant data
  • Background knowledge
  • Interestingness measures
  • Representing input data and output knowledge
  • Visualization techniques
  • Experiments with Weka – visualization
  • Attribute generalization
  • Attribute relevance
  • Class comparison
  • Statistical measures
  • Experiments with Weka – using filters and statistics
  • Motivation and terminology
  • Example: mining weather data
  • Basic idea: item sets
  • Generating item sets and rules efficiently
  • Correlation analysis
  • Experiments with Weka – mining association rules
  • Basic learning/mining tasks
  • Inferring rudimentary rules: 1R algorithm
  • Decision trees
  • Covering rules
  • Experiments with Weka – decision trees, rules
  • The prediction task
  • Statistical (Bayesian) classification
  • Bayesian networks
  • Instance-based methods (nearest neighbor)
  • Linear models
  • Experiments with Weka – Prediction
  • Basic issues in clustering
  • First conceptual clustering system: Cluster/2
  • Partitioning methods: k-means, expectation-maximization (EM)
  • Hierarchical methods: distance-based agglomerative and divisible clustering
  • Conceptual clustering: Cobweb
  • Experiments with Weka – k-means, EM, Cobweb
  • Text mining: extracting attributes (keywords), structural approaches (parsing, soft parsing).
  • Bayesian approach to classifying text
  • Web mining: classifying web pages, extracting knowledge from the web
  • Data Mining software and applications

Research Topics Computer Science

 
   
 

Topic Covered

Top 10 research topics of Data Mining | list of research topics of Data Mining | trending research topics of Data Mining | research topics for dissertation in Data Mining | dissertation topics of Data Mining in pdf | dissertation topics in Data Mining | research area of interest Data Mining | example of research paper topics in Data Mining | top 10 research thesis topics of Data Mining | list of research thesis  topics of Data Mining| trending research thesis topics of Data Mining | research thesis  topics for dissertation in Data Mining | thesis topics of Data Mining in pdf | thesis topics in Data Mining | examples of thesis topics of Data Mining | PhD research topics examples of  Data Mining | PhD research topics in Data Mining | PhD research topics in computer science | PhD research topics in software engineering | PhD research topics in information technology | Masters (MS) research topics in computer science | Masters (MS) research topics in software engineering | Masters (MS) research topics in information technology | Masters (MS) thesis topics in Data Mining.

Related Posts:

  • What is data mining? What is not data mining?
  • Data Stream Mining - Data Mining
  • SQL Programming for Data Mining for Data Mining MCQs
  • Data Quality in Data Preprocessing for Data Mining
  • Semantic Web Research Topics for MS PhD
  • Network Security Research Topics for MS PhD

You must be logged in to post a comment.

Newly Launched - AI Presentation Maker

SlideTeam

Researched by Consultants from Top-Tier Management Companies

Banner Image

AI PPT Maker

Powerpoint Templates

Icon Bundle

Kpi Dashboard

Professional

Business Plans

Swot Analysis

Gantt Chart

Business Proposal

Marketing Plan

Project Management

Business Case

Business Model

Cyber Security

Business PPT

Digital Marketing

Digital Transformation

Human Resources

Product Management

Artificial Intelligence

Company Profile

Acknowledgement PPT

PPT Presentation

Reports Brochures

One Page Pitch

Interview PPT

All Categories

Top 10 Big Data Proposal Templates With Samples and Examples

Top 10 Big Data Proposal Templates With Samples and Examples

Divyendu Rai

author-user

Big data projects encompass various critical phases, including model development, testing and evaluation, data collection, and data modeling. In today's data-driven world, the successful execution of a big data project very well hinges on a well-crafted proposal. It is essential to secure funding, align stakeholders, and guide your team towards successful project implementation.

To assist you in this endeavor, we have compiled a list of the top 10 Big Data Proposal Templates , complete with samples and examples. These templates are designed to streamline the proposal creation process, ensuring that your proposal effectively conveys the scope, objectives, and methodology of your project. Whether you're an experienced data scientist or a newcomer to the field, these slide will serve as valuable resources to help you communicate your ideas and secure the support you need for your venture.

To get started on your journey to creating a winning UX proposal, explore these templates at link .

Each template is a treasure trove of insights and guidance, offering you a roadmap to success in the realm of Big Data. The slides are content-ready and 100% editable to give you structure and convenience. In the following sections, we will delve into some of these templates, providing you with a glimpse of the comprehensive tools available to optimize your proposal development process.

Template 1: Big Data Project Proposal Report Sample Template Deck

This template deck is a valuable resource for IT project management companies seeking to meet the unique requirements of their clients and close more deals. Our template offers an in-depth project overview, complete with context and proposed solutions. It also outlines preliminary requirements and our approach to fulfilling your project needs. With a work breakdown structure detailing task names, durations, start and finish dates, we've got every detail covered. Worried about the budget? Our proposal includes a cost estimation for the entire project. Plus, it even covers contract terms and the sign-off page. It's the complete package for expanding your business and boosting your IT project management. Ready to gain that competitive edge in your market? Download our proposal sample today!

Big Data Project Proposal

Download Now!

Template 2: Cover Letter for Big Data Pr oject Proposal

Introducing the cover letter template – your key to crafting a compelling project pitch. This slide offers a clear structure for that important good first impression, simplifying the task of conveying your big data project's significance. Just insert your details, and you're ready to impress stakeholders and potential collaborators. Whether you're an experienced professional or new to the game, this tool ensures that your proposal shines. Save time and energy, focus on your project, and let our template guide you in creating a persuasive cover letter. Elevate your big data project with a strong introduction – start with our template today!

Cover letter for data big data project proposal

Template 3: Context and Solution for Big Data  Project Proposal Template

Crafted for success, this template provides the essential structure for your project's context and proposed solutions. This invaluable tool offers a well-structured framework to communicate your project's context and proposed solutions. With this comprehensive resource at your disposal, you can streamline your proposal creation, securing the crucial support you need. With this template, you unlock the doors to funding, alignment of stakeholders, and the precise guidance required for a triumphant Big Data venture. It's your chance to elevate your project presentation, ensuring that your vision is not only heard but understood. Download it now and witness your project's potential come to life!

Context and solution for big data project proposal

Template 4: Our Approach for Big Data Project Proposal Template

Crafted with precision, this template simplifies the proposal process. It covers essential sections, from analysis to requirements development, model creation, thorough testing and evaluation, and seamless project delivery. With our template, you're equipped to impress stakeholders, secure funding, and streamline your Big Data project journey. Elevate your proposals with this time-saving, professional resource today!

Our approach for big data project proposal

Template 5: Activity Flowchart for Big Data Project Proposal Template

Elevate your Big Data project proposals with our Activity Flowchart template. This tool offers a structured roadmap through the crucial project phases: from defining business requirements, efficient data collection, and thorough data preparation, to seamless production, meticulous model testing, and the art of data modeling. It's an important ingredient for a recipe for proposals that not only impress but also secure the support and resources your projects need. Streamline your Big Data endeavors, embrace a world of possibilities, and embark on a journey to guaranteed success. The future of your projects starts here - get started today!

Activity flowchart for big data project proposal

Template 6: Company Overview for Big Data Project Proposal Template

This PPT Slide streamlines the proposal process, providing a clear framework for presenting your mission, vision, and project details. Our pre-made template empowers you to showcase your success so far. Maximize your impact and save valuable time with this comprehensive resource. Elevate your proposals and drive your projects forward with our template. 

Company overview for big data project proposal

Template 7: Our Team for Big Data Project Proposal Template

Introducing our "Team Dynamics" template. This invaluable tool simplifies the process of introducing your team, providing not just names but also designations and roles, ensuring that your proposal exudes professionalism and clarity. Impress stakeholders and elevate your project communication, enhancing your chances of sealing the deal. With our template, you'll stand out in the competitive Big Data landscape, showcasing your team's expertise and capabilities. Don't miss out on the opportunity to advance in the world of Big Data – give your proposals the edge they deserve and use our template today!

Our team for big data project proposal (1/2)

Template 8: Terms and Conditions for Big Data Project Proposal Template

Within its structured framework, this PPT Preset addresses pivotal elements including service terms, payment procedures, cancellation policies, and modification guidelines. This meticulous approach guarantees a level of clarity and legal safeguard that's invaluable to all involved stakeholders. With our template, managing project agreements becomes easy, removing the complexities that often surround such documentation. It empowers you to confidently navigate the intricacies of your proposals, fostering trust, and ensuring a strong foundation for successful collaborations.

Click the link to get your hands on the best app development project to get funding . 

Terms & conditions for big data project proposal

Download Now

Template 9: Sign-off for Big Data Project Proposal Template

Ensure client agreement on deliverables, services, and contract terms with this PPT Layout. Your client will be able to appreciate the working relationship with your company when you present your proposal with this PPT Deck. When signed, they officially approve the terms and conditions, aligning both parties for success. Tailor it to your specific project, add those crucial details, and complete the process with signatures. Download now for clarity, trust, and a smooth journey towards excellence.

Sign-off for big data project proposal

Template 10: Project Roadmap for Big Data Project Proposal Template

This dynamic resource lays out a solid structure, not just for the present, but for the years ahead as well.  With it, you can optimize your proposal, ensuring every element aligns with your project's core objectives. It serves as your guiding light through the complexities of Big Data projects, offering a structured path towards your goals. From inception to future milestones, this template is your key to unlocking success in the dynamic world of data. Don't wait – start charting your course to triumph today and make your data-driven dreams a reality!

Project roadmap for big data project proposal

Wrapping it Up!

In conclusion, the journey through our top 10 Big Data proposal templates has been an illuminating exploration into the world of data-driven projects. We've delved into the intricacies of crafting compelling proposals, a critical step in securing funding and garnering support for your data initiatives. The templates we've showcased provide a comprehensive framework for success in this realm.

These templates are invaluable resources for data scientists and project managers alike. They offer a structured path to articulate your project's scope, objectives, and methodology, ensuring that you effectively communicate your vision to stakeholders and decision-makers.

Now, as you embark on your journey to master the art of Big Data proposals, remember that success lies in the details. Each template is a blueprint for success, guiding you through the intricate process of project proposal creation.

For more insights and examples to bolster your proposal game, don't forget to check out our proposal cover letter templates  by clicking on the following Link . There, you'll discover additional resources that will enhance your proposal-writing skills and help you stand out in the world of data-driven projects. So, go ahead, explore, learn, and pave the way for your Big Data success.

Related posts:

  • Top 10 Research Paper Proposal Templates with Samples and Examples
  • Top 10 Data Mining Templates with Samples and Examples
  • Visualizing Data Mining: Empowering Presentations with Templates (Free PPT & PDF)
  • Must-have Contractor Estimate Templates with Samples and Examples

Liked this blog? Please recommend us

research proposal data mining

Must-have Staffing Agency Proposal Templates with Examples and Samples

Must-Have Cloud Cryptography Templates with Examples and Samples

Must-Have Cloud Cryptography Templates with Examples and Samples

This form is protected by reCAPTCHA - the Google Privacy Policy and Terms of Service apply.

digital_revolution_powerpoint_presentation_slides_Slide01

--> Digital revolution powerpoint presentation slides

sales_funnel_results_presentation_layouts_Slide01

--> Sales funnel results presentation layouts

3d_men_joinning_circular_jigsaw_puzzles_ppt_graphics_icons_Slide01

--> 3d men joinning circular jigsaw puzzles ppt graphics icons

Business Strategic Planning Template For Organizations Powerpoint Presentation Slides

--> Business Strategic Planning Template For Organizations Powerpoint Presentation Slides

Future plan powerpoint template slide

--> Future plan powerpoint template slide

project_management_team_powerpoint_presentation_slides_Slide01

--> Project Management Team Powerpoint Presentation Slides

Brand marketing powerpoint presentation slides

--> Brand marketing powerpoint presentation slides

Launching a new service powerpoint presentation with slides go to market

--> Launching a new service powerpoint presentation with slides go to market

agenda_powerpoint_slide_show_Slide01

--> Agenda powerpoint slide show

Four key metrics donut chart with percentage

--> Four key metrics donut chart with percentage

Engineering and technology ppt inspiration example introduction continuous process improvement

--> Engineering and technology ppt inspiration example introduction continuous process improvement

Meet our team representing in circular format

--> Meet our team representing in circular format

Google Reviews

IMAGES

  1. Extraordinary Help to Articulate Data Mining Research Proposal

    research proposal data mining

  2. Thoriq 180504051 pdf

    research proposal data mining

  3. Data mining-related research fields.

    research proposal data mining

  4. Choose from 40 Research Proposal Templates & Examples. 100% Free

    research proposal data mining

  5. (PDF) A Proposal for Data Mining Management System

    research proposal data mining

  6. Data Mining Project Proposal

    research proposal data mining

COMMENTS

  1. Data Mining Project Proposal (Novel Research PhD Proposal)

    Data Mining Project Proposal provides you a list of guidelines for writing your data mining project proposal. Data mining is a top research field that is highly working under by various country researchers. We have significant research experts who can well-prepared for your research proposal. A research proposal is a major part of your research ...

  2. (PDF) Data Mining and Big Data Analytics

    This research paper seeks to provide an extensive exploration of the expansive landscape of data mining and big data analytics, encompassing their fundamental principles, diverse methodological ...

  3. A Proposal for Data Mining Management System

    The design of the KDD systems must consider human interaction and creativity as. crucial comp onen ts of the KDD pro cess. In this pap er we present an arc hitecture for a data mining management ...

  4. Data Analytics Resources: Writing a Research Proposal

    A research proposal describes what you will investigate, why it's important, and how you will conduct your research. Your paper should include the topic, research question and hypothesis, methods, predictions, and results (if not actual, then projected). ... Demonstrate that you have carefully considered the data, tools, and procedures ...

  5. PDF Research Proposal-data mining in bioinformatics

    In this project proposal, several topics about applying data mining techniques in bioinformatics research are proposed as a startup scope of Mphil study. Some preliminary studies and a pre-project study plan are presented as a preparation for the research project.

  6. Adaptations of data mining methodologies: a systematic literature

    The main research objective of this article is to study how data mining methodologies are applied by researchers and practitioners. To this end, we use systematic literature review (SLR) as scientific method for two reasons. Firstly, systematic review is based on trustworthy, rigorous, and auditable methodology.

  7. (PDF) A proposal for a web based educational data mining and

    A proposal for a web based educational data mining and. visualization system. Igor Jugo, Božidar Kovačić, V anja Slavuj. Department of Informatics. University of Rijeka. Radmile Matejcic 2 ...

  8. Good Research Proposal Topics in Data Mining

    PhD Research Proposal Topics for Data Mining. The rapid evolution of the data mining field has facilitated enormous achievements and new developments in organizations. To extract the potentially valid, understandable, novel, and useful data, data mining has become a non-trivial process in the real world due to its advantages of broad ...

  9. Proposal Sample

    Describe what your proposal is about and the organization of the rest of the proposal. Include whether you will be performing data mining tasks, implementing a new algorithm in Weka (or another data mining tool), or modifying some other system to incorporate data mining features, etc. Basically, provide the nature of your project. This section ...

  10. (PDF) Applying Data Mining Research Methodologies on Information

    The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc.

  11. Data Mining Research Proposal Structure

    Data mining refers to the method in which useful patterns and data are extracted out of large datasets for further analysis.This article provides complete information on data mining research proposal. Data mining visualization is affected by the huge amount of data and output device display capacity.

  12. Data Mining Research Proposal Topics

    Data Mining Research Proposal Topics & Ideas - We have offered a widespread instance of a research proposal based on data mining in healthcare, as well as reliable sources that assist you to detect appropriate and effective healthcare-based datasets. The below indicated details will be beneficial as well as assistive.

  13. PDF arXiv:2210.16843v1 [cs.LG] 30 Oct 2022

    Applying data mining models to a research proposal has several benefits. First, data mining models can briefly introduce the essential features of the research proposals to help human evaluators better screen the excellent research proposals, such as the influential features of the data mining models across all the research proposals.

  14. PDF A Proposal for Improving Project Coordination using Data Mining and

    Keywords: communication, data mining, social networks, machine learning 1 Introduction Developing software is a knowledge-intense activity that greatly relies on communica- ... Our research proposal is based on providing recommendations derived from infor-mation about connections between project members. This information consists of phys-

  15. Data Mining Research Proposal

    Cost of data mining project: $ 2090000. A data mining research proposal is one which outlines the findings of the project by suing the tools provided by data mining. Data mining is a research component which involves the collection of data and then obtaining the requite findings from that data. It can be of several types like business data ...

  16. PDF DATA-MINING PROPOSAL (TEMPLATE)

    data are not required for a proposal. However, if preliminary data are referred to in the proposal rationale, or have been used to formulate the hypotheses to be tested, such information must be formally presented in this section. 4.Research methods and procedures (1-2 pages maximum). 5.Anticipated results (half-page maximum).

  17. Top 10 Data Science Proposal Templates with Examples and Samples

    Template 1: Data Science Project Proposal Report Sample Example Document. Unlock the potential of your IT project management endeavors with our Data Science Project Proposal, the cornerstone for firms striving to meet the varied demands of their clientele and close deals with confidence. This PPT Preset clearly outlines what the project will ...

  18. Data Mining Project Proposal Template

    ClickUp's Data Mining Project Proposal template is designed to help you efficiently plan and execute your data mining projects. Here are the main elements of this template: Custom Statuses: Keep track of the progress of your data mining projects with two customizable statuses - Open and Complete. Custom Fields: Utilize custom fields to capture ...

  19. (PDF) Top Challenges in Data Mining Research

    Top Challenges in Data Mining Research. 1 Muthu Dayalan. 1 Senior Software Developer & Researcher. 1 Chennai & TamilNadu. Abstract — Data mining as a new phenomenon in. business and ...

  20. Data Mining Research Topics for MS PhD

    Data Mining Research Topics. I am sharing with you some of the research topics regarding data mining that you can choose for your research proposal for the thesis work of MS, or Ph.D. Degree. Categorizing the research into 4 categories in this tutorial . Industry-based research in data mining; Problem-based research in data mining

  21. Top 10 Big Data Proposal Templates With Samples and Examples

    Template 1: Big Data Project Proposal Report Sample Template Deck. This template deck is a valuable resource for IT project management companies seeking to meet the unique requirements of their clients and close more deals. Our template offers an in-depth project overview, complete with context and proposed solutions.

  22. (Pdf) Proposal On: Application of Data Mining to Predict Number of

    COLLEGE OF ENGINEERING AND TECHNOLOGY. DEPARTMENT OF COMPUTER SCIENCE. PROPOSAL ON: APPLICATION OF DATA MINING TO. PREDICT NUMBER OF PEOPLE WHICH DIED O N. TRAFFIC ACCIDENT IN AFAR REGION. By ...

  23. Research proposals and thesis in Machine Learning / Data Mining?

    Nanjing University of Aeronautics & Astronautics. @Ali khosravi. My target is gps trajectory data mining. Cite. Dr Santhosh Kumar. Guru Nanak Institute of Technology. Refer this article: https ...

  24. Penerapan Data Mining Menggunakan Algoritma K- Nearest ...

    Request PDF | Penerapan Data Mining Menggunakan Algoritma K- Nearest Neighbor untuk Penentuan Penerimaan Proposal Hibah | The grant program for farmers is one of the government programs provided ...