Unveiling the Secrets of Big Data Projects: Harnessing Machine Learning Algorithms and Maturity Domains to Predict Success
While existing literature has extensively explored factors influencing the success of big data projects and proposed big data maturity models, no study has harnessed machine learning to predict project success and identify the critical features contributing significantly to that success. The purpose of this paper is to offer fresh insights into the realm of big data projects by leveraging machine-learning algorithms.
Previously, we introduced the Global Big Data Maturity Model (GBDMM), which encompassed various domains inspired by the success factors of big data projects. In this paper, we transformed these maturity domains into a survey and collected feedback from 90 big data experts across the Middle East, Gulf, Africa, and Turkey regions regarding their own projects. This approach aims to gather firsthand insights from practitioners and experts in the field.
To analyze the feedback obtained from the survey, we applied several algorithms suitable for small datasets and categorical features. Our approach included cross-validation and feature selection techniques to mitigate overfitting and enhance model performance. Notably, the best-performing algorithms in our study were the Decision Tree (achieving an F1 score of 67%) and the Cat Boost classifier (also achieving an F1 score of 67%).
This research makes a significant contribution to the field of big data projects. By utilizing machine-learning techniques, we predict the success or failure of such projects and identify the key features that significantly contribute to their success. This provides companies with a valuable model for predicting their own big data project outcomes.
Our analysis revealed that the domains of strategy and data have the most influential impact on the success of big data projects. Therefore, companies should prioritize these domains when undertaking such projects. Furthermore, we now have an initial model capable of predicting project success or failure, which can be invaluable for companies.
Based on our findings, we recommend that practitioners concentrate on developing robust strategies and prioritize data management to enhance the outcomes of their big data projects. Additionally, practitioners can leverage machine-learning techniques to predict the success rate of these projects.
For further research in this field, we suggest exploring additional algorithms and techniques and refining existing models to enhance the accuracy and reliability of predicting the success of big data projects. Researchers may also investigate further into the interplay between strategy, data, and the success of such projects.
By improving the success rate of big data projects, our findings enable organizations to create more efficient and impactful data-driven solutions across various sectors. This, in turn, facilitates informed decision-making, effective resource allocation, improved operational efficiency, and overall performance enhancement.
In the future, gathering additional feedback from a broader range of big data experts will be valuable and help refine the prediction algorithm. Conducting longitudinal studies to analyze the long-term success and outcomes of Big Data projects would be beneficial. Furthermore, exploring the applicability of our model across different regions and industries will provide further insights into the field.