Adhyayana Publications

Employee Performance Prediction

Authors

  • Taruna Aggarwal

    Galgotias University
    Author

Keywords:

employee performance, decision tree model

Abstract

In an increasingly data-driven corporate environment, employee performance has emerged as a central determinant of organizational success, influencing not only operational efficiency but also long-term strategic positioning. As traditional performance management systems face criticism for being subjective, reactive, and inconsistent, organizations are shifting towards predictive analytics to gain proactive, evidence-based insights into workforce behavior and outcomes. This study explores the use of machine learning techniques to predict employee performance based on publicly available review data. Specifically, it utilizes a structured dataset comprising 20,995 anonymized employee reviews from Capgemini, sourced from Kaggle. The dataset captures employee sentiment across multiple workplace dimensions, including career growth, skill development, work satisfaction, salary and benefits, job security, and work-life balance. These self-reported ratings serve as predictors for the overall performance rating, which is used as the target variable. The research employs a predictive research design, with a focused application of a single supervised learning model—Decision Tree Classifier. The decision to use this model is based on its interpretability, computational efficiency, and strong predictive performance. Prior to modeling, the dataset underwent data cleaning, transformation, class labeling (High vs. Low performers), and exploratory data analysis. The Decision Tree model was trained on 70% of the dataset and validated on the remaining 30%. Model performance was evaluated using standard classification metrics including accuracy, precision, recall, F1-score, and confusion matrix analysis. The model achieved an accuracy of 84.9% and an F1-score of 88.8%, indicating a robust ability to classify employee performance levels. Feature importance rankings revealed that career growth was the most influential predictor, followed by work satisfaction, salary and benefits, and skill development. Factors such as job type and job security were found to have relatively minimal influence. The findings have strong managerial implications. HR teams can leverage such predictive insights to identify high-potential talent, design targeted upskilling programs, and address early signs of disengagement. Importantly, the explainability of the Decision Tree model supports its integration into organizational decision-making processes, offering transparent justifications for employee-level classifications. While the study is limited by the absence of demographic and internal performance metrics, it demonstrates the feasibility and value of using open-source review data for predictive HR analytics. Future research could enhance the model by incorporating textual sentiment analysis and expanding the dataset to include multi-company comparisons. In conclusion, this thesis presents a practical and scalable framework for forecasting employee performance using machine learning. It reinforces the growing role of predictive analytics in human resource management and contributes actionable insights that can help organizations build and sustain high-performing workforces.

Downloads

Download data is not yet available.

Published

2025-06-12