Machine Learning Pipeline: Feature Selection and Adaptive Training for DDoS Detection to Improve Cloud Security

Yazed Alsaawy

Keywords: Distributed Denial-of-Service, Machine Learning, LightGBM, CatBoost, XGBoost, Adaptive Detection, Cloud Security, Cybersecurity, Minority Class Handling, Scalable Solutions.

DDoS attacks are a concern in most distributed and cloud environments, and they can be a threat to any multi-cloud system. This research offers an innovative method to detect DDoS using adaptive machine learning techniques. The proposed methodology deploys a combination of algorithms, such as LightGBM, CatBoost, and XGBoost, with an overall accuracy of 99.32%, 99% specificity, and 99% sensitivity for most attack classes. In addition, the methodology addressed the challenges of the minority classes, where CatBoost had a recall of 85% for previously marginalized attacks. The results indicate the effectiveness of the proposed system across different DDoS attack types and traffic patterns, making it viable and effective for the protection of cyber security structures that operate in a multi-cloud system.

M. Jangjou and M. K. Sohrabi, “A Comprehensive Survey on Security Challenges in Different Network Layers in Cloud Computing,” Archives of Computational Methods in Engineering 2022, pp. 1–22, Jan. 2022, doi: 10.1007/S11831-022-09708-9.

[2]           Gaurav Aggarwal, “How the Pandemic Has Accelerated Cloud Adoption,” Forbes. Accessed: Aug. 14, 2022. [Online]. Available: https://www.forbes.com/sites/forbestechcouncil/2021/01/15/how-the-pandemic-has-accelerated-cloud-adoption/?sh=463f9b836621

[3]           Lionel Sujay Vailshery, “Cloud computing - Statistics & Facts,” Statista. Accessed: Aug. 14, 2022. [Online]. Available: https://www.statista.com/topics/1695/cloud-computing/#dossierKeyfigures

[4]           E. Abdurachman, F. Lumban Gaol, and B. Soewito, “ScienceDirect ScienceDirect Survey on Threats and Risks in the Cloud Computing Environment,” Procedia Comput Sci, vol. 161, pp. 1325–1332, 2019, doi: 10.1016/j.procs.2019.11.248.

[5]           Bob Violino, “Google, Microsoft ramp up cloud security as cyberattacks increase,” CNBC. Accessed: Aug. 14, 2022. [Online]. Available: https://www.cnbc.com/2022/03/29/google-microsoft-ramp-up-cloud-security-as-cyberattacks-increase.html

[6]           M. Masdari and M. Jalali, “A survey and taxonomy of DoS attacks in cloud computing,” Security and Communication Networks, vol. 9, no. 16, pp. 3724–3751, Nov. 2016, doi: 10.1002/SEC.1539.

[7]           Craig Sparling and Max Gebhardt, “Largest European DDoS Attack on Record,” Akamai Blog. Accessed: Mar. 29, 2023. [Online]. Available: https://www.akamai.com/blog/security/largest-european-ddos-attack-ever

[8]           Z. Liu, B. Xu, B. Cheng, X. Hu, and M. Darbandi, “Intrusion detection systems in the cloud computing: A comprehensive and deep literature review,” Concurr Comput, vol. 34, no. 4, p. e6646, Feb. 2022, doi: 10.1002/CPE.6646.

[9]           M. Darwish, A. Ouda, and L. Fernando Capretz, “Cloud-based DDoS Attacks and Defenses”.

[10]         V. Chang et al., “A Survey on Intrusion Detection Systems for Fog and Cloud Computing,” Future Internet 2022, Vol. 14, Page 89, vol. 14, no. 3, p. 89, Mar. 2022, doi: 10.3390/FI14030089.

[11]         N. Stephenson et al., “Survey of Machine Learning Techniques in Drug Discovery,” Curr Drug Metab, vol. 20, no. 3, pp. 185–193, Aug. 2019, doi: 10.2174/1389200219666180820112457.

[12]         M. Wang, Y. Lu, and J. Qin, “A dynamic mlp-based ddos attack detection method using feature selection and feedback,” Computers & Security, Vol. 88, p. 101645, 2020.

[13]         D.-C. Can, H.-Q. Le, and Q.-T. Ha, “Detection of distributed denial of service attacks using automatic feature selection with enhancement for imbalance dataset,” in Intelligent Information and Database Systems: 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7–10, 2021, Proceedings 13. Springer, 2021, pp. 386–398.

[14]         P. Singh Samom and A. Taggu, “Distributed denial of service (ddos) attacks detection: A machine learning approach,” in Applied Soft Computing and Communication Networks: Proceedings of ACN 2020. Springer, 2021, pp. 75–87.

[15]         Y. Wei, J. Jang-Jaccard, F. Sabrina, A. Singh, W. Xu, and S. Camtepe, “Ae-mlp: A hybrid deep learning approach for ddos detection and classification,” IEEE Access, vol. 9, pp. 146 810–146 821, 2021.

[16]         C. S. Kalutharage, X. Liu, C. Chrysoulas, N. Pitropakis, and P. Papadopoulos, “Explainable ai-based ddos attack identification method for iot networks,” Computers, vol. 12, no. 2, p. 32, 2023.

[17]         L. Antwarg, R. M. Miller, B. Shapira, and L. Rokach, “Explaining anomalies detected by autoencoders using Shapley additive explanations,” Expert Systems with Applications, vol. 186, p. 115736, 2021.

[18]         A. ˇ Sarˇcevi´c, D. Pintar, M. Vrani´c, and A. Krajna, “Cybersecurity knowledge extraction using XAI,” Applied Sciences, vol. 12, no. 17, p. 8669, 2022.

[19]         S. Lakshminarasimman, S. Ruswin, and K. Sundarakantham, “Detecting DDoS attacks using decision tree algorithm,” 2017 4th International Conference on Signal Processing, Communication and Networking, ICSCN 2017, Oct. 2017, doi: 10.1109/ICSCN.2017.8085703.

[20]         R. Latif, H. Abbas, S. Latif, and A. Masood, “EVFDT: An Enhanced Very Fast Decision Tree Algorithm for Detecting Distributed Denial of Service Attack in Cloud-Assisted Wireless Body Area Network,” Mobile Information Systems, vol. 2015, Jan. 2015, doi: 10.1155/2015/260594.

[21]         R. Latif, H. Abbas, S. Assar, and S. Latif, “Analyzing feasibility for deploying very fast decision tree for DDoS attack detection in cloud-assisted WBAN,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8588 LNCS, pp. 507–519, 2014, doi: 10.1007/978-3-319-09333-8_57/COVER.

[22]         M. I. Kareem and M. N. Jasim, “DDOS Attack Detection Using Lightweight Partial Decision Tree algorithm,” Proceedings of the 2nd 2022 International Conference on Computer Science and Software Engineering, CSASE 2022, pp. 362–367, 2022, doi: 10.1109/CSASE51777.2022.9759824.

[23]         J. Ye, X. Cheng, J. Zhu, L. Feng, and L. Song, “A DDoS Attack Detection Method Based on SVM in Software Defined Network,” Security and Communication Networks, vol. 2018, Apr. 2018, doi: 10.1155/2018/9804061.

[24]         A. Abusitta, M. Bellaiche, and M. Dagenais, “An SVM-based framework for detecting DoS attacks in virtualized clouds under changing environment,” vol. 7, p. 9, 2018, doi: 10.1186/s13677-018-0109-4.

[25]         M. Myint Oo, S. Kamolphiwong, T. Kamolphiwong, and S. Vasupongayya, “Advanced Support Vector Machine-(ASVM-) based detection for Distributed Denial of Service (DDoS) attack on Software Defined Networking (SDN),” Journal of Computer Networks and Communications, vol. 2019, 2019, doi: 10.1155/2019/8012568.

[26]         M. Alduailij, Q. Waqas Khan, M. Tahir, M. Sardaraz, M. Alduailij, and F. Malik, “Machine-Learning-Based DDoS Attack Detection Using Mutual Information and Random Forest Feature Importance Method,” Symmetry 2022, Vol. 14, Page 1095, vol. 14, no. 6, p. 1095, May 2022, doi: 10.3390/SYM14061095.

[27]         H. N. Thanh and T. Van Lang, “Use the ensemble methods when detecting DoS attacks in Network Intrusion Detection Systems,” EAI Endorsed Transactions on Context-aware Systems and Applications, vol. “6,” no. 19, p. 163484, Nov. 2019, doi: 10.4108/EAI.29-11-2019.163484.

[28]         Bin Jia, Xiaohong Huang, Rujun Liu, and Yan Ma, “A DDoS Attack Detection Method Based on Hybrid Heterogeneous Multiclassifier Ensemble Learning,” Journal of Electrical and Computer Engineering. Accessed: Aug. 17, 2022. [Online]. Available: https://www.hindawi.com/journals/jece/2017/4975343/

[29]         D. Firdaus, R. Munadi, and Y. Purwanto, “DDoS Attack Detection in Software Defined Network using Ensemble K-means++ and Random Forest,” 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2020, pp. 164–169, Dec. 2020, doi: 10.1109/ISRITI51436.2020.9315521.

[30]         L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” Journal of Big Data 2021 8:1, vol. 8, no. 1, pp. 1–74, Mar. 2021, doi: 10.1186/S40537-021-00444-8.

[31]         S. Dong, P. Wang, and K. Abbas, “A survey on deep learning and its applications,” Comput Sci Rev, vol. 40, p. 100379, May 2021, doi: 10.1016/J.COSREV.2021.100379.

[32]         X. Yuan, C. Li, and X. Li, “DeepDefense: Identifying DDoS Attack via Deep Learning,” 2017 IEEE International Conference on Smart Computing, SMARTCOMP 2017, Jun. 2017, doi: 10.1109/SMARTCOMP.2017.7946998.

[33]         I. Ortet Lopes, D. Zou, F. A. Ruambo, S. Akbar, and B. Yuan, “Towards Effective Detection of Recent DDoS Attacks: A Deep Learning Approach,” Security and Communication Networks, vol. 2021, 2021, doi: 10.1155/2021/5710028.

[34]         L. Xinlong and C. Zhibin, “DDoS Attack Detection by Hybrid Deep Learning Methodologies,” Security and Communication Networks, vol. 2022, pp. 1–7, May 2022, doi: 10.1155/2022/7866096.

[35]         S. Tabassum, N. Parvin, N. Hossain, A. Tasnim, R. Rahman, and M. I. Hossain, “Iot network attack detection using xai and reliability analysis, “in 2022 25th International Conference on Computer and Information Technology (ICCIT). IEEE, 2022, pp. 176–181.

[36]         Z. Abou El Houda, B. Brik, and L. Khoukhi, “why should I trust your ids?”: An explainable deep learning framework for intrusion detection systems in internet of things networks,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1164–1176, 2022.

[37]         Y. Wei, J. Jang-Jaccard, F. Sabrina, A. Singh, W. Xu, and S. Camtepe, “Ae-mlp: A hybrid deep learning approach for ddos detection and classification,” IEEE Access, vol. 9, pp. 146 810–146 821, 2021

[38]         Vu, N.H. DDoS attack detection using K-Nearest Neighbor classifier method. In Proceedings of the International Conference on Telehealth/Assistive Technologies, Baltimore, Maryland, USA, 16–18 April 2008; IEEE: Piscataway Township, NJ, USA, 2008; pp. 248–253.

[39]         Cheng, J.; Yin, J.; Liu, Y.; Cai, Z.; Wu, C. DDoS attack detection using IP address feature interaction. In Proceedings of the IEEE International Conference on Intelligent Networking and Collaborative Systems, Thessalonika, Greece, 24–26 November 2010; IEEE: Piscataway Township, NJ, USA, 2009; pp. 113–118.

[40]         Wang, C.; Zheng, J.; Li, X. Research on DDoS attacks detection based on RDF-SVM. In Proceedings of the 10th International Conference on Intelligent Computation Technology and Automation, Changsha, China, 9–12 October 2017.

[41]         Fadlil, A.; Riadi, I.; Aji, S. Review of detection DDoS attack detection using Naïve Bayes classifier for network forensics. Bull. Electr. Eng. Inform. 2017, 6, 140–148.

[42]         Dincalp, U. Anomaly based distributed denial of service attack detection and prevention with machine learning. In Proceedings of the 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies, Ankara, Turkey, 19–21 October 2018.

[43]         Ahanger, T.A. An effective approach of detecting DDoS using artificial neural networks. In Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking, Chennai, India, 22–24 March 2017; IEEE: Piscataway Township, NJ, USA, 2017; pp. 707–711.

[44]         Zahid Hasan, Md., Zubair Hasan, K. M., & Sattar, Abdus (2018). Burst header packet flood detection in optical burst switching network using deep learning model. Procedia Computer Science, 143, 970–977.

[45]         Krishnan, Prabhakar, Duttagupta, Subhasri, & Achuthan, Krishnashree (2019). VARMAN: Multi-plane security framework for software defined networks. Computer Communications, 148, 215–239. 

[46]         Zhu, M., Ye, K., & Xu, C. Z. (2018). Network anomaly detection and identification based on deep learning methods. In M. Luo, & L. J. Zhang (Eds.), Cloud Computing – CLOUD 2018. CLOUD 2018. Lecture Notes in Computer Science. Cham: Springer.

[47]         Alzahrani, S., & Hong, L. (2018). Detection of distributed denial of service (DDoS) attacks using artificial intelligence on cloud. In 2018 IEEE World Congress on Services (SERVICES), San Francisco, CA (pp. 35–36).

[48]McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference.

[49] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems

[50] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems.

[51] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

[52] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research

[53] Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., ... & Willing, C. (2016). Jupyter Notebooks - A publishing format for reproducible computational workflows. Proceedings of the 20th International Conference on Electronic Publishing

[54] Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpretable Model Predictions. Advances in Neural Information Processing Systems

[55] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning

[56] Powers, D. M. W. (2011). Evaluation: From Precision, Recall, and F-Measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies.

[10] Python Software Foundation. (2023). Openpyxl: A Python library to read/write Excel 2010 xlsx/xlsm files. Available at https://openpyxl.readthedocs.io/.

[57]         I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, “Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy,” Proceedings - International Carnahan Conference on Security Technology, vol. 2019-October, Oct. 2019, doi: 10.1109/CCST.2019.8888419.