The Optimization of Hyperparameters in Unsupervised Learning Algorithms for Anomaly Detection in Public Procurement in Paraguay
DOI:
https://doi.org/10.62544/ucomscientia.v3i1.46Keywords:
Anomaly Detection, Machine Learning, Artificial Intelligence, Open Contracting Data Standard, Public ProcurementAbstract
This study focuses on hyperparameter optimization in unsupervised learning algorithms for anomaly detection in public procurement processes in Paraguay. The main objective is to develop a tool that identifies irregularities in procurement processes using open data provided by the National Directorate of Public Procurement. The methodology follows the CRISP-DM industry standard, including data collection, transformation, and preparation, followed by the application of the algorithms Isolation Forest, Local Outlier Factor and One-Class SVM. Hyperparameter optimization is performed using grid search and random search techniques, and class imbalance is addressed using SMOTE oversampling. Results indicate that while the high recall model detects most anomalies, it produces a significant number of false positives. In contrast, to obtain models with high precision, a balancing of the data set is required, considerably reducing false positives at the cost of not identifying all anomalies. In conclusion, it is desirable to work on a correct labeling and balancing of the training data set to improve the accuracy and practical utility of the models.
References
Campos, G. O., Zimek, A., Sander, J., Campello, R. J. G. B., Micenková, B., Schubert, E., Assent, I., & Houle, M. E. (2016). On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), 891-927. https://doi.org/10.1007/s10618-015-0444-8
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0: Step-by-step data mining guide. CRISP-DM Consortium. https://www.crisp-dm.org/
Congreso de la Nación Paraguay. (2014). Ley Nº 5282 Libre acceso ciudadano a la información pública y transparencia gubernamental. https://www.bacn.gov.py/leyes-paraguayas/3013/ley-n-5282--libre-acceso-ciudadano-a-la-informacin-pblica-y-transparencia-gubernamental
Da Alesandro, R. (2019). Investigation of anomalies in a RTC system using Machine Learning(Master's thesis, Umeå University). Umeå University Publications. https://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164768
Domingues, R., Filippone, M., Michiardi, P., & Zouaoui, J. (2017). A comparative evaluation of outlier detection algorithms: Experiments and analyses.Pattern Recognition, 74, 406-421. https://doi.org/10.1016/j.patcog.2017.09.037
Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. En F. Hutter, L. Kotthoff, & J. Vanschoren (Eds.),Automated Machine Learning(pp. 3-33). Springer International Publishing. https://doi.org/10.1007/978-3-030-05318-5_1
Gómez Scifo, J. D. (2023). DNCP bate récord en gestión y control de procesos de licitación en los primeros 80 días de gobierno. Dirección Nacional de Contrataciones Públicas. https://www.contrataciones.gov.py/dncp/dncp-bate-record-en-gestion-y-control-de-procesos-de-licitacion-en-los-primeros-80-dias-de-gobierno/
Janssens, J. H. M. (2013). Outlier selection and one-class classification. Wöhrmann Print Service.
Kiran, M., Wang, C., Papadimitriou, G., Mandal, A., & Deelman, E. (2020). Detecting anomalous packets in network transfers: Investigations using PCA, autoencoder and isolation forest in TCP. Machine Learning, 109, 1127-1143.https://doi.org/10.1007/s10994-020-05870-y
Komer, B., Bergstra, J., & Eliasmith, C. (2019). Hyperopt-sklearn. En F. Hutter, L. Kotthoff, & J. Vanschoren (Eds.),Automated Machine Learning(pp. 97-111). Springer International Publishing. https://doi.org/10.1007/978-3-030-05318-5_5
López San Martín, M., Núñez Benitez, D. R., Paciello Coronel, J. M., & Pane Fernandez, J. I. (2024).Quantifying the risk of complaints in public procurement tenders in Paraguay using machine learning. 164-169. https://doi.org/10.54808/IMCIC2024.01.164
McKinney, J. (2023). test_fictional_example.py [Archivo de código fuente]. GitHub. https://github.com/open-contracting/sample-data/blob/main/tests/test_fictional_example.py
Mehta, S., Kothuri, P., & Garcia, D. L. (2018).Anomaly detection for network connection logs(arXiv:1812.01941). arXiv. https://doi.org/10.48550/arXiv.1812.01941
Niessen, M. E. K., Paciello, J. M., & Fernandez, J. I. P. (2020). Anomaly detection in public procurements using the open contracting data standard. 2020 Seventh International Conference on eDemocracy & eGovernment (ICEDEG), 127-134. https://doi.org/10.1109/ICEDEG48599.2020.9096674
Open Contracting Partnership. (s.f.). ¿Qué es el Estándar de Datos para las Contrataciones Abiertas (OCDS)?. Open Contracting Data Standard. https://standard.open-contracting.org/latest/es/primer/what/
Open Contracting Partnership. (2021). Calling for accountability: How Paraguay’s open emergency procurement can help restore public trust. Open Contracting. https://www.open-contracting.org/2021/05/03/calling-for-accountability-how-paraguays-open-emergency-procurement-can-help-restore-public-trust/
Transparency International. (2023). Corruption Perceptions Index 2023: Paraguay. Transparency International. https://www.transparency.org/en/cpi/2023/index/pry
Vierci Codas, M. B. (2018). Análisis exploratorio de datos públicos categóricos usando agrupación. https://gitlab.com/mbvierci/analisis-exploratorio-de-datos-publicos-categoricos-usando-agrupacion
Zenati, H., Romain, M., Foo, C. S., Lecouat, B., & Chandrasekhar, V. R. (2018).Adversarially learned anomaly detection. arXiv. https://doi.org/10.48550/arXiv.1812.02288
Zhao, Y., Nasrullah, Z., & Li, Z. (2019). PyOD: A Python toolbox for scalable outlier detection. Journal of Machine Learning Research,20(96),1-7. http://jmlr.org/papers/v20/19-011.html
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Matias Fabian Sanabria, Julio Manuel Paciello Coronel, Juan Ignacio Pane Fernández

This work is licensed under a Creative Commons Attribution 4.0 International License.
La Revista Científica UCOM Scientia se distribuye bajo una Licencia Atribución 4.0 Internacional (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/deed.es






