Computación evolutiva multi-objetivo para selección de atributos y clasificación interpretable

  1. Martinez Cortes, Carlos
Supervised by:
  1. Gracia Sánchez Carpena Director
  2. Fernando Jiménez Barrionuevo Director

Defence university: Universidad de Murcia

Fecha de defensa: 24 October 2019

Committee:
  1. Antonio Skarmeta Gómez Chair
  2. Fernando Terroso Sáenz Secretary
  3. Javier Prieto Tejedor Committee member

Type: Thesis

Abstract

In the context of supervised learning, in this Doctoral Thesis, multi-objective optimization models have been developed for the problems of feature selection and interpretable classification, as well as multi-objective evolutionary algorithms for their resolutions. The problem of feature selection is framed within a more general process that is the dimensionality reduction of data. This process is fundamental today due to the large amount of data that is increasingly generated with the unstoppable development of information technologies. The problem of interpretable classification (or prediction) also plays a crucial role today, since an automatic model is not always acceptable if it is not understandable and validated by an expert, especially in contexts where professional ethics requires it, such as, for example, medicine or business. On the other hand, the Multi-objective Evolutionary Computation has been shown as a very powerful metaheuristic to solve both types of problems, and although it does not guarantee optimal solutions, these can be more satisfactory than those provided with the classic search, optimization and learning techniques. The multi-objective evolutionary algorithms developed in this thesis have been implemented in the Weka platform of machine learning with the names MultiObjectiveEvolutionarySearch and MultiObjectiveEvolutionaryFuzzyClassifier respectively. For the feature selection problem, the search strategy MultiObjectiveEvolutionarySearch can be combined with diferent evaluators to configure feature selection methods both filter and wrapper, with diferent statistical measures, classifiers and evaluation metrics, which makes the technique very exible and robust. The algorithms ENORA and NSGA-II have been implemented as search strategy, solving a boolean optimization problem with objectives of precisión and attribute subset cardinality. For the problem of interpretable classification, the classifier MultiObjectiveEvolutionaryFuzzyClassifier allows to build rule-based classi_ers, both fuzzy (Gaussian) and crisp, with numerical and categorical data, in multi-class classification problems, allowing to configure diferent evaluators in the learning phase. The algorithms ENORA and NSGA-II have been again implemented for the construction of rule-based classifiers, solving a mixed combinatorial constrained optimization problema in this case, with the objectives of precision and rule set complexity, and similarity constraints of Gaussian fuzzy sets. Two fundamental application areas have been used for the experiments, in virtual screening for the discovery of drugs, and for the management of the professional skills of agents in a contact center with data extracted from the company GAP SRL in the north of Italy. Public databases of the UCI Machine Learning Repository have also been used for reproducibility reasons. The results have been analyzed following the methodologies of intelligent analysis of data, and the conclusions are supported by statistical tests, which show an excellent behavior of the proposed techniques both for feature selection and for rule-based classification, in comparison with other techniques, algorithms and classifiers of the state-of-the-art widely consolidated.