Abstract - This paper compares the semantically meaningful machine learning algorithms with the black box models. The machine learning models are applied to a real world wearable dataset for biometric identification of individuals. The semantically meaningful decision tree is compared with more accurate black-box models such as neural networks, random forest, and support vector machines. The paper further explores the possibility of using unsupervised learning that uses linear distances for separating the categories. Since the distance from the center is used to delineate the clusters, the centroids of the unsupervised clusters provide a semantic profile of the categories. The crisp K-means clustering is enhanced with evolutionary algorithms that primarily uses the distance from the center as the primary criteria, but nudges the clustering towards known classification using a semi-supervised penalty. Finally, the use of rough sets is shown to provide notable semantic information with the help of the three-way decision principle.
Abstract - The interaction between features, or attributes, of a dataset forms a major topic in machine learning and data mining. In particular, a wide range of methods have been established for feature selection, ranking, and grouping. Amongst these, fuzzy rough set based feature selection (FRFS) has been shown to be highly effective at reducing dimensionality for real-valued datasets while retaining attribute semantics. In fuzzy rough sets, the concept of crisp equivalence classes is extended by fuzzy similarity relations, and real- valued similarity measures can be captured between data instances in terms of their attribute values. Therefore, it is desirable to study the aggregation of fuzzy similarity relations to reflect the interactions between attributes. This paper presents an approach that employs OWA aggregation of fuzzy similarity relations to better perform FRFS. A high degree of modelling flexibility is provided by choosing the stress function in OWA. Experimental studies demonstrate that through using different stress functions, different features may be selected; and that given an appropriate stress function, the quality of selected features can improve over that achievable by the state-of-the-art FRFS, in performing classification tasks.