-
11 July 2017
-
01:30PM - 03:30PM
-
Room: Auditorium
-
Chairs: Jie Lu, Edwin Lughofer, Igor Skrjanc and Plamen Angelov
Data Streams and Evolving Fuzzy Systems
Abstract - This paper concerns the application of a cloud-based intelligent evolving method, namely, a typicality-and eccentricity-based method for data analysis (TEDA), to predict monthly mean temperature in different cities of Brazil. Past values of maximum, minimum and mean monthly temperature, as well as previous values of exogenous variables such as cloudiness, rainfall and humidity were considered in the analysis. A non-parametric Spearman correlation based method is proposed to rank and select the most relevant features and time delays for a more accurate prediction. The datasets were obtained from weather stations located in main cities such as Sao Paulo, Manaus, and Porto Alegre. These cities are known to have particular weather characteristics. TEDA prediction results are compared with results provided by the evolving Takagi-Sugeno (eTS) and the extended Takagi-Sugeno (xTS) methods. In general, TEDA provided slightly more accurate predictions at the price of a higher computational cost.
Abstract - Evolving fuzzy systems are widely recognized to be able to capture the non-stationary phenomenon of data stream. Most existing algorithms for the parameter identification problem of evolving fuzzy systems are built on heuristic methods rather than the optimal method, when there is a structure change of the system such as rule adding, merging or removing. In order to address this issue, this paper proposes a new online learning algorithm with time varying structure and parameters from a parameter optimization point of view, in which the influence between fuzzy rules is naturally considered, to identify evolving Mamdani fuzzy systems. Firstly, to minimize the local error function and get the accurate (rather than heuristic) weighted recursive least square estimation of the consequent parameters, the methods for structure changing and parameter updating are obtained. Further, these methods are proved leading to a new effective algorithm for optimum solutions. Moreover, for the proposed online learning approach, a special type of weighted recursive least square updating formulas of the consequent parameters are proposed. Numerical experiments and comparisons with other state-of-art algorithms demonstrate that the proposed algorithm can achieve better predictions than other algorithms judged by accuracy.
Abstract - Systems capable of generating data quickly and continuously, known as Data Streams, are a reality today and tend to increase. Due to the nature of Data Streams, unsupervised learning, such as clustering algorithms, is appropriate. In addition, techniques derived from fuzzy set theory can be useful and add flexibility to the process. Fuzzy clustering algorithms for Data Streams found in the literature are based on chunks, which require the definition of several parameters besides presenting the drawback of overly reducing the summarization of data. An approach to Data Stream clustering that overpasses some of the limitations of chunk-based algorithms is the one called Online-Offline Framework. This framework comprises two phases: summarization and clustering. To the best of our knowledge, there is not a fuzzy version of this framework. The objective of this work is to propose a fuzzy version of Online-Offline Framework, called FuzzStream, whose main component is a summarization structure and its corresponding maintenance algorithm to be used in the online phase. The well known Weighted Fuzzy C-Means clustering algorithm is used in the offline phase. Experiments show that our proposal is a promising approach to deal with data streams and presents benefits with relation to the classic version.
Abstract - Modelling technology integration in the teaching and learning environment is a complex, uncertain and dynamic practice. A large amount of student behaviour data has been gathered literately for different processing purposes. Yet, considerable questions are still remaining due to the huge data volume, diversification and uncertainty. In this work, we implement a big-data analytical framework for online behaviour modelling, particularly taking streaming data of students' online activity from their laptop usage as an illustrative example. The proposed framework covers details from accessing streaming records to storing heterogeneous data. Furthermore, the work also demonstrates the use of a TF- IDF based feature generation and fuzzy representation strategy to discover critical patterns via this behaviour data. The accuracy of the modelling work is evaluated using students' score on a national-wide test. Experimental results show that the employed TF-IDF feature is much stabler than other traditional features, thereby achieving a better modelling performance. In summary, the simulation result demonstrates the flexibility and applicability of the proposed framework for processing complex behaviour data, and revealing important patterns for decision making.
Abstract - Concept drift, given the huge volume of high-speed data streams, requires traditional machine learning models to be self-adaptive. Techniques to handle drift are especially needed in regression cases for a wide range of applications in the real world. There is, however, a shortage of research on drift adaptation for regression cases in the literature. One of the main obstacles to further research is the resulting model complexity when regression methods and drift handling techniques are combined. This paper proposes a self-adaptive algorithm, based on a fuzzy kernel c-means clustering approach and a lazy learning algorithm, called FKLL, to handle drift in regression learning. Using FKLL, drift adaptation first updates the learning set using lazy learning, then fuzzy kernel c-means clustering is used to determine the most relevant learning set. Experiments show that the FKLL algorithm is better able to respond to drift as soon as the learning sets are updated, and is also suitable for dealing with reoccurring drift, when compared to the original lazy learning algorithm and other state-of-the-art regression methods.
Abstract - The aim of machine learning is to find hidden insights into historical data, and then apply them to forecast the future data or trends. Machine learning algorithms optimize learning models for lowest error rate based on the assumption that the historical data and the data to be predicted conform to the same knowledge pattern (data distribution). However, if the historical data is not enough, or the knowledge pattern keeps changing (data uncertainty), this assumption will become invalid. In data stream mining, this phenomenon of knowledge pattern changing is called concept drift. To address this issue, we propose a novel fuzzy windowing concept drift adaptation (FW-DA) method. Compared to conventional windowing-based drift adaptation algorithms, FW-DA achieves higher accuracy by allowing the sliding windows to keep an overlapping period so that the data instances belonging to different concepts can be determined more precisely. In addition, FW-DA statistically guarantees that the upcoming data conforms to the inferred knowledge pattern with a certain confidence level. To evaluate FW-DA, four experiments were conducted using both synthetic and real-world data sets. The experiment results show that FW-DA outperforms the other windowing-based methods including state-of-the-art drift adaptation methods.