Session Program

 

  • 11 July 2017
  • 08:00AM - 10:00AM
  • Room: Auditorium
  • Chairs: Guangquan Zhang, Farookh Khadeer Hussain and Jie Lu

Handling Uncertainties in Big Data by Fuzzy Systems

Abstract - Due to the dynamic nature of cloud environments, the workload of virtual machines (VMs) fluctuates leading to imbalanced loads and utilization of virtual and physical cloud resources. It is, therefore, essential that cloud providers accurately forecast VM performance and resource utilization so they can appropriately manage their assets to deliver better quality cloud services on demand. Current workload and resource prediction methods forecast the workload or CPU utilization pattern of the given web-based applications based on their historical data. This gives cloud providers an indication of the required number of resources (VMs or CPUs) for these applications to optimize resource allocation for software as a service (SaaS) or platform as a service (PaaS), reducing their service costs. However, historical data cannot be used as the only data source for VM workload predictions as it may not be available in every situation. Nor can historical data provide information about sudden and unexpected peaks in user demand. To solve these issues, we have developed a fuzzy workload prediction method that monitors both historical and current VM CPU utilization and workload to predict VMs that are likely to be performing poorly. This model can also predict the utilization of physical machine (PM) resources for virtual resource discovery.
Abstract - Cloud computing has been advancing at an impressive rate in recent years and is likely to increase more and more in the near future. New services are being developed constantly, such as cloud infrastructure, security and platform as a service, to name just a few. Due to the vast pool of available services, review websites have been created to help customers make decisions for their business. This leads to some reviewers taking advantage of these tools to promote the providers that hire them or to discredit competitors. These reviewers can either act individually or cooperate with each other. When reviewers collude to promote one product or defame another, they are called spammer groups. In this paper, we present an approach to identify spammer groups. First, a network-based method is used to identify individual spam reviewers. Then, a fuzzy k-means clustering algorithm is used to find the group that they belong to. A case study that suggests which group an incorrect review belongs to is provided to further understand the new method.
Abstract - Query expansion has been widely used to select additional words that are related to the original query words in the field of information retrieval. In this paper, we present a novel query expansion method that jointly uses fuzzy rules and a word embedding similarity calculation. The expansion words are generated using a word embedding method and selected according to their semantic similarity to the original query. Fuzzy rules are used to enhance the word similarity calculations and reweight expansion words. When measuring and ranking the relevance of a retrieved document, the original query and the expansion words with their weights are considered. We conduct experiments on the query expansion in document ranking tasks. Experimental results from the document ranking task show that the proposed method is able to significantly outperform state-of-the-art baseline methods.
Abstract - Protein complexes play important roles in protein-protein interaction networks. Recent studies reveal that many proteins have multiple functions and belong to more than one different complexes. To get better complex division, we need to consider time-dependent information of networks. However, only few studies can be found to concentrate on detecting overlapping clusters in time- dependent networks. To solve this problem, we propose integrated model of time-dependent network (IM-TDN) to describe time-dependent networks. On the base of this model, we propose similarity based dynamic fuzzy clustering (SDFC) algorithm to detect overlapping clusters. We apply the algorithm to synthetic data and real world protein-protein interaction network dataset. The results showed that our algorithm by using the model which we proposed achieved better results over the state-of-the-art baseline algorithms.
Abstract - As the age of big data approaches, methods of massive scale data management are rapidly evolving. The traditional machine learning methods can no longer satisfy the exponential development of big data; there is a common assumption in these data-driving methods that the distribution of both the training data and testing data should be equivalent. A model built using today's data will not adequately address the classification tasks tomorrow if the distribution of the data item values has changed. Transfer learning is emerging as a solution to this issue, and many methods have been proposed. Few of the existing methods, however, explicitly indicate the solution to the case where the labels' distributions in two domains are different. This work proposes the fuzzy rule-based methods to deal with transfer learning problems where the discrepancy between the two domains shows in the label spaces. The presented methods are validated in both the synthetic and real-world datasets, and the experimental results verify the effectiveness of the introduced methods.
Abstract - Classification learning is a very complex process whose success and failure ratio depends on a high amount of elements. One of them is the representation mean used for the data that is employed in the process. Granularity of the data used for classification learning purposes can affect dramatically the success and failure ratio of the obtained classification. In this paper, multi-granular fuzzy linguistic modelling methods are applied over the classification learning data in order to modify their granularity and increase the classification success ratio. Thanks to multi-granular fuzzy linguistic modelling methods, it is possible to automatically modify the data granularity in order to determine which data representation is the one that provides the better classification results in the learning process.