Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.
La presencia de contaminantes emergentes en aguas es cada vez mayor. Especialmente preocupan los antibióticos, debido a que pueden dar lugar a la aparición de bacterias resistentes, pero también a que dichos antibióticos pueden afectar negativamente a los ecosistemas y a los organismos que los habitan. Los antibióticos empleados para el consumo humano terminan llegando a las estaciones depuradoras de aguas residuales (EDAR) donde se ha visto que se eliminan solo en parte…
Developing an intelligent system for the prediction of soil properties with a portable mid-infrared instrument.
- •Different machine learning techniques have been tested to predict soil properties.
- •The predicted soil properties are TC, TN, CEC, clay, silt and Na+.
- •The best predictive machine learning technique has been the Gaussian Process.
- •The Gaussian process is better compared to the traditional PLSR technique.
- •The Gaussian Process is the candidate for the development of intelligent system.
Air-Pollution Prediction in Smart Cities through Machine Learning Methods: A Case of Study in Murcia, Spain
Abstract:Air-pollution is one of the main threats for developed societies. According to the World Health Organization (WHO), pollution is the main cause of deaths among children aged under five. Smart cities are called to play a decisive role to improve such pollution by first collecting, in real-time, different parameters such as SO2, NOx, O3, NH3, CO, PM10, just to mention a few, and then performing the subsequent data analysis and prediction. However, some machine learning techniques may be more well-suited than others to predict pollution-like variables. In this paper several machine learning methods are analyzed to predict the ozone level (O3) in the Region of Murcia (Spain). O3 is one of the main hazards to health when it reaches certain levels. Indeed, having accurate air-quality prediction models is a previous step to take mitigation activities that may benefit people with respiratory disease like Asthma, Bronchitis or Pneumonia in intelligent cities. Moreover, here it is identified the most-significant variables to monitor the air-quality in cities. Our results indicate an adjustment for the proposed O3 prediction models from 90% and a root mean square error less than 11 μ/m3 for the cities of the Region of Murcia involved in the study.
The determination of carbon is one of the most important in soil analysis. However traditional techniques are costly and time consuming. In this manuscript we propose an alternative predictive approach based on portable mid-infrared spectroscopy data modeled by machine learning techniques. We evaluate the performance of different machine learning models and sample size to predict soil carbon in 457 Australian soils. The results show a good performance of the models. All models are validate by statistical tests. The best performing technique with a 99% of confidence level is the Gaussian Process providing a 98% of accuracy for the prediction of soil carbon. Moreover, this technique is the most robust for the different sample sizes tested. When compared with the commonly used Partial Least Squares Regression technique, the machine learning approaches provide more successful and balanced results.
Las técnicas de Fuzzy Clustering podemos aplicarlas en diferentes campos. En este ejemplo tenemos una colaboración que busca mejorar los métodos QSAR, de técnicas computacionales relacionadas con el cálculo de propiedades fisicoquímicas moleculares.
Various methods are used to make the partition of data sets for QSAR development and model validation. In this work we used a fuzzy minimals partitioning and we compare this methodology with another rational partition methods like k-means clustering (KMS) and Minimal Test Set Dissimilarity (MTSD). For the development of QSAR models Ordinary Least Squares (OLS) and Extreme Learning Machine (ELM) methods were used. The generated QSAR equations were validated by the coefficient of determination of the internal leave one out (LOO) cross validation method QLOO 2 and then the coefficient of the external test set Qext 2 was compared between partition methods. The results of this comparison showed that using fuzzy minimal for big and structurally diverse data sets gave an applicability domain similar to KMS and a better predictability models than both methods, KMS and MTSD.
The Segura River Basin is one of the most water-stressed basins in Mediterranean Europe. If we add to the actual situation that most climate change projections forecast important decreases in water resource availability in the Mediterranean region, the situation will become totally unsustainable. This study assessed the impact of climate change in the headwaters of the Segura River Basin using the Soil and Water Assessment Tool (SWAT) with bias-corrected precipitation and temperature data from two Regional Climate Models (RCMs) for the medium term (2041–2070) and the long term (2071–2100) under two emission scenarios (RCP4.5 and RCP8.5). Bias correction was performed using the distribution mapping approach. The fuzzy TOPSIS technique was applied to rank a set of nine GCM–RCM combinations, choosing the climate models with a higher relative closeness. The study results show that the SWAT performed satisfactorily for both calibration (NSE = 0.80) and validation (NSE = 0.77) periods. Comparing the long-term and baseline (1971–2000) periods, precipitation showed a negative trend between 6% and 32%, whereas projected annual mean temperatures demonstrated an estimated increase of 1.5–3.3 °C. Water resources were estimated to experience a decrease of 2%–54%. These findings provide local water management authorities with very useful information in the face of climate change.
Mi contribución se centra en la aplicación de las técnicas de Fuzzy TOPSIS para la selección de los modelos del cambio de clima.
Clustering aims to classify different patterns into groups called clusters. Many algorithms for both hard and fuzzy clustering have been developed to deal with exploratory data analysis in many contexts such as image processing, pattern recognition, etc. However, we are witnessing the era of big data computing where computing resources are becoming the main bottleneck to deal with those large datasets. In this context, sequential algorithms need to be redesigned and even rethought to fully leverage the emergent massively parallel architectures. In this paper, we propose a parallel implementation of the fuzzy minimals clustering algorithm called Parallel Fuzzy Minimal (PFM). Our experimental results reveal linear speed-up of PFM when compared to the sequential counterpart version, keeping very good classification quality.