Air-Pollution Prediction in Smart Cities through Machine Learning Methods

Escrito por Jesus Soto el 23 Jun, 2018, en Docencia

Estamos de suerte, al final nos han aceptado el artículo que llevamos meses en revisión.

Air-Pollution Prediction in Smart Cities through Machine Learning Methods: A Case of Study in Murcia, Spain

Abstract:Air-pollution is one of the main threats for developed societies. According to the World Health Organization (WHO), pollution is the main cause of deaths among children aged under five. Smart cities are called to play a decisive role to improve such pollution by first collecting, in real-time, different parameters such as SO2, NOx, O3, NH3, CO, PM10, just to mention a few, and then performing the subsequent data analysis and prediction. However, some machine learning techniques may be more well-suited than others to predict pollution-like variables. In this paper several machine learning methods are analyzed to predict the ozone level (O3) in the Region of Murcia (Spain). O3 is one of the main hazards to health when it reaches certain levels. Indeed, having accurate air-quality prediction models is a previous step to take mitigation activities that may benefit people with respiratory disease like Asthma, Bronchitis or Pneumonia in intelligent cities. Moreover, here it is identified the most-significant variables to monitor the air-quality in cities. Our results indicate an adjustment for the proposed O3 prediction models from 90% and a root mean square error less than 11 μ/m3 for the cities of the Region of Murcia involved in the study.


An unsupervised technique to discretize numerical values by fuzzy partitions

Escrito por Jesus Soto el 18 Jun, 2018, en Docencia


Una nueva publicación con técnicas aplicadas al análisis de datos.

An unsupervised technique to discretize numerical values by fuzzy partitions.


The numerical value discretization is a process that is performed in the data preprocessing phase of intelligent data analysis. Preprocessing phase is very relevant because the quality of the models obtained in data mining step depends on this phase. Value discretization is an important task in data preprocessing because not all data mining techniques can handle continuous values. In this paper an unsupervised technique to discretize continuous data values using fuzzy partitions is proposed. Specifically a clustering technique that gets fuzzy partitions is presented. In addition, to evaluate the behavior of the proposed technique a series of experiments have been proposed using a Extreme Learning Machine classifier and a committee of Extreme Learning Machine. Beside comparing with the K-means discretization technique. These experiments have been validated statistically obtaining the best results the approach proposed.


High-Throughput Infrastructure for Advanced ITS Services

Escrito por Jesús Soto el 11 Abr, 2018, en Docencia

articulo_ITSServicesHoy nos han comunicado la publicación de un nuevo artículo, con el que llevábamos trabajando desde el año pasado.

High-Throughput Infrastructure for Advanced ITS Services: A Case Study on Air Pollution Monitoring.


Novel cooperative intelligent transportation systems (ITS) serve as the basis for the provision of a number of services for drivers, occupants, and third parties. The vast amount of information to be collected, especially in vehicle-to-infrastructure (V2I) communication services, requires new algorithms and hardware platforms to cope with real-time requirements; however, this combination is not properly addressed in the literature. In this paper, we introduce a high-throughput hardware-software infrastructure to gather information from vehicles and efficiently process it to provide novel ITS services. We propose a parallelization approach of a fuzzy clustering technique on heterogeneous servers based on CPU and several GPUs, tailored to classification problems in V2I. The infrastructure is empirically tested to offer a geo-located pollution information service through the periodical collection of both vehicle’s position and status data. We offer a real service that correctly identifies highly polluting traffic areas and drivers. The results indicate a good performance of the system under high loads, and our scalability analysis reveals a good operation in real-ambitious deployments thanks to the use of the both CPU and multiple GPUs, showing that our proposal can efficiently host cooperative services involving high processing in the ITS context.


A novel fuzzy clustering approach to regionalise watersheds

Escrito por Jesús Soto el 28 Nov, 2017, en Docencia

paperHydrology Otra artículo que nos publican: A novel fuzzy clustering approach to regionalise watersheds with an automatic determination of optimal number of clusters. Una nueva aplicación de los algoritmos de clasificación difusa.


One of the most important problems faced in hydrology is the estimation of flood magnitudes and frequencies in ungauged basins. Hydrological regionalisation is used to transfer information from gauged watersheds to ungauged watersheds. However, to obtain reliable results, the watersheds involved must have a similar hydrological behaviour. In this study, two different clustering approaches are used and compared to identify the hydrologically homogeneous regions. Fuzzy C-Means algorithm (FCM), which is widely used for regionalisation studies, needs the calculation of cluster validity indices in order to determine the optimal number of clusters. Fuzzy Minimals algorithm (FM), which presents an advantage compared with others fuzzy clustering algorithms, does not need to know a priori the number of clusters, so cluster validity indices are not used. Regional homogeneity test based on L-moments approach is used to check homogeneity of regions identified by both cluster analysis approaches. The validation of the FM algorithm in deriving homogeneous regions for flood frequency analysis is illustrated through its application to data from the watersheds in Alto Genil (South Spain). According to the results, FM algorithm is recommended for identifying the hydrologically homogeneous regions for regional frequency analysis.


Influence of multivariate modeling in the prediction of soil carbon by a portable infrared sensor

Escrito por Jesús Soto el 19 Sep, 2017, en Docencia

congresokorea Contribución Workshop Proceedings of the 13th International Conference on Intelligent Environments, Seoul, Korea, August 2017.


The determination of carbon is one of the most important in soil analysis. However traditional techniques are costly and time consuming. In this manuscript we propose an alternative predictive approach based on portable mid-infrared spectroscopy data modeled by machine learning techniques. We evaluate the performance of different machine learning models and sample size to predict soil carbon in 457 Australian soils. The results show a good performance of the models. All models are validate by statistical tests. The best performing technique with a 99% of confidence level is the Gaussian Process providing a 98% of accuracy for the prediction of soil carbon. Moreover, this technique is the most robust for the different sample sizes tested. When compared with the commonly used Partial Least Squares Regression technique, the machine learning approaches provide more successful and balanced results.


Big Data con MATLAB

Escrito por Jesús Soto el 27 Jun, 2017, en Docencia

MatlabbigDataInteresante Webinars de cómo MatLab trata grandes conjuntos de datos

Big Data con MATLAB
Paz Tárrega, MathWorks

Descripción general

Según crecen nuestros datos en tamaño y complejidad, se hace más difícil trabajar con ellos, particularmente cuando los datos no caben en memoria. MATLAB ofrece un entorno único para trabajar con big data conviertiendo el análisis y el proceso de big data en fácil, coveniente y escalable.

En este webinar aprenderá estrategias y técnicas para manejar grandes volúmenes de datos en MATLAB. Se muestran las nuevas capacidades de la versión 2016b de MATLAB, incluyendo tall arrays. Utilizando tall arrays puede prescindir de aprender programación orientada a big data o técnicas de manejo de datos fuera de memoria, simplemente utilice el código y la sintaxis que utiliza hasta ahora de MATLAB.


Fuzzy clustering as rational partition method for QSAR

Escrito por Jesús Soto el 20 Jun, 2017, en Docencia

articuloQSARLas técnicas de Fuzzy Clustering podemos aplicarlas en diferentes campos. En este ejemplo tenemos una colaboración que busca mejorar los métodos QSAR, de técnicas computacionales relacionadas con el cálculo de propiedades fisicoquímicas moleculares.


Various methods are used to make the partition of data sets for QSAR development and model validation. In this work we used a fuzzy minimals partitioning and we compare this methodology with another rational partition methods like k-means clustering (KMS) and Minimal Test Set Dissimilarity (MTSD). For the development of QSAR models Ordinary Least Squares (OLS) and Extreme Learning Machine (ELM) methods were used. The generated QSAR equations were validated by the coefficient of determination of the internal leave one out (LOO) cross validation method QLOO 2 and then the coefficient of the external test set Qext 2 was compared between partition methods. The results of this comparison showed that using fuzzy minimal for big and structurally diverse data sets gave an applicability domain similar to KMS and a better predictability models than both methods, KMS and MTSD.


Using SWAT and Fuzzy TOPSIS

Escrito por Jesús Soto el 22 Feb, 2017, en Escritorio

paperWaterNos acaban de publicar un artículo: Using SWAT and Fuzzy TOPSIS to Assess the Impact of Climate Change in the Headwaters of the Segura River Basin (SE Spain).


The Segura River Basin is one of the most water-stressed basins in Mediterranean Europe. If we add to the actual situation that most climate change projections forecast important decreases in water resource availability in the Mediterranean region, the situation will become totally unsustainable. This study assessed the impact of climate change in the headwaters of the Segura River Basin using the Soil and Water Assessment Tool (SWAT) with bias-corrected precipitation and temperature data from two Regional Climate Models (RCMs) for the medium term (2041–2070) and the long term (2071–2100) under two emission scenarios (RCP4.5 and RCP8.5). Bias correction was performed using the distribution mapping approach. The fuzzy TOPSIS technique was applied to rank a set of nine GCM–RCM combinations, choosing the climate models with a higher relative closeness. The study results show that the SWAT performed satisfactorily for both calibration (NSE = 0.80) and validation (NSE = 0.77) periods. Comparing the long-term and baseline (1971–2000) periods, precipitation showed a negative trend between 6% and 32%, whereas projected annual mean temperatures demonstrated an estimated increase of 1.5–3.3 °C. Water resources were estimated to experience a decrease of 2%–54%. These findings provide local water management authorities with very useful information in the face of climate change.

Mi contribución se centra en la aplicación de las técnicas de Fuzzy TOPSIS para la selección de los modelos del cambio de clima.



El problema de los acentos en Kubuntu

Escrito por Jesús Soto el 13 Sep, 2016, en Escritorio

En la reciente instalación de Kubuntu 16.04 LTS me apareció un problema extraño, no dejaba acentuar las palabras. Después de buscar la solución con el cambio de lenguaje, las configuraciones locales y instalaciones que había que realizar en cada inicio, he topado con la solución: es el problema de la “tilde muerta”. Las distribuciones de Español incluyen la posibilidad de utilizar la Variante Español(incluir tilde muerta). Así solucionamos el incordio. Todo lo hacemos desde la ventana

Etiquetas: ,


Parallel implementation of fuzzy minimals clustering algorithm

Escrito por Jesús Soto el 5 Abr, 2016, en Docencia



Clustering aims to classify different patterns into groups called clusters. Many algorithms for both hard and fuzzy clustering have been developed to deal with exploratory data analysis in many contexts such as image processing, pattern recognition, etc. However, we are witnessing the era of big data computing where computing resources are becoming the main bottleneck to deal with those large datasets. In this context, sequential algorithms need to be redesigned and even rethought to fully leverage the emergent massively parallel architectures. In this paper, we propose a parallel implementation of the fuzzy minimals clustering algorithm called Parallel Fuzzy Minimal (PFM). Our experimental results reveal linear speed-up of PFM when compared to the sequential counterpart version, keeping very good classification quality.

Copyright © 2018 El blog de Jesús Soto All rights reserved.
Desk Mess Mirrored v1.4.3.1 theme from BuyNowShop.com.