Navegar Categoría

Docencia

An unsupervised technique to discretize numerical values by fuzzy partitions

articulo_jaise

Una nueva publicación con técnicas aplicadas al análisis de datos.

An unsupervised technique to discretize numerical values by fuzzy partitions.

Abstract:

The numerical value discretization is a process that is performed in the data preprocessing phase of intelligent data analysis. Preprocessing phase is very relevant because the quality of the models obtained in data mining step depends on this phase. Value discretization is an important task in data preprocessing because not all data mining techniques can handle continuous values. In this paper an unsupervised technique to discretize continuous data values using fuzzy partitions is proposed. Specifically a clustering technique that gets fuzzy partitions is presented. In addition, to evaluate the behavior of the proposed technique a series of experiments have been proposed using a Extreme Learning Machine classifier and a committee of Extreme Learning Machine. Beside comparing with the K-means discretization technique. These experiments have been validated statistically obtaining the best results the approach proposed.

High-Throughput Infrastructure for Advanced ITS Services

articulo_ITSServicesHoy nos han comunicado la publicación de un nuevo artículo, con el que llevábamos trabajando desde el año pasado.

High-Throughput Infrastructure for Advanced ITS Services: A Case Study on Air Pollution Monitoring.

Abstract:

Novel cooperative intelligent transportation systems (ITS) serve as the basis for the provision of a number of services for drivers, occupants, and third parties. The vast amount of information to be collected, especially in vehicle-to-infrastructure (V2I) communication services, requires new algorithms and hardware platforms to cope with real-time requirements; however, this combination is not properly addressed in the literature. In this paper, we introduce a high-throughput hardware-software infrastructure to gather information from vehicles and efficiently process it to provide novel ITS services. We propose a parallelization approach of a fuzzy clustering technique on heterogeneous servers based on CPU and several GPUs, tailored to classification problems in V2I. The infrastructure is empirically tested to offer a geo-located pollution information service through the periodical collection of both vehicle’s position and status data. We offer a real service that correctly identifies highly polluting traffic areas and drivers. The results indicate a good performance of the system under high loads, and our scalability analysis reveals a good operation in real-ambitious deployments thanks to the use of the both CPU and multiple GPUs, showing that our proposal can efficiently host cooperative services involving high processing in the ITS context.

A novel fuzzy clustering approach to regionalise watersheds

paperHydrology Otra artículo que nos publican: A novel fuzzy clustering approach to regionalise watersheds with an automatic determination of optimal number of clusters. Una nueva aplicación de los algoritmos de clasificación difusa.

Abstract

One of the most important problems faced in hydrology is the estimation of flood magnitudes and frequencies in ungauged basins. Hydrological regionalisation is used to transfer information from gauged watersheds to ungauged watersheds. However, to obtain reliable results, the watersheds involved must have a similar hydrological behaviour. In this study, two different clustering approaches are used and compared to identify the hydrologically homogeneous regions. Fuzzy C-Means algorithm (FCM), which is widely used for regionalisation studies, needs the calculation of cluster validity indices in order to determine the optimal number of clusters. Fuzzy Minimals algorithm (FM), which presents an advantage compared with others fuzzy clustering algorithms, does not need to know a priori the number of clusters, so cluster validity indices are not used. Regional homogeneity test based on L-moments approach is used to check homogeneity of regions identified by both cluster analysis approaches. The validation of the FM algorithm in deriving homogeneous regions for flood frequency analysis is illustrated through its application to data from the watersheds in Alto Genil (South Spain). According to the results, FM algorithm is recommended for identifying the hydrologically homogeneous regions for regional frequency analysis.

Big Data con MATLAB

MatlabbigDataInteresante Webinars de cómo MatLab trata grandes conjuntos de datos

Big Data con MATLAB
Paz Tárrega, MathWorks

Descripción general

Según crecen nuestros datos en tamaño y complejidad, se hace más difícil trabajar con ellos, particularmente cuando los datos no caben en memoria. MATLAB ofrece un entorno único para trabajar con big data conviertiendo el análisis y el proceso de big data en fácil, coveniente y escalable.

En este webinar aprenderá estrategias y técnicas para manejar grandes volúmenes de datos en MATLAB. Se muestran las nuevas capacidades de la versión 2016b de MATLAB, incluyendo tall arrays. Utilizando tall arrays puede prescindir de aprender programación orientada a big data o técnicas de manejo de datos fuera de memoria, simplemente utilice el código y la sintaxis que utiliza hasta ahora de MATLAB.