Data Science Distinguished Speaker Series: Data Analytics for Heterogeneous Data by Julio J. Valdes (NRC)

Data Science Distinguished Speaker Series: Data Analytics for Heterogeneous Data by Julio J. Valdes (NRC)

Categories: Lectures and Seminars, Programs and Training | Intended for , , , , , , ,

Wednesday, January 27, 2016

1:30 PM - 3:30 PM

5345 Herzberg Laboratories

1125 Colonel By Dr, Ottawa, ON

Contact Information

Kathryn Elliott, 613-520-2600 ext. 3244, kathryn.elliott@carleton.ca

Registration

Limited - Register Now

Cost

$0

About this Event

Host Organization: Carleton University Institute for Data Sciences
More Information: Please click here for additional details.

DATA ANALYTICS FOR HETEROGENEOUS DATA
By Julio J. Valdes, National Research Council

ABSTRACT

Heterogeneous data refers to objects described by features of different nature (e.g. mixtures of numeric, qualitative (nominal), ordinal, interval, images, documents, signals, graphs, etc.). In addition to the complexity introduced by the heterogeneity of the attributes, the information usually is incomplete (missing values) and is obtained with different degrees and types of uncertainty. An example is the case of a patient, described by non-numeric variables (e.g. gender), ordinal variables (pain intensity), numeric variables (e.g. temperature, blood pressure), image variables (e.g. X-ray), document variables (e.g. a medical laboratory report), signal variables (e.g. ECG), etc. All of these variables provide information about an object as a single whole entity. A given dataset may contain hundreds, thousands or even millions of such objects.

Modern developments in sensor, communication and computer technologies have revolutionized data acquisition by increasing the amount of information obtained from a targeted problem (the ‘big data’ buzzword), which has received a lot of attention. However, another degree of heterogeneity of the information obtained.

Most data analytic procedures in general are oriented to homogenous data (mostly numeric data). Those among them that have capabilities for handling missing information do so usually via imputation and fewer accept plain data absence. When dealing with problems involving heterogeneous data the usual approaches are i) to work with a (homogeneous) subset of the information and/or ii) to redefine the data attributes so that the resulting information is acceptable by the data processing procedure.

This presentation illustrates an approach to processing heterogeneous information sensu-strictu accepting data incompleteness and uncertainties. Real world examples are presented for important operations (overlooked) consequence has been the increasing in data analytics like classification, regression and data visualization.

Register For this Event

70 spaces capacity, 66 spot(s) left.