University of Maryland

Dec16, 10am, Carlos Scheidegger, AT&T Labs Research, “How do you look at a billion data points? Exploratory Visualization for Big Data”

December 4th, 2013 by

“How do you look at a billion data points? Exploratory Visualization for Big Data”
SPEAKER:  Dr. Carlos Scheidegger, AT&T Labs Research
WHEN:  10:00am  – 11:00am,  Monday, December 16, 2013  ***  LOCATION:  Room 2120 A.V. Williams Building  

Consider exploration of large multidimensional spatiotemporal datasets with billions of entries. Are certain attributes correlated spatially or temporally? How do we even look at data of this size? In this talk, I will present the techniques and algorithms to compute and query a nanocube, a data structure that enables interactive visualizations of data sources in the range of billions of elements.

Data cubes are widely used for exploratory data analysis. Although they are sometimes assumed to take a prohibitively large amount of space (and to consequently require disk storage), nanocubes fit in a modern laptop’s main memory, even for hundreds of millions of entries. I will present live demos of the technique on a variety of real-world datasets, together with comparisons to the previous state of the art with respect to memory, timing, and network bandwidth measurements.

Nanocubes merge database technology and visualization, and increase by two orders of magnitude the scale of data that can be quickly visualized. Time permitting, I will touch upon other work in the same spirit; a novel approximate numerical linear algebra algorithm that has applications in graph visualization, and a novel trajectory clustering and visualization algorithm.

Bio:   Carlos Scheidegger finished his PhD at the Scientific Computing and Imaging Institute at the University of Utah in 2009. Since then, he is a senior member of the technical staff of the Information Visualization department at AT&T Labs Research. He is interested in large-scale data visualization, exploratory data analysis and in provenance and data management issues of computational tasks. He has received best paper awards at VisWeek 2007 and SMI 2008, and is currently working on the next generation of data analysis and visualization infrastructure for the world wide web.