Ventocilla, E., & Riveiro, M. (2019). Visual Growing Neural Gas for Exploratory Data Analysis. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications : Volume 3: IVAPP, 58-71, 2019, Prague, Czech Republic (Vol. 3, pp. 58–71). SciTePress.
This paper argues for the use of a topology learning algorithm, the Growing Neural Gas (GNG), for providing an overview of the structure of large and multidimensional datasets that can be used in exploratory data analysis. We introduce a generic, off-the-shelf library, Visual GNG, developed using the Big Data framework Apache Spark, which provides an incremental visualization of the GNG training process, and enables user-in-the-loop interactions where users can pause, resume or steer the computation by changing optimization parameters. Nine case studies were conducted with domain experts from different areas, each working on unique real-world datasets. The results show that Visual GNG contributes to understanding the distribution of multidimensional data; finding which features are relevant in such distribution; estimating the number of k clusters to be used in traditional clustering algorithms, such as K-means; and finding outliers. Access publication.