We were confronted by a large noisy dataset, but now we were sure that most of the data would not be useful. The problem was to process the data without first attempting to clean it up, necessarily biasing the results. DQC succeeded admirably in this analysis.

No pre-cleaning: The 95% of the data that was of little interest segregated itself into two large, extended structures.

Huge number of clusters: The number of interesting clusters in the remaining 5% of the data was very large: 669 in all. This number is much larger than one would use as input for conventional clustering methods like k-means.

The good data showed interesting structure: Examination of the average behavior of each cluster showed that half of them exhibited the correlations of interest, while the other half definitively identified problems with the apparatus. Further study of the problematic data revealed a hole that had been burned in the detector during a previous experiment. Thus, both parts of the data proved informative.

By |2017-12-24T03:16:02+00:00December 24th, 2017|Examples|