In fields like business, finance, and biology we collect enormous amounts of data. Unfortunately, all too often we measure what we are able to measure, but don’t know if it is what we need to measure to predict things we really want to know. DQC can be used to settle this question either positively or negatively. In other words, DQC can be used to validate the usefulness of the data. Two examples of this approach:
 
  • Alzheimer’s data: One such challenge was presented by a dataset that listed changes to the genome of each of 1500 patients at 250,000 locations.   The question was “Could this data be used to predict the likelihood of a patient developing late onset Alzheimer’s disease?”   In addition to the genetic data, the Alzheimer’s status of the patients had been determined by autopsy. DQC analysis showed the data did not contain enough information to predict which patient would get the late onset form of the disease. The total DQC time required for this study was four days. We later learned that the medical community came to the same conclusion after 10 years of work.
  • Amino Acids: This dataset consisted of aligned amino acid sequences for two different proteins that create holes in cell walls. The question was “Could we predict the biological function of the two proteins using only this information? ”  We not only predicted the biological function, we then conclusively identified the two sites along the amino-acid chain most relevant to determining their behavior. ​
         NOTE:  It was not necessary to have any specific knowledge of biology to carry out either analysis.
By |2017-12-24T03:21:58-07:00December 24th, 2017|Examples|