National Research Council of Canada. NRC Institute for Biological Sciences
National Research Council of Canada. NRC Institute for Information Technology
data mining; genomics; gene identifications; gene expression; Alzheimer's disease and microarray
Genome wide transcription profiling is a powerful technique for studying the enormous complexity of cellular states. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. The data requires care in both pre-processing and the application of data mining techniques. This paper addresses the problem of dealing with microarray data that come from two known classes (Alzheimer and normal). We have applied three separate techniques to discover genes associated with Alzheimer disease (AD). The 67 genes identified in this study included a total of 17 genes that are already known to be associated with Alzheimer's or other neurological diseases. This is higher than any of the previously published Alzheimer's studies. Twenty known genes, not previously associated with the disease, have been identified as well as 30 uncharacterized Expressed Sequence Tags (ESTs). Given the success in identifying genes already associated with AD, we can have some confidence in the involvement of the latter genes and ESTs. From these studies we can attempt to define therapeutic strategies that would prevent the loss of specific components of neuronal function in susceptible patients or be in a position to stimulate the replacement of lost cellular function in damaged neurons. Although our study is based on a relatively small number of patients (4 AD and 5 normal), we think our approach sets the stage for a major step in using gene expression data for disease modelling (i.e. classification and diagnosis). It can also contribute to the future of gene function identification, pathology, toxicogenomics, and pharmacogenomics.
Artificial Intelligence in Medicine31, no. 2: 137–154.