Analysing Subsets of Gene Expression Data to Find Putatively Co-regulated Genes

This project is an investigation of whether analysing subsets of time series gene expression data can give additional information about putatively co-regulated genes, compared to only using the whole time series.

The original gene expression data set was partitioned into subsets and similarity was computed for both the whole timed series and subsets. Pearson correlation was used as similarity measure between gene expression profiles.

The results indicate that analysing co-expression in subsets of gene expression data derives true-positive connections, with respect to co-regulation, that are not detected by only using the whole time series data.

Unfortunately, with the actual data set, chosen similarity measure and partitioning of the data, randomly generated connections have the same amount of true-positives as the ones derived by the applied analysis.

However, it is worth to continue further analysis of the subsets of gene expression data, which is based on the multi-factorial nature of gene regulation. E.g. other similarity measures, data sets and ways of partitioning the data set should be tried.
Source: University of Skövde
Author: Karjalainen, Merja

Download Project