Standard Data Analysis

Data generated from microarray study is enormous and is richer and deeper than ever before that requires sophisticated statistical treatments and interpretations to decipher patterns from the entire expression repertoire in order to contribute in practical use to understand disease and to improve human condition.

Quality control filters quickly examine and re-examine data based on different experimental conditions. Data can be filtered by performing wide range of analyses. These are based on raw values, control signal, fold change gene expression. Following quality control measurements are performed to get significant data.

  • Data filteration
  • Improvement on missing values
  • Gene expression profile
Gene expression- Fold change

Data filteration

Data is filtered to get reliable and significant differentially expressed genes. Genes are selected based on:

  • Confidence
  • Background cut off value
  • Fold change cut off

Based on confidence:

Identify differentially expressed genes in two conditions- The Welch’s t- test is performed to identify differentially expressed genes in two conditions for comparison. The p- value is corrected for multiple hypotheses tested. We apply  p- value cut off < 0.05 to negate the assumption of equal variances between the groups.

Identify differentially expressed genes in three or more conditions- The Welch’s ANOVA is performed to find the variables between and within groups and to identify differential expressed genes having significant effects for three or more conditions.

Identify false positive genes - Benjamini and Hochberg method is used to estimate the percentage of false positives in a list of candidate genes. It is done for multiple test adjustments.  It is used to control the false discovery rate (FDR) at the right p- value threshold.

Based on background cut off value:

Identify highly regulated genes- Theory proposes that highly regulated genes are low in abundance. Advanced Genomics expert team can easily detects rarely expressed but potentially important genes while performing background cut off values.

Based on fold change cut- off:

Identify reliable regulated genes. The fold changes or log ratios is calculated in various parameters to inspect the region of reliable data, intensity, linearity and noise within data and linear dynamic range. Multi-functional scatter plots can be generated that can easily select groups of genes for analysis. Two fold up and down regulated genes are selected as significant in microarray gene expression.

Improvement on missing values:

Filtered, corrupted data or suspicious spots during image analysis phase can be recovered by gene ontology in two steps. In the first step, a set of genes nearest to the missing value is selected. The second step involves prediction of the missing value using observed values of the selected genes by gene ontology along with expression data. 

Gene expression profile:

Identify changes in gene expression- The expression level changes of an individual gene is viewed across a number of samples selected during data mining in a line graph to spot similar trends in the gene expression profiles under comparison.