\(~\)
\(~\)
\(~\)

Why normalization

The first step to analyze any data is preprocessing of the data. Data normalization is a part of data preprocessing. The measurement unit used can affect the data analysis. For example, changing measurement units from a scale into a different scale may lead to very different results (meters to centimeters for height, kilograms to grams for weight). The smaller scale may affect more on results or we can say the attribute has greater effect or weight. In data preprocessing we sometimes use the term standardization instead of normalization. In this document some methods related to data normalization are presented.

\(~\)

A microarray experiment raw data realated to colon cancer patients was stored at rawDataMatrix object containing 53 columns and 20000 rows. GSE18088 is the id of this experiment in GEO database.

dim(rawDataMatrix)
## [1] 20000    53
max(rawDataMatrix) - min(rawDataMatrix)
## [1] 65508
boxplot(rawDataMatrix , pch=".")

hist(rawDataMatrix[,2] , breaks = 100)

\(~\)

You can see MA plots between sample 1 and 2 to 7. As can be seen most of the data are located at the similar position

par(mfrow = c(2,3))

for(i in 2:7){
  A = (rawDataMatrix[,1]+rawDataMatrix[,i])/2 
  M = (rawDataMatrix[,1]-rawDataMatrix[,i])    
  smoothScatter(A,M , ylim = c(-25000,5000))
  abline(0,0 , col ="green")
}
dev.off()

\(~\)

\(~\)

par(mfrow = c(2,3))
for(i in 2:7){
qqplot(rawDataMatrix[,1],rawDataMatrix[,i] , col = "darkgreen" , ylim=c(0,70000))
}