Pages

Tuesday, November 29, 2011

Difference between Covariance and Correlation

Both measure the same thing, nearly the same to be more accurate. Covariance measures how the two variables are related. A positive value indicates a positive linear relationship. In R software, it can be calculated by cov(variable x, variable y).
Correlation coefficient is covariance divided by the product of the standard deviations of the two variables. It is a normalized measurement of the relationship between two variables.If it is near to one means that there is a positive linear relationship. If '-1' means that negative linear relationship. If zero, it means no linear relationship. In R software, it can be calculated simply by cor(variable x, variable y)

Wednesday, September 28, 2011

Japanese Encephalitis reported for the first time in Delhi

A disease which was till now thought to be limited to Eastern UP has been reported from Delhi. Epidemiological investigations are ongoing. Lab reports from NCDC have confir med JE antibodies from the samples that were sent from suspected patients.


 If local transmission is confirmed, Delhi will have to gear up to take care of one more scourge: Culex tritaenorynchus. This will be in addition to the existing burden of other mosquito borne diseases like Malaria and Dengue, which cause deadly outbreaks in Delhi. It is hightime that sanitation is given a high priority in the capital city of India.

Thursday, July 14, 2011

Levene's test for equality of variances

In an earlier post, I discussed about Bartlett's test for homogeneity of variances. But I found out that Bartlett's test is very sensitive to normality assumption. Even if the data is slightly non-normal, then it does not hold good. In that case Levene's test for equality of variances becomes the test of choice.

In R we can go for library(car) and then apply Levene's Test by using the following formula:
leveneTest(age~sex): in this e.g. sex is a categorical variable having male and female as two groups. This formula will test the null hypothesis that the variances of age in male and female groups are equal. If p is less than 0.05 then it means that the null hypothesis is rejected i.e. the variances are not equal in the two groups.

Friday, June 24, 2011

Test for homogeneity of variance: Bartlett test

First an important assumption: that the normal distribution is followed by the data.
Null hypothesis for this test is that the variances are equal in the groups.
So if, p value less than 0.05, null hypothesis is rejected and interpretation being that the variances are not equal (or homodescacity is absent).

In case, the normality assumption is violated, then we can go for Levene test.

For further discussion using R: http://wiki.stdout.org/rcookbook/Statistical%20analysis/Homogeneity%20of%20variance

Test for normality: Shapiro Wilk test

In order to test the normality of the distribution pattern of a variable, we can use the Shapiro Wilk test. The null hypothesis is that: There is no difference in the distribution of the given data and a normal distribution curve (or in other words the data is normally distributed).
So if the P value is less than 0.05, means that the null hypothesis is rejected or in other words, there is a difference in the distribution of the given data and a normal distribution curve or the given data is not normally distributed.

For this type of data which is not normally distributed, it is time to go for a non parametric test.

Note: Shapiro Wilk can be used to test the normality of residuals in a linear regression model. In R using the epicalc package, it can be done as: shapiro.test(lm(age ~ case.factor)$residuals). In this example, age is the numerical variable and case.factor is the categorical variable as cases and controls and lm stands for linear model.