BACK...

Chi-Square

Chi-Square is the statistic that is calculated here to determine the probability of a given table occuring by chance.

It works by comparing observed values with expected values.

Suppose you had 100 people in a room.

50 men and 50 women.

50 of them smokers and 50 of them non-smokers.

You could do a survey of all the people, note their sex and ask whether they were a smoker or not.

You would be able to complete the table below:

  MALE FEMALE  
SMOKER     50
NON-SMOKER     50
  50 50 100

Whatever we might think about the nature of any association here on the basis of NO ASSOCIATION how many male smokers would you expect?

If the 50 men were smokers and non-smokers in proportion to the overall numbers of smokers and non-smokers then you would have how many male smokers?

Half of the people are smokers. We would expect, therefore, half of the men to be smokers.

We have 50 men, therefore we would expect to have 25 male smokers - on the basis of no association.

Suppose we observed 26 male smokers. More than we would expect but this could easily be chance at work.

Suppose all the men were smokers - obviously evidence of some sort of link.

The question is "where to draw the line?"

Chi works by comparing observed and expected values for each cell in a table and is computed on the basis of the differences.

Suppose we came up with the following results:

  MALE FEMALE  
SMOKER 34 9 43
NON-SMOKER 22 20 42
  56 29 85

How many male smokers would we expect? OK, we have 34 of them, but how many would we expect?

On the basis of no association between the variables we would expect the proportion of men smoking to be the same as the proportion of PEOPLE smoking. That is we would expect:

(EXPECTED NUMBER OF MALE SMOKERS)/56 to equal 43/85

That being the case that leaves us expecting (43*56)/85 male smokers - or 28.329.

Thus expected values for a given cell can be calculated by taking:
(ROW TOTAL * COLUMN TOTAL)/GRAND TOTAL

The process goes on, as illustrated below to yield the Chi statistic for the table.

Observed Values (O) Expected Values (E) (O-E) (O-E)² Chi Components
(O-E)²/E
34 28.329 5.671 32.160 1.135
9 14.671 -5.671 32.160 2.192
22 27.671 -5.671 32.160 1.162
20 14.329 5.671 32.160 2.244
Sum of Chi components = Chi-Square = 6.734

"6.734?" I hear you shout. So what? Well normally you'd have to go clutching that figure to a printed table of chi-square significance values which would start asking awkward questions like "How many degrees of freedom have you got?" In this case the answer is 1.

Degrees of freedom = (number of rows -1) * (number of columns -1)
= (2-1)*(2-1)
= 1*1
= 1

Let's cut to the chase....

The printed table tells us that:
a chi value of more than 2.706 would be significant at 10%,
a chi value of more than 3.841 would be significant at 5% and
a chi value of more than 6.635 would be significant at 1%.

So, our table is significant at 1% - but what is the story?

Look for the big chi-components - thats where the differences between observed and expected values will be largest.

In this case its the 20 female non-smokers we've got - more than the 14.329 that we would expect - that are making the biggest contribution to the overall chi value.

Women are significantly less likely to smoke than men - according to this fictional data.

Enough for one day, but just note that in this example we had a 2x2 table and should really have applied Yates's correction - you can read up on that - which would have given a chi value of 5.598 making the table significant at only 5%.