Thursday, 21 June 2012

Introduction to Statistics

TYPES OF DATA
  • Categorical (qualitative)
    • Nominal (eg. type of car)
    • Ordinal (eg. Stage of cancer - rank order)
  • Numerical (quantitative)
    • Discrete (eg. number of children)
      • Numbers are real and can subtract/divide at will
    • Continuous (eg. cholesterol level)
Choice of statistical method to summarize data depends on type of data

Mean - arithmetic average of the observations (used for numerical data only)
Median - it is the middle observation in a data-set (can be used for rank ordered data, less sensitive to extremes than is the mean)
Mode - the most common value in a data-set (used only when the number of possible responses is small)
Geometric mean - used when data are heavily right skewed

Relationship between mean, median, and mode depends on the shape of the distribution of data.  In general, you should use the Mean for symmetric numerical data, and the Median for skewed numerical or ordinal data

Measures of spread
  • lower (25%) and upper (75%) quartiles
Standard deviation
  • Spread of data around the mean
  • useful for symmetric data
  • how far each observation is from the mean
  • calculate deviation from mean
    • then calculate variance
      • then calculate standard deviation
Standard deviation is important because, when data follows a normal distribution (bell-shaped curve)
  • 68% of data fall between mean -1 SD and mean +1 SD
  • 95% of data fall between mean -2 SD and mean +2 SD
  • 99.7% of data fall between mean -3 SD and mean +3 SD






No comments:

Post a Comment