Understanding Descriptive Statistics
Try the CalculatorIntroduction to Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
Measures of Central Tendency
These measures describe the center point or typical value of a dataset. The three most common measures are:
- Mean: The arithmetic average, calculated by summing all values and dividing by the number of values.
- Median: The middle value when data is ordered from smallest to largest.
- Mode: The most frequently occurring value in the dataset.
Measures of Dispersion
These measures describe how spread out the data is:
- Range: The difference between the highest and lowest values.
- Variance: The average of the squared differences from the Mean.
- Standard Deviation: The square root of the variance, showing how much variation exists from the average.
- Interquartile Range (IQR): The range of the middle 50% of the data.
Shape of the Distribution
The shape of a distribution is described by:
- Skewness: Measures the asymmetry of the distribution.
- Kurtosis: Measures the "tailedness" of the distribution.
When to Use Descriptive Statistics
Descriptive statistics are essential for:
- Summarizing large datasets with a few meaningful numbers
- Identifying patterns or anomalies in the data
- Providing a basis for further statistical analysis
- Communicating key features of the data to others
Example Calculation
Consider the following dataset: [12, 15, 18, 22, 25, 28, 28, 30]
- Mean: (12+15+18+22+25+28+28+30)/8 = 178/8 = 22.25
- Median: (22+25)/2 = 23.5 (average of the two middle numbers)
- Mode: 28 (appears twice, others appear once)
- Range: 30 - 12 = 18
- Variance: Calculate squared differences from mean, average them
- Standard Deviation: Square root of variance
Limitations
While descriptive statistics are useful, they have limitations:
- They don't allow you to make conclusions beyond the data analyzed
- They don't account for the reliability or validity of your data
- Different distributions can have similar descriptive statistics
- They can be misleading if used without visualization
Related Articles
Key Terms
- Population
- The complete set of items or events of interest
- Sample
- A subset of the population used for analysis
- Parameter
- A characteristic of a population
- Statistic
- A characteristic of a sample
- Outlier
- A data point significantly different from others