Machine Learning

Machine Learning - Mean Median Mode


Mean, Median, and Mode

Mean, median, and mode are measures of central tendency that help describe the characteristics of a data set. They provide useful insights into the data by identifying its average value (mean), middle value (median), and most frequent value (mode).

What can we learn from looking at a group of numbers?

In Machine Learning (and in mathematics) there are often three values that interests us:

  • Mean - The average value
  • Median - The mid point value
  • Mode - The most common value
Machine Learning

Example: We have registered the speed of 13 cars:

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

What is the average, the middle, or the most common speed value?

Mean

Machine Learning

Mean is the sum of all the values in the data set divided by the number of values in the data set. It is also called the Arithmetic Average. The Mean is denoted as x̅.

The mean value is the average value.

To calculate the mean, find the sum of all values, and divide the sum by the number of values:

(99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77

The NumPy module has a method for this. Learn about the NumPy module in our NumPy Tutorial.

Machine Learning

Example: Mean

import numpy

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = numpy.mean(speed)

print(x)

Median

Machine Learning

A median is a middle value for sorted data. The sorting of the data can be done either in ascending order or descending order. A median divides the data into two halves

The median value is the value in the middle, after you have sorted all the values:

77, 78, 85, 86, 86, 86, 87, 87, 88, 94, 99, 103, 111
It is important that the numbers are sorted before you can find the median.
Machine Learning
import numpy

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = numpy.median(speed)

print(x)

If there are two numbers in the middle, divide the sum of those numbers by two.

77, 78, 85, 86, 86, 86, 87, 87, 94, 98, 99, 103
(86 + 87) / 2 = 86.5

Mode

Machine Learning

A mode is the most frequent value or item of the data set. A data set can generally have one or more than one mode values. If the data set has one mode then it is called "Uni-modal". Similarly, if the data set contains 2 modes, then it is called "Bimodal" and if the data set contains 3 modes then it is known as "Trimodal". If the data set consists of more than one mode then it is known as "multi-modal"(can be bimodal or trimodal). There is no mode for a data set if every number appears only once.

The Mode value is the value that appears the most number of times:

99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86 = 86
Machine Learning
from scipy import stats

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = stats.mode(speed)

print(x)

Chapter Summary

The Mean, Median, and Mode are techniques that are often used in Machine Learning, so it is important to understand the concept behind them.