ExperienceMyServices reported that a typical American spends an average of 144 minutes (2.4 hours) per day accessing the Internet via a mobile device with a standard deviation of 110 minutes.
To test the validity of this statement, you collected 30 samples from friends and family. The results for the time spent per day accessing the Internet via a mobile device (in minutes) are stored in "InternetMobileTime.csv".
Is there enough statistical evidence to conclude that the population mean time spent per day accessing the Internet via mobile device is different from 144 minutes? Use the p-value approach and a level of significance of 0.05.
Note: We can assume that the samples are randomly selected, independent, and come from a normally distributed population.
# Import the important packages
import pandas as pd # Library used for data manipulation and analysis
import numpy as np # Library used for working with arrays
import matplotlib.pyplot as plt # Library for visualization
import seaborn as sns # Library for visualization
%matplotlib inline
import scipy.stats as stats # This library contains a large number of probability distributions as well as a growing library of statistical functions
mydata = pd.read_csv('InternetMobileTime.csv')
mydata.head()
| Minutes | |
|---|---|
| 0 | 72 |
| 1 | 144 |
| 2 | 48 |
| 3 | 72 |
| 4 | 36 |
mydata.shape
(30, 1)
Null hypothesis states that the mean Internet usage time, $\mu$ is equal to 144. Alternative hypothesis states that the mean Internet usage time, $\mu$ is not equal to 144.
Here, we are given that $\alpha$ = 0.05.
print("The sample size for this problem is", len(mydata))
The sample size for this problem is 30
The population is normally distributed and the population standard deviation is known to be equal to 110. So, we can use the Z-test statistic.
sample_mean = mydata["Minutes"].mean()
# Calculating the z-stat
n = 30
mu = 144
sigma = 110
test_stat = (sample_mean - mu) / (sigma / np.sqrt(n))
test_stat
1.8157832663959144
from scipy.stats import norm
# The p-value for one-tailed test
p_value1 = 1 - norm.cdf(test_stat)
# We can find the p_value for the the two-tailed test from the one-tailed test
p_value_ztest = p_value1 * 2
print('The p-value is: {0} '.format(p_value_ztest))
The p-value is: 0.06940362517785204
alpha_value = 0.05 # Level of significance
print('Level of significance: %.2f' %alpha_value)
if p_value_ztest < alpha_value:
print('We have the evidence to reject the null hypothesis as the p-value is less than the level of significance'.format(p_value_ztest))
else:
print('We do not have sufficient evidence to reject the null hypothesis as the p-value is greater than the level of significance'.format(p_value_ztest))
Level of significance: 0.05 We do not have sufficient evidence to reject the null hypothesis as the p-value is greater than the level of significance
We have calculated the z-statistic, which works on the assumption that population standard deviation is known but in real life, this assumption is very unlikely, and to deal with this problem there is another test called t-statistic, which is similar to z-statistic, with the assumption that population standard deviation is not known and sample standard deviation is used to calculate the test statistic.
We will use scipy.stats.ttest_1samp which calculates the t-test for the mean of one sample given the sample observations. This function returns the t statistic and the p-value for a two-tailed t-test.
t_statistic, p_value_ttest = stats.ttest_1samp(mydata, popmean = 144)
print('One sample t-test \nt statistic: {0} p value: {1} '.format(t_statistic, p_value_ttest))
One sample t-test t statistic: [1.41131966] p value: [0.16878961]
alpha_value = 0.05 # Level of significance
print('Level of significance: %.2f' %alpha_value)
if p_value_ttest < alpha_value:
print('We have the evidence to reject the null hypothesis as the p-value is less than the level of significance'.format(p_value_ttest))
else:
print('We do not have sufficient evidence to reject the null hypothesis as the p-value is greater than the level of significance'.format(p_value_ttest))
Level of significance: 0.05 We do not have sufficient evidence to reject the null hypothesis as the p-value is greater than the level of significance
Observation: