ANOVA

```python import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns import scipy from scipy.stats import f_oneway print("Pandas version: {}".format(pd.__version__)) print("Numpy version: {}".format(np.__version__)) print("Seaborn version: {}".format(sns.__version__)) print("Scipy version: {}".format(scipy.__version__)) print("Matplotlib version: {}".format(matplotlib.__version__)) ``` Pandas version: 2.2.2 Numpy version: 1.26.4 Seaborn version: 0.13.2 Scipy version: 1.13.1 Matplotlib version: 3.9.2 ```python df = sns.load_dataset('tips') df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>total_bill</th> <th>tip</th> <th>sex</th> <th>smoker</th> <th>day</th> <th>time</th> <th>size</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>16.99</td> <td>1.01</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>1</th> <td>10.34</td> <td>1.66</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>2</th> <td>21.01</td> <td>3.50</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>3</th> <td>23.68</td> <td>3.31</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>4</th> <td>24.59</td> <td>3.61</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>4</td> </tr> </tbody> </table> </div> # ANOVA HO : "le pourboire est le même en moyenne tous les jours" ```python df.groupby("day", observed=False)["tip"].mean() ``` day Thur 2.771452 Fri 2.734737 Sat 2.993103 Sun 3.255132 Name: tip, dtype: float64 on observe quelques différences entre les moyennes mais on doit effectuer un test d'hypothèse avant de tirer des conclusions ```python df.groupby("day", observed=False)["tip"].apply(list) ``` day Thur [4.0, 3.0, 2.71, 3.0, 3.4, 1.83, 5.0, 2.03, 5.... Fri [3.0, 3.5, 1.0, 4.3, 3.25, 4.73, 4.0, 1.5, 3.0... Sat [3.35, 4.08, 2.75, 2.23, 7.58, 3.18, 2.34, 2.0... Sun [1.01, 1.66, 3.5, 3.31, 3.61, 4.71, 2.0, 3.12,... Name: tip, dtype: object ```python #on "unpack" la fonction dans le test d'ANOVA f_oneway(*df.groupby("day", observed=False)["tip"].apply(list)) ``` F_onewayResult(statistic=1.6723551980998699, pvalue=0.1735885553040592) # En résumé ```python print(f"H0 :\"le pourboire est le même en moyenne tous les jours\"") print() alpha = 0.02 p_value = f_oneway(*df.groupby("day", observed=False)["tip"].apply(list)).pvalue if p_value < alpha: print("Nous avons suffisamment d'éléments pour rejeter H0") else: print("Nous n'avons pas suffisamment d'éléments pour rejeter H0") ``` H0 :"le pourboire est le même en moyenne tous les jours" Nous n'avons pas suffisamment d'éléments pour rejeter H0