```python import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns import scipy from scipy.stats import ttest_ind print("Pandas version: {}".format(pd.__version__)) print("Numpy version: {}".format(np.__version__)) print("Seaborn version: {}".format(sns.__version__)) print("Scipy version: {}".format(scipy.__version__)) print("Matplotlib version: {}".format(matplotlib.__version__)) ``` Pandas version: 2.2.2 Numpy version: 1.26.4 Seaborn version: 0.13.2 Scipy version: 1.13.1 Matplotlib version: 3.9.2 ```python df = sns.load_dataset('tips') df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>total_bill</th> <th>tip</th> <th>sex</th> <th>smoker</th> <th>day</th> <th>time</th> <th>size</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>16.99</td> <td>1.01</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>1</th> <td>10.34</td> <td>1.66</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>2</th> <td>21.01</td> <td>3.50</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>3</th> <td>23.68</td> <td>3.31</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>4</th> <td>24.59</td> <td>3.61</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>4</td> </tr> </tbody> </table> </div> # TTest_ind HO : "il n'y a pas de différence entre le pourboire moyen laissé par les hommes et par les femmes" ```python df.groupby("sex")["tip"].describe() ``` /tmp/ipykernel_17513/199809758.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. df.groupby("sex")["tip"].describe() <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>count</th> <th>mean</th> <th>std</th> <th>min</th> <th>25%</th> <th>50%</th> <th>75%</th> <th>max</th> </tr> <tr> <th>sex</th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Male</th> <td>157.0</td> <td>3.089618</td> <td>1.489102</td> <td>1.0</td> <td>2.0</td> <td>3.00</td> <td>3.76</td> <td>10.0</td> </tr> <tr> <th>Female</th> <td>87.0</td> <td>2.833448</td> <td>1.159495</td> <td>1.0</td> <td>2.0</td> <td>2.75</td> <td>3.50</td> <td>6.5</td> </tr> </tbody> </table> </div> ```python df_male = df.query("`sex` == 'Male'") df_female = df.query("`sex` == 'Female'") ``` ```python ttest_ind(df_male["tip"], df_female["tip"]) ``` TtestResult(statistic=1.387859705421269, pvalue=0.16645623503456755, df=242.0) # En résumé ```python print(f"H0 :\"il n'y a pas de différence entre le pourboire moyen laissé par les hommes et par les femmes\"") print() alpha = 0.02 p_value = ttest_ind(df_male["tip"], df_female["tip"]).pvalue if p_value < alpha: print("Nous avons suffisamment d'éléments pour rejeter H0") else: print("Nous n'avons pas suffisamment d'éléments pour rejeter H0") ``` H0 :"il n'y a pas de différence entre le pourboire moyen laissé par les hommes et par les femmes" Nous n'avons pas suffisamment d'éléments pour rejeter H0