# Librairies ```python import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns ``` # Data ```python df = sns.load_dataset("tips") df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>total_bill</th> <th>tip</th> <th>sex</th> <th>smoker</th> <th>day</th> <th>time</th> <th>size</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>16.99</td> <td>1.01</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>1</th> <td>10.34</td> <td>1.66</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>2</th> <td>21.01</td> <td>3.50</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>3</td> </tr> <tr> <th>3</th> <td>23.68</td> <td>3.31</td> <td>Male</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>2</td> </tr> <tr> <th>4</th> <td>24.59</td> <td>3.61</td> <td>Female</td> <td>No</td> <td>Sun</td> <td>Dinner</td> <td>4</td> </tr> </tbody> </table> </div> # Analyse multivariée ## Discret / Discret ```python pd.crosstab(df["sex"], df["smoker"], normalize=False) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th>smoker</th> <th>Yes</th> <th>No</th> </tr> <tr> <th>sex</th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Male</th> <td>60</td> <td>97</td> </tr> <tr> <th>Female</th> <td>33</td> <td>54</td> </tr> </tbody> </table> </div> Pour les pourcentages --> normalize = True Quand les données à analysées sont nombreuses, on peut utiliser un heatmap: ```python sns.heatmap(pd.crosstab(df["size"], df["day"]), annot=True) ``` <Axes: xlabel='day', ylabel='size'> ![png](analyse%20multivariée%20-%20MeP_9_1.png) ## Discret / Continue ```python df.groupby("sex")["tip"].describe() ``` C:\Users\steph\AppData\Local\Temp\ipykernel_17268\199809758.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. df.groupby("sex")["tip"].describe() <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>count</th> <th>mean</th> <th>std</th> <th>min</th> <th>25%</th> <th>50%</th> <th>75%</th> <th>max</th> </tr> <tr> <th>sex</th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Male</th> <td>157.0</td> <td>3.089618</td> <td>1.489102</td> <td>1.0</td> <td>2.0</td> <td>3.00</td> <td>3.76</td> <td>10.0</td> </tr> <tr> <th>Female</th> <td>87.0</td> <td>2.833448</td> <td>1.159495</td> <td>1.0</td> <td>2.0</td> <td>2.75</td> <td>3.50</td> <td>6.5</td> </tr> </tbody> </table> </div> ```python # Graphique des moyennes avec Pandas df.groupby("sex")["tip"].mean().plot(kind="bar") plt.show() ``` C:\Users\steph\AppData\Local\Temp\ipykernel_17268\1934845824.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. df.groupby("sex")["tip"].mean().plot(kind="bar") ![png](analyse%20multivariée%20-%20MeP_12_1.png) ```python # Graphique des moyennes avec Seaborn sns.catplot(data=df, x="sex", y="tip", kind="bar", hue="sex") ``` <seaborn.axisgrid.FacetGrid at 0x20e8c9f5910> ![png](analyse%20multivariée%20-%20MeP_13_1.png) ```python # Histogrammes superposés (uniquement Seaborn) sns.displot(data=df, x="tip", hue="sex") ``` <seaborn.axisgrid.FacetGrid at 0x20e8ccfb1d0> ![png](analyse%20multivariée%20-%20MeP_14_1.png) ```python # Boîtes à moustache (avec Seaborn) sns.catplot(data=df, x="tip", y="sex", kind="box", hue="sex") ``` <seaborn.axisgrid.FacetGrid at 0x20e8db379b0> ![png](analyse%20multivariée%20-%20MeP_15_1.png) ## Continue / Continue ```python # La seule chose à faire est un scatter plot # Avec Matplotlib plt.scatter(df["tip"], df["total_bill"]) plt.show() ``` ![png](analyse%20multivariée%20-%20MeP_17_0.png) ```python # Avec Seaborn sns.scatterplot(data=df, x="tip", y="total_bill") plt.show() ``` ![png](analyse%20multivariée%20-%20MeP_18_0.png)