NumPy is a foundational library for data science in Python. It excels at handling multidimensional arrays and matrices, along with offering a rich collection of mathematical functions to operate on this data. This makes NumPy a powerful tool for data manipulation, calculations, and extracting insights from raw data efficiently. Because of its capabilities, NumPy even serves as the underlying foundation for other popular data science libraries like Pandas. In order to brightest the insights that comes from the insights of NumPy, in this post I’ll combine it with matplotlib - the graphical library that adding the ability to create a useful plots, histograms and another graphical schemes.
To truly unlock the insights revealed by NumPy’s data analysis, we can leverage the power of Matplotlib. This visualization library allows us to create informative plots, histograms, and other graphical representations of the data. By combining NumPy’s calculations with Matplotlib’s visualizations, we can gain a deeper understanding of the patterns and trends hidden within our data.
Background Story
Let’s dive into the data and see if our assumption holds true! In this tour, we’ll explore the correlation between weight in adult males and females. While it’s commonly believed that men tend to weigh more, we’ll use data analysis to confirm this.
Alright, first things first! Let’s collect some real data. The realer, the better! For the experimental I asked my co-scientist, Gemini, to recruit some volunteers with informed consent, of course, and measure their weight and height.
men’s list 👨
Rank | Name | Height (cm) | Weight (kg) |
---|---|---|---|
1 | John Smith | 175 | 81.5 |
2 | David Lee | 180 | 95.4 |
3 | Michael Kim | 172 | 77.1 |
4 | Ryan Jones | 178 | 75.22 |
5 | Charles Lee | 176 | 99.25 |
6 | William Chen | 182 | 80.15 |
7 | Andrew Brown | 170 | 78 |
8 | Daniel Miller | 179 | 89.1 |
9 | Kevin Garcia | 174 | 83.8 |
10 | Thomas Hall | 181 | 90.3 |
11 | Lev Lewinski | 173 | 69.3 |
women’s list 👩
Rank | Name | Height (cm) | Weight (kg) |
---|---|---|---|
1 | Alice Brown | 170 | 69.4 |
2 | Beatrice Lee | 162 | 70.88 |
3 | Chloe Garcia | 183 | 75.6 |
4 | Diana Miller | 178 | 68.55 |
5 | Emily Chen | 165 | 58.9 |
6 | Fiona Jones | 188 | 100.5 |
7 | Gloria Kim | 175 | 60.43 |
8 | Hannah Smith | 168 | 52.3 |
9 | Isabella Lee | 180 | 72.52 |
10 | Olivia Hall | 172 | 65.23 |
11 | Sarit Haddad | 168 | 79.87 |
Technical Analysis
We’ll start by bringing in NumPy, referred to as np
for convenience. Then, we can create three separate arrays to store data for each sex category.
Our goal is to analyze the characteristics of BMI, which is:
A calculation tool invented by a statistician in the 19th century that measures the ratio of height to weight.
After creating a BMI table for each sex, we will calculate the mean, median, and then compare the groups.
|
|
Alright, with our data wrangled into arrays, let’s get down to calculating BMI! Just to satisfy my curiosity, I’ll print out the calculation results in a neat and organized way for each group.
|
|
Output:
|
|
If you prefer to work with sorted data, the following code snippet can help you achieve that:
|
|
Now when the entire data is sorted by the values of the BMI, let’s place a histogram to visualize our data:
|
|
Resulted histogram:
Based on our initial analysis, it appears that women tend to have lower weights compared to men. Now I’d like to analyze two more crucial details: median and average
|
|
output:
|
|
How about using bar charts to take a look at this from another angle?
|
|
Conclusions
This process has been a valuable learning experience for me.
It’s important to note that all the data used in this post is fictional. The names were generated with the assistance of Gemini, but the data itself was created specifically for educational purposes.
I hope you learned something too,
see you in the next post! 👋
Photo by Campaign Creators on Unsplash