Rising rates of obesity in industrialized societies have been blamed on an increased consumption of foods having high percentages of refined sugar and fat. Minorities and the poor are at a disadvantage when it comes to the adoption of healthier eating habits. Moreover, low-income neighborhoods have been observed to attract more fast-food outlets and convenience stores as opposed to full-service supermarkets and grocery stores. Even though healthier and more nutritious food options may be present as purchasing choices, their cost is often significantly higher and might be prohibitive for working class people. The rapid rise in food prices has hence helped demonstrate that healthier diets are no longer merely a matter of choice.
In this article, we aim at quantifying the healthiness of food items available in France, as well as exploring the links between the healthiness of the available food items and the socio-economic level according to different subdivisions of the French territory.
Food quality
How can we combine nutritional values in order to acquire a metric that indicates the obesogenic level of each food item?
Food grading involves the assessment of various foods regarding quality. Four different nutrition scores for each food item were exploited for our analysis: the nutrition score, the nutrition grade, the calorie density (kcal per serving) and the calorie deviation.
Socio-economic level and geographic characteristics
What criteria should be used to separate the regions with different socio-economic environments ?
In order to measure poverty, several methods are usually applied. Firstly, comparing the median revenue of different subdivisions of a territory allows the identification of poorer areas. This can be done by using the average revenue as well, although this last indicator is prone to misrepresent the economic level of the area since it can be increased by a few wealthy inhabitants.
Additionally, one can use the poverty rate, which describes the percentage of people whose revenue is below 60% of the median revenue. Finally, a third indicator can be based on the inhabitants’ source of income. The sources are namely professional activity, social aid, patrimony and retirement pension.
Spectral clustering
Are the healthier food items associated with higher revenues?
In order to check if there are existing clusters in our dataset, we visualize our data as a graph. The graph’s nodes are considered to be each food item, and the edges are the nutritional similarity between those nodes (nutritional grade, sugar, fat, protein content and so on). The edge similarity is computed only with the nutritional features. The economic marker (i.e. the median revenue) is left as a label. The aim of this analysis is to check if products are clustered by their healthiness. If it is the case, are the healthier clusters associated with higher revenues?
What are the links between healthy food availability and socio-economic environment?
The most widespread method of determining the relationship between two variables is the Pearson correlation test. This test was performed at each territorial level between each nutrition feature and the median revenue.
We plotted the correlation tests between the median revenue (€) and each of the nutritional features with their corresponding p-values. We chose the significance level to be at 95%, meaning that a correlation coefficient is only significant if its corresponding p-value is below 0.05. We obtained only a few correlations that were significant at the 95% level.
Geographic visualization
How are the wealth and healthy products geographically distributed in France ?
Based on our analysis up to now, we have plotted our data on maps. Each French subdivision contains the median value of the evaluated feature. We have plotted every nutritional and economic feature. However, the only pairs for which we found a correlation significant to the 95% level are:
Median revenue and calorie density at the region level
Median revenue and energy at the arrondissement level
The goal of this project was to explore the relation between income and food availability in order to find an intermediate link between obesity and poverty. As we have shown, this problem is not without complexity.
Our Amazing Team
Icíar Lloréns Jover
Lead Thinker
Yassine Zouaghi
Lead Grapher
Guillaume Vizier
Lead Tinkerer
“If you think your data is clean, you haven’t looked at it hard enough.” ― Eben Hewitt
Introduction
Rising rates of obesity in industrialized societies have been blamed on an increased consumption of foods having high percentages of refined sugar and fat. Minorities and the poor are at a disadvantage when it comes to the adoption of healthier eating habits. Moreover, low-income neighborhoods have been observed to attract more fast-food outlets and convenience stores as opposed to full-service supermarkets and grocery stores. Even though healthier and more nutritious food options may be present as purchasing choices, their cost is often significantly higher and might be prohibitive for working class people. The rapid rise in food prices has hence helped demonstrate that healthier diets are no longer merely a matter of choice.
In this article, we aim at quantifying the healthiness of food items available in France, as well as exploring the links between the healthiness of the available food items and the socio-economic level according to different subdivisions of the French territory.
For our analysis, we relied on data from OpenFoodFacts for the available food items, along with their nutritional informations; and data from the National Institute of Statistics and Economic Studies (INSEE or Institut National de la Statistique et des Études Économiques in French) for the repartition of the income throughout France, along with several other economic indicators. All the data used is publicly available.
Food quality
Food grading involves the assessment of various foods regarding quality. Four different nutrition scores for each food item were exploited for our analysis: the nutrition score, the nutrition grade, the calorie density (kcal per serving) and the calorie deviation.
The nutrition score has values between -15 and 40, going from healthy to unhealthy. For each food item, points are added for unhealthy nutritional features such as high energy or high content of fat, sugar and sodium. Points are deducted for healthy nutritional features (high content of fruits, vegetables and nuts, fiber and proteins).
The nutrition grade derives from the nutrition score. It classifies products into five discrete categories, A to E (healthy to unhealthy). For more detailed information about the nutrition scores available on OpenFoodFacts, we encourage you to visit their page on nutrition scores.
Since obesity is often linked with the density in calories of the ingested food items, we decided to introduce two additional nutrition scores: calorie density (the calories per serving) and calorie deviation (the deviation from the expected nutrients distribution in food items for a healthy diet). The calorie density metric describes how largely a product is packed with calories. The calorie deviation metric describes the deviation from a balanced diet induced by the consumption of a product.
Furthermore, nutritional standards state that among the calories that are consumed in a day, 21% should come from proteins, 53% from carbohydrates and 26% from fat. The calorie deviation metric is a deviation from this standard for each product. It aims at observing whether the products follow a healthy calorie distribution.
We plotted the distribution of the nutrition grades for all known products in France. The results are striking: available products are in general unhealthy!
The nutrition grade histogram shows that most of the products are labeled with a 3 or higher, which indicates that healthy products are rare.
The nutrition score histogram shows two peaks, one where the products are around 0 (i.e. nutrition grade of 2) and the other one with products around 15 (nutrition grade of 4). Most products are however well above 0 (i.e. nutrition grade of 2 or higher), hence the lack of healthy products is reinforced.
The calorie density and deviation histograms follow an inverse power law. Most products have low calorie density and deviation.
Socio-economic level and geographic characteristics
In order to measure poverty, several methods are usually applied. Firstly, comparing the median revenue of different subdivisions of a territory allows the identification of poorer areas. This can be done by using the average revenue as well, although this last indicator is prone to misrepresent the economic level of the area since it can be increased by a few wealthy inhabitants.
Additionally, one can use the poverty rate, which describes the percentage of people whose revenue is below 60% of the median revenue. Finally, a third indicator can be based on the inhabitants’ source of income. The sources are namely professional activity, social aid, patrimony and retirement pension.
In order to determine whether these indicators are equivalent, the dependency between the variables is measured using Mutual Information (MI). MI is zero if and only if two random variables are independent, and higher values mean higher dependency.
By comparing all the above listed economic markers, we observe that there is a high dependency between them all. By retaining only one feature, all the other features can be deducted. The Median Revenue per city was hence retained as our socio-economic marker.
Moreover, we need to take into account the territorial level at which our analysis needs to be conducted. Our analysis was focused on Metropolitan France. Indeed, some French regions are located in the Pacific and the Caribbean. Seeing that our analysis examines the eating habits and economical disparities, using a culturally homogeneous territory seems imperative.
There are four types of territorial subdivisions, ranging from largest to smallest: regions, departments, arrondissements and communes. The analysis was conducted at each of the subdivisional levels.
Spectral clustering
In order to check if there are existing clusters in our dataset, we visualize our data as a graph. The graph’s nodes are considered to be each food item, and the edges are the nutritional similarity between those nodes (nutritional grade, sugar, fat, protein content and so on). The edge similarity is computed only with the nutritional features. The economic marker (i.e. the median revenue) is left as a label. The aim of this analysis is to check if products are clustered by their healthiness. If it is the case, are the healthier clusters associated with higher revenues?
Our graph is indeed fully connected, but most points are agglomerated into one tight cluster, with a few food items spiking out.
This is confirmed by the results of the DBSCAN clustering, which has a maximum Silhouette Coefficient when finding only one cluster.
This finding shows us that our food products are hence extremely similar amongst them, meaning that they all have similar nutritional features. This finding will contribute to the interpretation of latter results. Indeed, if all of our food products have close nutritional characteristics, there is little chance that we will be able to differentiate them according to economic markers.
Correlations
The most widespread method of determining the relationship between two variables is the Pearson correlation test. This test was performed at each territorial level between each nutrition feature and the median revenue.
We plotted the correlation tests between the median revenue (€) and each of the nutritional features with their corresponding p-values. We chose the significance level to be at 95%, meaning that a correlation coefficient is only significant if its corresponding p-value is below 0.05. We obtained only a few correlations that were significant at the 95% level.
Correlation analysis with arrondissement
At the arrondissement level, we can see that the only relation that is significant is between energy (in kcal per 100 g) and median revenue (in €). It is a strong negative correlation, which indicates that poorer arrondissements have more unhealthy available food products than their wealthy counterparts.
At the department level (second largest subdivision of the French territory) we find no significant relation between the median revenue and any of the nutritional features.
Correlation analysis with department
Correlation analysis with region
At the region level, the median revenue has a strong negative correlation with calorie density and a strong positive correlation with nutrition grade. The first one could indicate that poor regions have more available food products that are densely packed with calories. The second one suggests that poor regions also have products that have a healthier grade. However, we have found that all regions have a median nutrition grade of 3 except for one whose median nutrition grade is 2. Knowing this, we decided that this particular correlation was not informative.
Geographic visualization
Based on our analysis up to now, we have plotted our data on maps. Each French subdivision contains the median value of the evaluated feature. We have plotted every nutritional and economic feature. However, the only pairs for which we found a correlation significant to the 95% level are:
Median revenue and calorie density at the region level
Median revenue and energy at the arrondissement level