Exploring the Essence: Insights from Perfume Data Visualization
Introduction
We’re stepping into the captivating world of fragrances, exploring a #TidTuesday dataset that delves deep into the intricate details of perfumes. The dataset, sourced from Parfumo a vibrant community of perfume enthusiasts, was web-scraped by Olga G.and provides a comprehensive overview of perfumes—from their ratings and olfactory notes to the perfumers behind them and their year of release.
For this project, I focused on analyzing the top notes of perfumes, the first impression fragrances leave. After cleaning and transforming the data in R, I used D3.js to craft a beautiful beeswarm visualization, showcasing the most popular top notes. Let’s dive into the details, including how the data was prepared and visualized.
Understanding the Dataset
This dataset contains detailed information about 59,325 perfumes listed on Parfumo. It includes:
Perfume ratings
Olfactory notes: Top, middle, and base notes
Perfumers and the year of release
Other relevant characteristics
The data was cleaned to focus on top notes and explore their popularity. Below is the step-by-step R code used to clean and prepare the data.
Data Preparation in R
Code
# Import the cleaned Parfumo dataset
parfumo_data_clean <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-12-10/parfumo_data_clean.csv') |>
clean_names() # Clean column names for easier manipulation
Rows: 59325 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Number, Name, Brand, Concentration, Main_Accords, Top_Notes, Middl...
dbl (3): Release_Year, Rating_Value, Rating_Count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
# Analyze brand and perfume counts
parfumo_data_clean |>
count(brand, sort = TRUE)
# A tibble: 1,452 × 2
brand n
<chr> <int>
1 Avon 1000
2 Victoria's Secret 995
3 Zara 995
4 Bath & Body Works 972
5 Guerlain 586
6 Ensar Oud / Oriscent 534
7 Demeter Fragrance Library / The Library Of Fragrance 474
8 Al Haramain / الحرمين 434
9 Oriflame 394
10 Yves Rocher 381
# ℹ 1,442 more rows
1442 brands available in the dataset.
Code
parfumo_data_clean |>
count(name, sort = TRUE)
# A tibble: 55,120 × 2
name n
<chr> <int>
1 Chypre 37
2 Gardenia 36
3 Amber 34
4 Rose 26
5 Jasmin 24
6 Magnolia 20
7 The Fragrance Kitchen 20
8 Eau de 19
9 Black 18
10 Ambre 17
# ℹ 55,110 more rows
55,110 unique perfumes.
Code
# Focus on top notes
# Select the "top_notes" column and separate the notes into individual entries
df <- parfumo_data_clean |>
select(top_notes) |>
mutate(top_notes = str_split(top_notes, ", ")) %>% # Split multiple notes into a list
unnest_wider(top_notes, names_sep = "_") # Expand the list into separate columns
Code
# Transform the data to a long format
df <- df |>
pivot_longer(cols = c(1:25), # Transform all top notes columns into rows
names_to = "name", # New column for original column names
values_to = "note_name") # New column for the actual note names
There are a total of 2,430 unique top notes in the dataset.
Key Takeaways:
The dataset includes information on 1,442 brands and 55,110 perfumes.
A total of 2,430 unique top notes were identified from the dataset.
Visualization with D3.js
Using the cleaned data from R, I transitioned to D3.js to create an interactive beeswarm visualization. The visual captures the most prominent top notes, revealing insights into which fragrances dominate perfume creation. Each bubble represents a top note, with its size corresponding to its frequency.
Key Highlights:
The most popular top notes include Bergamot, Mandarin, Grapefruit, and Lemon.
These citrusy notes are widely used in perfumes for their fresh and vibrant appeal.
The visualization also integrates elegant design elements, such as soft gradients and a perfume bottle illustration, to evoke the essence of luxury and refinement.
Reflections
Exploring the world of perfumes through data has been both fascinating and rewarding. The combination of R for data cleaning and D3.js for visualization allowed me to uncover and present intriguing insights. This project highlights how data visualization can bring abstract concepts, such as fragrances, to life.
Feel free to explore the dataset and experiment with your own analyses. The olfactory journey is just beginning!