Defining Beauty on the Internet using Natural Language Processing

Goal:

Given areas of interest, can we identify what the internet discusses within those community in regards to Beauty?

Methods and Tools:

Python:

  • BeautifulSoup/Selenium to scrape unstructured data from a list of subreddits, forums, op-eds, and scientific articles.

  • Spacy/NLTK to remove stop words, stem, and lemmatize text.

  • LDA and TF-IDF for clustering and topic model.

PowerBI:

  • Network graph visualization

Product:

*Note: Client logos were scrubbed from dashboard.

Capture.PNG

Relevant Clusters:

  • Asian American and Trans Experience clusters discussing the lack of representation, as well as not feeling like they can be viewed of as attractive (i.e. Asian Americans don’t fit the beauty standards of both Eastern and the Western societies).

  • Drag Culture as a different up-and-coming way to experience and appreciate beauty.

Relevant Clusters:

  • Complaints about the overuse of plastic and to reduce packaging of health and beauty products in ways similar to LUSH.

Interesting Clusters:

  • Second hand economy of thrifting clothes.

  • Groups of people beginning the journey of sustainability through small steps.

Unfortunately nothing within the music space fell in line with desired results, however interesting clusters to have branched from beauty and music involve:

  • Wedding First Dance

  • Nostalgia

  • Video Game Music

Relevant Clusters:

  • Wellness Tracking, Optimism, and Motivation all relate to mental health and how being mentally healthy is just as important as being physically healthy. People are either progressing to or shaping their definitions of beauty within these clusters.

Ivan Shengnlp