Converse Products Webpage Scrapping Using R.

Ahsan Amri Rohman
2 min readAug 14, 2019

Hello Everybody. Before we start this action. Let’s get to know some of the libraries that will be used. The first is rvest library, rvest helps you scrape information from web pages. It is designed to work with magtittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup. The next library is ggplot2, ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

For the first, we should activate rvest and dplyr library. Also copy converse Indonesia webpage.

library(rvest)
library(dplyr)
url <- 'https://www.converse.id/products'
webpage <- read_html(url)

Using inspect element feature from browser to looking CSS class and HTML code. Highlight the product name and product price section. For the price section, eliminate “IDR” characters and punctuation (dot) so that only numbers are left.

#mengambil nama item
items_html <- html_nodes(webpage,’.product-item-inline-data p’)
items <- html_text(items_html)
items <- as.character(items)
#mengambil daftar harga
prices_html <- html_nodes(webpage,'.price p')
prices <- html_text(prices_html)
prices <- gsub("IDR", "", prices)
prices <- gsub("\\.", "", prices)
prices <- as.numeric(prices)

Make the items become uppercase.

items <- toupper(items)

Combine the items and prices become dataset.

converse_pricelist <- data.frame(items, prices)

Because in the dataset there are duplicate items, then we must eliminate the duplications.

converse_pricelist1 <- unique(converse_pricelist)
head(converse_pricelist1, n=10)
items prices
1 INTERGALACTIC TEE 199000
3 CONVERSE REPEATED STAR CHEVRON TEE 199000
5 CONVERSE EMBROIDERED WORDMARK TEE 199000
7 CONVERSE LEFT CHEST STAR CHEVRON TEE 199000
9 ASYMMETRICAL SHINE SS TEE 299000
10 CONVERSE WORDMARK LS TEE 259000
13 LS CREW TEE NOVA 259000
15 LEFT CHEST LOGO TEE 199000
16 CENTER FRONT LOGO TEE 199000
19 CONVERSE ERX SHORT SLEEVE TEE 259000

Thank You.

Amri Rohman.
Surabaya, East Java, ID.

--

--

Ahsan Amri Rohman

Love sport science and bussines statistics, Indonesian.