Skip to contents

This function processes a dataframe containing user reviews and removes predefined stopwords. It first searches the package's internal stopwords dataset (stopwords_tr), and if no match is found, it falls back to the broader stopwords_iso list.

Usage

match_stopwords(df)

Arguments

df

Dataframe containing user reviews, with required columns comment (text) and rating (numerical score).

Value

A modified dataframe with an additional cleaned_text column containing stopword-free text.

Details

The function converts text to a standardized format by removing accents and special characters, transforming it into basic Latin characters, and making all letters lowercase. It then tokenizes the text, filters out stopwords, and returns the cleaned version.

Examples

reviews_sample <- tibble::tibble(
  comment = c("Bu ürün xs ancak fiyatı yüksek gibi",
              "Fiyat çok pahalı ama kaliteli iyi"),
  rating = c(4.5, 3.0)
)
match_stopwords(reviews_sample)
#> # A tibble: 2 × 3
#>   comment                             rating cleaned_text         
#>   <chr>                                <dbl> <chr>                
#> 1 Bu ürün xs ancak fiyatı yüksek gibi    4.5 urun fiyati yuksek   
#> 2 Fiyat çok pahalı ama kaliteli iyi      3   fiyat pahali kaliteli