New {explore} release!

{explore} 1.3.3 introduces use_data_wordle() and yyyymm_calc()

{explore} is an R package that simplifies exploratory data analysis. You can do interactive data exploration, generate an automated report or use AI to reveal hidden patterns in your data.

use_data_wordle()

With this release of {explore}
you can put to the test
the power of small data
of my daily WORDLE quest

WORDLE is a popular word-puzzle game. A friend and I started playing it daily and comparing our results. I collected the data and made it available in {explore}

library(explore)
wordle <- use_data_wordle()
wordle |> describe()

# A tibble: 10 x 8
   variable type     na na_pct unique   min  mean   max
   <chr>    <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
round    int       0      0     60     1 30.5     60
word     chr       0      0     60    NA NA       NA
language chr       0      0      1    NA NA       NA
noun     int       0      0      2     0  0.9      1
player   chr       0      0      2    NA NA       NA
try      int       0      0      6     2  3.96     7
aeiou    int       0      0      3     1  2.15     3
unique   int       0      0      3     3  4.55     5
common   int       0      0      4     2  3.85     5
rare     int       0      0      3     0  0.58     2

Explore it by yourself or check my blog-post about my WORDLE journey https://rolkra.github.io/wordle/

yyyymm_calc()

{explore} has a function new
yyyymm_calc(), a vision true?
Year and month, a swift command,
the right answer, close at hand.

In some datasets dates are stored in the format yyyymm(e.g. 202411 for November 2024). Adding 6 month to it causes a lot of work - converting it into a real date, adding months, reconvert back to yyyymm. Now there is an easy way to that:

Let’s say we have a database with a collection of Austrian beers. The year and month of production is stored.

library(tidyverse)
library(explore)

# use beer data with random year/month produced 
beer <- use_data_beer() |>
  add_var_random_cat(
    name = "produced",
    cat = c(202210, 202211, 202212, 202301))

beer |> select(name, produced) |> head(12)

# A tibble: 12 x 2
   name                  produced
   <chr>                    <dbl>
Puntigamer Maerzen      202212
Puntigamer PR0,0ST      202210
Puntigamer Panther      202211
Puntigamer Winterbier   202301
Puntigamer Zwickl       202212
Goesser Maerzen         202212
Goesser Helles          202301
Goesser Naturgold       202210
Goesser Spezial         202211
Goesser Gold            202212
Goesser Bock            202210
Goesser Stiftsbraeu     202211

Now we can add 6 month and 1 year easily:

# calculate produced + 6 month & produced + 1 year
beer |> 
  select(name, produced) |> 
  mutate(
    best = yyyymm_calc(produced, add_month = 6),
    check = yyyymm_calc(produced, add_year = 1)
  ) 

# A tibble: 12 x 4
   name                  produced   best  check
   <chr>                    <dbl>  <dbl>  <dbl>
Puntigamer Maerzen      202212 202306 202312
Puntigamer PR0,0ST      202210 202304 202310
Puntigamer Panther      202211 202305 202311
Puntigamer Winterbier   202301 202307 202401
Puntigamer Zwickl       202212 202306 202312
Goesser Maerzen         202212 202306 202312
Goesser Helles          202301 202307 202401
Goesser Naturgold       202210 202304 202310
Goesser Spezial         202211 202305 202311
Goesser Gold            202212 202306 202312
Goesser Bock            202210 202304 202310
Goesser Stiftsbraeu     202211 202305 202311

Cheers!

Written on November 14, 2024