New {explore} release!

{explore} 1.3.3 introduces use_data_wordle() and yyyymm_calc()

{explore} is an R package that simplifies exploratory data analysis. You can do interactive data exploration, generate an automated report or use AI to reveal hidden patterns in your data.

use_data_wordle()

With this release of {explore}
you can put to the test
the power of small data
of my daily WORDLE quest

WORDLE is a popular word-puzzle game. A friend and I started playing it daily and comparing our results. I collected the data and made it available in {explore}

library(explore)
wordle <- use_data_wordle()
wordle |> describe()
# A tibble: 10 x 8
   variable type     na na_pct unique   min  mean   max
   <chr>    <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
 1 round    int       0      0     60     1 30.5     60
 2 word     chr       0      0     60    NA NA       NA
 3 language chr       0      0      1    NA NA       NA
 4 noun     int       0      0      2     0  0.9      1
 5 player   chr       0      0      2    NA NA       NA
 6 try      int       0      0      6     2  3.96     7
 7 aeiou    int       0      0      3     1  2.15     3
 8 unique   int       0      0      3     3  4.55     5
 9 common   int       0      0      4     2  3.85     5
10 rare     int       0      0      3     0  0.58     2

Explore it by yourself or check my blog-post about my WORDLE journey https://rolkra.github.io/wordle/

yyyymm_calc()

{explore} has a function new
yyyymm_calc(), a vision true?
Year and month, a swift command,
the right answer, close at hand.

In some datasets dates are stored in the format yyyymm(e.g. 202411 for November 2024). Adding 6 month to it causes a lot of work - converting it into a real date, adding months, reconvert back to yyyymm. Now there is an easy way to that:

Let’s say we have a database with a collection of Austrian beers. The year and month of production is stored.

library(tidyverse)
library(explore)

# use beer data with random year/month produced 
beer <- use_data_beer() |>
  add_var_random_cat(
    name = "produced",
    cat = c(202210, 202211, 202212, 202301))

beer |> select(name, produced) |> head(12)
# A tibble: 12 x 2
   name                  produced
   <chr>                    <dbl>
 1 Puntigamer Maerzen      202212
 2 Puntigamer PR0,0ST      202210
 3 Puntigamer Panther      202211
 4 Puntigamer Winterbier   202301
 5 Puntigamer Zwickl       202212
 6 Goesser Maerzen         202212
 7 Goesser Helles          202301
 8 Goesser Naturgold       202210
 9 Goesser Spezial         202211
10 Goesser Gold            202212
11 Goesser Bock            202210
12 Goesser Stiftsbraeu     202211

Now we can add 6 month and 1 year easily:

# calculate produced + 6 month & produced + 1 year
beer |> 
  select(name, produced) |> 
  mutate(
    best = yyyymm_calc(produced, add_month = 6),
    check = yyyymm_calc(produced, add_year = 1)
  ) 
# A tibble: 12 x 4
   name                  produced   best  check
   <chr>                    <dbl>  <dbl>  <dbl>
 1 Puntigamer Maerzen      202212 202306 202312
 2 Puntigamer PR0,0ST      202210 202304 202310
 3 Puntigamer Panther      202211 202305 202311
 4 Puntigamer Winterbier   202301 202307 202401
 5 Puntigamer Zwickl       202212 202306 202312
 6 Goesser Maerzen         202212 202306 202312
 7 Goesser Helles          202301 202307 202401
 8 Goesser Naturgold       202210 202304 202310
 9 Goesser Spezial         202211 202305 202311
10 Goesser Gold            202212 202306 202312
11 Goesser Bock            202210 202304 202310
12 Goesser Stiftsbraeu     202211 202305 202311

Cheers!

Written on November 14, 2024