Title: | Tools at the Intersection of 'purrr' and 'dplyr' |
---|---|
Description: | Some functions at the intersection of 'dplyr' and 'purrr' that formerly lived in 'purrr'. |
Authors: | Lionel Henry [aut, cre], Hadley Wickham [ctb], RStudio [cph] |
Maintainer: | Lionel Henry <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.0.8.9000 |
Built: | 2024-11-19 05:56:18 UTC |
Source: | https://github.com/hadley/purrrlyr |
by_row()
and invoke_rows()
apply ..f
to each row
of .d
. If ..f
's output is not a data frame nor an
atomic vector, a list-column is created. In all cases,
by_row()
and invoke_rows()
create a data frame in tidy
format.
by_row( .d, ..f, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE ) invoke_rows( .f, .d, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE )
by_row( .d, ..f, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE ) invoke_rows( .f, .d, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE )
.d |
A data frame. |
... |
Further arguments passed to |
.collate |
If "list", the results are returned as a list- column. Alternatively, if the results are data frames or atomic vectors, you can collate on "cols" or on "rows". Column collation require vector of equal length or data frames with same number of rows. |
.to |
Name of output column. |
.labels |
If |
.f , ..f
|
A function to apply to each row. If |
By default, the whole row is appended to the result to serve as
identifier (set .labels
to FALSE
to prevent this). In
addition, if ..f
returns a multi-rows data frame or a
non-scalar atomic vector, a .row
column is appended to
identify the row number in the original data frame.
invoke_rows()
is intended to provide a version of
pmap()
for data frames. Its default collation method is
"cols"
, which makes it equivalent to
mdply()
from the plyr package. Note that
invoke_rows()
follows the signature pattern of the
invoke
family of functions and takes .f
as its first
argument.
The distinction between by_row()
and invoke_rows()
is
that the former passes a data frame to ..f
while the latter
maps the columns to its function call. This is essentially like
using invoke()
with each row. Another way to view
this is that invoke_rows()
is equivalent to using
by_row()
with a function lifted to accept dots (see
lift()
).
A data frame.
# ..f should be able to work with a list or a data frame. As it # happens, sum() handles data frame so the following works: mtcars %>% by_row(sum) # Other functions such as mean() may need to be adjusted with one # of the lift_xy() helpers: mtcars %>% by_row(purrr::lift_vl(mean)) # To run a function with invoke_rows(), make sure it is variadic (that # it accepts dots) or that .f's signature is compatible with the # column names mtcars %>% invoke_rows(.f = sum) mtcars %>% invoke_rows(.f = purrr::lift_vd(mean)) # invoke_rows() with cols collation is equivalent to plyr::mdply() p <- expand.grid(mean = 1:5, sd = seq(0, 1, length = 10)) p %>% invoke_rows(.f = rnorm, n = 5, .collate = "cols") ## Not run: p %>% plyr::mdply(rnorm, n = 5) %>% dplyr::tbl_df() ## End(Not run) # To integrate the result as part of the data frame, use rows or # cols collation: mtcars[1:2] %>% by_row(function(x) 1:5) mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "rows") mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "cols")
# ..f should be able to work with a list or a data frame. As it # happens, sum() handles data frame so the following works: mtcars %>% by_row(sum) # Other functions such as mean() may need to be adjusted with one # of the lift_xy() helpers: mtcars %>% by_row(purrr::lift_vl(mean)) # To run a function with invoke_rows(), make sure it is variadic (that # it accepts dots) or that .f's signature is compatible with the # column names mtcars %>% invoke_rows(.f = sum) mtcars %>% invoke_rows(.f = purrr::lift_vd(mean)) # invoke_rows() with cols collation is equivalent to plyr::mdply() p <- expand.grid(mean = 1:5, sd = seq(0, 1, length = 10)) p %>% invoke_rows(.f = rnorm, n = 5, .collate = "cols") ## Not run: p %>% plyr::mdply(rnorm, n = 5) %>% dplyr::tbl_df() ## End(Not run) # To integrate the result as part of the data frame, use rows or # cols collation: mtcars[1:2] %>% by_row(function(x) 1:5) mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "rows") mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "cols")
by_slice()
applies ..f
on each group of a data
frame. Groups should be set with slice_rows()
or
dplyr::group_by()
.
by_slice( .d, ..f, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE )
by_slice( .d, ..f, ..., .collate = c("list", "rows", "cols"), .to = ".out", .labels = TRUE )
.d |
A sliced data frame. |
..f |
A function to apply to each slice. If |
... |
Further arguments passed to |
.collate |
If "list", the results are returned as a list- column. Alternatively, if the results are data frames or atomic vectors, you can collate on "cols" or on "rows". Column collation require vector of equal length or data frames with same number of rows. |
.to |
Name of output column. |
.labels |
If |
by_slice()
provides equivalent functionality to dplyr's
dplyr::do()
function. In combination with
map()
, by_slice()
is equivalent to
dplyr::summarise_each()
and
dplyr::mutate_each()
. The distinction between
mutating and summarising operations is not as important as in dplyr
because we do not act on the columns separately. The only
constraint is that the mapped function must return the same number
of rows for each variable mapped on.
A data frame.
by_row()
, slice_rows()
,
dmap()
# Here we fit a regression model inside each slice defined by the # unique values of the column "cyl". The fitted models are returned # in a list-column. mtcars %>% slice_rows("cyl") %>% by_slice(purrr::partial(lm, mpg ~ disp)) # by_slice() is especially useful in combination with map(). # To modify the contents of a data frame, use rows collation. Note # that unlike dplyr, Mutating and summarising operations can be # used indistinctly. # Mutating operation: df <- mtcars %>% slice_rows(c("cyl", "am")) df %>% by_slice(dmap, ~ .x / sum(.x), .collate = "rows") # Summarising operation: df %>% by_slice(dmap, mean, .collate = "rows") # Note that mapping columns within slices is best handled by dmap(): df %>% dmap(~ .x / sum(.x)) df %>% dmap(mean) # If you don't need the slicing variables as identifiers, switch # .labels to FALSE: mtcars %>% slice_rows("cyl") %>% by_slice(purrr::partial(lm, mpg ~ disp), .labels = FALSE) %>% purrr::flatten() %>% purrr::map(coef)
# Here we fit a regression model inside each slice defined by the # unique values of the column "cyl". The fitted models are returned # in a list-column. mtcars %>% slice_rows("cyl") %>% by_slice(purrr::partial(lm, mpg ~ disp)) # by_slice() is especially useful in combination with map(). # To modify the contents of a data frame, use rows collation. Note # that unlike dplyr, Mutating and summarising operations can be # used indistinctly. # Mutating operation: df <- mtcars %>% slice_rows(c("cyl", "am")) df %>% by_slice(dmap, ~ .x / sum(.x), .collate = "rows") # Summarising operation: df %>% by_slice(dmap, mean, .collate = "rows") # Note that mapping columns within slices is best handled by dmap(): df %>% dmap(~ .x / sum(.x)) df %>% dmap(mean) # If you don't need the slicing variables as identifiers, switch # .labels to FALSE: mtcars %>% slice_rows("cyl") %>% by_slice(purrr::partial(lm, mpg ~ disp), .labels = FALSE) %>% purrr::flatten() %>% purrr::map(coef)
dmap()
is just like purrr::map()
but always returns a
data frame. In addition, it handles grouped or sliced data frames.
dmap(.d, .f, ...) dmap_at(.d, .at, .f, ...) dmap_if(.d, .p, .f, ...)
dmap(.d, .f, ...) dmap_at(.d, .at, .f, ...) dmap_if(.d, .p, .f, ...)
.d |
A data frame. |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.at |
A character vector of names, positive numeric vector of
positions to include, or a negative numeric vector of positions to
exlude. Only those elements corresponding to |
.p |
A single predicate function, a formula describing such a
predicate function, or a logical vector of the same length as |
dmap_at()
and dmap_if()
recycle length 1 vectors to
the group sizes.
# dmap() always returns a data frame: dmap(mtcars, summary) # dmap() also supports sliced data frames: sliced_df <- mtcars[1:5] %>% slice_rows("cyl") sliced_df %>% dmap(mean) sliced_df %>% dmap(~ .x / max(.x)) # This is equivalent to the combination of by_slice() and dmap() # with 'rows' collation of results: sliced_df %>% by_slice(dmap, mean, .collate = "rows")
# dmap() always returns a data frame: dmap(mtcars, summary) # dmap() also supports sliced data frames: sliced_df <- mtcars[1:5] %>% slice_rows("cyl") sliced_df %>% dmap(mean) sliced_df %>% dmap(~ .x / max(.x)) # This is equivalent to the combination of by_slice() and dmap() # with 'rows' collation of results: sliced_df %>% by_slice(dmap, mean, .collate = "rows")
slice_rows()
is equivalent to dplyr's
dplyr::group_by()
command but it takes a vector of
column names or positions instead of capturing column names with
special evaluation. unslice()
removes the slicing
attributes.
slice_rows(.d, .cols = NULL) unslice(.d)
slice_rows(.d, .cols = NULL) unslice(.d)
.d |
A data frame to slice or unslice. |
.cols |
A character vector of column names or a numeric vector
of column positions. If |
A sliced or unsliced data frame.
by_slice()
and dplyr::group_by()