Title: | Letter Value 'Boxplots' |
---|---|
Description: | Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data points by dynamically selecting the appropriate number of letter values to display. |
Authors: | Hadley Wickham [aut, cre], Heike Hofmann [aut] |
Maintainer: | Hadley Wickham <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.1.9000 |
Built: | 2024-12-04 03:10:17 UTC |
Source: | https://github.com/hadley/lvplot |
County level statistics based on the 1980 US Census.
census
census
A data frame with 10 variables
County name
FIPS county code
Geographic location of county centers
(normalized) Temperatures in January & July
(normalized) Sunshine measurment in January & July
Elevation above sea level
Population
Determine depth of letter values needed for n observations.
determineDepth(n, k = NULL, alpha = NULL, perc = NULL)
determineDepth(n, k = NULL, alpha = NULL, perc = NULL)
n |
number of observation to be shown in the LV boxplot |
k |
number of letter value statistics used |
alpha |
if supplied, depth k is calculated such that (1- |
perc |
if supplied, depth k is adjusted such that |
Supply one of k
, alpha
or perc
.
An extension of standard boxplots which draws k letter statistics.
Conventional boxplots (Tukey 1977) are useful displays for conveying rough
information about the central 50% of the data and the extent of the data.
For moderate-sized data sets (), detailed estimates of tail
behavior beyond the quartiles may not be trustworthy, so the information
provided by boxplots is appropriately somewhat vague beyond the quartiles,
and the expected number of “outliers” and “far-out” values for a
Gaussian sample of size
is often less than 10 (Hoaglin, Iglewicz,
and Tukey 1986). Large data sets (
) afford
more precise estimates of quantiles in the tails beyond the quartiles and
also can be expected to present a large number of “outliers” (about
).
The letter-value box plot addresses both these shortcomings: it conveys
more detailed information in the tails using letter values, only out to the
depths where the letter values are reliable estimates of their
corresponding quantiles (corresponding to tail areas of roughly
); “outliers” are defined as a function of the most extreme
letter value shown. All aspects shown on the letter-value boxplot are
actual observations, thus remaining faithful to the principles that
governed Tukey's original boxplot.
geom_lv( mapping = NULL, data = NULL, stat = "lv", position = "dodge", outlier.colour = "black", outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, na.rm = TRUE, varwidth = FALSE, width.method = "linear", show.legend = NA, inherit.aes = TRUE, ... ) GeomLv scale_fill_lv(...) stat_lv( mapping = NULL, data = NULL, geom = "lv", position = "dodge", na.rm = TRUE, conf = 0.95, percent = NULL, k = NULL, show.legend = NA, inherit.aes = TRUE, ... ) StatLv
geom_lv( mapping = NULL, data = NULL, stat = "lv", position = "dodge", outlier.colour = "black", outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, na.rm = TRUE, varwidth = FALSE, width.method = "linear", show.legend = NA, inherit.aes = TRUE, ... ) GeomLv scale_fill_lv(...) stat_lv( mapping = NULL, data = NULL, geom = "lv", position = "dodge", na.rm = TRUE, conf = 0.95, percent = NULL, k = NULL, show.legend = NA, inherit.aes = TRUE, ... ) StatLv
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
position |
Position adjustment, either as a string naming the adjustment
(e.g. |
outlier.colour |
Override aesthetics used for the outliers. Defaults
come from |
outlier.shape |
Override aesthetics used for the outliers. Defaults
come from |
outlier.size |
Override aesthetics used for the outliers. Defaults
come from |
outlier.stroke |
Override aesthetics used for the outliers. Defaults
come from |
na.rm |
If |
varwidth |
if |
width.method |
character, one of 'linear' (default), 'area', or 'height'. This parameter determines whether the width of the box for letter value LV(i) should be proportional to i (linear), proportional to $2^-i$ (height), or whether the area of the box should be proportional to $2^-i$ (area). |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to |
geom , stat
|
Use to override the default connection between
|
conf |
confidence level |
percent |
numeric value: percent of data in outliers |
k |
number of letter values shown |
An object of class GeomLv
(inherits from Geom
, ggproto
, gg
) of length 6.
An object of class StatLv
(inherits from Stat
, ggproto
, gg
) of length 5.
Number of Letter Values used for the display
Name of the Letter Value
width of the interquartile box
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statistician 32, 12-16.
stat_quantile
to view quantiles conditioned on a
continuous variable.
library(ggplot2) p <- ggplot(mpg, aes(class, hwy)) p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_brewer() p + geom_lv() + geom_jitter(width = 0.2) p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() # Outliers p + geom_lv(varwidth = TRUE, aes(fill = after_stat(LV))) + scale_fill_lv() p + geom_lv(fill = "grey80", colour = "black") p + geom_lv(outlier.colour = "red", outlier.shape = 1) # Plots are automatically dodged when any aesthetic is a factor p + geom_lv(aes(fill = drv)) # varwidth adjusts the width of the boxes according to the number of observations ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV)), varwidth=TRUE) + scale_fill_lv() + scale_y_sqrt() + theme_bw() ontime$DayOfWeek <- as.POSIXlt(ontime$FlightDate)$wday ggplot(ontime, aes(factor(DayOfWeek), TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() + scale_y_sqrt() + theme_bw()
library(ggplot2) p <- ggplot(mpg, aes(class, hwy)) p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_brewer() p + geom_lv() + geom_jitter(width = 0.2) p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() # Outliers p + geom_lv(varwidth = TRUE, aes(fill = after_stat(LV))) + scale_fill_lv() p + geom_lv(fill = "grey80", colour = "black") p + geom_lv(outlier.colour = "red", outlier.shape = 1) # Plots are automatically dodged when any aesthetic is a factor p + geom_lv(aes(fill = drv)) # varwidth adjusts the width of the boxes according to the number of observations ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV)), varwidth=TRUE) + scale_fill_lv() + scale_y_sqrt() + theme_bw() ontime$DayOfWeek <- as.POSIXlt(ontime$FlightDate)$wday ggplot(ontime, aes(factor(DayOfWeek), TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() + scale_y_sqrt() + theme_bw()
An extension of standard boxplots which draws k letter statistics. Conventional boxplots (Tukey 1977) are useful displays for conveying rough information about the central 50% of the data and the extent of the data.
LVboxplot(x, ...) ## S3 method for class 'formula' LVboxplot( formula, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... ) ## S3 method for class 'numeric' LVboxplot( x, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... )
LVboxplot(x, ...) ## S3 method for class 'formula' LVboxplot( formula, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... ) ## S3 method for class 'numeric' LVboxplot( x, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... )
x |
numeric vector of data |
... |
passed onto |
formula |
a plotting formula of the form |
alpha |
if supplied, depth k is calculated such that (1- |
k |
number of letter value statistics used |
perc |
if supplied, depth k is adjusted such that |
horizontal |
display horizontally (TRUE) or vertically (FALSE) |
xlab |
x axis label |
ylab |
y axis label |
col |
vector of colours to use |
bg |
background colour |
width |
maximum height/width of box |
width.method |
one of 'linear', 'height' or 'area'. Methods 'height' and 'area' ensure that these dimension are proportional to the number of observations within each box. |
median.col |
colour of the line for the median |
For moderate-sized data sets (), detailed estimates of tail
behavior beyond the quartiles may not be trustworthy, so the information
provided by boxplots is appropriately somewhat vague beyond the quartiles,
and the expected number of “outliers” and “far-out” values for a
Gaussian sample of size
is often less than 10 (Hoaglin, Iglewicz,
and Tukey 1986). Large data sets (
) afford
more precise estimates of quantiles in the tails beyond the quartiles and
also can be expected to present a large number of “outliers” (about
).
The letter-value box plot addresses both these shortcomings: it conveys
more detailed information in the tails using letter values, only out to the
depths where the letter values are reliable estimates of their
corresponding quantiles (corresponding to tail areas of roughly
); “outliers” are defined as a function of the most extreme
letter value shown. All aspects shown on the letter-value boxplot are
actual observations, thus remaining faithful to the principles that
governed Tukey's original boxplot.
n <- 10 oldpar <- par() par(mfrow=c(4,2), mar=c(3,3,3,3)) for (i in 1:4) { x <- rexp(10 ^ (i + 1)) boxplot(x, col = "grey", horizontal = TRUE) title(paste("Exponential, n = ", length(x))) LVboxplot(x, col = "grey", xlab = "") } par(mfrow=oldpar$mfrow, mar=oldpar$mar) with(ontime, LVboxplot(sqrt(TaxiIn + TaxiOut) ~ UniqueCarrier, horizontal=FALSE))
n <- 10 oldpar <- par() par(mfrow=c(4,2), mar=c(3,3,3,3)) for (i in 1:4) { x <- rexp(10 ^ (i + 1)) boxplot(x, col = "grey", horizontal = TRUE) title(paste("Exponential, n = ", length(x))) LVboxplot(x, col = "grey", xlab = "") } par(mfrow=oldpar$mfrow, mar=oldpar$mar) with(ontime, LVboxplot(sqrt(TaxiIn + TaxiOut) ~ UniqueCarrier, horizontal=FALSE))
Compute table of k letter values for vector x
lvtable(x, k, alpha = 0.95)
lvtable(x, k, alpha = 0.95)
x |
input numeric vector |
k |
number of letter values to compute |
alpha |
alpha-threshold for confidence level |
Data set detailing on-time performance of national US flights in January 2015. This data is a subset of the data provided by the US Department of Transportation. The full data as well as archived or more recent data is available for download from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time.
ontime
ontime
A data frame consisting of the variables
a date variable of the day of the flight
factor variable of the carrier (using the two letter abbreviation)
numeric variable of the flight number
scheduled departure time in hhmm format
actual departure time in hhmm format
scheduled arrival time in hhmm format
actual arrival time in hhmm format
numeric variable of the taxi out time in minutes
numeric variable of the taxi in time in minutes
Arrival delay, in Minutes
Departure delay, in Minutes
Carrier Delay, in Minutes
Weather Delay, in Minutes
National Air System Delay, in Minutes
Security Delay, in Minutes
Late Aircraft Delay, in Minutes
http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
library(ggplot2) ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() + scale_y_sqrt() + theme_bw()
library(ggplot2) ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv() + scale_y_sqrt() + theme_bw()