This function determines the probability of having bounding effects in a scatter
plot of of x
and y
based on the clustering of points at the upper
edge of the scatter plot (Miti et al.2024). It tests the hypothesis of larger
clustering at the upper bounds of a scatter plot against a null bivariate normal
distribution with no bounding effect (random scatter at upper edges). It returns
the probability (p-value) of the observed clustering given that it a realization
of an unbounded bivariate normal distribution.
Arguments
- x
A numeric vector of values for the independent variable.
- y
A numeric vector of values for the response variable.
- shells
A numeric value indicating the number of boundary peels (default is 10).
- simulations
The number of simulations for the null bivariate normally distributed data sets used to test the hypothesis (default is 1000).
- plot
If
TRUE
, a plot is part of the output. IfFALSE
, plot is not part of output (default isTRUE
).- ...
Additional graphical parameters as with the
par()
function.
Value
A dataframe with the p-values of obtaining the observed standard deviation of the euclidean distances of vertices in the upper peels to the center of the dataset for the left and right sections of the dataset.
Details
It is recommended that any outlying observations, as identified by the
bagplot()
function of the aplpack
package are removed from
the data. This is also implemented in the simulation step in the
expl_boundary()
function.
References
Eddy, W. F. (1982). Convex hull peeling, COMPSTAT 1982-Part I: Proceedings in Computational Statistics, 42-47. Physica-Verlag, Vienna.
Miti. c., Milne. A. E., Giller. K. E. and Lark. R. M (2024). Exploration of data for analysis using boundary line methodology. Computers and Electronics in Agriculture 219 (2024) 108794.
Examples
x<-evapotranspiration$`ET(mm)`
y<-evapotranspiration$`yield(t/ha)`
expl_boundary(x,y,10,100) # recommendation is to set simulations to greater than 1000
#> Index Section value
#> 1 sd Left 52.26857
#> 2 sd Right 83.15138
#> 3 Mean sd Left 62.38809
#> 4 Mean sd Right 55.67064
#> 5 p_value Left 0.02000
#> 6 p_value Right 1.00000