Package 'MVET' reference manual

Title:	Multivariate Estimates and Tests
Description:	Multivariate estimation and testing, currently a package for testing parametric data. To deal with parametric data, various multivariate normality tests and outlier detection are performed and visualized using the 'ggplot2' package. Homogeneity tests for covariance matrices are also possible, as well as the Hotelling's T-square test and the multivariate analysis of variance test. We are exploring additional tests and visualization techniques, such as profile analysis and randomized complete block design, to be made available in the future and making them easily accessible to users.
Authors:	Yeonseok Choi [aut, cre], Yong-Seok Choi [ctb]
Maintainer:	Yeonseok Choi <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2025-03-04 04:02:49 UTC
Source:	https://github.com/yeonseok-choi/mvet

Mean Value Parallel Coordinates Plot (Use to HT2test & VManova)

Description

Mean Value Parallel Coordinates Plot (Use to HT2test & VManova)

Usage

.mean_parallel_plot(data, grp.name, scale = FALSE)
.mean_parallel_plot(data, grp.name, scale = FALSE)

Arguments

`data`	A numeric matrix or data frame. If data frame, group(class) column can be a factor or a string.
`grp.name`	The name of a column of `string` representing groups(classes) in the input data.
`scale`	If `TRUE`, the data will be scaled before calculating mean values and used in the plot. (default scale = `FALSE`)

Value

Mean Value Parallel Coordinates Plot

Box's M-test

Description

Performs Box's M-test for homogeneity of covariance matrices derived from multivariate normality data according to a single classification factor. This test is based on the chi-square approximation.

Usage

boxMtest(data,
         group)
boxMtest(data,
         group)

Arguments

`data`	A numeric matrix or data frame.
`group`	In either vector or factor form, the length of the group must correspond to the number of observations `n` in the data.

Value

`M.stat`	Box's M-test statistic approximates the chi-square distribution.
`df`	The degree of freedom is related to the test statistic.
`p.value`	The p-value of the test statistic.

Examples

data(wine)
class <- wine$class
winedata <- subset(wine, select = -class)
boxMtest(winedata, class)

data(wine)
class <- wine$class
winedata <- subset(wine, select = -class)
boxMtest(winedata, class)

Hotelling T Square Test

Description

The mean vector test (Hotelling T square test) to compare one sample or two samples that satisfy the multivariate normality test and the homogeneity of covariance matrices test.

Usage

HT2test(data1,
        data2,
        mu0 = NULL,
        sample = "two",
        plot.scale = FALSE)
HT2test(data1,
        data2,
        mu0 = NULL,
        sample = "two",
        plot.scale = FALSE)

Arguments

`data1`	The data frame or matrix must consist of only numbers, and the data must consist of only a single group or class. It should not contain columns that separate groups or classes.
`data2`	The data frame or matrix must consist of only numbers, and the data must consist of only a single group or class. It should not contain columns that separate groups or classes. The `data2` is for comparison with `data1` and is not used to compare one sample to another.
`mu0`	The mu0 is used to test the mean vector hypothesis of `data1`. It is only used to compare one-sample.
`sample`	The options for specifying the number of groups for group comparisons are `one` and `two`, where `one` is used to compare one-sample and `two` is used to compare two-samples. (default sample = `two`)
`plot.scale`	If `TRUE`, the data will be scaled before calculating mean values and used in the plot. It has no direct effect on the data. It only applies to two samples. (default plot.scale = `FALSE`)

Value

`One.HT2`	The Hotelling T square test in one-sample, showing the degrees of freedom required for the F test, the Hotelling t square statistic, the F test statistic, and the probability of significance.
`Mean.val.plot`	Plot the mean value parallel coordinates, representing the two samples using the mean values for each variable.
`Two.HT2`	The Hotelling T square test in two-sample, showing the degrees of freedom required for the F test, the Hotelling t square statistic, the F test statistic, and the probability of significance.

References

Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis (6th ed.). Pearson Prentice Hall.

Examples

data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
class2.wine <- subset(wine, class == 2)[, -1]
modified.class2.wine <- outlier(class2.wine, lim = 0, level = 0.05, option = "all")$modified.data

## one sample
value <- 0
p <- ncol(class1.wine)
mu0 <- matrix(rep(value, p), nrow = p, ncol = 1)
HT2test(data1 = class1.wine, mu0 = mu0, sample = "one")

## two sample
HT2test(data1 = class1.wine, data2 = modified.class2.wine, sample = "two", plot.scale = TRUE)


data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
class2.wine <- subset(wine, class == 2)[, -1]
modified.class2.wine <- outlier(class2.wine, lim = 0, level = 0.05, option = "all")$modified.data

## one sample
value <- 0
p <- ncol(class1.wine)
mu0 <- matrix(rep(value, p), nrow = p, ncol = 1)
HT2test(data1 = class1.wine, mu0 = mu0, sample = "one")

## two sample
HT2test(data1 = class1.wine, data2 = modified.class2.wine, sample = "two", plot.scale = TRUE)

Mardia Test for Multivariate Normality Test

Description

Performs a multivariate normality test by conducting a mardia test using skewness and kurtosis. If both skewness and kurtosis are satisfied, multivariate normality is satisfied.

Usage

mardiatest(data,
           level = 0.05,
           showplot = FALSE,
           showoutlier = FALSE,
           outlieropt = "all",
           shownewdata = FALSE)
mardiatest(data,
           level = 0.05,
           showplot = FALSE,
           showoutlier = FALSE,
           outlieropt = "all",
           shownewdata = FALSE)

Arguments

`data`	A numeric matrix or data frame.
`level`	The significance level of the skewness and kurtosis statistics. (default = `0.05`)
`showplot`	If `TRUE`, show a chi-square Q-Q plot using `ggplot2`. If '`showoutlier`' is `TRUE`, outliers are also displayed. (default = `FALSE`)
`showoutlier`	If `TRUE`, show the outliers number and count. (default = `FALSE`)
`outlieropt`	An `"option"` in the `outlier` function. (default = `"all"`)
`shownewdata`	If `TRUE` Shows the new data with outliers removed. (default = `FALSE`)

Value

`mult.nomality`	Calculate statistics and p-values for skewness and kurtosis to ultimately determine whether multivariate normality is satisfied.
`QQPlot`	Shows Chi-Square Q-Q plot.
`...`	Same as the result of `outlier`

References

Mardia, K. V. (1970), Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519-530.

Mardia, K. V. (1974), Applications of Some Measures of Multivariate Skewness and Kurtosis in Testing Normality and Robustness Studies. Sankhya, 36, 115-128.

Examples

## Simple Mardia Test
data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
mardiatest(class2.wine, level = 0.05, showplot = TRUE)

## Mardia Test and Outlier Detection
data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
mardiatest(class2.wine, level = 0.05, showplot = TRUE,
           showoutlier = TRUE, outlieropt = "all", shownewdata = TRUE)


## Simple Mardia Test
data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
mardiatest(class2.wine, level = 0.05, showplot = TRUE)

## Mardia Test and Outlier Detection
data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
mardiatest(class2.wine, level = 0.05, showplot = TRUE,
           showoutlier = TRUE, outlieropt = "all", shownewdata = TRUE)

Outliers Detection

Description

Using the mardia test, outliers are detected based on skewness and kurtosis. However, outliers don't detect more than half of the total observation data.(Can be modified with the lim option.)

Usage

outlier(data,
        lim = 0,
        level = 0.05,
        option = "all")
outlier(data,
        lim = 0,
        level = 0.05,
        option = "all")

Arguments

`data`	A numeric matrix or data frame.
`lim`	The number of outliers detected can be limited. If 0 is entered, detection is possible up to half of the data. (default = `0`)
`level`	The significance level of the skewness and kurtosis statistics of the "`madiatest`" function. (default = `0.05`)
`option`	`"skew"` refers to skewness, `"kurt"` refers to kurtosis, `"all"` refers to skewness and kurtosis. Outliers are detected until the corresponding option in the `mardiatest` is “Accept”. (default = `"all"`)

Value

`modified.data`	The modified data without outliers.
`modified.mvn`	The modified Mardia test result without outliers.
`outlier.num`	The position of outliers.
`outlier.cnt`	Total number of outliers.

References

Jobson, J. D.(1992). Applied Multivariate Data Analysis, Springer-Verlag, New York.

Examples

data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
outlier(class2.wine, lim = 0, level = 0.05, option = "all")


data(wine)
class2.wine <- subset(wine, class == 2)[, -1]
outlier(class2.wine, lim = 0, level = 0.05, option = "all")

Probability Plot Correlation Coefficient(PPCC) Test for Multivariate Normality Test

Description

The correlation coefficient of the quantiles and mahalanobis square are tested by using the critical value table by Filliben (1975) for the multivariate normality test.

Usage

PPCCtest(data,
         level = 0.05)
PPCCtest(data,
         level = 0.05)

Arguments

`data`	A numeric matrix or data frame.
`level`	At the `0.01` or `0.05` significance level, the critical value. (default = `0.05`)

Value

`data.cnt`	Observation `n` data count.
`PPCC.value`	Correlation coefficient value.
`critical.value`	Critical value proposed by Filliben (1975), corresponding to `data.cnt` and `PPCC.value`.
`test.res`	Final result of multivariate normality.
`QQPlot`	Shows Chi-Square Q-Q plot.

References

Filliben, J. J. (1975), The Probability Plot Correlation Coefficient Test for Normality, Technometrics 17, 111-117.

Examples

data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
PPCCtest(class1.wine, level = 0.05)

data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
PPCCtest(class1.wine, level = 0.05)

Srivastava Plot Correlation Coefficient(SPCC) Test for Multivariate Normality Test

Description

Using principal component analysis, the number of eigenvalues is selected such that the ratio of eigenvalues exceeds 70%. The principal component score vectors corresponding to these selected eigenvalues are used, and testing is conducted using the threshold defined by Filliben (1975). Users have the option to select the number of eigenvalues for the analysis based on their requirements.

Usage

SPCCtest(data,
         k = 0,
         level = 0.05)
SPCCtest(data,
         k = 0,
         level = 0.05)

Arguments

`data`	A numeric matrix or data frame.
`k`	The number of principal components can be manually selected. If 0 is entered, it automatically finds k components such that the explained variance ratio is at least 70%. (default = `0`)
`level`	At the `0.01` or `0.05` significance level, the critical value. (default = `0.05`)

Value

`Srivastava.QQplot`	Shows a chi-Square Q-Q plot for each PCs using ggplot2.
`data.cnt`	Observation `n` data count.
`explain.ratio`	Displays all explained variance ratios.
`critical.value`	Critical value proposed by Filliben (1975), corresponding to `data.cnt` and `PPCC.value`.
`result`	Final result of multivariate normality.

References

Srivastava, M. S. (1984), A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statistics & Probability Letters, 2(5), 263-267.

Filliben, J. J. (1975), The Probability Plot Correlation Coefficient Test for Normality, Technometrics 17, 111-117.

Examples

data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
SPCCtest(class1.wine, k = 5, level = 0.05)

data(wine)
class1.wine <- subset(wine, class == 1)[, -1]
SPCCtest(class1.wine, k = 5, level = 0.05)

Various Multivariate Anova(VManova)

Description

Perform various types of multivariate analysis of variance (MANOVA) that satisfy tests of multivariate normality and homogeneity of covariance matrices.

Usage

VManova(data,
        grp1.name,
        grp2.name,
        way = "one",
        method = "all",
        plot.scale = FALSE)
VManova(data,
        grp1.name,
        grp2.name,
        way = "one",
        method = "all",
        plot.scale = FALSE)

Arguments

`data`	A numeric matrix or data frame. If data frames, group(class) column can be a factor or a string.
`grp1.name`	The name of the first group (or class) column in the input data, specified as a `string`.
`grp2.name`	The name of the second group (or class) column in the input data, specified as a `string`. Used to represent the second group(class) in a two-way MANOVA.
`way`	The type of MANOVA to perform ("`one`" for one-way or "`two`" for two-way). (default = "`one`")
`method`	The method for MANOVA analysis. "`Wilks`" represents Wilks' lambda, "`LH`" represents Lawley-Hotelling trace, "`Pillai`" represents Pillai-Bartlett trace, "`Roy`" represents Roy's largest root, and "`all`" represents all methods. (default is "`all`")
`plot.scale`	If `TRUE`, the data will be scaled before calculating mean values and used in the plot. It has no direct effect on the MANOVA analysis itself. (default plot.scale = `FALSE`)

Value

`Mean.val.plot`	Plot the mean value parallel coordinates, representing the two samples using the mean values for each variable.
`One.all`	Outputs the results of a one-way MANOVA test. It displays the degrees of freedom (Df1, Df2) of the F-distribution, statistics for Wilks, Lawley-Hotelling, Pillai, and Roy, the F-distribution test statistic, and the significance level in that order.
`Two.all`	Outputs the results of a two-way MANOVA test. It displays the degrees of freedom (Df1, Df2) of the F-distribution, statistics for Wilks, Lawley-Hotelling, Pillai, and Roy, the F-distribution test statistic, and the significance level in that order.

References

Rencher, A. C., & Christensen, W. F. (2002). Methods of Multivariate Analysis. John Wiley & Sons, Inc., New York.

Examples

data(wine)

## one way
VManova(wine, grp1.name = "class", way = "one", method = "all", plot.scale = TRUE)

## two way
newwine <- wine
# (1: low, 2: medium, 3: high)
newwine$v4 <- ifelse(wine$v4 <= 17, 1,
                     ifelse(wine$v4 <= 22, 2, 3))
VManova(newwine, grp1.name = "class", grp2.name = "v4",
        way = "two", method = "all", plot.scale = TRUE)


data(wine)

## one way
VManova(wine, grp1.name = "class", way = "one", method = "all", plot.scale = TRUE)

## two way
newwine <- wine
# (1: low, 2: medium, 3: high)
newwine$v4 <- ifelse(wine$v4 <= 17, 1,
                     ifelse(wine$v4 <= 22, 2, 3))
VManova(newwine, grp1.name = "class", grp2.name = "v4",
        way = "two", method = "all", plot.scale = TRUE)

Wine Dataset

Description

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

Usage

wine
wine

Format

A data frame with 178 observations on the following 14 variables:

class: The class vector, the three different cultivars of wine are reprensented by the three integers(1 to 3).
v1: Alcohol
v2: Malic acid
v3: Ash
v4: Alcalinity of ash
v5: Magnesium
v6: Total phenols
v7: Flavanoids
v8: Nonflavanoid phenols
v9: Proanthocyanins
v10: Color intensity
v11: Hue
v12: OD280/OD315 of diluted wines
v13: Proline

Source

http://archive.ics.uci.edu/ml/datasets/Wine.

Package 'MVET'

Help Index

Mean Value Parallel Coordinates Plot (Use to HT2test & VManova)

Description

Usage

Arguments

Value

Box's M-test

Description

Usage

Arguments

Value

See Also

Examples

Hotelling T Square Test

Description

Usage

Arguments

Value

References

See Also

Examples

Mardia Test for Multivariate Normality Test

Description

Usage

Arguments

Value

References

See Also

Examples

Outliers Detection

Description

Usage

Arguments

Value

References

See Also

Examples

Probability Plot Correlation Coefficient(PPCC) Test for Multivariate Normality Test

Description

Usage

Arguments

Value

References

Examples

Srivastava Plot Correlation Coefficient(SPCC) Test for Multivariate Normality Test

Description

Usage

Arguments

Value

References

Examples

Various Multivariate Anova(VManova)

Description

Usage

Arguments

Value

References

See Also

Examples

Wine Dataset

Description

Usage

Format

Source