# Permutation methods for factor analysis and PCA

@article{Dobriban2017PermutationMF, title={Permutation methods for factor analysis and PCA}, author={E. Dobriban}, journal={arXiv: Statistics Theory}, year={2017} }

Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly… Expand

#### 31 Citations

Selecting the number of components in PCA via random signflips.

- Mathematics
- 2020

Dimensionality reduction via PCA and factor analysis is an important tool of data analysis. A critical step is selecting the number of components. However, existing methods (such as the scree plot,… Expand

Deterministic parallel analysis: An improved method for selecting the number of factors and principal components

- Mathematics
- 2017

Factor analysis and principal component analysis (PCA) are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improved… Expand

Deterministic parallel analysis: an improved method for selecting factors and principal components

- Computer Science, Mathematics
- Journal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2018

This work derandomizes parallel analysis, proposing deterministic PA, which is faster and more reproducible than PA, and proposes deflation to counter shadowing, and raises the decision threshold to improve estimation accuracy. Expand

Factor analysis in high dimensional biological data with dependent observations

- Computer Science, Mathematics
- 2020

This work develops a novel statistical framework to perform factor analysis and interpret its results in data with dependent observations and factors whose signal strengths span several orders of magnitude, and shows that its estimator for the number of factors overcomes both the notorious "eigenvalue shadowing" problem and the biases due to the pervasive factor assumption. Expand

Robust high dimensional factor models with applications to statistical machine learning.

- Medicine, Computer Science
- Statistical science : a review journal of the Institute of Mathematical Statistics
- 2021

It is shown that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference and illustrate through several applications how insights from these fields yield solutions to modern challenges. Expand

Estimating Number of Factors by Adjusted Eigenvalues Thresholding

- Mathematics
- 2019

Determining the number of common factors is an important and practical topic in high dimensional factor models. The existing literatures are mainly based on the eigenvalues of the covariance matrix.… Expand

Likelihood Ratio Test in Multivariate Linear Regression: from Low to High Dimension

- Mathematics
- Statistica Sinica
- 2021

Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer such… Expand

Biwhitening Reveals the Rank of a Count Matrix

- Computer Science, Mathematics
- ArXiv
- 2021

This work proposes a simple procedure termed biwhitening that makes it possible to estimate the rank of the underlying data matrix without any prior knowledge on its structure, and extends it to other discrete distributions, such as the generalized Poisson, binomial, multinomial, and negative binomial. Expand

Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data

- Mathematics, Computer Science
- 2018

CBCV and CorrConf are developed: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high dimensional data with correlated or nonexchangeable residuals. Expand

Estimation of large block structured covariance matrices: Application to ‘multi‐omic’ approaches to study seed quality

- Mathematics, Computer Science
- Journal of the Royal Statistical Society: Series C (Applied Statistics)
- 2021

This work proposes a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than thenumber of samples without limiting ourselves to block diagonal matrices. Expand

#### References

SHOWING 1-10 OF 60 REFERENCES

Bi-cross-validation for factor analysis

- Mathematics
- 2015

Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. We provide a systematic review of current methods and then introduce a method… Expand

Deterministic parallel analysis: An improved method for selecting the number of factors and principal components

- Mathematics
- 2017

Factor analysis and principal component analysis (PCA) are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improved… Expand

Deterministic parallel analysis: an improved method for selecting factors and principal components

- Computer Science, Mathematics
- Journal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2018

This work derandomizes parallel analysis, proposing deterministic PA, which is faster and more reproducible than PA, and proposes deflation to counter shadowing, and raises the decision threshold to improve estimation accuracy. Expand

How many principal components? stopping rules for determining the number of non-trivial axes revisited

- Mathematics, Computer Science
- Comput. Stat. Data Anal.
- 2005

A Bartlett's test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set, and a two-step approach appears to be highly effective. Expand

Remarks on Parallel Analysis.

- Mathematics, Medicine
- Multivariate behavioral research
- 1992

Evidence is given that quasi-inferential PA based on normal random variates (as opposed to data permutations) is surprisingly independent of distributional assumptions, and enjoys therefore certain non- parametric properties as well. Expand

The Elements of Statistical Learning

- Computer Science, Mathematics
- Technometrics
- 2003

Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Expand

Confirmatory Factor Analysis for Applied Research

- Psychology
- 2008

Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose. The following review was performed independently of LaRose’s other two books.… Expand

Eigenvalue significance testing for genetic association.

- Mathematics, Medicine
- Biometrics
- 2018

A novel block permutation approach is introduced, designed to produce an appropriate null eigen value distribution by eliminating long-range genomic correlation while preserving local correlation, and a fast approach based on eigenvalue distribution modeling is proposed. Expand

Finite sample approximation results for principal component analysis: a matrix perturbation approach

- Mathematics
- 2009

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of $n$ observations (samples), each with $p$ variables. In this paper, using a matrix perturbation approach,… Expand

TESTING HYPOTHESES ABOUT THE NUMBER OF FACTORS IN LARGE FACTOR MODELS

- Mathematics
- 2009

In this paper we study high-dimensional time series that have the generalized dynamic factor structure. We develop a test of the null of k 0 factors against the alternative that the number of factors… Expand