Publications


Escaping strict saddle points of the Moreau envelope in nonsmooth optimization (with D. Davis and D. Drusvyatskiy)
Preprint, 2021.
Abstract:

Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.

Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient (with D. Applegate, O. Hinder, H. Lu, M. Lubin, B. O'Donoghue, and W. Schudy)
Preprint, 2021.
Abstract:

We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), to a saddle-point formulation of LP. PDLP enhances PDHG for LP by combining several new techniques with older tricks from the literature; the enhancements include diagonal preconditioning, presolving, adaptive step sizes, and adaptive restarting. PDLP improves the state of the art for first-order methods applied to LP. We compare PDLP with SCS, an ADMM-based solver, on a set of 383 LP instances derived from MIPLIB 2017. With a target of $10^{-8}$ relative accuracy and 1 hour time limit, PDLP achieves a 6.3x reduction in the geometric mean of solve times and a 4.6x reduction in the number of instances unsolved (from 227 to 49). Furthermore, we highlight standard benchmark instances and a large-scale application (PageRank) where our open-source prototype of PDLP, written in Julia, outperforms a commercial LP solver.

Optimal Convergence Rates for the Proximal Bundle Method (with B. Grimmer)
Preprint, 2021.
Abstract:

We study convergence rates of the classic proximal bundle method for a variety of nonsmooth convex optimization problems. We show that, without any modification, this algorithm adapts to converge faster in the presence of smoothness or a Hölder growth condition. Our analysis reveals that with a constant stepsize, the bundle method is adaptive, yet it exhibits suboptimal convergence rates. We overcome this shortcoming by proposing nonconstant stepsize schemes with optimal rates. These schemes use function information such as growth constants, which might be prohibitive in practice. We complete the paper with a new parallelizable variant of the bundle method that attains near-optimal rates without prior knowledge of function parameters. These results improve on the limited existing convergence rates and provide a unified analysis approach across problem settings and algorithmic details. Numerical experiments support our findings and illustrate the effectiveness of the parallel bundle method.

Infeasibility detection with primal-dual hybrid gradient for large-scale linear programming (with D. Applegate, H. Lu, and M. Lubin)
Preprint, 2021.
Abstract:

We study the problem of detecting infeasibility of large-scale linear programming problems using the primal-dual hybrid gradient method (PDHG) of Chambolle and Pock (2011). The literature on PDHG has mostly focused on settings where the problem at hand is assumed to be feasible. When the problem is not feasible, the iterates of the algorithm do not converge. In this scenario, we show that the iterates diverge at a controlled rate towards a well-defined ray. The direction of this ray is known as the infimal displacement vector. The first contribution of our work is to prove that this vector recovers certificates of primal and dual infeasibility whenever they exist. Based on this fact, we propose a simple way to extract approximate infeasibility certificates from the iterates of PDHG. We study three different sequences that converge to the infimal displacement vector: the difference of iterates, the normalized iterates, and the normalized average. All of them are easy to compute, and thus the approach is suitable for large-scale problems. Our second contribution is to establish tight convergence rates for these sequences. We demonstrate that the normalized iterates and the normalized average achieve a convergence rate of $O(1/k)$, improving over the known rate of $O(1/\sqrt{k})$. This rate is general and applies to any fixed-point iteration of a nonexpansive operator. Thus, it is a result of independent interest since it covers a broad family of algorithms, including, for example, ADMM, and can be applied settings beyond linear programming, such as quadratic and semidefinite programming. Further, in the case of linear programming we show that, under nondegeneracy assumptions, the iterates of PDHG identify the active set of an auxiliary feasible problem in finite time, which ensures that the difference of iterates exhibits eventual linear convergence to the infimal displacement vector.

Efficient Clustering for Stretched Mixtures: Landscape and Optimality (with K. Wang and Y. Yan)
NeurIPS, 2020.
Abstract:

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.

Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence (with V. Charisopoulos, Y. Chen, D. Davis, L. Ding, D. Drusvyatskiy)
Foundations of Computational Mathematics, 2020.
Abstract:

The task of recovering a low-rank matrix from its noisy linear measurements plays a central role in computational science. Smooth formulations of the problem often exhibit an undesirable phenomenon: the condition number, classically defined, scales poorly with the dimension of the ambient space. In contrast, we here show that in a variety of concrete circumstances, nonsmooth penalty formulations do not suffer from the same type of ill-conditioning. Consequently, standard algorithms for nonsmooth optimization, such as subgradient and prox-linear methods, converge at a rapid dimension-independent rate when initialized within constant relative error of the solution. Our framework subsumes such important computational tasks as phase retrieval, blind deconvolution, quadratic sensing, matrix completion, and robust PCA. Numerical experiments on these problems illustrate the benefits of the proposed approach.

Composite optimization for robust rank one bilinear sensing (with V. Charisopoulos, D. Davis, and D. Drusvyatskiy)
Information and Inference, 2020.
Abstract:

We consider the task of recovering a pair of vectors from a set of rank one bilinear measurements, possibly corrupted by noise. Most notably, the problem of robust blind deconvolution can be modeled in this way. We consider a natural nonsmooth formulation of the rank one bilinear sensing problem and show that its moduli of weak convexity, sharpness and Lipschitz continuity are all dimension independent, under favorable statistical assumptions. This phenomenon persists even when up to half of the measurements are corrupted by noise. Consequently, standard algorithms, such as the subgradient and prox-linear methods, converge at a rapid dimension-independent rate when initialized within a constant relative error of the solution. We complete the paper with a new initialization strategy, complementing the local search algorithms. The initialization procedure is both provably efficient and robust to outlying measurements. Numerical experiments, on both simulated and real data, illustrate the developed theory and methods.

The nonsmooth landscape of blind deconvolution
Workshop on Optimization for Machine Learning, 2019.
Abstract:

The blind deconvolution problem aims to recover a rank-one matrix from a set of rank-one linear measurements. Recently, Charisopulos et al. introduced a nonconvex nonsmooth formulation that can be used, in combination with an initialization procedure, to provably solve this problem under standard statistical assumptions. In practice, however, initialization is unnecessary. As we demonstrate numerically, a randomly initialized subgradient method consistently solves the problem. In pursuit of a better understanding of this phenomenon, we study the random landscape of this formulation. We characterize in closed form the landscape of the population objective and describe the approximate location of the stationary points of the sample objective. In particular, we show that the set of spurious critical points lies close to a codimension two subspace. In doing this, we develop tools for studying the landscape of a broader family of singular value functions, these results may be of independent interest.

Local angles and dimension estimation from data on manifolds (with A. Quiroz and M. Velasco)
Journal of Multivariate Analysis, to appear 2019.
Abstract:

For data living in a manifold $M\subseteq \mathbb{R}^m$ and a point $p\in M$ we consider a statistic $U_{k,n}$ which estimates the variance of the angle between pairs of vectors $X_i-p$ and $X_j-p$, for data points $X_i$, $X_j$, near $p$, and evaluate this statistic as a tool for estimation of the intrinsic dimension of $M$ at $p$. Consistency of the local dimension estimator is established and the asymptotic distribution of $U_{k,n}$ is found under minimal regularity assumptions. Performance of the proposed methodology is compared against state-of-the-art methods on simulated data.

In Search of Balance: The Challenge of Generating Balanced Latin Rectangles (with C. Gomes, R. Le Bras)
CPAIOR 2017 Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming.

Abstract:

Spatially Balanced Latin Squares are combinatorial structures of great importance for experimental design. From a computational perspective they present a challenging problem and there is a need for efficient methods to generate them. Motivated by a real-world application, we consider a natural extension to this problem, balanced Latin Rectangles. Balanced Latin Rectangles appear to be even more defiant than balanced Latin Squares, to such an extent that perfect balance may not be feasible for Latin rectangles. Nonetheless, for real applications, it is still valuable to have well balanced Latin rectangles. In this work, we study some of the properties of balanced Latin rectangles, prove the nonexistence of perfect balance for an infinite family of sizes, and present several methods to generate the most balanced solutions.

Compressed sensing of data with known distribution (with M. Junca, F. Rincón and M. Velasco)
Applied and Computational Harmonic Analysis, 2018.
Abstract:

Compressed sensing is a technique for recovering an unknown sparse signal from a small number of linear measurements. When the measurement matrix is random, the number of measurements required for perfect recovery exhibits a phase transition: there is a threshold on the number of measurements after which the probability of exact recovery quickly goes from very small to very large. In this work we are able to reduce this threshold by incorporating statistical information about the data we wish to recover. Our algorithm works by minimizing a suitably weighted $\ell_1$-norm, where the weights are chosen so that the expected statistical dimension of the corresponding descent cone is minimized. We also provide new discrete-geometry-based Monte Carlo algorithms for computing intrinsic volumes of such descent cones, allowing us to bound the failure probability of our methods.