Machine Learning Latest Submitted Preprints | 2019-07-04
Machine Learning
Rethinking Table Recognition using Graph Neural Networks (1905.13391v2)
Shah Rukh Qasim, Hassan Mahmood, Faisal Shafait
2019-05-31
Document structure analysis, such as zone segmentation and table recognition, is a complex problem in document processing and is an active area of research. The recent success of deep learning in solving various computer vision and machine learning problems has not been reflected in document structure analysis since conventional neural networks are not well suited to the input structure of the problem. In this paper, we propose an architecture based on graph networks as a better alternative to standard neural networks for table recognition. We argue that graph networks are a more natural choice for these problems, and explore two gradient-based graph neural networks. Our proposed architecture combines the benefits of convolutional neural networks for visual feature extraction and graph networks for dealing with the problem structure. We empirically demonstrate that our method outperforms the baseline by a significant margin. In addition, we identify the lack of large scale datasets as a major hindrance for deep learning research for structure analysis and present a new large scale synthetic dataset for the problem of table recognition. Finally, we open-source our implementation of dataset generation and the training framework of our graph networks to promote reproducible research in this direction.
Benchmarking Model-Based Reinforcement Learning (1907.02057v1)
Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba
2019-07-03
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. It is fairly common for authors to experiment with self-designed environments, and there are several separate lines of research, which are sometimes closed-sourced or not reproducible. Accordingly, it is an open question how these various existing MBRL algorithms perform relative to each other. To facilitate research in MBRL, in this paper we gather a wide collection of MBRL algorithms and propose over 18 benchmarking environments specially designed for MBRL. We benchmark these algorithms with unified problem settings, including noisy environments. Beyond cataloguing performance, we explore and unify the underlying algorithmic differences across MBRL algorithms. We characterize three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma. Finally, to maximally facilitate future research on MBRL, we open-source our benchmark in http://www.cs.toronto.edu/~tingwuwang/mbrl.html.
Variance Reduction for Matrix Games (1907.02056v1)
Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian
2019-07-03
We present a randomized primal-dual algorithm that solves the problem to additive error in time , for matrix with larger dimension and nonzero entries. This improves on Nemirovski's mirror-prox method by a factor of and is faster than stochastic gradient methods in the accurate and/or sparse regime . Our results hold for in the simplex (matrix games, linear programming) and for in an ball and in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines the mirror-prox method and a novel variance-reduced gradient estimator based on "sampling from the difference" between the current iterate and a reference point.
Spatially-Coupled Neural Network Architectures (1907.02051v1)
Arman Hasanzadeh, Nagaraj T. Janakiraman, Vamsi K. Amalladinne, Krishna R. Narayanan
2019-07-03
In this work, we leverage advances in sparse coding techniques to reduce the number of trainable parameters in a fully connected neural network. While most of the works in literature impose regularization, DropOut or DropConnect techniques to induce sparsity, our scheme considers feature importance as a criterion to allocate the trainable parameters (resources) efficiently in the network. Even though sparsity is ensured, regularization requires training on all the resources in a deep neural network. The DropOut/DropConnect techniques reduce the number of trainable parameters in the training stage by dropping a random collection of neurons/edges in the hidden layers. However, both these techniques do not pay heed to the underlying structure in the data when dropping the neurons/edges. Moreover, these frameworks require a storage space equivalent to the number of parameters in a fully connected neural network. We address the above issues with a more structured architecture inspired from spatially-coupled sparse constructions. The proposed architecture is shown to have a performance akin to a conventional fully connected neural network with dropouts, and yet achieving a reduction in the training parameters. Extensive simulations are presented and the performance of the proposed scheme is compared against traditional neural network architectures.
Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack (1907.02044v1)
Francesco Croce, Matthias Hein
2019-07-03
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the -norms for aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields high quality results already with one restart, minimizes the size of the perturbation, so that the robust accuracy can be evaluated at all possible thresholds with a single run, and comes with almost no free parameters except number of iterations and restarts. It achieves better or similar robust test accuracy compared to state-of-the-art attacks which are partially specialized to one -norm.
Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference (1901.10668v2)
Shun Liao, Ting Chen, Tian Lin, Denny Zhou, Chong Wang
2019-01-30
Computations for the softmax function are significantly expensive when the number of output classes is large. In this paper, we present a novel softmax inference speedup method, Doubly Sparse Softmax (DS-Softmax), that leverages sparse mixture of sparse experts to efficiently retrieve top-k classes. Different from most existing methods that require and approximate a fixed softmax, our method is learning-based and can adapt softmax weights for a better inference speedup. In particular, our method learns a two-level hierarchy which divides entire output class space into several partially overlapping experts. Each expert is sparse and only contains a subset of output classes. To find top-k classes, a sparse mixture enables us to find the most probable expert quickly, and the sparse expert enables us to search within a small-scale softmax. We empirically conduct evaluation on several real-world tasks, including neural machine translation, language modeling and image classification, and demonstrate that significant computation reductions can be achieved at no performance loss.
Estimating Information-Theoretic Quantities with Random Forests (1907.00325v2)
Richard Guo, Cencheng Shen, Joshua T. Vogelstein
2019-06-30
Information-theoretic quantities, such as mutual information and conditional entropy, are useful statistics for measuring the dependence between two random variables. However, estimating these quantities in a non-parametric fashion is difficult, especially when the variables are high-dimensional, a mixture of continuous and discrete values, or both. In this paper, we propose a decision forest method, Conditional Forests (CF), to estimate these quantities. By combining quantile regression forests with honest sampling, and introducing a finite sample correction, CF improves finite sample bias in a range of settings. We demonstrate through simulations that CF achieves smaller bias and variance in both low- and high-dimensional settings for estimating posteriors, conditional entropy, and mutual information. We then use CF to estimate the amount of information between neuron class and other ceulluar feautres.
Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias (1901.00433v2)
Patrick Forré, Joris M. Mooij
2019-01-02
We prove the main rules of causal calculus (also called do-calculus) for i/o structural causal models (ioSCMs), a generalization of a recently proposed general class of non-/linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions. We also generalize adjustment criteria and formulas from the acyclic setting to the general one (i.e. ioSCMs). Such criteria then allow to estimate (conditional) causal effects from observational data that was (partially) gathered under selection bias and cycles. This generalizes the backdoor criterion, the selection-backdoor criterion and extensions of these to arbitrary ioSCMs. Together, our results thus enable causal reasoning in the presence of cycles, latent confounders and selection bias. Finally, we extend the ID algorithm for the identification of causal effects to ioSCMs.
VC Classes are Adversarially Robustly Learnable, but Only Improperly (1902.04217v2)
Omar Montasser, Steve Hanneke, Nathan Srebro
2019-02-12
We study the question of learning an adversarially robust predictor. We show that any hypothesis class with finite VC dimension is robustly PAC learnable with an improper learning rule. The requirement of being improper is necessary as we exhibit examples of hypothesis classes with finite VC dimension that are not robustly PAC learnable with any proper learning rule.
libconform v0.1.0: a Python library for conformal prediction (1907.02015v1)
Jonas Fassbender
2019-07-03
This paper introduces libconform v0.1.0, a Python library for the conformal prediction framework, licensed under the MIT-license. libconform is not yet stable. This paper describes the main algorithms implemented and documents the API of libconform. Also some details about the implementation and changes in future versions are described.