publications | Trung Trinh

2024

NeurIPS Spotlight
Improving robustness to corruptions with multiplicative weight perturbations

Trung Trinh, Markus Heinonen, Luigi Acerbi, and Samuel Kaski

Advances in Neural Information Processing Systems 38, 2024

Abs arXiv Bib Code Website

Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23.7% which is comparable to ResNet50 without extensive data augmentations.
@article{trinh2024improving, title = {Improving robustness to corruptions with multiplicative weight perturbations}, author = {Trinh, Trung and Heinonen, Markus and Acerbi, Luigi and Kaski, Samuel}, journal = {Advances in Neural Information Processing Systems 38}, year = {2024}, }
ICLR Spotlight
Input-gradient space particle inference for neural network ensembles

Trung Trinh, Markus Heinonen, Luigi Acerbi, and Samuel Kaski

In The Twelfth International Conference on Learning Representations , 2024

Abs arXiv Bib Code Website

Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robustness to perturbations over single neural networks partly due to their functional diversity. Particle-based variational inference (ParVI) methods enhance diversity by formalizing a repulsion term based on a network similarity kernel. However, weight-space repulsion is inefficient due to over-parameterization, while direct function-space repulsion has been found to produce little improvement over DEs. To sidestep these difficulties, we propose First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based on ParVI, which performs repulsion in the space of first-order input gradients. As input gradients uniquely characterize a function up to translation and are much smaller in dimension than the weights, this method guarantees that ensemble members are functionally different. Intuitively, diversifying the input gradients encourages each network to learn different features, which is expected to improve the robustness of an ensemble. Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations.
@inproceedings{trinh2024inputgradient, title = {Input-gradient space particle inference for neural network ensembles}, author = {Trinh, Trung and Heinonen, Markus and Acerbi, Luigi and Kaski, Samuel}, booktitle = {The Twelfth International Conference on Learning Representations}, year = {2024}, url = {https://openreview.net/forum?id=nLWiR5P3wr}, }

2022

ICML Oral
Tackling covariate shift with node-based Bayesian neural networks

Trung Trinh, Markus Heinonen, Luigi Acerbi, and Samuel Kaski

In Proceedings of the 39th International Conference on Machine Learning , 2022

Abs arXiv Bib Code Website

Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently been introduced as scalable alternatives, which induce epistemic uncertainty by multiplying each hidden node with latent random variables, while learning a point-estimate of the weights. In this paper, we interpret these latent noise variables as implicit representations of simple and domain-agnostic data perturbations during training, producing BNNs that perform well under covariate shift due to input corruptions. We observe that the diversity of the implicit corruptions depends on the entropy of the latent variables, and propose a straightforward approach to increase the entropy of these variables during training. We evaluate the method on out-of-distribution image classification benchmarks, and show improved uncertainty estimation of node-based BNNs under covariate shift due to input perturbations. As a side effect, the method also provides robustness against noisy training labels.
@inproceedings{trinh22tacklingcovariate, title = {Tackling covariate shift with node-based {B}ayesian neural networks}, author = {Trinh, Trung and Heinonen, Markus and Acerbi, Luigi and Kaski, Samuel}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {21751--21775}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }

2021

Master thesis
Scalable Bayesian neural networks

Trung Trinh

Jun 2021

Abs Bib PDF Website

The ability to output accurate predictive uncertainty estimates is vital to a reliable classifier. Standard neural networks (NNs), while being powerful machine learning models that can learn complex patterns from large datasets, do not possess such ability. Therefore, one cannot reliably detect when an NN makes a wrong prediction. This shortcoming prevents applying NNs in safety-critical domains such as healthcare and autonomous vehicles. Bayesian neural networks (BNNs) have emerged as one of the promising solutions combining the learning capacity of NNs with probabilistic representations of uncertainty. By treating its weights as random variables, a BNN produces over its outputs a distribution from which uncertainty can be quantified. As a result, a BNN can provide better predictive performance while being more robust against out-of-distribution (OOD) samples than a respective deterministic NN. Unfortunately, training large BNNs is challenging due to the inherent complexity of these models. Therefore, BNNs trained by standard Bayesian inference methods typically produce lower classification accuracy than their deterministic counterparts, thus hindering their practical applications despite their potential. This thesis introduces implicit Bayesian neural networks (iBNNs), which are scalable BNN models that can be applied to large architectures. This model considers weights as deterministic parameters and augments the input nodes of each layer with latent variables as an alternative method to induce predictive uncertainty. To train an iBNN, we only need to infer the posterior distribution of these low-dimensional auxiliary variables while learning a point estimate of the weights. Through comprehensive experiments, we show that iBNNs provide competitive performance compared to other existing scalable BNN approaches and are more robust against OOD samples despite having smaller numbers of parameters. Furthermore, with minimal overhead, we can convert a pretrained deterministic NN to a respective iBNN with better generalisation performance and predictive uncertainty. Thus, we can use iBNNs with pretrained weights of state-of-the-art deep NNs as a computationally efficient post-processing step to further improve performance of those models.
@article{trinh_scalable_2021, title = {Scalable {Bayesian} neural networks}, url = {https://aaltodoc.aalto.fi/handle/123456789/108212}, language = {en}, urldate = {2024-06-21}, author = {Trinh, Trung}, month = jun, year = {2021}, }
Nested variational autoencoder for topic modelling on microtexts with word vectors

Trung Trinh, Tho Quan, and Trung Mai

Expert Systems, Jun 2021

Abs Bib Website

Most of the information on the Internet is represented in the form of microtexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful insights. Topic modelling is one of the popular methods to extract knowledge from a collection of documents; however, conventional topic models such as latent Dirichlet allocation (LDA) are unable to perform well on short documents, mostly due to the scarcity of word co-occurrence statistics embedded in the data. The objective of our research is to create a topic model that can achieve great performances on microtexts while requiring a small runtime for scalability to large datasets. To solve the lack of information of microtexts, we allow our method to take advantage of word embeddings for additional knowledge of relationships between words. For speed and scalability, we apply autoencoding variational Bayes, an algorithm that can perform efficient black-box inference in probabilistic models. The result of our work is a novel topic model called the nested variational autoencoder, which is a distribution that takes into account word vectors and is parameterized by a neural network architecture. For optimization, the model is trained to approximate the posterior distribution of the original LDA model. Experiments show the improvements of our model on microtexts as well as its runtime advantage.
@article{trinh2021nested, title = {Nested variational autoencoder for topic modelling on microtexts with word vectors}, author = {Trinh, Trung and Quan, Tho and Mai, Trung}, journal = {Expert Systems}, volume = {38}, number = {2}, pages = {e12639}, year = {2021}, publisher = {Wiley Online Library}, }

2020

Preprint
Scalable Bayesian neural networks by layer-wise input augmentation

Trung Trinh, Samuel Kaski, and Markus Heinonen

arXiv preprint arXiv:2010.13498, Jun 2020

Abs arXiv Bib

We introduce implicit Bayesian neural networks, a simple and scalable approach for uncertainty representation in deep learning. Standard Bayesian approach to deep learning requires the impractical inference of the posterior distribution over millions of parameters. Instead, we propose to induce a distribution that captures the uncertainty over neural networks by augmenting each layer’s inputs with latent variables. We present appropriate input distributions and demonstrate state-of-the-art performance in terms of calibration, robustness and uncertainty characterisation over large-scale, multi-million parameter image classification tasks.
@article{trinh2020scalable, title = {Scalable Bayesian neural networks by layer-wise input augmentation}, author = {Trinh, Trung and Kaski, Samuel and Heinonen, Markus}, journal = {arXiv preprint arXiv:2010.13498}, year = {2020}, }

2018

Lead engagement by automated real estate chatbot

Tho Quan, Trung Trinh, Dang Ngo, Hon Pham , and 6 more authors

In 2018 5th NAFOSTED Conference on Information and Computer Science (NICS) , Jun 2018

Abs Bib Website

Recently, automated chatbot has been increasingly applied in real estate industry. Even though chatbots cannot fully replace the traditional relation between agents and home buyers, they can help to engage potential clients (or leads) in meaningful conversations, which is highly useful for lead capture. In this paper, we present an intelligent chatbot for this purpose. Various machine learning techniques, including multi-task deep learning technique for intent identification and frequent itemsets for conversation elaboration, have been employed in our system. Our chatbot has been deployed by CEO K35 GROUP JSC with daily updated data of real estate information at Hanoi and Ho Chi Minh cities, Vietnam.
@inproceedings{quan2018lead, title = {Lead engagement by automated real estate chatbot}, author = {Quan, Tho and Trinh, Trung and Ngo, Dang and Pham, Hon and Hoang, Long and Hoang, Hung and Thai, Thanh and Vo, Phong and Pham, Dang and Mai, Trung}, booktitle = {2018 5th NAFOSTED Conference on Information and Computer Science (NICS)}, pages = {357--359}, year = {2018}, organization = {IEEE}, }