Second-order step-size tuning of SGD for non-convex optimization

Camille Castera; Cédric Févotte; Jérôme Bolte; Edouard Pauwels

doi:10.1007/s11063-021-10705-5

Article Dans Une Revue Neural Processing Letters Année : 2022

Second-order step-size tuning of SGD for non-convex optimization

(1, 2) , (1, 2) , (3) , (4)

1
2
3
4

Camille Castera

Fonction : Auteur
PersonId : 175473
IdHAL : camille-castera
ORCID : 0000-0002-7384-6387

Signal et Communications

Centre National de la Recherche Scientifique

Cédric Févotte

Fonction : Auteur
PersonId : 184864
IdHAL : cedric-fevotte
ORCID : 0000-0003-3801-5534
IdRef : 083298460

Signal et Communications

Centre National de la Recherche Scientifique

Jérôme Bolte

Fonction : Auteur
PersonId : 995617

Toulouse School of Economics

Edouard Pauwels

Fonction : Auteur
PersonId : 12830
IdHAL : edouard-pauwels
ORCID : 0000-0002-8180-075X

Argumentation, Décision, Raisonnement, Incertitude et Apprentissage

Résumé

In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.

Mots clés

Non-convex optimization Deep learning Stochastic optimization Adaptive methods Mini-batch algorithms

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC]

Fichier principal

2103.03570v2.pdf (1.47 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Camille Castera : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03161775

Soumis le : mardi 23 novembre 2021-09:09:21

Dernière modification le : mardi 19 mars 2024-03:10:49

Dates et versions

hal-03161775 , version 1 (08-03-2021)

hal-03161775 , version 2 (23-11-2021)

Identifiants

HAL Id : hal-03161775 , version 2
ARXIV : 2103.03570
DOI : 10.1007/s11063-021-10705-5

Citer

Camille Castera, Cédric Févotte, Jérôme Bolte, Edouard Pauwels. Second-order step-size tuning of SGD for non-convex optimization. Neural Processing Letters, 2022, pp.1--26. ⟨10.1007/s11063-021-10705-5⟩. ⟨hal-03161775v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS EHESS UT1-CAPITOLE TDS-MACS INRAE IRIT IRIT-SC IRIT-ADRIA ANR ANITI IRIT-SI IRIT-IA CIMI-TOULOUSE IRIT-CNRS IRIT-UT3 INTERACTIFS TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

245 Consultations

274 Téléchargements

Second-order step-size tuning of SGD for non-convex optimization

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager