Statistical Analysis of Functional Data: modeling and applications to environment and health

External organizer

Jean-François DUPUY

Affiliation external organizer

INSA Rennes

Country external organizer

France

Email external organizer

Jean-Francois.Dupuy@insa-rennes.fr

Local Organizer

Local organizer

Abderrazek KAROUI

Affiliation local organizer

Faculté des Sciences de Bizerte

Email local organizer

abderrazek.karoui@fsb.ucar.tn

Functional Data Analysis (FDA) has become an important field in statistics, facilitating the study of data in the shape of functions, curves, and surfaces. With applications spanning biomedicine, neuroscience, climate science, and engineering, FDA offers robust methods for modeling high dimensional, spatial-temporal, and intricately structured data. While FDA is a widely recognized research field in the Global North, its presence is still limited across Africa. To bridge this gap, we organize the first CIMPA (Centre International de Mathématiques Pures et Appliquées) Research School focused on FDA in Tunisia.

Tentative scientific activities (the definitive programme is/will be on the webpage of the event)

Speaker : Cristian PREDA (University of Lille,France)

This course introduces Principal Component Analysis (PCA) in the framework of
Functional Data Analysis (FDA), where observations are curves, functions, shapes.
The course covers PCA for both continuous functional data (e.g., growth curves, spec-
tra, EEG signals) and categorical functional data (e.g., sequences of categorical states,
symbolic data, medical codes). Emphasis will be placed on theoretical foundations, prac-
tical implementation, and applications in real-world domains. At the end of the second part of this course, the
students will be able to:

• Extend PCA concepts to categorical and mixed functional data using distances and
kernels.
• Interpret functional principal components and scores for visualization and modeling.

References:

[1] Hsing, T., & Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley.
[2] Preda, Cristian, Quentin Grimonprez, and Vincent Vandewalle. "cfda: an R Package for Categorical Functional Data Analysis." (2020).
[3] Saporta, Gilbert, and Ndeye Niang Keita. "Principal component analysis: application to statistical process control." Data analysis (2009): 1-23

Speaker : Michelle CAREY (Univeristy of College Dublin,Ireland)

Modeling complex physical dynamics is a fundamental task in science and en-
gineering. Traditional physics-based models are grounded in first principles, explainable,
and sample-efficient. However, they often rely on strong modeling assumptions and re-
quire expensive numerical integration, which requires significant computational resources
and domain expertise. Functional data analysis (FDA) offers efficient alternatives for
modeling complex dynamics. Nevertheless, as a nonparametric approach, FDA provides
only estimates rather than explicit information about the underlying behavior of the
7
processes. Moreover, its predictions may violate governing physical laws and can be dif-
ficult to interpret. Physics-guided FDA seeks to integrate first-principles knowledge with
data-driven methods, combining the strengths of both approaches. This paradigm is well
positioned to address scientific challenges more effectively. In this course, we describe the
learning pipeline, categorize state-of-the-art methods within this framework, and offer
perspectives on open challenges and emerging opportunities.

References:

[1] Ramsay, James, and Giles Hooker. ”Dynamic data analysis.” Springer New York,
New York, NY. doi 10 (2017): 978-1.

[2] Raissi, Maziar, Paris Perdikaris, and George E. Karniadakis. ”Physics-informed neural
networks: A deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations.” Journal of Computational Physics 378 (2019):
686-707.

[3] Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. ”Physics Informed Deep
Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations.”
arXiv preprint arXiv:1711.10561 (2017).

[4] Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. ”Physics Informed Deep
Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations.”
arXiv preprint arXiv:1711.10566 (2017).

Speaker : Cristian PREDA (University of Lille,France),Afif MASMOUDI (Faculty of Sciences of Sfax,Tunisia)

This introductory course will be devoted to study the different mathematical preliminaries (from linear algebra, spectral theory of random matrices etc.),
needed to develop PCA in the framework of the FDA.

Speaker : Gilbert SAPORTA (Conservatoire National des Arts et Métiers,France),Angelina ROCHE (University Paris Cité

This introductory course will be devoted to study the different mathematical preliminaries (from linear algebra, spectral theory of random matrices etc.), needed to develop PCA in the framework of the FDA.

Speaker : Gilbert SAPORTA (Conservatoire National des Arts et Métiers,France)

Functional data analysis (FDA) is concerned with the statistical analysis of observations that are functions defined on a continuous domain. In the past decade, boosted by increasing technological innovations, FDA has been one of the fastest growing areas of statistics. Motivated by this increasing importance, this course will focus on functional regression, that is, on models where the response variable, or explanatory variables, are functions. The purpose of this paper is to provide a basis for understanding the sophisticated FDA methods covered by the other courses.
The first part of the course will be devoted to regression modeling strategies for functional data. We will first review the main regression models for FDA, such as the so-called functional linear model (scalar response and functional covariate(s), a.k.a. "scalar-on-function" regression model), the function-on-scalar regression model (functional response and scalar covariates), and the function-to-function regression model (both the response and covariate(s) are functional). Some extensions, such as the generalized functional linear model, will also be introduced. Semi-parametric modeling and the basis function approach will be reviewed (kernel methods will be the topic for another course).
Statistical inference with functional data will be investigated in the second part of the course. There, functional ANOVA, testing and confidence intervals in FDA will be investigated. Bayesian regression analysis for functional data will be one of the main topic of this second part. Focusing on the scalar-on-function regression model, prior elicitation will be discussed. Then, several inferential methods will be introduced (MCMC, variational methods).
The course will be illustrated on numerous real datasets (e.g., in the environment, energy, health domains), using R and Python. Detailed investigations of the topics covered in this course will be given as the subject of the group project.

References:
[1] Hsing, T., & Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley.
[2] Preda, Cristian, Quentin Grimonprez, and Vincent Vandewalle. "cfda: an R Package for Categorical Functional Data Analysis." (2020).
[3] Saporta, Gilbert, and Ndeye Niang Keita. "Principal component analysis: application to statistical process control." Data analysis (2009): 1-23

Speaker : Afif MASMOUDI (Faculty of Sciences of Sfax,Tunisia),Angelina ROCHE (University Paris Cité,France)

The first part of this course is devoted to the study of the different mathematical statistics tools that will be frequently used in this course. In particular, we give a comprehensive review of the concepts and the main desirable properties of two different families of popular kernels: The Parzen-Rosenblatt kernels and the positive-definite (or reproducing) kernels. These two types of kernels are widely used as the main tool for different regression estimators, including for the functional regression estimators. Also, we recall some tools from mathematical statistics, such as the concentration inequalities, as well as some tools from the spectral analysis of kernel random matrices. These tools will be used to derive the convergence rates of the studied functional regression estimators.
The second part of this course is devoted to the design of various functional regression estimators, based on the use of Parzen-Rosenblatt kernels, positive-definite kernels as well as random projections. In particular, we study the convergence rates of these functional regression estimators. The last part of this course is devoted to some numerical simulations that illustrate the different studied results.
Note that more extensive numerical simulations with real datasets, that illustrate the performances of these FDA estimators will be given as one the group projects of the workshop.

References:

[1] Ramsay, J. O., & Silverman, B. W. (2005). Functional Data Analysis. Springer.
[2] Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application, 2, 321-359.
[3] Zhu, H., & Fan, J. (2014). A Bayesian approach to functional regression. Annals of Statistics, 42(1), 299-329.
[4] Blei, D. M., Kucukelbir, A., & McAulie, J. D. (2017). Variational inference: A review. Journal of the American Statistical Association, 112(518), 859-877.
[5] Goldsmith, J., Crainiceanu, C., Cao, B., & Reich, D. (2011). Penalized functional regression. Journal of Computational and Graphical Statistics, 20(4), 830-851.
[6] Xu, Y., Wang, L., & Zhu, H. (2016). Bayesian functional regression using Gaussian processes. Biometrics, 72(1), 141-151.

Speaker : Anne-Françoise YAO (University of Clermont Auvergne,,France)

The first part of this course is devoted to the study of the different mathematical statistics tools that will be frequently used in this course. In particular, we give a
comprehensive review of the concepts and the main desirable properties of two different families of popular kernels: The Parzen-Rosenblatt kernels and the positive-definite (or reproducing) kernels. These two types of kernels are widely used as the main tool for different regression estimators, including for the functional regression estimators. Also, we recall some tools from mathematical statistics, such as the concentration inequalities, as well as some tools from the spectral analysis of kernel random matrices. These tools will be used to derive the convergence rates of the studied functional regression estimators. The second part of this course is devoted to the design of various functional regres sion estimators, based on the use of Parzen-Rosenblatt kernels, positive-definite kernels as
well as random projections. In particular, we study the convergence rates of these functional regression estimators. The last part of this course is devoted to some numerical simulations that illustrate the different studied results. Note that more extensive numerical simulations with real datasets, that illustrate the
performances of these FDA estimators will be given as one the group projects of the
workshop.

References:

[1] A. BenSaber and A. Karoui. On some stable linear functional regression estimators based on random projections. Stat. Papers, 65, 4147{4178 (2024).
[2] S. Dabo-Niang, A.F. Yao. Kernel regression estimation for continuous spatial processes.Math. Meth. Stat., 16, 298{317 (2007).
[3] F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice, Springer, New York, (2006).
[4] M. Yuan and T. T Cai. A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression, Ann. Stat., 38 (6), 3412{3444, (2010).

Speaker : Jairo CUGLIARI DUHALDE (Univeristy Lumière 2,France)

This course introduces clustering methods for functional data. Classical clustering algorithms (such as k-means, hierarchical clustering, and mixture models) are extended to infinite-dimensional settings. Both continuous functional data (e.g., growth curves, EEG signals, temperature series) and categorical or symbolic functional data (e.g., state sequences, linguistic or medical trajectories) are considered. The course emphasizes theoretical foundations, practical algorithms, and applications. At the end of the course, students will be able to:
- Understand the challenges of clustering in Hilbert spaces and for discretized functional data.
- Apply distance-based and model-based clustering methods for functional data.
- Extend clustering methods to categorical and mixed functional data.
- Use clustering for exploratory analysis, feature extraction, and classification
- Implement functional clustering methods with R or Python software.

References:
[1] Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Computational Statistics & Data Analysis, 71, 92{106.
[2] Bouveyron, C., Jacques, J., Girard, S., & Karlis, D. (2011). Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5, 281{300.
[3] Cugliari, J, Goude, Y and Poggi, J-M. (2016). Disaggregated electricity forecasting using wavelet-based clustering of individual consumers. 2016 IEEE
International Energy Conference (ENERGYCON). IEEE.

Speaker : Ibtissem HDHIRI (Faculty of Sciences of Gabès,Tunisia),Khalil MASMOUDI (Faculty of Sciences of Sfax,Mohamed JEBALIA (National Engineering School of Bizerte,Sameh KESSENTINI (Faculty of Sciences of Sfax

This work will be a group activity, devoted to research projects in relation with the various courses. This activity will be supervised by members of the teaching team.

Website of the school

https://cimpafda.sciencesconf.org

Info address

Faculté des Sciences de Bizerte | Faculté des Sciences de Bizerte

Pays

Tunisia

Dates

8 November, 2027 - 20 November, 2027

Deadline

8 July, 2027

Language of the school

English

How to participate

For registration and application to a CIMPA financial support, read carefully the instructions given here. If you already know what to do, you can also directly go to the application website, create an account (if necessary) and apply to the school of your choice. Be aware that you will be redirected to an external website.