Logo CIMPA

Statistics for health

External organizer

External organizer
Valérie GARES
Affiliation external organizer
INRIA
Country external organizer
France
Email external organizer
valerie.gares@inria.fr

Local Organizer

Local organizer
Menedore Karimumuryango
Affiliation local organizer
University of Burundi
Country local organizer
Burundi
Email local organizer
kmenedore@gmail.com

<div class="tex2jax_process">This school focuses on modern statistical and data science methods to analyze and understand diseases. Topics include the identification of risk factors for a disease, its prevalence and incidence, statistical analysis of historical data, and prediction of its future spread.

Organized by the Institute of Applied Statistics of the University of Burundi, this CIMPA School provides both theoretical foundations and practical applications of statistical techniques.

The program covers a wide range of topics, including:

- Descriptive and inferential statistics,
- Survival analysis,
- Statistical learning (supervised and unsupervised),
- Statistical Analysis of Functional Data,
- Time series modeling,
- Geographical Information System (GIS) and statistical analysis,
- Deep Learning and Transformer Models,
- and Compartmental modeling in epidemiology.

The school is designed to:

- Provide a solid foundation in statistical methods for epidemiology,
- Equip participants with practical tools to analyze and model infectious diseases,
- Bridge the gap between statistics and medical research,
- Support research on Mpox and other public health challenges,
- Foster interdisciplinary collaboration.

Participants will gain both theoretical knowledge and practical skills using tools such as R, Python, and SageMath. Dedicated sessions will focus on real datasets and local public health challenges, encouraging collaboration and potential research outcomes in epidemiology.</div>

Tentative scientific activities (the definitive programme is/will be on the webpage of the event)

Speaker : Freedath DJIBRIL MOUSSA (Université d’Abomey-Calavi,Benin)

Time series are used to represent random phenomena evolving over time in countless fields, including public health and epidemiology. We will present the time series models with an emphasis on statistical estimation and forecasting methods and dedicated tools in R software. Applications will emphasize health-related data, such as disease incidence trends, hospital admissions, or mortality rates.

Speaker : Audrey LAVENU (Université de Rennes 1,France)

After this course, students will be able to deal with standard probabilistic tools for survival analysis and to select the appropriate statistical models (parametric, semiparametric, nonparametric, and machine learning-based) suitable for survival data. We first introduce censored failure times and the probabilistic foundations of survival analysis (survival function, hazard rate, cumulative hazard). We then present nonparametric methods for estimating the survival curve and the hazard function (Kaplan–Meier and Nelson–Aalen estimators, respectively). The Logrank test is developed to compare survival distributions between groups. We next discuss semiparametric regression models for censored data, focusing on the proportional hazards model. Beyond these classical approaches, the course introduces modern machine learning methods for survival analysis, including: survival trees and Random Survival Forests (RSF), Boosting methods for survival prediction, Support Vector Machines adapted to censored data, Neural network approaches (e.g. DeepSurv, DeepHit, Cox-nnet). All methods are illustrated with artificial or real datasets. Special emphasis is placed on applications in health, such as analyzing patient survival or disease progression in clinical and epidemiological studies. The practical sessions will be carried out in R software, using dedicated packages for both classical and modern approaches.

Speaker : Freedath DJIBRIL MOUSSA (Université d’Abomey-Calavi,Benin)

Time series are used to represent random phenomena evolving over time in countless fields, including public health and epidemiology. We will present the time series models with an emphasis on statistical estimation and forecasting methods and dedicated tools in R software. Applications will emphasize health-related data, such as disease incidence trends, hospital admissions, or mortality rates.

After this course, students will be able to deal with standard probabilistic tools for survival analysis and to select the appropriate statistical models (parametric, semiparametric, nonparametric, and machine learning-based) suitable for survival data. We first introduce censored failure times and the probabilistic foundations of survival analysis (survival function, hazard rate, cumulative hazard). We then present nonparametric methods for estimating the survival curve and the hazard function (Kaplan–Meier and Nelson–Aalen estimators, respectively). The Logrank test is developed to compare survival distributions between groups. We next discuss semiparametric regression models for censored data, focusing on the proportional hazards model. Beyond these classical approaches, the course introduces modern machine learning methods for survival analysis, including: survival trees and Random Survival Forests (RSF), Boosting methods for survival prediction, Support Vector Machines adapted to censored data, Neural network approaches (e.g. DeepSurv, DeepHit, Cox-nnet). All methods are illustrated with artificial or real datasets. Special emphasis is placed on applications in health, such as analyzing patient survival or disease progression in clinical and epidemiological studies. The practical sessions will be carried out in R software, using dedicated packages for both classical and modern approaches.

Speaker : Nkunzimana ATHANASE (University of Burundi,Burundi)

The aims to provide theoretical and practical knowledge of statistical and spatial analysis of datasets from different institutions such as INSBU, IGEBU, ISABU, REGIDESO, ARB, administrative services, etc. Nowadays, GIS is a very important tool in spatial analysis, and spatial management and very useful for policymakers and decision-makers in all sectors. The course mainly focuses on data collection, data display, data manipulation, data transformation, data conversion, data visualization and dissemination of information. Different statistical methods will be taught such as data interpolation with different methods (IDW, spline, Kriging, Natural neighbour… ), autocorrelation, regression, suitability, modelling, and simulation. Finally, learners will be taught about using GIS statistics in land use and services management (land use and land cover). The second course aims is to provide knowledge on the basic tools of applied statistics in public health, epidemiology, nutrition, agriculture and the environment

Speaker : Valérie Monbet (Université de Rennes 1,France)

Objectives of this course are to make students acquainted with classical tools for statistical learning and decision-making and with modern techniques for high-dimensional data. Several methods will be presented : - supervised learning: (generalized) linear regression, supervised and unsupervised clustering, discriminant analysis, decision trees, variable selection in high-dimensional settings (penalized methods), parametric regression using kernel, splines and polynomials-basis functions Model averaging, support vector machine (SVM) and neural networks.* - unsupervised learning: K-means, Density-Based Spatial Clustering of Applications with Noise, Mixture models, Hierarchical Ascendant Clustering All methods are illustrated with artificial or real data sets. The practical cession will be realized on R software.

Speaker : Valérie Monbet (Université de Rennes 1,France)

Objectives of this course are to make students acquainted with classical tools for statistical learning and decision-making and with modern techniques for high-dimensional data. Several methods will be presented : - supervised learning: (generalized) linear regression, supervised and unsupervised clustering, discriminant analysis, decision trees, variable selection in high-dimensional settings (penalized methods), parametric regression using kernel, splines and polynomials-basis functions Model averaging, support vector machine (SVM) and neural networks. - unsupervised learning: K-means, Density-Based Spatial Clustering of Applications with Noise, Mixture models, Hierarchical Ascendant Clustering All methods are illustrated with artificial or real data sets. The practical cession will be realized on R software.

Speaker : Remi Servien (INRAE,France)

This course offers an introduction to Functional Data Analysis (FDA), with a particular focus on applications in health. Functional data—such as growth curves, biomedical signals, or longitudinal clinical measurements—are increasingly common nowadays in medical studies. The aim of the course is to provide students with fundamental tools to model, visualize, and interpret this type of data. The first part of the course will cover some core concepts: functional representations, smoothing techniques, basis functions (Fourier, splines), and measures of similarity between curves. We will then explore key analysis methods including Functional Principal Component Analysis (FPCA), functional regression, and classification or clustering techniques adapted to functional data. Practical work in R will allow students to apply these methods to real datasets from the health framework.

Speaker : Remi Servien (INRAE,France)

This course offers an introduction to Functional Data Analysis (FDA), with a particular focus on applications in health. Functional data—such as growth curves, biomedical signals, or longitudinal clinical measurements—are increasingly common nowadays in medical studies. The aim of the course is to provide students with fundamental tools to model, visualize, and interpret this type of data. The first part of the course will cover some core concepts: functional representations, smoothing techniques, basis functions (Fourier, splines), and measures of similarity between curves. We will then explore key analysis methods including Functional Principal Component Analysis (FPCA), functional regression, and classification or clustering techniques adapted to functional data. Practical work in R will allow students to apply these methods to real datasets from the health framework.

Speaker : Audrey LAVENU (Université de Rennes 1,France)

In epidemiology, the Susceptible-Infectious-Removed (SIR) models provide a suitable modeling framework to describe the transmission dynamics of infectious diseases. These models are an important tool for uncovering the mechanisms driving the observed disease dynamics, assessing potential control strategies, and predicting future epidemic outbreaks. The course provides a basic introduction to the development of SIR-type models. Common issues encountered during the structuring and analysis of these models will be highlighted. A general introduction to compartmental models (SI, SIS, SIRS, SEIR,...) will be given to show that the model should be chosen according to the circumstances, the type of disease, and populations (multiple populations with different sensitivities to the disease or with geographical distributions), lethal or curable diseases (complete or temporary), vector-borne diseases.Like mosquitoes, spatial dispersions or networks. The objectives of the course are: - To describe the main compartmental models in epidemiology - To simulate epidemics using Susceptible-Infectious -Removed (SIR) models - To interpret the different parameters for the SIR model and its derivatives to guide public health decision-making. - To translate epidemic spread hypotheses into a system of differential equations - To simulate stochastic models and represent average behavior - To simulate strategies to combat epidemics and measure their effectiveness.

Speaker : Bachir KADDAR (University of Science and Technology of Oran,Burundi)

This course provides a cutting-edge introduction to the application of modern deep learning—especially Transformer-based models—in domains that are traditionally driven by biostatistics and epidemiological modeling. It addresses the increasing demand for sophisticated modeling techniques that can handle high-dimensional, non-linear, and temporally structured data, particularly in the contexts of disease surveillance, patient outcome prediction, and public health policy evaluation. The course begins with the fundamentals of time series forecasting using neural networks, transitioning from classical models (ARIMA, VAR) to recurrent neural networks (LSTM, GRU), and then to advanced architectures like Temporal Fusion Transformers (TFT) and Transformer encoders tailored for multivariate health time series. These models are applied to problems such as forecasting disease incidence, hospital admissions, and healthcare resource utilization. Students then explore how deep learning can enhance epidemiological models used to evaluate intervention strategies, such as vaccination campaigns or treatment programs. Neural approximations to classical models, including hybrid approaches where clinical and demographic constraints are embedded into deep networks, allow for faster simulations and greater adaptability to policy scenarios involving multiple interacting populations (patients, healthcare providers, government, community). The course also introduces deep survival analysis, starting from traditional tools such as Kaplan–Meier and Cox models, and progressing toward DeepSurv, DeepHit, and TransformerSurv architectures. These are applied to real-world use cases like patient survival, treatment effectiveness, and progression of chronic diseases. In parallel, students are introduced to the principles of statistical learning and interpretability, using attention-based models for tabular and structured health data (e.g., TabTransformer, SAINT) and explainability tools like SHAP and attention heatmaps to make model predictions transparent and clinically relevant. Later sessions connect this knowledge to population health and epidemiological statistics. Students learn how Transformer models and graph-based neural networks can be used for integrating heterogeneous health data sources, forecasting outbreaks, and supporting public health decision-making. Emphasis is placed on multimodal data fusion, uncertainty quantification, and using AI to support evidence-based health policies and resource allocation. The final part of the course focuses on deploying these models for decision-support dashboards, using tools such as FastAPI, Docker, and Streamlit, to provide health authorities with accessible, real-time insights. Throughout the course, practical exercises and a final project help students consolidate their skills. These include building deep learning models for disease incidence forecasting, survival prediction under treatment programs, or Transformer-based systems for monitoring health indicators. Tools used include TensorFlow, PyTorch, Hugging Face Transformers, PySurvival, and R for statistical baselines. This course is intended for graduate students, researchers, and professionals in public health, epidemiology, and biostatistics who want to bridge the gap between statistical foundations and powerful, explainable AI.

Speaker : Bachir KADDAR (University of Science and Technology of Oran,Burundi)

This course provides a cutting-edge introduction to the application of modern deep learning—especially Transformer-based models—in domains that are traditionally driven by biostatistics and epidemiological modeling. It addresses the increasing demand for sophisticated modeling techniques that can handle high-dimensional, non-linear, and temporally structured data, particularly in the contexts of disease surveillance, patient outcome prediction, and public health policy evaluation. The course begins with the fundamentals of time series forecasting using neural networks, transitioning from classical models (ARIMA, VAR) to recurrent neural networks (LSTM, GRU), and then to advanced architectures like Temporal Fusion Transformers (TFT) and Transformer encoders tailored for multivariate health time series. These models are applied to problems such as forecasting disease incidence, hospital admissions, and healthcare resource utilization. Students then explore how deep learning can enhance epidemiological models used to evaluate intervention strategies, such as vaccination campaigns or treatment programs. Neural approximations to classical models, including hybrid approaches where clinical and demographic constraints are embedded into deep networks, allow for faster simulations and greater adaptability to policy scenarios involving multiple interacting populations (patients, healthcare providers, government, community). The course also introduces deep survival analysis, starting from traditional tools such as Kaplan–Meier and Cox models, and progressing toward DeepSurv, DeepHit, and TransformerSurv architectures. These are applied to real-world use cases like patient survival, treatment effectiveness, and progression of chronic diseases. In parallel, students are introduced to the principles of statistical learning and interpretability, using attention-based models for tabular and structured health data (e.g., TabTransformer, SAINT) and explainability tools like SHAP and attention heatmaps to make model predictions transparent and clinically relevant. Later sessions connect this knowledge to population health and epidemiological statistics. Students learn how Transformer models and graph-based neural networks can be used for integrating heterogeneous health data sources, forecasting outbreaks, and supporting public health decision-making. Emphasis is placed on multimodal data fusion, uncertainty quantification, and using AI to support evidence-based health policies and resource allocation. The final part of the course focuses on deploying these models for decision-support dashboards, using tools such as FastAPI, Docker, and Streamlit, to provide health authorities with accessible, real-time insights. Throughout the course, practical exercises and a final project help students consolidate their skills. These include building deep learning models for disease incidence forecasting, survival prediction under treatment programs, or Transformer-based systems for monitoring health indicators. Tools used include TensorFlow, PyTorch, Hugging Face Transformers, PySurvival, and R for statistical baselines. This course is intended for graduate students, researchers, and professionals in public health, epidemiology, and biostatistics who want to bridge the gap between statistical foundations and powerful, explainable AI.

Speaker : Audrey LAVENU (Université de Rennes 1,France)

In epidemiology, the Susceptible-Infectious-Removed (SIR) models provide a suitable modeling framework to describe the transmission dynamics of infectious diseases. These models are an important tool for uncovering the mechanisms driving the observed disease dynamics, assessing potential control strategies, and predicting future epidemic outbreaks. The course provides a basic introduction to the development of SIR-type models. Common issues encountered during the structuring and analysis of these models will be highlighted. A general introduction to compartmental models (SI, SIS, SIRS, SEIR,...) will be given to show that the model should be chosen according to the circumstances, the type of disease, and populations (multiple populations with different sensitivities to the disease or with geographical distributions), lethal or curable diseases (complete or temporary), vector-borne diseases.Like mosquitoes, spatial dispersions or networks. The objectives of the course are: - To describe the main compartmental models in epidemiology - To simulate epidemics using Susceptible-Infectious -Removed (SIR) models - To interpret the different parameters for the SIR model and its derivatives to guide public health decision-making. - To translate epidemic spread hypotheses into a system of differential equations - To simulate stochastic models and represent average behavior - To simulate strategies to combat epidemics and measure their effectiveness.

Speaker : Nkunzimana ATHANASE (University of Burundi,Burundi)

The aims to provide theoretical and practical knowledge of statistical and spatial analysis of datasets from different institutions such as INSBU, IGEBU, ISABU, REGIDESO, ARB, administrative services, etc. Nowadays, GIS is a very important tool in spatial analysis, and spatial management and very useful for policymakers and decision-makers in all sectors. The course mainly focuses on data collection, data display, data manipulation, data transformation, data conversion, data visualization and dissemination of information. Different statistical methods will be taught such as data interpolation with different methods (IDW, spline, Kriging, Natural neighbour… ), autocorrelation, regression, suitability, modelling, and simulation. Finally, learners will be taught about using GIS statistics in land use and services management (land use and land cover). The second course aims is to provide knowledge on the basic tools of applied statistics in public health, epidemiology, nutrition, agriculture and the environment.

Speaker : Florence MUNEZERO (University of Burundi,Burundi)

The course aims to provide knowledge on the fundamental tools of applied statistics in the fields of public health, epidemiology, nutrition, agriculture, and the environment. It covers essential descriptive statistics and statistical tests for data analysis, as well as regression-based modeling approaches such as Generalized Linear Models (GLM). Applications will be illustrated using real-world datasets, with an emphasis on interpretation and critical evaluation of results.

Speaker : Florence MUNEZERO (University of Burundi,Burundi)

The course aims to provide knowledge on the fundamental tools of applied statistics in the fields of public health, epidemiology, nutrition, agriculture, and the environment. It covers essential descriptive statistics and statistical tests for data analysis, as well as regression-based modeling approaches such as Generalized Linear Models (GLM). Applications will be illustrated using real-world datasets, with an emphasis on interpretation and critical evaluation of results.

Speaker : Freedath DJIBRIL MOUSSA (Université d’Abomey-Calavi,Benin),Florence MUNEZERO (University of Burundi,Burundi),Bachir KADDAR (University of Science and Technology of Oran,Remi Servien (INRAE,France),Valérie Monbet (Université ...

All the teachers will be involved in this project, working with the students on their real datasets and research problems. Together, they will identify the most appropriate analytical methods for the questions at hand, write the code needed to carry out the analysis, and take a first look at the results. For participants who do not have their own data, example datasets and problems will be provided. This project may also lead to collaborations and epidemiological papers addressing real research questions linked to local issues.

This course will address several ethical questions related to artificial intelligence algorithms. Ethical considerations in algorithmic decision-making will be explored through a guided “arpentage” reading of Cathy O’Neil’s book Weapons of Math Destruction. Then, a practical session will focus on evaluating the fairness of algorithms and exploring methodological approaches to ensure fairness in algorithmic decision-making..

Info address
University of Burundi | Avenue de l’UNESCO n°2
Pays
Burundi
Dates
-
Deadline
Language of the school
English

How to participate

For registration and application to a CIMPA financial support, read carefully the instructions given here. If you already know what to do, you can also directly go to the application website, create an account (if necessary) and apply to the school of your choice. Be aware that you will be redirected to an external website.