Topological Data Analysis

External organizer

Frederic CHAZAL

Affiliation external organizer

INRIA, Laboratoire de Mathematiques d'Orsay

Country external organizer

France

Email external organizer

frederic.chazal@inria.fr

Local Organizer

Local organizer

Haniya AZAM

Affiliation local organizer

Lahore University of Management Sciences

Country local organizer

Pakistan

Email local organizer

haniya.azam@lums.edu.pk

The main theme of this CIMPA School is Topological Data Analysis (TDA), an emerging field at the intersection of topology and data science that uses tools from Algebraic Topology to understand the “shape” of data. This school will provide participants with both foundational and advanced topics in TDA, along with an introduction to its integration with Machine learning techniques. The program is designed to support participants from diverse backgrounds through a structured sequence of introductory and advanced courses. Core topics include an introduction to TDA and Machine learning, followed by advanced themes including but not limited to Algebraic Persistence, Morse theory, and Topological Deep Learning etc. The school will combine lectures with hands-on lab sessions enabling participants to apply concepts in practice.

Tentative scientific activities (the definitive programme is/will be on the webpage of the event)

Speaker : Pawel DLOTKO (Institut Matematyczny PAN,University of Warsaw,Poland)

In this lecture series, we will explore the field of Topological Data Analysis (TDA), a cutting-edge set of tools designed to quantify and visualize the underlying data typically originated from high-dimensional spaces. TDA's versatility has led to impactful applications in disciplines such as mathematics, physics, biology, medicine, and the social sciences, bridging theoretical innovation with practical problem-solving. We will begin by introducing persistent homology, a cornerstone of TDA, which captures multi-scale topological features in data. This foundational concept will serve as a motivation for certain extensions, such as Euler characteristic curves and profiles, which provide more nuanced perspectives on data structure. The series will also explore the implementation of these topological tools in various domains, with a special focus on their role in statistical hypothesis testing. This will showcase how TDA's robustness allows it to handle complex, noisy datasets with precision and insight. Beyond theory, we will delve into methods for topological visualization, illustrating how concepts from algebraic topology can be leveraged to map and understand the structure of high-dimensional point clouds. Additionally, we will introduce Forman’s discrete Morse theory, a combinatorial approach that complements classical Morse theory, offering valuable techniques for analyzing topological spaces generated from data. The lectures will blend theoretical discussions with hands-on algorithmic demonstrations, featuring practical examples and real-world applications. Software tools will be presented to help bridge the gap between theory and practice, supported by case studies from diverse fields.

Speaker : Martina SCOLAMIERO (KTH Royal Institute of Technology,Sweden)

This is a second course on TDA, we will deepen the study of persistent homology by presenting several distances to compare persistence modules (e.g. Bottleneck and Wasserstein distances).  Persistent homology is a stable descriptor of data, more precisely it is a 1-Lipschitz function from data with appropriate distances to the space of persistence modules with Bottleneck or Wasserstein distances. In this course we will present some stability results. We will also describe common vectorizations for persistence and functional summaries (e.g. persistence landscapes, persistence images) as well as their stability properties. Finally, we will investigate generalizations of persistent homology to the multi-parameter case where multiple measurements are considered at the same time, examples leading to this finer representation of the data as well as ways to encode multi-parameter persistence.

Speaker : Hassan MOHY-UD-DIN (Lahore University of Management Sciences,Pakistan)

In this course, we will begin by disambiguating key terminologies which are now a routine part of many conversations in academic circles and beyond: data science, artificial intelligence, machine learning, and deep learning. We will talk about developing computational learning algorithms with data, the efficacy of learning algorithms, limitations, scalability, and applications. We will discuss a broad classification of learning methods including supervised learning, semi-supervised learning, self-supervised learning, (completely) unsupervised learning, and reinforcement learning. We will also discuss task-based classification of learning algorithms
i.e., regression vs classification. We will discuss fundamental aspects of learning algorithms including training and testing of algorithms, cross-validation, bias-variance trade-off, and regularization. Training a learning algorithm is an optimization problem. Testing/deployment of a learning algorithm is a statistical problem. We will also study gradient-based optimization routines and statistical techniques for evaluation of learned models. Towards the end of the course we will discuss open questions and prospective applications of learning algorithms. The course includes
numerical simulations for illustration of concepts and processes and a lab session for hands-on experience for the participants.

Speaker : Bastian RIECK ( University fo Fribourg,Switzerland)

As we enter the age of ever-larger machine learning models, we likewise observe an increasing necessity to understand their inner workings, as well as epiphenomena like hallucinations, over-smoothing, or adversarial samples. A path towards analyzing these and related epiphenomena requires a strong foundation, which can be provided by concepts from geometry and topology, particularly those arising at the intersections of fields of an algebraic nature (algebraic topology) and fields with a distinct geometric flavor (differential geometry and differential topology). While offering a rich tapestry of different methods and concepts, there is no clear-cut access to these fields—therefore, papers making use of geometry and topology are often considered daunting to read or require substantial effort to go through large amounts of literature. This tutorial aims to change that by providing a principled introduction to state-of-the-art machine learning methods that leverage concepts from geometry and topology. This tutorial will focus primarily on representation learning, outlining novel concepts driving the analysis and classification of point clouds, graphs, or higher-order combinatorial complexes; to be comprehensive, additional application areas beyond representation learning will be pointed out (e.g., recent advances in understanding generalization phenomena using tools from topology). This course provides a foundational introduction to machine learning methods through the lens of geometry and topology. We will explore concepts from algebraic topology and differential geometry, focusing on representation learning and the analysis of point clouds, graphs, and combinatorial complexes. The course will also discuss topological tools for understanding generalization phenomena in machine learning models.

Speaker : Guo-Wei WEI (Michigan State University,United States)

Mathematics underpins fundamental theories in physics such as quantum mechanics, general relativity, and quantum field theory. Nonetheless, its success in modern biology, namely cellular biology, molecular biology, chemical biology, genomics, and genetics, has been quite limited. Artificial intelligence (AI), including deep learning, has fundamentally changed the landscape of science, engineering, and technology in the past decade and holds a great future for discovering the rules of life. However, AI-based biological discovery encounters challenges arising from the intricate complexity, high dimensionality, nonlinearity, and multiscale biological systems. We tackle these challenges by a topological deep learning (TDL) paradigm, which integrates deep learning with algebraic topology-centered differential geometry, geometric topology, and combinatorial Laplacian to significantly enhance AI's ability to tackle biological challenges. Using our TDL approaches, my team has been the top winner in D3R Grand Challenges, a worldwide annual competition series in computer-aided drug design for years. By further integrating TDL with millions of genomes isolated from patients, we uncovered the mechanisms of SARS-CoV-2 evolution and accurately forecast emerging dominant SARS-CoV-2 variants. The course will start with an introduction to Topological Deep Learning.

Speaker : Woojin KIM (Department of Mathematical Sciences,KAIST,Korea,Dem. People´s Rep.)

This lecture series will introduce the algebraic and combinatorial foundations of persistent homology, covering both single-parameter and multi-parameter cases. We will explore the notions of 'barcodes' and 'persistence diagrams’ analogues in the multi-parameter setting which capture topological features from data. Additionally, we will discuss their stability, vectorization methods, and applications. The course will also delve into recent advancements in the field, highlighting innovative applied topology techniques and their implications for data analysis.

Speaker : Vanessa ROBINS (Australian National University,Australia)

Discrete Morse Theory provides a combinatorial approach to studying topological spaces by simplifying their structures without altering their essential features. It assigns Morse functions to simplicial complexes, identifying critical simplices that capture the topology of the space. Through discrete gradient flows, we can obtain reductions that simplify the computation of topological invariants, such as homology. These ideas will be integrated into the framework of Topological Data Analysis which is important and interesting from a mathematical point of view, as well as when applied to study shape of data. The later part of the course will be about a tool from Topological Data Analysis called Mapper. It also helps us in capturing the
shape of high dimensional data by creating simplicial complexes and continuous functions such as height function or density estimators. The connection between Mapper and Discrete Morse Theory arises in their ability to reduce and simplify data for topological analysis. Discrete Morse Theory can be applied to the output simplicial complex from Mapper to further reduce its size and make subsequent computations, like homology or persistence, more efficient. In the end, I will use image analysis and image processing as applications.

Speaker : Ippei OBAYASHI (Okayama University,Japan)

This course will introduce persistent homology, a powerful tool for characterizing the shape of data using the mathematical concept of topology, and its application to materials science. The audience will also learn persistent homology techniques useful in applications such as machine learning and inverse analysis. We will delve into the mathematical foundations of persistent homology, exploring its algebraic and topological properties. Emphasis will be placed on understanding the theoretical topological underpinnings that make persistent homology a robust tool for analysing complex material structures. This approach enables the investigation of correlations between geometric structures and physical properties of materials, providing a quantitative summary of data shapes.

Speaker : Emi MINAMITANI (Osaka University,Japan)

In the field of materials science, understanding the correlation between the complex structures of amorphous and glassy materials and their physical properties has been a long-standing challenge. There is a strong demand for techniques that not only capture the characteristics of these structures but also serve as input for machine learning models. One such method that has gained attention in recent years is persistent homology. In this lecture, after introducing the basics of materials simulations, I will present examples of applications of persistent homology in materials science. This course will build on the foundations laid in Ippei Obayashi’s course. We will emphasize the relevant mathematical foundations of persistent
homology, providing a robust topological framework for analysing the geometric structures of materials and investigating correlations between these structures and their physical properties.

Speaker : Kelin XIA (School of Physical and Mathematical Sciences,Nanyang Technological University. ,Singapore)

Artificial intelligence (AI) based Molecular Sciences have begun to gain momentum due to the great advancement in experimental data, computational power and learning models. However, a major issue that remains for all these AI-based learning models is the efficient molecular representations and featurization. In this course, we will talk about the recently developed advanced mathematics-based molecular representations and featurization. Molecular structures and their interactions are represented by high-order topological and algebraic models (including Rips complex, Alpha complex, Neighbourhood complex, Dowker complex, Hom-complex, Tor-algebra, etc.) Mathematical invariants (from persistent homology, persistent Ricci curvature, persistent spectral, R-Torsion, etc.) are used as molecular descriptors for learning models. Further, we develop geometric and topological deep learning models to systematically incorporate molecular high-order, multiscale, and periodic information, and use them for analyzing molecular data from chemistry, biology, and materials.

Speaker : Guo-Wei WEI (Michigan State University,United States)

This exercise/lab session is supposed to offer implementation for the course on the same topic. Mathematics underpins fundamental theories in physics such as quantum mechanics, general relativity, and quantum field theory. Nonetheless, its success in modern biology, namely cellular biology, molecular biology, chemical biology, genomics, and genetics, has been quite limited. Artificial intelligence (AI), including deep learning, has fundamentally changed the landscape of science, engineering, and technology in the past decade and holds a great future for discovering the rules of life. However, AI-based biological discovery encounters challenges arising from the intricate complexity, high dimensionality, nonlinearity, and multiscale biological systems. We tackle these challenges by a topological deep learning (TDL) paradigm, which integrates deep learning with algebraic topology-centered differential geometry, geometric topology, and combinatorial Laplacian to significantly enhance AI's ability to tackle biological challenges. Using our TDL approaches, my team has been the top winner in D3R Grand Challenges, a worldwide annual competition series in computer-aided drug design for years. By further integrating TDL with millions of genomes isolated from patients, we uncovered the mechanisms of SARS-CoV-2 evolution and accurately forecast emerging dominant SARS-CoV-2 variants. The course will start with an introduction to Topological Deep Learning.

Speaker : Kelin XIA (School of Physical and Mathematical Sciences,Nanyang Technological University. ,Singapore),Guo-Wei WEI (Michigan State University,United States),Emi MINAMITANI (Osaka University,Japan)

This one hour interactive panel discussion will be lead by Guo-wei Wei and will be featuring Kelin Xia and Emi Minamitani (online). We have chosen a mix of speakers based on gender and developing world contexts. The panel will be moderated by local organizer. Note: There was only one way of adding all panelists, each one for the same hour, so the number of hours for the activity shows to be 3 hours whereas it is, as shown as in the time table PDF, a one-hour activity.

Speaker : Emi MINAMITANI (Osaka University,Japan)

Emi Minamitani's lectures should be considered as in addition to Ippei Obayashi's lectures. While her course is around the same theme of application of TDA to materials research, content-wise it deserves to be considered as additional lectures from a different perspective. Ippei Obayashi will lead the exercise/lab session in her stead. Please note that this session is hence not reflected in the time table and only two sessions by Emi Minaminatni's name appear. Other than her lectures she will be part of a panel discussion and another session on grant writing and research opportunities.
This entry is made only to get past the 6 hour hurdle built in the application form.

Speaker : Hassan MOHY-UD-DIN (Lahore University of Management Sciences,Pakistan)

This will allow participants from the region and host country to showcase their own interests in short 15 min presentations. To encourage maximum participation these have been scheduled in each week. There is a session open in the schedule in the first week, which can be utilized for the same if interest exceeds availability of slots during these two sessions.

Speaker : Emi MINAMITANI (Osaka University,Japan)

This session is aimed at participants at diverse career-stages. Grant writing is a part of any academic career and we believe our excellent line-up of instructors can advise participants on this necessary skill. We also want to make use of our resourceful instructors to guide participants to make use of opportunities in the field and excel.

Website of the school

https://sites.google.com/view/tdalahore/home

Info address

Lahore University of Management Sciences (LUMS) | Opposite sector U, DHA, Lahore-Cantt

Pays

Pakistan

Dates

25 January, 2027 - 5 February, 2027

Deadline

25 September, 2026

Language of the school

English

How to participate

For registration and application to a CIMPA financial support, read carefully the instructions given here. If you already know what to do, you can also directly go to the application website, create an account (if necessary) and apply to the school of your choice. Be aware that you will be redirected to an external website.