STATISTICAL METHODS and TECHNIQUES for DATA ANALYSIS

Professor: Alexis Pompili
(University of Bari Aldo Moro & I.N.F.N.-Bari)

Mini-course / 6-13 May 2025 / NKUA Department of Physics (@ Zografou Campus)
Erasmus+ Teaching Mobility (Erasmus+ funds by the consortium TUCEP )

This mini-course is an introduction to the statistical data analysis
with examples of applications borrowed by High Energy Physics.

It cannot be, of course, an all-encompassing course, but concepts are introduced
and used coherently and some selected topics are treated in a detailed way within an hands-on approach.

For this reason the mini-course is addressed not only to Master students
and students that are doing a research thesis in HEP but also to Ph.D. students.

Hands-on exercises are carried out within a Jupyter Notebook executed in the Google Colab framework.

Local organizers: P. Sphicas


Compact description of contents and topics of the course

Basic concept of the theory of Probability. Axiomatic probability and the role of Bayes theorem.
Histograms: sampling and binning. Hystograms' comparison: absolute and relative normalization, stacked plots, data-to-simulation comparison, data-to-data comparison.
Histograms ratio and uncertainties.

Probability density functions and their features. Joint and conditional probabilities.
Dependence and correlation between observables. Covariance matrix. Variance propagation.

Generation of distributions. Binomial distribution and efficiency. Stochastic (Poissonian) processes and applicability of the Poissonian distribution.
Gaussian function and its role in the Central Limit Theorem. Gaussian resolution function
Other important distributions (Crystal Ball, Breit-Wigner, chi-squared).

Point estimation theory. Maximum Likelihood fitting, binned and unbinned, extended.
Symmetric and asymmetric uncertainties, Profile Likelihood.
Fitting tasks within a Jupyter notebook. Background modelization with different polynomia; sidebands subtraction method.

Python framework and Jupyter notebook. Uproot and RDataFrame to handle big data.
Extraction of a physical signal from big data; evaluation of signal significance, signal purity and signal-to-noise ratio.

Hypothesis testing: test statistic, discrimination of signal against background, ROC curve and choice of a suitable Working Point.

Note: all items are covered by hands-on examples/exercises - executed on Google COLAB platform - borrowed by High Energy Physics best practices.


Temptative Agenda

(work-in-progress; material will be added during the course)

- Tuesday 6 / 2 hours theory/seminar

- Wednesday 7 / 2 hours theory/seminar + 2 hours hands-on/exercises

- Thursday 8 / 2 hours theory/seminar + 2 hours hands-on/exercise

- Friday 9 / 2 hours theory/seminar + 2 hours hands-on/execise

- Monday 12 / 2 hours theory/seminar + 2 hours hands-on/exercises

- Tuesday 13 / 4 (2+2) hours hands-on/exercise

In total, the mini-course consists of 22 hours (with 10 hours for seminars + 12 hours for hands-on activities)

Copyright: all the material of this course could be used only under permission of the author (pompili AT ba.infn.it) and with proper acknowledgment.