Machine learning for particle physics using R

Andrew John Lowe

3 September 2016

Introduction: about this talk

  • I’m a particle physicist, programmer, and aspiring data scientist
    • Worked for 10 years on the development of core software and algorithms for a multi-stage cascade classifier that processes massive data in real-time (60 TB/s)
    • Previously a member of team that discovered the Higgs boson
    • I now work on using advanced machine learning techniques to develop classification algorithms for recognising subatomic particles based on their decay properties
  • I’m going to talk about how switching to R has made it easier for me to ask more complex questions from my data than I would have been able to otherwise

What is particle physics?

  • The study of subatomic particles and the fundamental forces that act between them
  • Present-day particle physics research represents man’s most ambitious and organised effort to answer the question: What is the universe made of?
  • We have an extremely successful model that was developed throughout the mid to late 20th century
  • But many questions still remain unanswered
    • Doesn’t explain: gravity, identity of dark matter, neutrino oscillations, matter/antimatter asymmetry of universe …
  • To probe these mysteries, we built the Large Hadron Collider

LHC data flow

  1. Detected by LHC experiment
  2. Online multi-level filtering (hardware and software)
  3. Transferred to Worldwide Computing Grid, processed in stages:
    • Tier-0: CERN (Geneva) and Wigner RCP (Budapest)
    • Tier-1: about a dozen large data centres located worldwide
    • Tier-2: institute and university clusters
  4. Users run large analysis jobs on the Grid
  5. Data written to locally-analysable files, put on PCs
  6. Turned into plot in a paper