Universidad
de Puerto Rico, Recinto de Río Piedras
Facultad
de Ciencias Naturales, Departamento de Física
Título: Introduction
to Machine Learning for Scientists
Código: PHYS
6510
Créditos:
3
Profesor: Julian
Velev
Office:
Natural Sciences II, C-346
Email: julian.velev@upr.edu
Hours: Tuesday
& Thursday, 13:00-14:40, C-311
Office
hours: By appointment
Prerequisites: Proficiency in
Python programming (e.g. PHYS 4041), a foundational knowledge of scientific
concepts and data analysis, and a solid grounding in mathematics and statistics.
Course
description: Over the past decade, the tremendous advancement of machine
learning (ML) has made it essential for enhancing productivity and innovation
across multiple domains. Despite this, the scientific community has
predominantly overlooked the potential of ML, due to the preconception that it
is primarily relevant to business, commerce, media, and social networks. This
course aims to bridge this gap by fostering a profound understanding of ML in
the context of scientific applications, underlining its relevance to the world
of scientific inquiry. Furthermore, the course places a strong emphasis on
hands-on experience, enabling students to develop proficiency in the creation
of practical applications in the fields of physics, bioinformatics, and economics.
Textbooks:
(1) Artificial
Intelligence with Python – Joshi (2017)
(2) Introduction to
Machine Learning with Python – Müller & Guido (2017)
(4) Hands-on Machine
Learning with Scikit-Learn, Keras, & Tensorflow – Géron (2019)
(3) Machine Learning with PyTourch & Scikit-Learn – Raschika,
Liu, & Mirjalili (2022)
(5) Online documentation
and resources related to machine learning libraries (e.g., numpy,
scipy, pandas, scikit-learn, tensorflow)
Tentative schedule:
|
Topic |
Reading |
1 |
Preliminaries · Syllabus · Course organization · Course projects |
|
2 |
Introduction to Machine
Learning · Machine
learning fundamentals · Setting
up Python environment · Jupyter Notebooks · Visualization · ML
libraries Hands-on: Python, visualization, and ML
libraries |
MG-1 J-1 RLM-1 G-1,2 |
3 |
Problems and datasets · Materials Science project · DNA sequencing · Financial forecasting Hands-on: Accessing data
sources and data exploration |
|
4 |
Data cleaning and feature engineering
· Categorical data representation
· Categorical data representation
· Numerical data scaling and normalization
· Numerical data transforms
· Feature correlations
· Feature selection (best features) · Feature
engineering (dimensionality reduction) Hands-on: Toy datasets. Perovskite oxides
dataset – feature engineering. |
MG-3,4 J-2 RLM-4,5,8 |
|
Assignment 1: Course
project proposal and data source |
|
5 |
Unsupervised Learning: Clustering · K-Means Clustering · Hierarchical Clustering · DBSCAN Clustering · Gaussian Mixtures Clustering Hands-on: Perovskite oxides dataset – band gap, magnetic moment. |
MG-3 J-4 RLM-10 G-9 |
6 |
Supervised Learning:
Classification · Logistic Regression · k-Nearest Neighbors · Decision Trees, Random Forest · Perception models · Multi-class classification · Model evaluation for classification Hands-on: Perovskite oxide
dataset – metals, semiconductors, insulators. |
MG-2 J-2,3 RLM-2,3,7 G-4,5,6,7 |
7 |
Supervised Learning: Regression · Linear Regression · Polynomial Regression · Decision Trees Regression · Gradient Boosting Regression · Model evaluation metrics for regression Hands-on: Perovskite oxides dataset – band gap prediction. |
MG-2 J-2,3 RLM-9 G-4,5,6,7 |
8 |
Model Evaluation and
Hyperparameter Tuning · Cross-validation · Bias-variance tradeoff · Overfitting and regularization · Hyperparameter optimization · Handling missing data Hands-on: Oxide and DNA
datasets – hyperparameter tuning. |
MG-5 RLM-6 G-6,7 |
|
Assignment 2: Course project data
exploration and feature engineering |
|
9 |
Neural Networks: · Introduction to Artificial Neural Networks (ANN) · Feedforward Neural Networks · Convolutional Neural Networks (CNN) · Encoder-decoder Hands-on: Oxide dataset – band
gap classification and regression. DNA base calling |
J-14,16 RLM-11 G-10,11,14 |
10 |
Special Topics: Time-Series Data Analysis · Time-series fundamentals · Autoregressive Integrated Moving Average (ARIMA) · Introduction to Recurrent Neural Networks (RNNs) · LSTM networks for time-series prediction Hands-on: Volatility in financial markets |
J-11 RLM-15 G-16 |
|
Assignment 3: Final
project presentations |
|
Grading: The grade will be
based on a course project that would finish with an in-class presentation. Collaboration
on the assignments is not allowed unless the project is explicitly assigned to
a group. The project will require the
student to produce source the data, engineer features, train models, and
present the results. The grading scheme is A, B, C, D, F.
RIGHTS
OF STUDENTS WITH DISABILITIES
UPR
complies with all Federal and State Laws and regulations regarding
discrimination, including the Americans with Disabilities Act 1990 (ADA) and
the Commonwealth of Puerto Rico Law 51. Students receiving services through
Rehabilitation Vocational must contact the professor at the beginning of the
semester in order to plan for a reasonable accommodation and any required
support equipment according to the recommendations given by the Oficina de Asuntos para Personas con Impedimentos (OAPI)
of the Dean of Students. Likewise, students with special need that require some
type of accommodation must contact the professor at the beginning of the
semester.
INTEGRIDAD ACADEMICA
La Universidad de Puerto
Rico promueve los más altos estándares de integridad académica y científica. El
artículo 6.2 del Reglamento General de Estudiantes de la UPR (Certificación
Núm. 13, 2009-2010, de la Junta de Síndicos) establece que "la deshonestidad
académica incluye, pero no se limita a: acciones fraudulentas, la obtención de
notas o grados académicos valiéndose de falsas o fraudulentas simulaciones,
copiar total o parcialmente la labor académica de otra persona, plagiar total o
parcialmente el trabajo de otra persona, copiar total o parcialmente las
respuestas de otra persona a las preguntas de un examen, haciendo o
consiguiendo que otro tome en su nombre cualquier prueba o examen oral o
escrito, así como la ayuda o facilitación para que otra persona incurra en la
referida conducta". Cualquiera de estas acciones estará sujeta a sanciones
disciplinarias en conformidad con el procedimiento disciplinario establecido en
el Reglamento General de Estudiantes de la UPR vigente.
ACOMODO RAZONABLE
La Universidad de Puerto
Rico cumple con todas las leyes federales, estatales y reglamentos
concernientes a discriminación, incluyendo "The American Dissabilities Act" (Ley
ADA) y la Ley 51 del Estado Libre Asociado de Puerto Rico. Los estudiantes que
reciban servicios de rehabilitación vocacional deben comunicarse con el (la)
profesor(a) al principio del semestre para planificar el acomodo razonable y
equipo de apoyo necesario conforme a las recomendaciones de la Oficina de
Asuntos para las Personas con Impedimento (OAPI) del Decanato de Estudiantes.
Una solicitud de acomodo razonable no exime al estudiante de cumplir con los
requisitos académicos del curso.
HOSTIGAMIENTO SEXUAL
La Universidad de Puerto
Rico prohíbe el discrimen par razón de sexo y género en todas sus modalidades,
incluyendo el hostigamiento sexual. Según la Política institucional contra el
Hostigamiento Sexual en la Universidad de Puerto Rico, Certificación Núm. 130,
2014-2015 de la Junta de Gobierno, si un estudiante está siendo o fue afectado
por conductas relacionadas a hostigamiento sexual, puede acudir ante la Oficina
de la Procuraduría Estudiantil, el Decanato de Estudiantes o la Coordinadora de
Cumplimiento con Titulo IX para orientación y/o presentar una queja.