ID:
DSM08
Tipo Insegnamento:
Obbligatorio
Durata (ore):
48
CFU:
6
SSD:
STATISTICA
Url:
DATA SCIENCE AND MANAGEMENT/BASE Anno: 2
Anno:
2023
Dati Generali
Periodo di attività
Primo Semestre (11/09/2023 - 02/12/2023)
Syllabus
Obiettivi Formativi
As datasets grow to Petabyte scale, traditional analysis models and computation paradigms become obsolete. The course will focus on fundamental algorithmic, statistical, and programming issues posed by big-data analytics, tackling major problems and techniques for extracting knowledge from massive amounts of data. By the end of the course the students will gain an understanding of the theory and computing of modern methods for big data analytics, with particular emphasis on advanced statistical methods and algorithms for mining massive and noisy datasets as well as rapidly changing streams of data.
Prerequisiti
Basic knowledge of probability and statistics, fundamental algorithms, and computer programming skills (knowledge of Python is required for the project work).
Metodi didattici
The course consists of traditional lectures complemented by hands-on lab sessions and industrial testimonials, that will guide the students on the use of good analytics practices and industry-standard practices.
Verifica Apprendimento
There will be three written intermediate tests (highly recommended) and a group software project with a final presentation. Students not taking the intermediate tests will be required to take a written exam in one of the standard sessions.
Testi
Mining of Massive Datasets. HYPERLINK "https://www.amazon.com/s/ref=dp_byline_sr_ebooks_1?ie=UTF8&field-author=Jure+Leskovec&text=Jure+Leskovec&sort=relevancerank&search-alias=digital-text" Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. Second edition.
Apache SparkTM manual.
Lecture notes, research papers, and other course material made available on the e-learning platform.
Apache SparkTM manual.
Lecture notes, research papers, and other course material made available on the e-learning platform.
Contenuti
Defining big data: the five V's. From big data to actionable insights: smart data.
Data sources, statistical features of big data.
Pattern discovery.
Sampling and estimators: traditional approaches, data stream algorithmics.
Predictive analytics, recommender systems.
Analytics at scale in Apache SparkTM: batch and streaming data analytics, SQL analytics.
Data sources, statistical features of big data.
Pattern discovery.
Sampling and estimators: traditional approaches, data stream algorithmics.
Predictive analytics, recommender systems.
Analytics at scale in Apache SparkTM: batch and streaming data analytics, SQL analytics.
Risultati di Apprendimento Attesi
Knowledge and understanding:
Upon successful completion of the course, the students will be familiar with data mining problems and techniques, advanced statistical methods, as well as computational models and frameworks for analyzing and extracting insights from massive, possibly distributed or rapidly changing amounts of data at a large scale.
Applying knowledge and understanding:
After this course, the students will be able to proficiently develop innovative big data solutions, based on sound statistical and algorithmic techniques, in different application domains. They will be also able to implement the proposed solutions on top of industry-standard frameworks, e.g., Apache SparkTM, in order to tackle real-world problems such as those typically faced by big tech companies.
Making judgements:
Throughout the entire course, students will be invited to assess critically strengths and weaknesses of all the different methods and tools presented in class. After this course, they will be able to analyze different solutions to big data problems and to demonstrate an in-depth, critical understanding of the scope and challenges of different data-driven analytics techniques.
Communication skills:
This course will give the students the possibility to acquire and to understand major terms and concepts so as to communicate effectively their ideas, findings, proposals, analysis, and critical reasoning in the area of data-driven analytics. A special emphasis will be given to oral presentations and pitches in project group works.
Learning skills:
This course will provide the students with the ability to learn cutting-edge design and analysis tools and to apply them to real-world data analytics problems. The method of study will make the students able to break down complex problems arising in specific applications into manageable pieces and to apply different patterns in order to design rigorous and documentable solutions. A strong emphasis will be given to the direct application of the techniques and tools covered in this course to complex problems that are typical of today’s data-driven industry.
Upon successful completion of the course, the students will be familiar with data mining problems and techniques, advanced statistical methods, as well as computational models and frameworks for analyzing and extracting insights from massive, possibly distributed or rapidly changing amounts of data at a large scale.
Applying knowledge and understanding:
After this course, the students will be able to proficiently develop innovative big data solutions, based on sound statistical and algorithmic techniques, in different application domains. They will be also able to implement the proposed solutions on top of industry-standard frameworks, e.g., Apache SparkTM, in order to tackle real-world problems such as those typically faced by big tech companies.
Making judgements:
Throughout the entire course, students will be invited to assess critically strengths and weaknesses of all the different methods and tools presented in class. After this course, they will be able to analyze different solutions to big data problems and to demonstrate an in-depth, critical understanding of the scope and challenges of different data-driven analytics techniques.
Communication skills:
This course will give the students the possibility to acquire and to understand major terms and concepts so as to communicate effectively their ideas, findings, proposals, analysis, and critical reasoning in the area of data-driven analytics. A special emphasis will be given to oral presentations and pitches in project group works.
Learning skills:
This course will provide the students with the ability to learn cutting-edge design and analysis tools and to apply them to real-world data analytics problems. The method of study will make the students able to break down complex problems arising in specific applications into manageable pieces and to apply different patterns in order to design rigorous and documentable solutions. A strong emphasis will be given to the direct application of the techniques and tools covered in this course to complex problems that are typical of today’s data-driven industry.
Criteri Necessari per l'Assegnazione del Lavoro Finale
The final work will be assigned (upon specific request to the instructor) to students who demonstrate a serious and motivated interest in the course topics.
Corsi
Corsi
DATA SCIENCE AND MANAGEMENT
Laurea Magistrale
2 anni
No Results Found
Persone
Persone (3)
Altro personale docente
Docenti di ruolo di Ia fascia
Titolare di Vetrya Chair in Machine Learning and Artifical Intelligence
No Results Found