2024年1月12日发(作者:)

《数据科学》硕士专业设置
俞梦怡 14723396
专业(方向)名称:Data Science
学位名称: professional Master of Information and Data Science (MIDS)
信息和数据科学专业硕士
级别:master 硕士
所属院系:The UC Berkeley School of Information (I school) 信息学院
所属学校:加州大学伯克利分校
网址:/
专业介绍:Designed by I School faculty, our curriculum is multidisciplinary. You will
bring together a range of methods to define a research question; to gather,
store, retrieve, and analyze data; to interpret results; and to convey
findings effectively. Using the latest tools and practices, you will identify
patterns in and gain insights from complex data sets.
由信息学院的教师设计,课程是多学科的。你将使用一系列方法来定义一个研究问题:去收集、存储、检索和分析数据,去解释结果并有效地传达发现。采用最新的工具和实践,你会识别模式,并从复杂的数据集中获得见解。
专业培养目标:train leaders in the ever-evolving field of data science
培养在数据科学领域的领导人
专业培养方案:The program focuses on problem solving, preparing you to creatively
apply methods of data collection, analysis, and presentation to solve
the world’s most challenging problems.
侧重于问题解决,帮助你准备创造性地运用数据的收集、分析和图像的方法来解决世界上最具挑战性的问题。
学生背景要求: 1. A bachelor’s degree 学士学位
2. Test scores 考试成绩(GRE/GMAT/TOEFL)
3. A high level of quantitative ability 高层次的定量能力
4. A problem-solving mindset 解决问题的思维方式
5. A working knowledge of fundamental concepts
基本概念的应用知识
6. The ability to communicate effectively 有效的沟通能力
7. Programming proficiency 编程能力
学分:27学分(九门课)
完成时间:5 terms,20 months
五个学期,20个月
授课方式:The UC Berkeley School of Information’s Master of Information and Data
Science (MIDS) is a web-based program featuring immersive coursework
and live, online classes you can attend from anywhere in the world.
Delivered on a state-of-the-art learning platform, datascience@berkeley
facilitates collaboration and discussion to help you build a professional
network of faculty and peers from the start.
Students can access all datascience@berkeley content 24 hours a day, 7
days a week.
加州大学伯克利分校信息学院的信息与数据科学硕士(MIDS)是一个基于网络的项目,这是具有身临其境的课程和直播,你可以在世界任何地方参加网上课程。在国家最先进的学习平台上进行传送,伯克利分校的数据科学有助于协作和讨论,以帮助学生从一开始就建立一个与教师和同行一起的专业网络。
学生可以一周七天,每天24小时访问伯克利分校所有数据科学的内容。
课程架构/课程体系:Below is a sample course schedule and the expected path
through the degree program. Students who are interested in
taking the program on an accelerated basis can complete their
coursework in 3 or 4 terms with approval from the School by
taking up to 3 courses in one or more terms.
下面是一个示例课程安排,以及通过学位课程的预期路径。有兴趣在加速基础上参加该项目的学生能够在3或4学期完成他们的课程,这需要获得学院批准其在一个或多个学期内完成3门课程。
每门课程简介:
1. Research Design and Application for Data and Analysis
数据和分析研究设计与应用
技能:Research design / Question formulation / Data and decision making /
Understanding cognitive bias / Data for persuasion and action /
Integrating data and domain knowledge / Storytelling with data
研究设计/问题制定/数据和决策/了解认知偏差/数据进行劝说和行动/
数据集成和领域知识/用数据讲故事
课程简介:This course introduces students to the burgeoning data sciences
landscape, with a particular focus on learning how to apply data
science techniques to uncover, enrich, and answer questions facing
industries today. After an introduction to data sciences and an
overview of the program, students will explore how organizations
make decisions and the emerging role of big data in guiding both
tactical and strategic decisions. Lectures, readings, discussions, and
assignments will teach how to apply disciplined, creative methods to
ask better questions, gather data, interpret results, and convey
findings to various audiences in ways that change minds and change
behaviors. The emphasis throughout is on making practical
contributions to real decisions that organizations will and should
make. Industries and domains that we will explore include sports
management, finance, energy, journalism, intelligence, health care,
and media entertainment.
本课程向学生介绍了新兴的数据科学的情况,尤其侧重于学习如何运用数据的科学技术来发现、丰富并回答如今所面临的行业问题。在介绍了数据科学和项目的概况后,学生将探讨企业如何做出决策和大数据在指导战术和战略决策中扮演的新兴角色。讲座、阅读、讨论、作业会教学生如何运用学科和创造性的方法来提出更好的问题,收集数据、解释结果并向大量听众传达调查结果可以改变思想和行为方式。整体的重点是为组织提供切实有效的决策。我们将探讨的行业和领域包括体育管理,金融,能源,新闻,情报,医疗保健和媒体娱乐。
2. Exploring and Analyzing Data 探索和分析数据
技能:Research design / Statistical analysis 研究设计/统计分析
工具:R
课程简介:The goal of this course is to provide students with an introduction to
many different types of quantitative research methods and statistical
techniques for analyzing data. We begin with a focus on measurement,
inferential statistics, and causal inference. Then, we will explore a
range of statistical techniques and methods using the open-source
statistics language, R. We will use many different statistics and
techniques for analyzing and viewing data, with a focus on applying
this knowledge to real-world data problems. Topics in quantitative
techniques include: descriptive and inferential statistics, sampling,
experimental design, parametric and non-parametric tests of
difference, ordinary least squares regression, and logistic regression.
本课程的目的是为学生提供介绍许多不同类型的定量研究方法和分析数据的统计技术。首先侧重于测量、统计推断和因果推断。然后,将探讨一系列使用开源统计语言R的统计技术和方法。我们将使用许多不同的统计和技术来分析和查看数据,重点是将这一知识用于解决现实世界的数据问题。定量技术主题包括:描述和统计推断,取样,实验设计,参数化和差异性的非参数检验,普通最小二乘回归和回归。
3. Storing and Retrieving Data存储和检索数据
技能:Data acquisition/Data cleaning and normalization/Building data bases /
Data classification and indexing / Data warehousing
数据采集/数据清理和规范化/建筑数据库/数据分类和索引/数据仓库
工具:Python / Relational databases / Hadoop / Map reduce/ Spark/
Cloud Computing (AWS)
课程简介:This course prepares students to deal with large-scale collections of data
as objects to be stored, searched over, selected, and transformed for
use. We examine both the background theory and practical application
of information retrieval, database design and management, data
extraction, transformation and loading for data warehouses, and
operational applications. We will examine traditional methods of
information retrieval and database management as well as new
approaches that use massively parallel computation
(MapReduce/Hadoop). Through readings, discussion, and hands-on
experimentation, students will be prepared to discuss, plan, and
implement storage, search and retrieval systems for large-scale
structured and unstructured information systems using a variety of
software tools. They will also be able to evaluate large-scale
information storage and retrieval systems in terms of both efficiency
and effectiveness in providing timely, accurate, and reliable access to
needed information.
本课程培养学生处理以大规模集合数据为对象的存储、搜索、选择及转化以供使用。我们研究这一问题的背景理论和信息检索,数据库设计和管理,数据抽取,转换和加载数据仓库的实际应用和业务应用。我们将研究信息检索和数据库管理的传统方法以及使用大规模并行计算(MapReduce/ Hadoop)的新方法。通过阅读、讨论、动手实验,学生将使用多种软件工具为大规模的结构化和非结构化信息系统进行讨论、计划、实施存储、搜索和检索系统。他们也将能够在提供及时、准确、可靠的获得所需要的信息,以评估在效率和有效性方面的大规模信息存储和检索系统。
4. Applied Machine Learning 应用机器语言
技能:Experimental design / Working with machine learning algorithms/
Feature engineering/Prediction vs. explanation/
Network analysis/Collaborative filtering
实验设计/用机器学习算法工作/功能设计/预测与解释/网络分析/
协同过滤
工具:Python / Python libraries for linear algebra, plotting, machine learning:
numpy, matplotlib, sk-learn / Github for submitting project code
课程简介:Machine learning is a rapidly growing field at the intersection of
computer science and statistics concerned with finding patterns in
data. It is responsible for tremendous advances in technology, from
personalized product recommendations to speech recognition in cell
phones. This course provides a broad introduction to the key ideas in
machine learning. The emphasis will be on intuition and practical
examples rather than theoretical results, though some experience with
probability, statistics, and linear algebra will be important.
机器学习是一个在与数据查找模式有关的计算机科学与统计的交集中快速增长的领域。它是负责技术的巨大进步,从个性化的产品推荐到手机的语音识别。本课程在机器学习的主要观点方面提供了广阔的介绍。重点将放在直觉和实际的例子,而不是理论成果,但与概率、统计和线性代数有关的一些经验将是重要的。
5. Visualizing and Communicating Data可视化和数据通信
技能:Exploratory data analysis / Effective written communication /
Effective visual presentation of data / Design for human perception
探索性数据分析/有效的书面沟通/数据的有效视觉呈现/人类感知设计
工具:Tableau / Javascript / D3 / Illustrator / R/ggplot2 /
Highcharts / Visit
课程简介:Communicating clearly and effectively about the patterns we find in data
is a key skill for a successful data scientist. This course focuses on the
design and implementation of complementary visual and verbal
representations of patterns and analyses in order to convey findings,
answer questions, drive decisions, and provide persuasive evidence
supported by data. Assignments will give students hands-on
experience with designing and building data visualizations as well as
reporting their findings in prose.
对在数据中所发现的模式进行清楚而有效的沟通是成功的数据科学家的一个重要技能。本课程的重点是设计和实施模式和分析互补的视觉和口头交涉,以传达调查结果、回答问题、推动决策并提供了数据支持的有说服力的证据。作业会让学生通过设计和建立数据可视化进行动手实验,以及报告他们在实践经验中的发现。
6. Field Experiments 现场实验
技能:Experimental design/ Statistical analysis / Communicating results /
Cleaning data / Mining and exploring data
实验设计/统计分析/沟通结果/清理数据/挖掘和探索数据
工具:R
课程简介:This course introduces students to experimentation in the social
sciences. This topic has increased considerably in importance since
1995, as researchers have learned to think creatively about how to
generate data in more scientific ways, and developments in
information technology has facilitated the development of better data
gathering. Key to this area of inquiry is the insight that correlation
does not necessarily imply causality. In this course, we learn how to
use experiments to establish causal effects, and how to be
appropriately skeptical of findings from observational data.
本课程向学生介绍在社会科学中的实验。自1995年以来这一话题已经大大增加了重要性,研究人员已经学会创造性地去思考如何用更科学的方式来生成数据以及信息技术的发展推动了更好的数据收集的发展。探究这一领域的关键是洞察关联并不意味着因果关系。在这个过程中,我们学会了如何使用实验建立因果效应,以及如何从发现的数据中进行适当怀疑。
7. Legal, Policy, and Ethical Considerations for Data Scientists
数据科学家的法律,政策和伦理问题
技术:Ethical and legal frameworks / Policy analysis /
Oral and written presentation
道德和法律框架/政策分析/口头和书面陈述
课程简介:This course provides an introduction to the legal, policy, and ethical
implications of data. The course will examine legal, policy, and
ethical issues that arise throughout the full life cycle of data science
from collection, to storage, processing, analysis and use including,
privacy, surveillance, security, classification, discrimination,
decisional-autonomy, and duties to warn or act. Case studies will be
used to explore these issues across various domains such as criminal
justice, national security, health, marketing, politics, education,
automotive, employment, athletics, and development. Attention will
be paid to legal and policy constraints and considerations that attach
to specific domains as well as particular data-types, collection
methods, and institutions. Technical, legal, and market approaches to
mitigating and managing discrete and compound sets of concerns
will be introduced, and the strengths and benefits of competing and
complementary approaches will be explored.
本课程介绍了数据的法律、政策和伦理问题。该课程将研究出现在数据科学整个生命周期中的法律、政策以及伦理问题,从收集到存储、处理、分析和利用,包括隐私、监控、安防、分类、
识别、自主性决策和以及警告或行为的职责。案例研究将被用于探索在各个领域这些问题,如刑事司法、国家安全、健康、市场营销、政治、教育、汽车、就业、体育和发展。需要关注与特定领域和特定数据类型、收集方式和制度有关的法律和政策限制和注意事项。课程将介绍技术,法律和市场办法以缓和及管理独立和复合的组织的顾虑,以及探讨竞争和互补方法的优势与好处。
8. Scaling Up! Really Big Data 扩大!真正的大数据
技能:Working with data at scale
与大规模数据工作
工具:D-Streams / Apache Pig / OpenStack components and OpenStack
Heat specifically / CloudSoft Brooklyn / Apache Storm
课程简介:This course provides a hands-on introduction to very large-scale
data and the practical issues surrounding how the data is stored,
processed and analyzed. Students will work with Cloud
Computing systems, large data collections and high velocity data
streams. The class material will be introduced gradually as it
helps students accomplish their projects and assignments
throughout the course. Hands-on activities will enable the
students to learn the practical toolkit of a big data specialist, e.g .
Hadoop, Apache Spark, NoSQL databases, distributed file
systems, large scale object storage systems and many others.
本课程提供了对很大规模的数据以及数据是如何存储、处理和分析的实际问题环境的一个实际操作介绍。学生将用云计算系统、大型数据集合和高速数据流进行工作。该类材料将逐步介绍,因为它有助于学生在整个过程中完成他们的项目和任务。实际操作活动使学生学会大数据专家的实用工具,如 Hadoop、Apache Spark、NoSQL数据库、分布式文件系统、大规模的对象存储系统等等。
9. Synthetic Capstone Course综合毕业设计
技能:Project scoping, planning and management / Data acquisition and
analysis /
Communication / Teamwork / Influence in organizations /
Design thinking for data science
课程简介:In the capstone class, students will synthesize technical, analytic,
interpretive, and social dimensions to design and execute a full
data science project in which they develop and demonstrate their
skills at synthesis. The final project is designed to integrate all of
the core skills and concepts learned throughout the program and
prepare students to compete in the professional data science job
market. It provides experience in formulating and carrying out a
sustained, coherent, and significant course of work resulting in a
tangible data science analysis project with real-world data.
Students are evaluated on their ability to develop and present their
final data science analysis project in both written and oral form.
The capstone is completed as a group/team project (3-4 students),
and each project will focus on open, pre-existing secondary data.
A robust listing of open datasets will be made available before the
capstone course begins.
在毕业设计课上,学生将综合技术、分析、解释和社会层面来设计和执行一个完整的数据科学项目,这其中他们将开发和展示他们的综合技能。这个最后项目的目的是整合所有在项目中学到的核心技能和概念以及培养学生在专业数据科学就业市场中的竞争力。它提供了制定和执行持续的、连贯的、显著的进行有形数据的科学分析与项目实际数据的工作的经验。学生的开发能力以及用书面和口头形式展示他们最终数据科学分析项目的能力被进行评估。
毕业设计以一组/团队项目(3-4人)完成,每个项目将关注开放的、预先存在的辅助数据。在毕业设计课程开始前,学院将提供一个强大的公开数据集清单。
专业特点:1. A New Field Emerges(一个新领域崛起)
2. An Explosion of Data(数据爆炸)
3. A Challenge Identified(确定的挑战)
师资力量:The School of Information (I School) is made up of tenured faculty,
leading industry practitioners, and post-doctoral scholars. This diversity
among our faculty ensures that students gain the necessary theory and
skills development while also having access to cutting-edge data science
research. I School faculty members’ expertise includes, but is not limited
to, information science, computer science, applied statistics, social
science, research design, and information policy.
信息学院的师资是由终身教授、引领行业的从业者和博士后学者共同组成的。教师队伍的多样性确保学生获得必要的理论和技能的发展,同时也有机会获得尖端数据科学的研究。信息学院教师的专业知识包括但不限于信息科学、计算机科学、应用统计、社会科学、科研设计和信息政策。