This is a brief account of the talk by Prof. and Chair Michael Franklin, of UC Berkeley Computer Sciences Department, on May 29, 2015.
Advances of Data Science and Big Data Analysis:
Massively scalable processing and storage
Pay as you go also scalable
Flexible schema on read
Data lakes for storage
Open source ecosystem driving innovation
He pioneered and operates AMPlab at UC Berkeley. It develops:
Machines cluster and cloud computing
Parts of the system are:
Berkeley Data Analytics Stack
Apache Spark processing engine
In-memory dataflow system
Genomics is an application for this.
At UCB, data science is coordinated at the UC Berkeley Institute for Data Science
It is run by Saul Perlmutter in Physics (he shared the 2011 Nobel Prize in Physics for discovering the acceleration of the Big Bang from Dark Energy by astronomical analysis of type 1a supernova.)
Data Science involves the overlap of Computer Scientists, Statisticians, and Domain Experts.
It also requires Inference, Visualization, and Communication.
Another consideration is the ethics of data collection and usage.
At UC Berkeley, 5,000 out of 6,000 freshmen learn about computing, also involving some Python programming.
At UCB, he is on the Rapid Action Committee on Data Science. Its report will come out soon.
It recommends a basic course for all freshmen. Connecter classes are then recommended leading into the student’s major.
UCI has a major in data science.
I was especially interested in the references for further information that he pointed out.
Big Data Analytics for Dummies, a free pdf from Alteryx.
Frontiers in Massive Data Analysis, another free pdf, from the National Academies Press.
He referenced the Berkeley Institute for Data Science
His lab also developed the free app Carat, which collects data and gives a collaborative energy diagnosis of what apps are causing energy drain on your cell phone.
This was a very informative and non-technical talk.