class: center, middle, inverse, title-slide # STAT3622 Data Visualization (Lecture 1) ## Introduction to Data Science ###
Dr. Aijun Zhang
The University of Hong Kong ### 21 January 2020 --- # What's covered in this lecture? <img style="float: right; width: 360px; padding:50px 100px 0 0;" src="DataScienceVennDiagram.png"> - STAT3622 Course Outline - Course Objectives - Tentative Contents - Assessments - Refereces - RStudio and R Markdown - Introduction to Data Science - DS Job Market - DS Venn Diagram - DS Workflow --- class: center, middle # 1. STAT3622 Course Outline --- # Course Admin Information <img src="LogoSaaSHKU.png" width="40%" style="display: block; margin: auto;" /> - Instructor: Dr. Aijun Zhang - Office: RR224 - Email: ajzhang@hku.hk - Lecture Hours: - Tuesday 1:30pm - 4:20pm (T5) - Tutor: Mr. Yifeng Guo - Office: RR114 - Email: gyf9712@hku.hk --- # Course Websites - http://stat3622.saas.hku.hk/ * Weekly updates with various course materials <!-- * A public domain for also hosting your DataViz projects --> - http://moodle.hku.hk/ * Sync with lecture notes/assignments/etc * Annoucements, reminders, and surveys <!-- - Rstudio server: http://stat3622.saas.hku.hk:8787/ --> --- # Course Objectives - This course (as part of data science) will focus on statistical graphics and interactive data visualization, as well as their applications in real case studies. - **Programming:** R (primary), Python, D3.js - **You will learn:** - Choose the best chart that fits the data - Communicate effectively using statistical graphics - Create compelling visualization via programming tools - **Prerequisites:** STAT2602 (Probability & Statistics II) or STAT3902 (Statistical Models) --- # Tentative Contentes .pull-left[ - Introduction to Data science - Exploratory data analysis - Data manipulation - Hans Rosling's Bubbles - Interactive data visualization - Shiny - Web scrapping - Dynamic documentation - Big data visualization ] .pull-right[ 1. Statisticla Graphics * R base plot * R:lattice * R:ggplot2 1. Interactive Data Visualization * R:magick/animation * R:plotly (based on d3.js) * Rstudio:shiny 1. Selected Topics * Spatiotemporal DataViz * Map visualization * Web scrapping, etc ] --- # Assessments - No Final Exam! - 40% Homeworks (2 sets) and in-class quizzes (2 sets); - 60% Final project, consisting of - DataViz app: 30% - Oral presentation: 15% - Written report: 15% --- # References - *R for Data Science* (2017 O'Reilly) by Grolemund and Wickham. http://r4ds.had.co.nz/ - Wickham, H. (2016). *ggplot2: Elegant Graphics for Data Analysis* (2nd). Springer. http://ggplot2.org/book/ - Rossant, C. (2015). *Learning IPython for Interactive Computing and Data Visualization* (2nd). Packt. http://ipython-books.github.io/minibook/ - Meeks, E. (2017) *D3.js in Action* (2nd Edition, 2017 Manning) https://www.manning.com/books/d3js-in-action-second-edition - Yau, N. (2011). *Visualize This: The FlowingData Guide to Design, Visualization, and Statistics*. Wiley. http://book.flowingdata.com/ - *RStudio Cheat Cheets*. https://www.rstudio.com/resources/cheatsheets/ --- class: center, middle # RStudio and R Markdown --- # RStudio IDE - RStudio is a popular IDE (Integrated Development Environment) for R programming - It is a powerful editor for R coding and debugging. - It is a powerful generator for HTML, PDF, dynamic documents and slide shows. - RStudio can be run on both Desktop and Cloud. <!-- In this course, we provide the RStudio Server: http://stat3622.saas.hku.hk:8787 --> - Check out more nice features of RStudio at its [official website](https://www.rstudio.com/products/rstudio/features/) --- # RStudio IDE <img src="RStudio-Screenshot.png" width="60%" style="display: block; margin: auto;" /> --- # R Markdown (Demonstrated) ```r knitr::kable(head(iris), format = 'html') ``` <table> <thead> <tr> <th style="text-align:right;"> Sepal.Length </th> <th style="text-align:right;"> Sepal.Width </th> <th style="text-align:right;"> Petal.Length </th> <th style="text-align:right;"> Petal.Width </th> <th style="text-align:left;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.4 </td> <td style="text-align:right;"> 3.9 </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> - Dynamic documentation: report, table, graphics ... - R packages by Yihui Xie: knitr, bookdown, xaringan, etc --- # R Markdown (Demonstrated) ```r plot(iris, col=iris$Species) ``` <img src="index_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> - Data-generated graphics that are reproducible --- # R Markdown <img src="Rmarkdown.png" width="60%" style="display: block; margin: auto;" /> - Click [here](https://rmarkdown.rstudio.com/lesson-1.html) to view a fantastic micro-video tutorial - Browse [here](http://rmarkdown.rstudio.com/gallery.html) for a gallery of creative Rmarkdown works --- class: center, middle # Introduction to Data Science --- # Data Scientist The Sexy Job <img src="HBR201210.png" width="75%" style="display: block; margin: auto;" /> - See also an old article by NYT (2009): [For Today’s Graduate, Just One Word: Statistics](https://www.nytimes.com/2009/08/06/technology/06stats.html) - And another famous McKinseay 2011 Report: [Big data: The next frontier for innovation, competition, and productivity](https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation) --- # What is a data scientist? - Nate Silver ([FiveThirtyEight](https://fivethirtyeight.com/contributors/nate-silver/), author of The Signal and the Noise): "Data scientist is just a sexed up word for a statistician. - "A data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.” (from [Joshua Blumenstock](http://www.jblumenstock.com/teaching/course=infx573)) <img src="DataScientistDict.png" width="75%" style="display: block; margin: auto;" /> --- # What is a data scientist? <img src="DSPillow.jpg" width="50%" style="display: block; margin: auto;" /> <!-- --- --> <!-- # Job Market of Data Scientist --> <!-- ```{r echo=FALSE, fig.align="center", out.width = '70%'} --> <!-- knitr::include_graphics("DataScientistJobHK.png") --> <!-- ``` --> <!-- --- --> <!-- # Innovation Technologies --> <!-- Big Data, Data Science, Machine Learning, Artificial Intelligence, Deep Learning, and Statistics --> <!-- ```{r echo=FALSE, fig.align="center", out.width = '50%'} --> <!-- knitr::include_graphics("InnovationTechnology.jpg") --> <!-- ``` --> <!-- - [A BMC Blog Post](https://www.bmc.com/blogs/machine-learning-data-science-artificial-intelligence-deep-learning-and-statistics/) about these innovative technology terms --> <!-- - [Another blog post](https://www.newgenapps.com/blog/artificial-intelligence-vs-machine-learning-vs-data-science) about "Artificial Intelligence vs Machine Learning vs Data Science" --> <!-- --- --> <!-- # AI is Statistics --> <!-- ```{r echo=FALSE, fig.align="center", fig.cap="HKU SAAS Orientation Day, 4/9/2018", out.width = '55%'} --> <!-- knitr::include_graphics("AIisStat.jpg") --> <!-- ``` --> --- # From Big Data to Applied AI <div class="figure" style="text-align: center"> <img src="AppliedAI.png" alt="HKU New BASc Programme in Applied AI (2019 onwards)" width="70%" /> <p class="caption">HKU New BASc Programme in Applied AI (2019 onwards)</p> </div> --- # Data Science Venn Diagram <img src="DataScienceVennDiagram.png" width="45%" style="display: block; margin: auto;" /> --- # Data Science vs. Statitics .pull-left[ <img src="TagCloudStatistics.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="TagCloudDataScience.png" width="90%" style="display: block; margin: auto;" /> ] --- # Data Science Workflow <img src="DataScienceWorkflow1.png" width="50%" style="display: block; margin: auto;" /> --- # Data Science Workflow <img src="DataScienceWorkflow2.jpg" width="70%" style="display: block; margin: auto;" /> --- # Roles of Data Visualization - Role 1: Exploratory data analysis (pre stage); - Role 2: Visual presentation of results (after stage). - John W. Tukey (1977; Exploratory Data Analysis): "The greatest value of a picture is when it forces us to notice what we never expected to see.” .pull-left[ <img src="JohnTukey.png" width="47%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="JohnTukeyEDA.jpg" width="40%" style="display: block; margin: auto;" /> ] --- class: center, middle ## The best stats you've ever seen | Hans Rosling <iframe width="560" height="315" src="https://www.youtube.com/embed/hVimVzgtD6w" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Click [here](https://www.youtube.com/embed/hVimVzgtD6w) to view it on YouTube. --- class: center, middle # Thank you! Q&A or Email ajzhang@umich.edu。