資料科學從頭學(二)初學.Udemy 4門初學課程
        7、8月完成 Udemy 4門資料科學相關課程,總時數 93.5小時。
二、做法:
        7月份完成課程一遍,8月份挑出不熟與需要加強的地方精進。
7月份平均每周23.4小時,等於每天4.6小時(五天) + 2小時以上練習。
        ( google calendar 紀錄課程與累積時數)
三、原則:
       每堂課邊看邊分類出  1.基礎:一定要懂、 2.八月份精進精進名單、3.有概念就好
四、課程內容:
Course1. Data Science A-Z:Real-Life Data Science Exercises (21 Hours)
This course will give you a full overview of the Data Science journey.
Upon completing this course you will know:
- How to clean and
     prepare your data for analysis
- How to perform basic
     visualisation of your data
- How to model your data
- How to curve-fit your
     data
- And finally, how to
     present your findings and wow the audience
In this course you will develop a good understanding of the following
tools:
- SQL
- SSIS
- Tableau
- Gretl
重點:
1) 學整體概念優先於個別工具學習
2) 掌握資料庫相關概念
注意:
1) Tableau、Gretl 都是付費工具,以體驗為主,不求精進
Course2. Machine Learning A-Z:Hands-on Python & R in Data Science(41 Hours)
Structured:
- Part 1 - Data
     Preprocessing (與課程1重複)
- Part 2 - Regression:
     Simple Linear Regression, Multiple Linear
     Regression, Polynomial Regression, SVR, Decision Tree
     Regression, Random Forest Regression
- Part 3 -
     Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes,
     Decision Tree Classification, Random Forest Classification
- Part 4 - Clustering:
     K-Means, Hierarchical Clustering
- Part 5 - Association
     Rule Learning: Apriori, Eclat
- Part 6 - Reinforcement
     Learning: Upper Confidence Bound, Thompson Sampling
- Part 7 - Natural
     Language Processing: Bag-of-words model and algorithms for NLP
- Part 8 - Deep Learning:
     Artificial Neural Networks, Convolutional Neural Networks
- Part 9 - Dimensionality
     Reduction: PCA, LDA, Kernel PCA
- Part 10 - Model
     Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid
     Search, XGBoost
重點:
1) 初步了解、並選擇幾個ML工具進入八月精進名單
2) Deep Learning 概念掌握
3) 學習老師的 Python 語法
注意:
1) 要另外花時間補一些統計與數學知識
2) R語言先瀏覽,但不刻意學習
Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)
- Extract meaning from
     large data sets using a wide variety of machine learning, data
     mining, and data science techniques with the Python programming
     language.
- Perform machine
     learning on "big data" using Apache Spark and its MLLib
     package.
- Design experiments and
     interpret the results of A/B tests
- Visualize clustering and
     regression analysis in Python using matplotlib
- Produce automated
     recommendations of products or content with collaborative filtering
     techniques
- Apply best
     practices in cleaning and preparing your data prior to analysis
- Regression analysis
- K-Means Clustering
- Principal Component
     Analysis
- Train/Test and cross
     validation
- Bayesian Methods
- Decision Trees and
     Random Forests
- Multivariate Regression
- Multi-Level Models
- Support Vector Machines
- Reinforcement Learning
- Collaborative Filtering
- K-Nearest Neighbor
- Bias/Variance Tradeoff
- Ensemble Learning
- Term Frequency /
     Inverse Document Frequency
- Experimental Design and
     A/B Tests
重點:
1) 比較與課程2的講法差異,選擇幾個ML工具進入八月精進名單
2) Apache Spark 概念掌握
3) 學習老師的 Python 語法,與課程2比較
注意:
1) 要另外花時間補一些統計與數學知識
Course4. Deep Learning A-Z:Hands-on Artificial Neural Networks (22.5 Hours)
- Understand the
     intuition behind Artificial Neural Networks
- Apply Artificial Neural
     Networks in practice
- Understand the
     intuition behind Convolutional Neural Networks
- Apply Convolutional
     Neural Networks in practice
- Understand the
     intuition behind Recurrent Neural Networks
- Apply Recurrent Neural
     Networks in practice
- Understand the
     intuition behind Self-Organizing Maps
- Apply Self-Organizing
     Maps in practice
- Understand the
     intuition behind Boltzmann Machines
- Apply Boltzmann
     Machines in practice
- Understand the
     intuition behind AutoEncoders
- Apply AutoEncoders in
     practice
重點:
1) 掌握Artificial
Neural Networks概念與目前發展狀況
2) 先挑一個深入精進
快速概念參考:

 
 
留言
張貼留言