資料科學從頭學(二)初學.Udemy 4門初學課程














(文章內容皆為記錄本人之學習過程,非以分享為目的)
一、目標: 
        78月完成 Udemy 4門資料科學相關課程,總時數 93.5小時。

二、做法: 
        7月份完成課程一遍,8月份挑出不熟與需要加強的地方精進。

        7
月份平均每周23.4小時,等於每天4.6小時(五天) + 2小時以上練習。
        ( google calendar 紀錄課程與累積時數)

三、原則:
       每堂課邊看邊分類出  1.基礎:一定要懂、 2.八月份精進精進名單、3.有概念就好


四、課程內容:

Course1. Data Science A-ZReal-Life Data Science Exercises (21 Hours)

This course will give you a full overview of the Data Science journey. Upon completing this course you will know:
  • How to clean and prepare your data for analysis
  • How to perform basic visualisation of your data
  • How to model your data
  • How to curve-fit your data
  • And finally, how to present your findings and wow the audience
In this course you will develop a good understanding of the following tools:
  • SQL
  • SSIS
  • Tableau
  • Gretl

重點:
1) 學整體概念優先於個別工具學習
2) 掌握資料庫相關概念

注意:
1) TableauGretl 都是付費工具,以體驗為主,不求精進



Course2. Machine Learning A-Z
Hands-on Python & R in Data Science(41 Hours)


Structured:
  • Part 1 - Data Preprocessing (與課程1重複)
  • Part 2 - Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, Random Forest Regression
  • Part 3 - Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree Classification, Random Forest Classification
  • Part 4 - Clustering: K-Means, Hierarchical Clustering
  • Part 5 - Association Rule Learning: Apriori, Eclat
  • Part 6 - Reinforcement Learning: Upper Confidence Bound, Thompson Sampling
  • Part 7 - Natural Language Processing: Bag-of-words model and algorithms for NLP
  • Part 8 - Deep Learning: Artificial Neural Networks, Convolutional Neural Networks
  • Part 9 - Dimensionality Reduction: PCA, LDA, Kernel PCA
  • Part 10 - Model Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid Search, XGBoost

重點:
1) 初步了解、並選擇幾個ML工具進入八月精進名單
2) Deep Learning 概念掌握
3) 學習老師的 Python 語法

注意:
1) 要另外花時間補一些統計與數學知識
2) R語言先瀏覽,但不刻意學習





Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)

  • Extract meaning from large data sets using a wide variety of machine learning, data mining, and data science techniques with the Python programming language.
  • Perform machine learning on "big data" using Apache Spark and its MLLib package.
  • Design experiments and interpret the results of A/B tests
  • Visualize clustering and regression analysis in Python using matplotlib
  • Produce automated recommendations of products or content with collaborative filtering techniques
  • Apply best practices in cleaning and preparing your data prior to analysis


  • Regression analysis
  • K-Means Clustering
  • Principal Component Analysis
  • Train/Test and cross validation
  • Bayesian Methods
  • Decision Trees and Random Forests
  • Multivariate Regression
  • Multi-Level Models
  • Support Vector Machines
  • Reinforcement Learning
  • Collaborative Filtering
  • K-Nearest Neighbor
  • Bias/Variance Tradeoff
  • Ensemble Learning
  • Term Frequency / Inverse Document Frequency
  • Experimental Design and A/B Tests

重點:
1) 比較與課程2的講法差異,選擇幾個ML工具進入八月精進名單
2) Apache Spark 概念掌握
3) 學習老師的 Python 語法,與課程2比較

注意:
1) 要另外花時間補一些統計與數學知識




Course4. Deep Learning A-Z
Hands-on Artificial Neural Networks (22.5 Hours)

  • Understand the intuition behind Artificial Neural Networks
  • Apply Artificial Neural Networks in practice
  • Understand the intuition behind Convolutional Neural Networks
  • Apply Convolutional Neural Networks in practice
  • Understand the intuition behind Recurrent Neural Networks
  • Apply Recurrent Neural Networks in practice
  • Understand the intuition behind Self-Organizing Maps
  • Apply Self-Organizing Maps in practice
  • Understand the intuition behind Boltzmann Machines
  • Apply Boltzmann Machines in practice
  • Understand the intuition behind AutoEncoders
  • Apply AutoEncoders in practice

重點:
1) 掌握Artificial Neural Networks概念與目前發展狀況
2) 先挑一個深入精進



快速概念參考:

留言

這個網誌中的熱門文章

統計從頭學(二) 假設檢定入門

資料科學從頭學(五) Linear Regression

資料科學從頭學(四) SVM(線性)