Introduction to Data Science with Python
Photo by Franki Chamaki on Unsplash
Click here for the FDP Data Science Notebook
What you can expect from this notebook;
- Introduction to Data Science using python
- What is Data Science?
- Why do we need Data Science?
- Brief overview of Topics
- Big Data Analytics
- Machine Learning & Deep Learning
- Pythonic way of Data Science
- Brief Intro to Python Programming Language
- Python for Data Science
- Intro to Data Processing, Statistical analysis and Visualization libraries
- numpy, pandas, scipy
- matplotlib, seaborn, plotly
- Intro to Model Building and inference frameworks
- Scikit Learn, Tensorflow, Pytorch
- Approaching a Tabular(Structured) Problem (Hands On)
- Understanding the Problem
- Understanding the problem type
- Class imbalances and necessary fixes
- Understanding features and its types
- Exploratory Data Analysis
- Missing Data Imputation
- Identifying correlation, collinearity of features
- Data Distribution and statistical analysis
- Outlier Analysis
- Data Preprocessing
- Dimensionality reduction - Curse of dimensionality
- Data Preprocessing
- Normalization, MinMax Scalar, Standardization
- Categorical Encoding - OneHot Encoder
- Feature Engineering
- Combining Features
- Splitting Temporal features
- Feature Selection
- Removing features
- Choosing the right features to improve prediction power
- Model Building - A Machine Learning approach
- Hyper parameter tuning and Grid Search
- Logistic Regression
- Ensemble - Bagging and Boosting
- Gradient Boosting Classifier,Stochastic Gradient Boosting (SGB),XgBoostVoting Classifier
- Choosing Best classifier
- Choosing the right classifier based on evaluation criteria
- Classifier Inference on example data
- Understanding the Problem
- Approaching a Text(NLP) Problem(Hands On)
- Importance of solving NLP
- Applications of NLP
- chatbots, sentiment analysis, translation, autocomplete, document search ..etc
- Intro to Text
- Tokens, Corpus,Tokenization, Stemming, Lemmatization,N-grams ..etc
- Brief Intro to basic text processing libraries
- NLTK, spacy
- Solving a Real World Tweet Classification Problem
- Understanding the problem
- Basic EDA of tweets
- Class distribution, distribution of length of tweets
- Common Stopwords, words in tweets w/o stopwords,bigrams in tweets
- WordClouds of tweets
- Data Cleaning
- Handling stopwords, special characters, url, html,handler, emoji
- Text Vectorization
- CountVectorizer, Bag of Words, TF-IDF
- Approaching a Vision Problem (Hands On)
- An introduction to computer vision
- What is Computer Vision?
- How is computer vision used today?
- Image Processing
- Point Operators
- Pixel Transforms
- Color Transforms
- Compositing and matting
- Histogram Equalization
- Linear Filtering
- Separable Filtering
- Band Pass and Steerable Filters
- More neighborhood operators
- Non-linear filtering
- Bilateral filtering
- Binary Image processing
- Fourier Transforms
- Two-dimensional Fourier Transforms
- Pyramid and wavelets
- Interpolation
- Decimation
- Multi-resolution representations
- Wavelts
- Geometrics transformations
- Parametric transformations
- Mesh-based warping
- Point Operators
- OpenCV Library [Hands On]
- Introduction
- Changing colorspaces
- Geometric transformations of Images
- Image thresholding
- Smoothing Images
- Morphological Transformations
- Image Gradients
- Canny Edge Detection
- Image Pyramids
- Contours
- Histograms
- Image Transforms
- An introduction to computer vision