Topics to be covered
Data science is divided into two parts:
a) Data Analysis and Data Visualization
b) Predictive Modeling
A) Data Analysis and Visualization
1: Numpy: Dealing with N-dim array
- Overview
- Creating ndims arrays
- Why do we need arrays?
- Numeric operations using NumPy
- Indexing and slicing
- Some Mathematical functions
- Generate Random array
2: Pandas: Data analysis and Manipulation
- Pandas Overview
- Data Structures
- Series
- DataFrame
- Series and DataFrame operations
- Missing Data
- Categorical Data
- Working on DateTime data
- Read data from the different file format
- Merging and Grouping Data
- Many other data operations using Pandas
3: Matplotlib / Seaborn : Data visualization
- Overview
- Scatter plot, line plot, bar plot
- Histogram
- Xlabel, Ylabel, Xticks, Yticks, title
- Marker style,type, size
- Figure and Subplot
- Saving a Figure
- HeatMap,BoxPlot
4: Text analysis using NLTK
- What is NLP?
- NLP libraries
- NLP Applications
- Cleaning text data
- Tokenization
- Removal Stop words
- Stemming and Lemmatization
- part-of-speech(POS) tagging
B) Predictive Modeling using scikit-learn
1: scikit-learn
- Regression
- Introduction
- Simple Linear Regression
- Multiple Linear Regression
- Polynomial Regression
- Evaluate Performance of a linear regression model
- Overfitting and underfitting
- Regularization
- Logistic Regression
- Logistic Regression theory
- Implementing Logistic regression with scikit-learn
- Logistic Regression Parameters
- MNIST digit dataset with Logistic Regression
- Predictive modeling on adult income dataset
- Naive Bayes Classification
- Theory Naive Bayes Algorithm
- Features Extraction
- Countvectorizer
- TF-IDF
- Email Spam filtering
- Sentiment analysis
- Decision Tree and Random Forest
- The theory behind the decision tree
- Implementing a decision tree with scikit-learn
- Decision tree parameters
- Combining multiple decision trees via Random forest
- How random forest works..?
- Model Evaluation and Parameter Tuning
- Cross-validation via K-Fold
- Tuning hyperparameters via grid search
- Confusion matrix
- Recall and Precision
- ROC and AUC
- Clustering and Dimension Reduction
- K-means Clustering
- Elbow method
- Principal components analysis(PCA)
- PCA step by step
- Implementing PCA with scikit-learn
Target Audience
The course can be taken by:
Students: All students who are pursuing professional graduate/post-graduate courses related to computer science or Information Technology.
Teachers/Faculties: All computer science and engineering teachers/faculties.
Professionals: All IT professionals, who wish to acquire new skills or improve their existing skills.
Test & Evaluation
1. During the program, the participants will have to take all the assignments given to them for better learning.
2. At the end of the program, a final assessment will be conducted.
Certification
1. All successful participants will be provided with a certificate of completion.
2. Students who do not complete the course / leave it midway will not be awarded any certificate.