Specialization Certification in Data Scientist

Learn to Deal with concrete datasets and analysis of data. Develop programs to gather, clean, analyze, and visualize data.

100+ Hours of Learning | Specialization Certificate | Virtual Lab | Project Works

Limited no. of seats available | Delivery Mode: Recorded Lectures

Enroll and Pay Now Get in Touch

For any query related to our course kindly mail at prutor.ai@gmail.com or WhatsApp on 8953463074

Skills you will gain

Data Mining
Data Processing
Association Rule Mining
Classification Basics
Decision Tree
Bayes Classifier
K nearest neighbor
Analysis of Data
Introduction to R
Linear Algebra
Product of Matrix
Algebraic View

Programming Languages, Tools & Libraries Covered

About this Specialization

This Specialization builds on the success of the Data Scientist course and will introduce fundamental R programming concepts that are focussed more on practical learning rather than theoretical. Dealing with concrete datasets and analysis of data, using the R programming language. In the Project, you’ll use the technologies learned throughout the Specialization to design and create your own applications for data retrieval, processing, and visualization.

How the Specialization Works

Take Courses

Prutor.ai Specialization is a series of courses that help you master a skill. To begin, enroll in the Specialization directly, or review its courses and choose the one you'd like to start with. When you subscribe to a course that is part of a Specialization, you’re automatically subscribed to the full Specialization. Visit your learner dashboard to track your course enrollments and your progress.

Hands-on Project

Every Specialization includes a hands-on project. You'll need to successfully finish the project(s) to complete the Specialization and earn your certificate. If the Specialization includes a separate course for the hands-on project, you'll need to finish each of the other courses before you can start it.

Earn a Certificate

When you finish every course and complete the hands-on project, you'll earn a Certificate that you can share with prospective employers and your professional network.

Course Fee 30,000 + 18% GST

Enroll and Pay Now

Program Syllabus in this Specialization

Data Science for Engineers

Topics to be covered

Module-1: Data Science for Engineers Course Philosophy and Expectation
In this module, we will see course objectives and expected the outcome of the course.
- What are the course objectives?
- What will not be covered?
- What are the course outcomes and objectives?
Module-2: Introduction to R
Introduce R as a programming language to perform data analysis and the brief introduction of R studio.
- What is R and RStudio and how to get started with it?
- How to write, sav and execute R files?
Module-3: Introduction to R (Continued)
In this module, we will see adding comments to R file, clear environment of R studio and save the workspace of R.
- How to add comments in R file?
- How to clear the console and environment and how to save the data from the workspace?
Module-4: Variables and data types in R
In this module, we are going to see the rules for naming the variables in R, basic data types that are available in R and we are also going to see two basic R objects-Vectors and Lists in detail.
- What are the rules for naming the variables and what are the basic data types in R?
- What are the basic objects in R?
Module-5: Data Frames
In this module, we are going to introduce Data frame objects of R and perform some operation on the data frame.
- What is a Dataframe and how to create it?
- How to access the rows and columns of a data frame and how to edit it?
- How to add ad delete extra rows and columns in a data frame?
Module-6: Recasting and joining of data frames
The recasting of a data frames means, need to recast data frames and then look at more sophisticated operations on data frames such as Recasting and Joining of data frames.
- What does Recasting of a data frame mean?
- How is the recasting of a data frame done?
- How to join two data frames?
Module-7: Arithmetic, Logical and Matrix operations in R
In this module, we are going to do Arithmetic, Logical, and Matrix operations in R.
- What are Arithmetic and Logical operations in R?
- How to create a Matrix and access its elements?
- How to access an entry in a matrix and how does a colon operator works?
- How to do matrix concatenation and perform algebraic operations on the matrix?
Module-8: Advanced Programming in R: Functions
We are going to introduce the Functions in R and explain how to load or source the functions and how to call or invoke the functions, we are also going to see passing arguments to functions.
- What are the functions in R and how to create and invoke it?
- How to pass arguments in the function and how functions are evaluated in R?
Module-9: Advanced Programming in R: Functions (Continued)
In this module we will see the functions with MIMO, loading and call a function, we also see about inline functions and looping over objects using commands such as apply, lapply, and tapply.
- What are the functions with Multiple Input and Multiple Output (MIMO) and Inline functions?
- How to loop over the objects?
Module-10: Control Structures
We are going to study about if-else-if family, constructs for loop, Nested for loops, for loop with break and while loop.
- What is an if-else family of constructs and Sequence function in R?
- What is for and Nested for loop in R?
- What is a while loop in R?
Module-11: Data Visualization in R Basic graphics
In this module, we are going to show the generation of basic graphics such as scatter plot, line plot, bar plot using R and also give the brief idea of the need for sophisticated graphics.
- How to generate Scatter, Line and Bar Plot?
- Why there is a need for sophisticated graphics?
Module-12: Linear Algebra for Data Science
In this module we will learn about Linear algebra and matrices, also learn Identification of independent attributes and the linear relationship among attributes.
- What Linear Algebra is useful for and what is a Matrix?
- How to represent data using matrices in data science?
- How to identify independent variables or attributes in the data matrix?
- How to identify linear relationships among variables or attributes in the data matrix?
Module-13: Solving Linear Equations
In this tutorial session, we will solve some matrix equations problem.
- What are the general considerations for solving matrix equations?
- How to solve matrix equation for the case m = n and its examples?
- How to use optimization perspective to find a solution to the matrix equation in case of m>n?
Module-14: Solving Linear Equations (Continued)
In this tutorial session, we will solve some matrix equations problems.
- What is the example to solve matrix equation using optimization perspective for case m>n?
- How to use optimization perspective to find a solution to the matrix equation in case of m < n?
Module-15: Linear Algebra - Distance, Hyperplanes, and Halfspaces, EigenValues, Eigenvectors
In this module, we will learn Vector with the notion of distance, and then learn Unit, orthogonal, Orthonormal, and Basis vectors by their example.
- What is the concept of Vectors?
- What are Unit, Orthogonal and Orthonormal Vectors?
- What are the Basis Vectors?
- How to find basis vectors of the given set of vectors?
Module-16: Linear Algebra - Distance, Hyperplanes, and Halfspaces, EigenValues, Eigenvectors (Continued 1)
we are going to look at the representation of line and plane in geometrically and the concept of projection with its example and we are also looking at the generalization of projection.
- How are equations represented geometrically?
- What is the concept projections?
- How to illustrate projections through example and how projection is generalized?
Module-17: Linear Algebra - Distance, Hyperplanes, and Halfspaces, EigenValues, Eigenvectors (Continued 2)
In this module, we are going to be looking at Hyperplanes, Halfspace, Eigenvalues and Eigenvectors with their examples.
- What are Hyperplanes and what is the concept of Halfspace?
- What are the Eigenvalues and Eigenvectors (Part 1)?
- What are the Eigenvalues and Eigenvectors (Part 2)?
Module-18: Linear Algebra - Distance, Hyperplanes, and Halfspaces, EigenValues, Eigenvectors (Continued 3)
The objective of this module is to learn about Connections between eigenvectors, column space, and null space.
- What is the connection between eigenvectors, column space and null space (Part 1)?
- What is the connection between eigenvectors, column space and null space (Part 2)?
- What example is taken to explain the connection between eigenvectors, column space and null space?
Module-19: Statistical Modelling
we will go on to characterizing random phenomena what they are and how probability can be used as a measure for describing such phenomena.
- What are a Random and Discrete Phenomena?
- What is Probability and what are Exclusive and Independent events?
- What are the different rules in Probability and what is Conditional Probability?
- How to illustrate Conditional Probability through an example?
Module-20: Random variables and Probability Mass/Density Function
In this module we will go to introduce the notion of Random variable and the idea of probability mass and density function, we also see how to characterize these functions, properties of PDF, computation of probability using R, Multivariate normal distribution.
- What is a Random Variable (RV) and Probability Mass/Density Function (PDF)?
- What is the Binomial Mass Function and Gaussian or Normal Density Function?
- What is a Chi-square density function and what is the moment of a pdf?
- What are the properties of a Gaussian RV, how to compute the probability of using R and what are the other different functions in R?
- What is the joint pdf of two continuous RVs and what is Multivariate Normal Distribution?
Module-21: Simple Statistics
In this module, we will introduce a few measures of statistical and how they are used in the analysis.
- What is the need of sampling, its basic concepts and what are the two parts of statistical analysis?
- What is Mean and Median and mode?
- What are the measures of spread and properties of sample mean and variance?
- What are the different types of plots for graphical analysis?
Module-22: Hypotheses Testing
In this module we will try to introduce the basics of Hypothesis testing, some motivation for hypothesis testing, we look at some cases of hypothesis testing.
- What is the motivation behind Hypotheses Testing, what is hypotheses testing and its procedure?
- What are one-sided and two-sided tests?
- What are the different errors in Hypotheses testing?
- How are hypotheses testing for mean illustrated using an example?
- How are hypotheses testing for differences in mean illustrated using an example?
- How are hypotheses testing for differences in variance illustrated using an example?
Module-23: Optimization for Data Science
We will start with a general description of the optimization problem and then we will point out the relevance of understanding this field of optimization from a data science perspective, we will also introduce various types of the optimization problem, and we will focus on the Univariate optimization problem.
- What is the concept of Optimization?
- What are the components and types of optimization problem?
- What is Univariate Optimization problem and what is the concept of Local and Global Optimum?
- What are the conditions for Local Optimum in Univariate Optimization Problem?
Module-24: Unconstrained Multivariate Optimization
Unconstrained multivariate optimization problem, analytical conditions for the minimum multivariate problem, conditions in the univariate case translate to the multivariate case.
- What is Multivariate Optimization problem?
- What is the concept of Local and Global Optimum in Multivariate Optimization Problem?
- What are the conditions for Local Optimum in Univariate Optimization Problem?
Module-25: Unconstrained Multivariate Optimization (Continued)
In this module, we will learn Directional search for solving an Unconstrained multivariate optimization problem.
- How to use a directional search to solve a multivariate optimization problem?
- How to mathematically interpret the solution to the multivariate optimization problem?
- What are Steepest descent and optimum step size?
Module-26: Gradient (Steepest) Descent (OR) Learning Rule
The numerical example of how the gradient descent works in optimization in many cases this is also called the learning rule.
- What is the first step in the learning rule?
- What is the second and third step in learning rule?
- What is the fourth step in the learning rule?
Module-27: Multivariate OPtimization with Equality Constraints
In this module, we will study how to solve the Multivariate optimization problem with equality constraints and effect of equality constraints on the optimal solution.
- What is Multivariate optimization problem with equality constraints (part 1)?
- What is Multivariate optimization problem with equality constraints (part 2)?
- What is Multivariate optimization problem with equality constraints (part 3)?
Module-28: Multivariate OPtimization with Inequality Constraints
In this module, we will study how to solve the Multivariate optimization problem with inequality constraints and the effect of inequality constraints on the optimal solution.
- What is Multivariate optimization problem with inequality constraints (part 1)?
- What is Multivariate optimization problem with inequality constraints (part 2)?
- What is Multivariate optimization problem with inequality constraints (part 3)?
- What is Multivariate optimization problem with inequality constraints (part 4)?
- What is Multivariate optimization problem with inequality constraints (part 5)?
- What is Multivariate optimization problem with inequality constraints (part 6)?
Module-29: Introduction to Data Science
The objective of this module is to learn about the various techniques in data science, types of problems and reasons for various techniques available in data science.
- What are the various techniques used for solving problems in Data Science?
- What are classification problems (part 1)
- What are classification problems (part 2)
- What are the functional approximation problems?
- Why there are many techniques for solving two types of problems (part 1)?
- Why there are many techniques for solving two types of problems (part 2)?
Module-30: Solving Data Analysis Problems - A Guided Thought Process
In this module we are going to take a very simple example and then illustrate how you should think about solving data science problems and end of it, we will come up with a flowchart that is useful.
- How to solve Data Analysis Problem (part 1)?
- How to solve Data Analysis Problem (part 2)?
- How to solve Data Analysis Problem (part 3)?
- What is the conceptual framework for solving Data Analysis Problems?
Module-31: Module: Predictive Modeling
We are going to introduce the notion of correlation and its types, what they are useful for.
- What are Correlation and its various measures?
- What is Pearson's Correlation and how to apply it to Anscombe's data?
- What is Spearman Rank Correlation and how to apply it to Anscombe's data?
- What is Kendall Rank Correlation Coefficient and how to apply it to Anscombe's data?
Module-32: Linear Regression
In this module, we are going to introduce Regression and its process and also the method of linear regression technique for analyzing data and building models.
- What are Regression and its types?
- What are the regression methods and its process?
- How is the Concept of Ordinary Least Squares (OLS) applied to Linear Regression Model?
- How is the Concept of Ordinary Least Squares (OLS) applied to Linear Regression Model (continued)?
- How to test the goodness of fit of OLS Model?
Module-33: Model Assessment
In this module, we are going to assess whether the linear model we have developed actually fitted is reasonably good or not and decide whether the coefficients of the linear model are significant.
- What questions to be asked in the assessment of an OLS model?
- What are the properties of the estimates?
- What are the confidence intervals on regression coefficients how to perform hypotheses test on them?
- What are the definitions for Sum Squared Quantities and what is F-Test for selecting a model?
- How Is F-Test applied to an example in R?
Module-34: Diagnostics to Improve Linear Model Fit
In this module, we will assess the linear model on Anscombe data sets and another way of assessing whether linear is adequate or not is called residual plots.
- What are the drawbacks of applying Linear Model to Anscombe's dataset?
- What are residual plots and how they're used for assessment of models?
- How are residuals used for checking normality of errors, non-uniform error variance, and outliers in data?
- How is outlier detection illustrated with the help of an example?
Module-35: Simple Linear Regression Model Building
In this module we are going to implement simple Linear regression in R as a part of this module we are also going to look at loading the data from the .txt file, plot the data, build the linear model, and interpret the summary of the model.
- How to load and view the data, what is its structure and how to visualize it?
- How to build a Simple Linear Regression Model?
Module-36: Simple Linear Regression Model Assessment
In this module we are going to look at simple linear regression model assessment as a part of this we are also going to look at identifying significant coefficients in the linear model.
- What is the First Level Model Assessment?
Module-37: Simple Linear Regression Model Assessment (Continued)
The second level of model assessment as a part of this we are going to see if we can improve the quality of the linear model and can we identify bad measurements in the data(outliers).
- What are outliers and how to identify them by residual analysis?
- How to remove outliers, check for the need of refinement and build the refined model?
Module-38: Multiple Linear Regression
The objective of this module is to learn Multiple Linear Regression problems which consist of one dependent variable, but several independent variables, and solving multiple linear regression problem.
- What is the Multiple Linear Regression Problem?
- How to solve the Multiple Linear Regression Problem (part 1)?
- How to solve the Multiple Linear Regression Problem (part 2)?
- How to solve the Multiple Linear Regression Problem (part 3)?
- How to solve the Multiple Linear Regression Problem (part 4)?
Module-39: Cross-Validation
In this module, we will try to learn cross-validation, which is very useful in model building and use cross-validation on validation data set to determine the optimal numbers of parameters.
- What is the motivation behind cross-validation and what is Bias-Variance trade-off on the test data set?
- What are Training and Validation Datasets and what is a Validation Set Approach and its example?
- How sampling of small data sets is done and what is Leave-one-out-cross-validation (LOOCV) and k-Fold Cross Validation?
Module-40: Multiple linear regression modeling building and selection
We are going to build multiple linear regression model we are also going to look at the model summary and identify insignificant variable and discard them and rebuild the model, we also look at the model selection.
- How to load, read and view the data, and how to plot a pairwise scatter plot for it?
- How to build the Multiple Linear Regression Model?
Module-41: Classification
In this module, we will see the various classification problems and some characteristic of classification problems.
- What does classification and what are Binary and Multi-Class classification problems?
- What are Linearly Separable and Non Linearly Separable problems?
- How to solve classification problems?
Module-42: Logistic Regression
In this module, we will learn the basic idea of Logistic Regression.
- What is Logistic Regression and what are the various aspects of a Binary classification problem?
- What are Linear and Log models and what is the sigmoid function?
- How to estimate parameters?
- What is the Log-likelihood function?
Module-43: Logistic Regression (Continued)
In this module, we will take a very simple example with several data points to show how logistic regression works in practice and I will also introduce a notion of regularization which would help in avoiding overfitting when doing logistic regression.
- What are a Logit Model and its example problem?
- How is the problem solved using Logistic Regression?
- What is Regularization in Logistic Regression?
Module-44: Performance Measures
The objective of this module is to see about typical performance measures that are used once a classifier is built and also see ROC curve.
- What is the result of running an R code for any classifier?
- How to measure performance?
- How are the performance parameters illustrated through an example?
- What is ROC?
Module-45: Logistic Regression Implementation in R
In this module we are going to look at a case study and a problem statement associated with it, we are also going to solve the case study using R.
- What is the Automotive Crash Testing problem?
- How to solve the Automotive Crash Testing problem using R?
- How to build Logistic Regression model and find the odds for the Automotive Crash Testing problem?
- How to plot the probabilities and what is the confusion matrix?
Module-46: K - Nearest Neighbors (kNN)
In this module, we are going to understand the very powerful classification algorithm called the k-nearest neighbors and also understand different things to consider before applying this algorithm.
- What is a k Nearest Neighbor (kNN)?
- What are the assumptions and algorithm for kNN?
- How kNN is illustrated?
- What are the different things to be considered before applying kNN algorithm and how to select parameters?
Module-47: K - Nearest Neighbors implementation in R
In this module, we are going to look at a case study to implement K-NN algorithm and a problem statement associated with it, we are also going to solve the case study using R.
- What is the problem statement for the case study of Automotive Service Company?
- How to solve the case study problem of Automotive Service Company using R (Part 1)?
- How to solve the case study problem of Automotive Service Company using R (Part 2)?
- How to implement k-Nearest Neighbors using knn() function and how to apply knn algorithm on data?
- What are the results of applying the knn algorithm?
Module-48: K - means Clustering
The objective of this module is to illustrate the concept of K-means clustering and its disadvantages.
- What are K-means Clustering and its description?
- How K-means Clustering Algorithm works and its example?
- How to determine the number of Clusters (K) and what are the disadvantages of K-means?
Module-49: K - means Implementation in R
In this module, we are going to look at a case study to implement K-means clustering algorithm and a problem statement associated with it, we are also going to solve the case study using R.
- What is the problem statement for the case study of Clustering of trips and its solution?
- How to implement k-means clustering using kmeans() function and its results?
Module-50: Data Science for Engineers - Summary
the quick summary of the course, the next logical step after learning this course.
- What is the overall course summary and what is the next logical step after learning this course?
Data Science for Engineers - Final Quiz

R Programming – A Practical Approach

Topics to be covered

R- Basics
In this Chapter, we have started with step-by-step Installation of R and the R Studio which is a GUI based IDE for R language. We have also explained package installation on R, built in datasets in R, manual data entry, data importing, tabular to row data conversion. We have also looked at the default colors present in the data and a more elaborate color options named "Colorbrewer".
- Steps to Install 'R'
  In this lesson, you will learn some of the steps to install R in your system.
- R-Studio Installation
  In this lesson, you will learn to install R Studio, which is a GUI based Integrated Development Environment (IDE).
- Using R Materials
  In this lesson, you will learn to download R materials and then use these R Materials in R Studio.
- R-Studio Interface
  In this lesson, you will learn about the different interfaces of the R Studio such as R Script, Console Section, R Environment and Graphical Output Section.
- Steps to Install Packages
  In this lesson, you will learn about various packages that are available in R and how to use them.
- Default Data-Sets in R
  In this lesson, you will learn about the default datasets which are already installed in R. These are those packages which are by default installed and loaded in R.
- Manual Data Entry
  R Programming provides different ways to enter the data manually. In this lesson, you will learn about manual data entry in R.
- Data Importing
  In R, there are different cases in which the data is required to be imported in order to use it. In this lesson, you will learn to import the data.
- Tabular to Row Data Conversion
  In R, the data has to be stored in a specific format so that it can be easily understood and used. In this lesson, you will learn to arrange the data in rows and columns.
- R - Colors
  We use R color for R data manipulation, with the help of R Color, our graphical output looks a lot better. In this lesson, you will learn about R Color.
- Overview - 'Colorbrewer'
  In this lesson, you will learn about an external package named RColorbrewer. By installing this package we can use R color brewer.
- Colors in R: Summary
  In this lesson, we will summarize what we have learned so far in R Color, and will discuss about other applications of R Color.
Introduction to Charts
This chapter covers the details about various charts in R. R programming has multiple libraries which can be used to create charts like Bar charts, pie charts, histograms, box-plots etc. A bar graph or a bar chart is the representation of data in bars. On other hands, a pie chart is the representation of data or values as sectors within the circle each represented with a different color to distinguish them. Box-plot is used for getting information about possible outliers in the data sample. Various ways to save plots as images has also been explained in the unit.
- Bar Charts
  R language is mostly famous for graphical representation. A Bar Chart is a very good example of this. In this lesson, you will learn about the Bar Charts.
- Pie Charts
  In this lesson, you will learn about Pie Charts for graphical representation. A Pie Chart is also a very good source of data representation.
- Histograms
  The histogram is suitable for visualizing distribution of numerical data over a continuous interval, or a certain time period. In this lesson, you will learn about Histograms.
- Box-Plots
  When there is a requirement of possible outliers while analyzing the data, then Box plots are used. In this lesson, you will learn about Box plots.
- Customized Graphs
  In this lesson, you will learn to customize the graphs and also see the effect of customization on your graph.
- Images
  In this lesson, you will learn to present the data in the image format. In order to do that, you will first import the data in the image form and then present the data through the image.
- Layering Plots: Summary
  In this lesson, you will get the summary for the plotted datasets by using the Layered Plots.
Introduction to Statistics
This chapter covers the basic concept of statistics viz frequencies, descriptive, hypothesis testing and chi-square testing in R programming. The frequency distribution of a data variable is the count of data that is occurring within a collection of non-repeated categories. Descriptive statistics gives summary statistics of the data and is the basis of advanced analysis of data. We then had a look on inferential statistics methods. In this unit, we have explained single proportion testing, single mean testing and Chi-square testing, which is used to infer results based on the sample data characteristics and hypothesized values. We have also done a univariate analysis to find patterns in the data.
- Frequencies
  In this lesson, you will now learn to calculate the frequency of data and analyze the data after changing it from frequency to density.
- Descriptives
  In this lesson, you will now learn about the descriptive statistics. These are those figures which are used for summarizing the data.
- Single Proportion Testing
  In this lesson, you will learn something about inferential statistics, for which you will now be making use of the single proportion testing.
- Single Mean Testing
  In this lesson, you will learn about single mean testing. Single mean testing performs mean test for a sample in comparison to an aim value.
- Chi-Square Test
  In this lesson, you will learn about the Chi-Square test. This is the test which is used to determine the goodness for fit for the categorical variable.
- Univariate Analysis
  In this lesson, you will learn about Univariate Analysis Data, which is used to present the sample of a variable in a amazing way.
- Descriptive Statistics: Summary
  In this lesson, you will get the summary about Descriptive Statistics through a dataset.
Manipulating Data
This chapter covers the details of working with data. We can have outliers in the data and its treatment is explained. Outliers are those observations which occur very infrequently and might be the result of errors while observing. Proper treatment of data is necessary for the unbiased result. This might includes subsetting, sorting, extracting unique observations renaming variables, creating new variables etc. Each of these tasks can be accomplished using the set of newly introduced packages.
- Outliers
  In this lesson, you will learn to treat the present outliers in the data. In order to do this, you will use a categorical data to understand the outliers.
- Transformation of Variables
  In this lesson, you will learn to transform the variable to fit better in the assumption of data analysis.
- Composite Variables
  In this lesson, you will demonstrate the functionality of the composite variable using the random variable.
- Working with Missing Data
  In this lesson, you will learn to deal with the missing data which are also seen often in your data. You need to treat them in such a way that your figures are not biased.
- Working with Outliers: Summary
  In this lesson, you will get the complete summary of whatever you have learned so far in outliers.
Managing Huge Data
In this chapter, we have worked with cases, subgroups and files. Any data set is like an enclosed or shelled collection. It consists of cases which are exactly the objects in the same collection with each case having one or more attributes or qualities known as variables. This lesson covers working with subgroups and merging files. Merging means that different datasets or files are combined together within a single dataset or file. R programming includes the method to merge the files.
- Working with Cases
  In this lesson, you will learn the method of customizing your analysis for a particular parameter in a set of data.
- Working with Subgroups
  In this lesson, you will look at the demonstration of by which you can obtain all the descriptive calculations of all the values of a particular variable at a time.
- Working with Files - Merging
  In this lesson, you will learn about a very useful method of combining the different data in a same unit. This method is called as Merging.
- Working with Subgroups: Summary
  In this lesson, you get a complete summary of all the analysis of this section with subgrouping.
Association: Presentation
In this chapter, the Bar charts, Box plots and scatter plots have been demonstrated. A bar chart or a bar graph represents the data with the help of bars or rectangles. The values of the variables are determined by the height or length of the rectangle be it vertical or horizontal. A box plot is an exploratory graphic which enables us to encapsulate the features of quantitative variables. A scatter plot pairs up the values of two quantitative variables in a dataset and represent them as geometric points in the Cartesian diagram.
- Bar Charts
  In this lesson you will learn about the different ways to analyze your data with the help of Bar charts.
- Box Plots
  In this lesson, you will make use of the iris dataset to summarize and present data with the help of Box Plots.
- Scatter Plots
  In this lesson, you will explore the quantitative relation between the variables with the help of a scatter plot using the iris dataset and the swiss dataset.
- Working with Plots: Summary
  In this chapter we will summarize all the information about the appropriate section.
Associations: Statistics
In this chapter, the statistical concepts like correlation, regression, proportions etc have been covered in detail. A correlation is a statistical method or technique to display if there is a relation between pairs of variables or how strongly the pairs of variables are related. Regression is the most critical fundamental tool for statistical analysis frequently used in various research fields. Bivariate regression is the simplest linear regression procedure. Then we also demonstrated a few tests as well in the later part of the chapter such as T-test, one-factor analysis of variance, proportions etc.
- Correlation
  In this lesson, you will learn about the correlation. In mathematical terms, Correlation is equivalent to the covariance of the two variables divided by the product of the standard deviation of each data sample.
- Bivariate Regression
  In this lesson, you will explore Bivariate Regression with the help of appropriate line regression and vector equations.
- T-Test
  In this lesson, you will learn about T-Statistics by comparing the calculated values of two samples with the T-test using the Iris Dataset.
- Paired T-Test
  In this lesson, you will learn to examine the difference between two samples by creating two Random Variables, through Paired T-test.
- ANOVA
  In this lesson, you will learn to test the similarities of the content from two populations or groups with ANOVA test.
- Proportions
  In this lesson, you will learn to compare the categorical groups with the help of proportion.
- Chi-Square Test
  In this lesson, you will learn to make use of the Chi-Square test to perform independent testing between the two specific variables.
- Statistics for Bivariate Associations
  In this lesson, you will learn about the statistics of Bivariate Associations using some packages available in R.
- Association Stats: Summary
  In this lesson, you will get the summary of all the testing and other statistics of the appropriate section which you have used in this section.
Advanced Charts
In this chapter, the method of creating bar charts for mean, scatter plots for grouped data, scatter plot matrices and a very interesting and visual 3D scatter plot have been covered in detail..
- Bar Charts for Mean
  In this lesson, you will learn about drawing the bar charts for multiple variables defined by different categories.
- Scatter Plots for Grouped Data
  In this lesson, you will learn to plot a Scatter Plot for multiple variables by loading the CSV file in R.
- Scatter Plot Matrices
  In this lesson, you will learn to plot a Scatter Plot by loading the matrix data.
- 3D Scatter Plots
  In this lesson, you will learn to plot a grouped data with the help of 3D Scatter Plot.
- Charts for Multiple Variables: Summary
  In this lesson, you will get the complete summary of various plotting techniques and Bar Charts.
Multiple Variable Statistics
In this chapter, some relatively advanced topics such as multiple regression, two factor ANOVA, cluster analysis and principal component & factor analysis have been covered in detail. These topics are very important specially multiple regression which is used very extensively in research papers and industry to establish the relationship between variables.
- Multiple Regression
  In this lesson, you will learn about Multiple Variable Statistics, which is the most common tool of Multiple Regression with the help of an inbuilt database. This tool is used to analyze the data.
- Two-Factor ANOVA
  In this lesson, you will learn to analyze two-factor ANOVA with the help of toothgrowth, through which you will learn to interact between two-categorical terms.
- Cluster Analysis
  In this lesson, you will learn about cluster analysis which creates clusters or groups based on the values of variables.
- Principal Component / Factor Analysis
  In this lesson, you will learn to search the components using Principal component analysis which will explain the most viewed variations in the data.
- Multiple Variable Statistics: Summary
  In this lesson, the course will be concluded with the summary of whatever has been covered in this section so far.
- R Programming Final Quiz
  In this lesson, there is a Quiz containing a set of questions. This test is for self evaluation of the candidate of the overall understanding of the course content. The course will not be considered successfully completed if this quiz is omitted or ignored.

Topics to be covered

Module_1: Introduction, Knowledge Discovery Process
Here, we have looked at data mining, its motivations followed by the drawbacks of traditional data analysis, further, a discussion on data and data functionalities has been done along with the study of the process of knowledge discovery and the issues in data mining. Finally, the typical architecture of Data Mining has been covered in detail.
- Why is Data Mining important?
- What is Data Mining, what are the drawbacks of Traditional Data Analysis?
- Data mining is done on what kinds of data, and what are the functionalities of data mining?
- What is the process of Knowledge Discovery in Databases (KDD)?
- What are the major issues in Data Mining and what is the typical architecture of Data Mining?
Module_2: Data Preprocessing - I
Here, we have looked at the data and different types of attributes, its properties including the different types of data sets.
- What is Data?
- What are the different types of attributes?
- What are the properties of attribute values?
- What are Discrete and Continuous Attributes and what are the various types of data sets?
- What are the different types of Data?
Module_3: Data Preprocessing - II
The main objective of this lecture is to understand the issues that are considered before performing the preprocessing along with some of the preprocessing techniques.
- What are the various data quality problems?
- What are the different kinds of preprocessing algorithms (part 1)?
- What are the different kinds of preprocessing algorithms (part 2)?
- What are the different kinds of preprocessing algorithms (part 3)?
Module_4: Association Rules
The focus of this lecture is on understanding the association rule mining and the different steps of discovering association rules.
- What is Association Rule Mining?
- What is the different set of steps in discovering Association rules (part 1)?
- What is the different set of steps in discovering Association rules (part 2)?
Module_5: Data Frames
Here, we have looked at the Frequent itemset generation which is computationally expensive, then we have covered Apriori principle and its algorithm.
- What is the concept of Frequent Itemset Generation?
- What is Apriori Principle and its Algorithm?
Module_6: Rule Generation
Here, we have continued to look at the Apriori algorithm and also covered Rule generation for Apriori algorithm along with pattern evaluation followed by seeing these evaluations in term of the interestingness.
- What is the Apriori Algorithm (continued)?
- How to efficiently generate rules from frequent itemsets and for Apriori Algorithm?
- How to evaluate patterns and compute interestingness measure?
Module_7: Classification
Here, we have covered the classification means and classification task along with the classification techniques.
- What is the Classification (part 1)?
- What is the Classification (part 2)?
Module_8: Decision Tree - I
The focus of this lecture is to understand Decision trees along with the study of the representation of rules in Decision trees.
- What is Decision Tree and Classification Task?
- What is a Decision Tree Algorithm?
- How to implement a Decision Tree?
Module_9: Decision Tree - II
Here, we have looked at obtaining a decision tree for classification problem including an example of the same.
- How to create a Decision Tree (part 1)?
- How to create a Decision Tree (part 2)?
- How to create a Decision Tree (part 3)?
Module_10: Decision Tree - III
Here, we have continued looking at obtaining a decision tree and also covered the Top-down construction rule for obtaining a decision tree for classification problem.
- How to create a Decision Tree (part 4)?
- How to use Top-Down Construction rule in Decision Tree Creation?
- What is the best attribute to split and what is the principle of Decision Tree Construction?
- What is Entropy?
Module_11: Decision Tree - IV
We have continued looking at obtaining a decision tree and also covered the problem with obtaining decision tree along with few extensions of the basic tree algorithm and some of the advantages of the decision.
- What is the concept of Decision Tree Pruning and Decision Tree Extensions?
Module_12: Bayes Classifier I
We have covered probability distribution through an example and looked at some of the important concepts, which is known as Class Conditional Probabilities.
- Class Conditional Probabilities example (part 1)
- Class Conditional Probabilities example (part 2)
- Class Conditional Probabilities example (part 3)
Module_13: Bayes Classifier II
Here, we have looked at the Posteriori Distribution on the previous example and we have covered the MAP representation of Bayes classifier and MAP multiple classifiers along with Multivariate Bayes classifier.
- What is Posteriori Probability?
- How is MAP representation of Bayes Classifier and Multiclass classifier done?
- What is Multivariate Bayes Classifier?
Module_14: Bayes Classifier III
Here, we have continued to look at the Multivariate Bayes classifier and its special case.
- What is Multivariate Bayes Classifier (part 1)?
- What is Multivariate Bayes Classifier (part 2)?
- What is Multivariate Bayes Classifier (part 3)?
Module_15: Bayes Classifier IV
Here, we have looked at the types of distances measurement between two distribution, and an example of a Bayes classifier along with the Naive Bayes classifier.
- What is the different type of distances between two distributions?
- Example of Bayes Classifier
- What are the Naive Bayes Classifier and its example?
Module_16: Bayes Classifier V
We have continued to look at the Naive Bayes classifier and its example, we have also looked at the conditional independence and an exercise on it along with the comprehensive look at the Directed acyclic graph (DAG).
- Example of a Naive Bayes Classifier (part 1)
- Example of a Naive Bayes Classifier (part 2)
- What is Conditional Independence?
- Exercise on Conditional Independence
Module_17: K Nearest Neighbor I
Here, we have covered the Classification algorithm called the K nearest neighbor classifier.
- Recap of Bayes Classifiers
- K Nearest Neighbour Classifiers
Module_18: K Nearest Neighbor II
The focus of this lecture is to understand the basics of K Nearest Neighbor and also understand the Voronoi diagram of the Nearest Neighbor. Also, we have looked at the distance-weighted K-NN followed by different issues of nearest-neighbor classifiers.
- What is the definition of nearest neighbors and Voronoi Diagram?
- What is Distance Weighted K Nearest Neighbour Rule and how to predict continuous values?
- What are the issues in Nearest Neighbour Classifiers?
Module_19: K Nearest Neighbor III
Here, we have looked at K-nearest neighbor classification technique (KNN) and the computational complexity of KNN followed by reduction of computational complexity.
- Example of K Nearest Neighbour (KNN) Classifier (part 1)
- Example of K Nearest Neighbour (KNN) Classifier (part 2)
- What is the computational complexity in KNN Classifiers?
- What is Condensing and Condensed Nearest Neighbour?
Module_20: K Nearest Neighbor IV
Here, we have covered the reduction of computational complexity using High dimensional search and also looked at the K dimensional tree structure along with some alternate terminologies in KNN.
- What is the concept of High Dimension Search?
- What is a KD-tree and how it is used for range search?
- What are the alternate terminologies in KNN?
Module_21: K Nearest Neighbor V
Here, we have looked at the classification algorithms to know which one is better and which one should be chosen.
- How to evaluate a classifier?
- What are the metrics for performance evaluation?
- What are the methods for performance evaluation and model comparison?
Module_22: Support Vector Machine - I
The main objective of this lecture is to understand the discriminant analysis and the case of Linear discriminants, which means that we have 2 features and 2 classes as well, we want to draw a line which will separate this.
- What is a Discriminant Analysis?
- What is Linear Discriminant Analysis and Design (part 1)?
- What is Linear Discriminant Analysis and Design (part 2)?
- What is Linear Discriminant Analysis and Design (part 3)?
Module_23: Support Vector Machine - II
Here, we have continued to look at the linear discriminate (Linear separators).
- What are Linear Separators (part 1)?
- What are Linear Separators (part 2)?
- What are Linear Separators (part 3)?
- What are Linear Separators (part 4)?
Module_24: Support Vector Machine - III
Here, we have again looked at linear discriminate(Linear separators), bad and good decision boundaries and then we have covered the way of getting the line with the highest margin which will give the equation of the line.
- What are Linear Separators (part 5)?
- What are a good decision and a bad decision boundary?
- How to choose the optimal linear separator (part 1)?
- How to choose the optimal linear separator (part 2)?
Module_25: Support Vector Machine - IV
Here, we have covered the Primal and Dual optimization problem followed by understanding the solution of the dual optimization problem. We have also covered the concept of quadratic programming (QP).
- What is Primal Optimization Problem and dual problem?
- What is Dual Optimization Problem (part 1)?
- What is Dual Optimization Problem (part 2)?
Module_26: Support Vector Machine - V
Here, we have covered the Quadratic programming (QP) problem and looked at Karush–Kuhn–Tucker (KKT) theorem that can be used for solving QP problem.
- What is the Quadratic Programming (QP) problem?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 1)?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 2)?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 3)?
Module_27: Kernel Machines
Here, we have introduced the concept of slack variable, soft and hard margin in separable cases followed by the optimizations problems. We have also looked at the kernel machine for solving the non-linearly separable class problem.
- What is the concept of Slack Variable?
- What is the hard and soft margin in an inseparable class and what is optimization problem for non-separable classes?
- What is the Dual optimization problem for Soft Margin Hyperplane?
- What is the problem of Non linearly separable class and how it is solved using Kernel machine?
Module_28: Artificial Neural Networks I
The main objective of this lecture is to study the Neural networks and its connectionism and Biological neuron. We have also covered the Artificial neural network and its simplest model which is Perceptron.
- What are Neural Networks, what is Connectionism and Biological Neuron?
- When Artificial Neural Networks (ANNs) are to be considered and what is the Perceptron?
- What is Perceptron (Continued)
- How Does Perceptron work as a Linear Discriminant?
Module_29: Artificial Neural Networks II
Here, we have looked at the mechanism of finding the correct set of weights to solve a prediction problem.
- What are the training rules to be followed for determining weights in ANNs?
- What is the Perceptron training rule?
- What is the Delta rule and Gradient Descent?
- What is the Gradient Descent Technique and Algorithm?
Module_30: Artificial Neural Networks III
Here, we have explained the difference between Perceptron and Gradient Descent algorithm and also explained about the logic gates that can be realized with the perceptron model, and also explained Multilayer Perceptron.
- Comparison between Perceptron and Gradient Descent algorithm
- How to realize logic gates using Perceptron?
- What are Multi-Layered Perceptrons (MLP)?
- What are MLPs (continued)?
Module_31: Artificial Neural Networks IV
Here, we have covered Sigmoid unit, and weight update rule for the multilayer perceptron along with the issues with ANNs and extension of ANNs.
- How to use a Sigmoid function in Multi-Layered Perceptron and its training rules?
- What is Forward and Back Propagation?
- What is the error gradient for a sigmoid unit?
- What is the procedure of backpropagation?
Module_32: Clustering I
Here, we have looked at the basics of Clustering and Scatter coefficient for observing the goodness of clustering including an understanding of Hierarchical & Partitional clustering.
- What is Clustering?
- How to find groups of similar objects?
- What is the distance measure?
- What is partitional clustering?
- What is Hierarchical clustering?
Module_33: Clustering II
Here, we have covered the desirable properties of the clustering algorithm, Hierarchical Agglomerative & Hierarchical Divisive clustering. We have also looked at the three ways of measuring distance b/w two clusters.
- What are the desirable properties of Clustering algorithm and what is the Hierarchical Agglomerative Clustering?
- What is the Hierarchical Divisive Clustering and how to measure closeness between two clusters?
- What is Single Linkage Clustering, its advantages and disadvantages?
Module_34: Clustering III
Here, we have looked at the K-means clustering algorithm and its Example
- What K - Means Clustering Algorithm (part 1)?
- What K - Means Clustering Algorithm (part 2)?
- Example of K - Means Clustering Algorithm
Module_35: Clustering IV
The main objective of this lecture is to learn about Idea of Density-based clustering, and density-based algorithm that is DBSCAN (Density-based special clustering acronym)
- What is Density-based Clustering algorithm?
- How to measure the density of a point?
- Some definitions related to Density-Based Clustering
- What is the DBSCAN Algorithm?
Module_36: Clustering V
Here, we have covered the Hybrid clustering algorithm that is CLARA followed by evaluating clustering algorithms.
- What is Hybrid Clustering Algorithm?
- What is Cluster Validity, what are the different aspects of Cluster Validation and what are the measures of Cluster Validity?
- What is Scatter Coefficient and what are internal and external measures of Cluster Validity?
Module_37: Regression I
The focus of this lecture is to understand the concept of Regression problem, univariate and multivariate regression of regression model along with the most common technique of regression called as Linear Regression.
- What is Regression?
- What is Univariate and Multivariate Regression?
- What is a Regression Model?
- What is Linear Regression?
Module_38: Regression II
Here, we have looked at the Linear Regression model.
- What is Linear Regression Model (part 1)?
- What is Linear Regression Model (part 2)?
- What is Linear Regression Model (part 3)?
- What is Linear Regression Model (part 4)?
Module_39: Regression III
Here, we have continued to look at the Linear Regression model and some of the limitations of Linear regression followed by the Non-Linear regression.
- What is the Error in Linear Regression Model?
- How to solve Error in Linear Regression Model?
- What are the limitations of Linear Regression model what is Non-Linear Regression?
Module_40: Regression IV
Here, we have covered the problem of Over-fitting, Ochams razor principle and the Time series prediction problem along with its solution.
- What is the problem of Overfitting?
- How Complexity and Goodness of Fit are compared and what is Ochams Razor Principle?
- What are Complexity and Generalization?
- What is Training, Validation and Test Data and what is time series prediction problem?
Module_41: Dimensionality Reduction I
- What is the purpose of Dimensionality Reduction?
- What are the Dimensionality Reduction Techniques?
- What is the Evaluation Index, Kullback-Leiber Divergence?
- What are the search algorithms to find the best subset and what are the techniques of feature subset selection?
Module_42: Dimensionality Reduction II
- What is a Feature Selection?
- What is Feature Extraction Problem?
Module_43: Tutorial
- Basics of R programming (part 1)
- Basics of R programming (part 2)
- How to use Apriori Algorithm for accessing Association Rules from a dataset?
- How to Generate the Decision Trees?
- How to apply the R program to use K-Means Clustering?
- How to classify the data based on Naive Bayesian classification?
Data Mining - Final Quiz