Learn to Deal with concrete datasets and analysis of data. Develop programs to gather, clean, analyze, and visualize data.
Course Fee 30,000 + 18% GSTEnroll and Pay Now
Become an Expert
18 % GST Extra
Skills you will gain
Programming Languages, Tools & Libraries Covered
About this Specialization
This Specialization builds on the success of the Data Scientist course and will introduce fundamental R programming concepts that are focussed more on practical learning rather than theoretical. Dealing with concrete datasets and analysis of data, using the R programming language. In the Project, you’ll use the technologies learned throughout the Specialization to design and create your own applications for data retrieval, processing, and visualization.
How the Specialization Works
Prutor.ai Specialization is a series of courses that help you master a skill. To begin, enroll in the Specialization directly, or review its courses and choose the one you'd like to start with. When you subscribe to a course that is part of a Specialization, you’re automatically subscribed to the full Specialization. It’s okay to complete just one course — you can pause your learning or end your subscription at any time. Visit your learner dashboard to track your course enrollments and your progress.
Every Specialization includes a hands-on project. You'll need to successfully finish the project(s) to complete the Specialization and earn your certificate. If the Specialization includes a separate course for the hands-on project, you'll need to finish each of the other courses before you can start it.
Earn a Certificate
When you finish every course and complete the hands-on project, you'll earn a Certificate that you can share with prospective employers and your professional network.
Courses in this Specialization
In this module, we will see course objectives and expected the outcome of the course.
Introduce R as a programming language to perform data analysis and the brief introduction of R studio.
In this module, we will see adding comments to R file, clear environment of R studio and save the workspace of R.
In this module, we are going to see the rules for naming the variables in R, basic data types that are available in R and we are also going to see two basic R objects-Vectors and Lists in detail.
In this module, we are going to introduce Data frame objects of R and perform some operation on the data frame.
The recasting of a data frames means, need to recast data frames and then look at more sophisticated operations on data frames such as Recasting and Joining of data frames.
In this module, we are going to do Arithmetic, Logical, and Matrix operations in R.
We are going to introduce the Functions in R and explain how to load or source the functions and how to call or invoke the functions, we are also going to see passing arguments to functions.
In this module we will see the functions with MIMO, loading and call a function, we also see about inline functions and looping over objects using commands such as apply, lapply, and tapply.
We are going to study about if-else-if family, constructs for loop, Nested for loops, for loop with break and while loop.
In this module, we are going to show the generation of basic graphics such as scatter plot, line plot, bar plot using R and also give the brief idea of the need for sophisticated graphics.
In this module we will learn about Linear algebra and matrices, also learn Identification of independent attributes and the linear relationship among attributes.
In this tutorial session, we will solve some matrix equations problem.
In this tutorial session, we will solve some matrix equations problems.
In this module, we will learn Vector with the notion of distance, and then learn Unit, orthogonal, Orthonormal, and Basis vectors by their example.
we are going to look at the representation of line and plane in geometrically and the concept of projection with its example and we are also looking at the generalization of projection.
In this module, we are going to be looking at Hyperplanes, Halfspace, Eigenvalues and Eigenvectors with their examples.
The objective of this module is to learn about Connections between eigenvectors, column space, and null space.
we will go on to characterizing random phenomena what they are and how probability can be used as a measure for describing such phenomena.
In this module we will go to introduce the notion of Random variable and the idea of probability mass and density function, we also see how to characterize these functions, properties of PDF, computation of probability using R, Multivariate normal distribution.
In this module, we will introduce a few measures of statistical and how they are used in the analysis.
In this module we will try to introduce the basics of Hypothesis testing, some motivation for hypothesis testing, we look at some cases of hypothesis testing.
We will start with a general description of the optimization problem and then we will point out the relevance of understanding this field of optimization from a data science perspective, we will also introduce various types of the optimization problem, and we will focus on the Univariate optimization problem.
Unconstrained multivariate optimization problem, analytical conditions for the minimum multivariate problem, conditions in the univariate case translate to the multivariate case.
In this module, we will learn Directional search for solving an Unconstrained multivariate optimization problem.
The numerical example of how the gradient descent works in optimization in many cases this is also called the learning rule.
In this module, we will study how to solve the Multivariate optimization problem with equality constraints and effect of equality constraints on the optimal solution.
In this module, we will study how to solve the Multivariate optimization problem with inequality constraints and the effect of inequality constraints on the optimal solution.
The objective of this module is to learn about the various techniques in data science, types of problems and reasons for various techniques available in data science.
In this module we are going to take a very simple example and then illustrate how you should think about solving data science problems and end of it, we will come up with a flowchart that is useful.
We are going to introduce the notion of correlation and its types, what they are useful for.
In this module, we are going to introduce Regression and its process and also the method of linear regression technique for analyzing data and building models.
In this module, we are going to assess whether the linear model we have developed actually fitted is reasonably good or not and decide whether the coefficients of the linear model are significant.
In this module, we will assess the linear model on Anscombe data sets and another way of assessing whether linear is adequate or not is called residual plots.
In this module we are going to implement simple Linear regression in R as a part of this module we are also going to look at loading the data from the .txt file, plot the data, build the linear model, and interpret the summary of the model.
In this module we are going to look at simple linear regression model assessment as a part of this we are also going to look at identifying significant coefficients in the linear model.
The second level of model assessment as a part of this we are going to see if we can improve the quality of the linear model and can we identify bad measurements in the data(outliers).
The objective of this module is to learn Multiple Linear Regression problems which consist of one dependent variable, but several independent variables, and solving multiple linear regression problem.
In this module, we will try to learn cross-validation, which is very useful in model building and use cross-validation on validation data set to determine the optimal numbers of parameters.
We are going to build multiple linear regression model we are also going to look at the model summary and identify insignificant variable and discard them and rebuild the model, we also look at the model selection.
In this module, we will see the various classification problems and some characteristic of classification problems.
In this module, we will learn the basic idea of Logistic Regression.
In this module, we will take a very simple example with several data points to show how logistic regression works in practice and I will also introduce a notion of regularization which would help in avoiding overfitting when doing logistic regression.
The objective of this module is to see about typical performance measures that are used once a classifier is built and also see ROC curve.
In this module we are going to look at a case study and a problem statement associated with it, we are also going to solve the case study using R.
In this module, we are going to understand the very powerful classification algorithm called the k-nearest neighbors and also understand different things to consider before applying this algorithm.
In this module, we are going to look at a case study to implement K-NN algorithm and a problem statement associated with it, we are also going to solve the case study using R.
The objective of this module is to illustrate the concept of K-means clustering and its disadvantages.
In this module, we are going to look at a case study to implement K-means clustering algorithm and a problem statement associated with it, we are also going to solve the case study using R.
the quick summary of the course, the next logical step after learning this course.
In this Chapter, we have started with step-by-step Installation of R and the R Studio which is a GUI based IDE for R language. We have also explained package installation on R, built in datasets in R, manual data entry, data importing, tabular to row data conversion. We have also looked at the default colors present in the data and a more elaborate color options named "Colorbrewer".
In this lesson, you will learn some of the steps to install R in your system.
In this lesson, you will learn to install R Studio, which is a GUI based Integrated Development Environment (IDE).
In this lesson, you will learn to download R materials and then use these R Materials in R Studio.
In this lesson, you will learn about the different interfaces of the R Studio such as R Script, Console Section, R Environment and Graphical Output Section.
In this lesson, you will learn about various packages that are available in R and how to use them.
In this lesson, you will learn about the default datasets which are already installed in R. These are those packages which are by default installed and loaded in R.
R Programming provides different ways to enter the data manually. In this lesson, you will learn about manual data entry in R.
In R, there are different cases in which the data is required to be imported in order to use it. In this lesson, you will learn to import the data.
In R, the data has to be stored in a specific format so that it can be easily understood and used. In this lesson, you will learn to arrange the data in rows and columns.
We use R color for R data manipulation, with the help of R Color, our graphical output looks a lot better. In this lesson, you will learn about R Color.
In this lesson, you will learn about an external package named RColorbrewer. By installing this package we can use R color brewer.
In this lesson, we will summarize what we have learned so far in R Color, and will discuss about other applications of R Color.
This chapter covers the details about various charts in R. R programming has multiple libraries which can be used to create charts like Bar charts, pie charts, histograms, box-plots etc. A bar graph or a bar chart is the representation of data in bars. On other hands, a pie chart is the representation of data or values as sectors within the circle each represented with a different color to distinguish them. Box-plot is used for getting information about possible outliers in the data sample. Various ways to save plots as images has also been explained in the unit.
R language is mostly famous for graphical representation. A Bar Chart is a very good example of this. In this lesson, you will learn about the Bar Charts.
In this lesson, you will learn about Pie Charts for graphical representation. A Pie Chart is also a very good source of data representation.
The histogram is suitable for visualizing distribution of numerical data over a continuous interval, or a certain time period. In this lesson, you will learn about Histograms.
When there is a requirement of possible outliers while analyzing the data, then Box plots are used. In this lesson, you will learn about Box plots.
In this lesson, you will learn to customize the graphs and also see the effect of customization on your graph.
In this lesson, you will learn to present the data in the image format. In order to do that, you will first import the data in the image form and then present the data through the image.
In this lesson, you will get the summary for the plotted datasets by using the Layered Plots.
This chapter covers the basic concept of statistics viz frequencies, descriptive, hypothesis testing and chi-square testing in R programming. The frequency distribution of a data variable is the count of data that is occurring within a collection of non-repeated categories. Descriptive statistics gives summary statistics of the data and is the basis of advanced analysis of data. We then had a look on inferential statistics methods. In this unit, we have explained single proportion testing, single mean testing and Chi-square testing, which is used to infer results based on the sample data characteristics and hypothesized values. We have also done a univariate analysis to find patterns in the data.
In this lesson, you will now learn to calculate the frequency of data and analyze the data after changing it from frequency to density.
In this lesson, you will now learn about the descriptive statistics. These are those figures which are used for summarizing the data.
In this lesson, you will learn something about inferential statistics, for which you will now be making use of the single proportion testing.
In this lesson, you will learn about single mean testing. Single mean testing performs mean test for a sample in comparison to an aim value.
In this lesson, you will learn about the Chi-Square test. This is the test which is used to determine the goodness for fit for the categorical variable.
In this lesson, you will learn about Univariate Analysis Data, which is used to present the sample of a variable in a amazing way.
In this lesson, you will get the summary about Descriptive Statistics through a dataset.
This chapter covers the details of working with data. We can have outliers in the data and its treatment is explained. Outliers are those observations which occur very infrequently and might be the result of errors while observing. Proper treatment of data is necessary for the unbiased result. This might includes subsetting, sorting, extracting unique observations renaming variables, creating new variables etc. Each of these tasks can be accomplished using the set of newly introduced packages.
In this lesson, you will learn to treat the present outliers in the data. In order to do this, you will use a categorical data to understand the outliers.
In this lesson, you will learn to transform the variable to fit better in the assumption of data analysis.
In this lesson, you will demonstrate the functionality of the composite variable using the random variable.
In this lesson, you will learn to deal with the missing data which are also seen often in your data. You need to treat them in such a way that your figures are not biased.
In this lesson, you will get the complete summary of whatever you have learned so far in outliers.
In this chapter, we have worked with cases, subgroups and files. Any data set is like an enclosed or shelled collection. It consists of cases which are exactly the objects in the same collection with each case having one or more attributes or qualities known as variables. This lesson covers working with subgroups and merging files. Merging means that different datasets or files are combined together within a single dataset or file. R programming includes the method to merge the files.
In this lesson, you will learn the method of customizing your analysis for a particular parameter in a set of data.
In this lesson, you will look at the demonstration of by which you can obtain all the descriptive calculations of all the values of a particular variable at a time.
In this lesson, you will learn about a very useful method of combining the different data in a same unit. This method is called as Merging.
In this lesson, you get a complete summary of all the analysis of this section with subgrouping.
In this chapter, the Bar charts, Box plots and scatter plots have been demonstrated. A bar chart or a bar graph represents the data with the help of bars or rectangles. The values of the variables are determined by the height or length of the rectangle be it vertical or horizontal. A box plot is an exploratory graphic which enables us to encapsulate the features of quantitative variables. A scatter plot pairs up the values of two quantitative variables in a dataset and represent them as geometric points in the Cartesian diagram.
In this lesson you will learn about the different ways to analyze your data with the help of Bar charts.
In this lesson, you will make use of the iris dataset to summarize and present data with the help of Box Plots.
In this lesson, you will explore the quantitative relation between the variables with the help of a scatter plot using the iris dataset and the swiss dataset.
In this chapter we will summarize all the information about the appropriate section.
In this chapter, the statistical concepts like correlation, regression, proportions etc have been covered in detail. A correlation is a statistical method or technique to display if there is a relation between pairs of variables or how strongly the pairs of variables are related. Regression is the most critical fundamental tool for statistical analysis frequently used in various research fields. Bivariate regression is the simplest linear regression procedure. Then we also demonstrated a few tests as well in the later part of the chapter such as T-test, one-factor analysis of variance, proportions etc.
In this lesson, you will learn about the correlation. In mathematical terms, Correlation is equivalent to the covariance of the two variables divided by the product of the standard deviation of each data sample.
In this lesson, you will explore Bivariate Regression with the help of appropriate line regression and vector equations.
In this lesson, you will learn about T-Statistics by comparing the calculated values of two samples with the T-test using the Iris Dataset.
In this lesson, you will learn to examine the difference between two samples by creating two Random Variables, through Paired T-test.
In this lesson, you will learn to test the similarities of the content from two populations or groups with ANOVA test.
In this lesson, you will learn to compare the categorical groups with the help of proportion.
In this lesson, you will learn to make use of the Chi-Square test to perform independent testing between the two specific variables.
In this lesson, you will learn about the statistics of Bivariate Associations using some packages available in R.
In this lesson, you will get the summary of all the testing and other statistics of the appropriate section which you have used in this section.
In this chapter, the method of creating bar charts for mean, scatter plots for grouped data, scatter plot matrices and a very interesting and visual 3D scatter plot have been covered in detail..
In this lesson, you will learn about drawing the bar charts for multiple variables defined by different categories.
In this lesson, you will learn to plot a Scatter Plot for multiple variables by loading the CSV file in R.
In this lesson, you will learn to plot a Scatter Plot by loading the matrix data.
In this lesson, you will learn to plot a grouped data with the help of 3D Scatter Plot.
In this lesson, you will get the complete summary of various plotting techniques and Bar Charts.
In this chapter, some relatively advanced topics such as multiple regression, two factor ANOVA, cluster analysis and principal component & factor analysis have been covered in detail. These topics are very important specially multiple regression which is used very extensively in research papers and industry to establish the relationship between variables.
In this lesson, you will learn about Multiple Variable Statistics, which is the most common tool of Multiple Regression with the help of an inbuilt database. This tool is used to analyze the data.
In this lesson, you will learn to analyze two-factor ANOVA with the help of toothgrowth, through which you will learn to interact between two-categorical terms.
In this lesson, you will learn about cluster analysis which creates clusters or groups based on the values of variables.
In this lesson, you will learn to search the components using Principal component analysis which will explain the most viewed variations in the data.
In this lesson, the course will be concluded with the summary of whatever has been covered in this section so far.
In this lesson, there is a Quiz containing a set of questions. This test is for self evaluation of the candidate of the overall understanding of the course content. The course will not be considered successfully completed if this quiz is omitted or ignored.
Here, we have looked at data mining, its motivations followed by the drawbacks of traditional data analysis, further, a discussion on data and data functionalities has been done along with the study of the process of knowledge discovery and the issues in data mining. Finally, the typical architecture of Data Mining has been covered in detail.
Here, we have looked at the data and different types of attributes, its properties including the different types of data sets.
The main objective of this lecture is to understand the issues that are considered before performing the preprocessing along with some of the preprocessing techniques.
The focus of this lecture is on understanding the association rule mining and the different steps of discovering association rules.
Here, we have looked at the Frequent itemset generation which is computationally expensive, then we have covered Apriori principle and its algorithm.
Here, we have continued to look at the Apriori algorithm and also covered Rule generation for Apriori algorithm along with pattern evaluation followed by seeing these evaluations in term of the interestingness.
Here, we have covered the classification means and classification task along with the classification techniques.
The focus of this lecture is to understand Decision trees along with the study of the representation of rules in Decision trees.
Here, we have looked at obtaining a decision tree for classification problem including an example of the same.
Here, we have continued looking at obtaining a decision tree and also covered the Top-down construction rule for obtaining a decision tree for classification problem.
We have continued looking at obtaining a decision tree and also covered the problem with obtaining decision tree along with few extensions of the basic tree algorithm and some of the advantages of the decision.
We have covered probability distribution through an example and looked at some of the important concepts, which is known as Class Conditional Probabilities.
Here, we have looked at the Posteriori Distribution on the previous example and we have covered the MAP representation of Bayes classifier and MAP multiple classifiers along with Multivariate Bayes classifier.
Here, we have continued to look at the Multivariate Bayes classifier and its special case.
Here, we have looked at the types of distances measurement between two distribution, and an example of a Bayes classifier along with the Naive Bayes classifier.
We have continued to look at the Naive Bayes classifier and its example, we have also looked at the conditional independence and an exercise on it along with the comprehensive look at the Directed acyclic graph (DAG).
Here, we have covered the Classification algorithm called the K nearest neighbor classifier.
The focus of this lecture is to understand the basics of K Nearest Neighbor and also understand the Voronoi diagram of the Nearest Neighbor. Also, we have looked at the distance-weighted K-NN followed by different issues of nearest-neighbor classifiers.
Here, we have looked at K-nearest neighbor classification technique (KNN) and the computational complexity of KNN followed by reduction of computational complexity.
Here, we have covered the reduction of computational complexity using High dimensional search and also looked at the K dimensional tree structure along with some alternate terminologies in KNN.
Here, we have looked at the classification algorithms to know which one is better and which one should be chosen.
The main objective of this lecture is to understand the discriminant analysis and the case of Linear discriminants, which means that we have 2 features and 2 classes as well, we want to draw a line which will separate this.
Here, we have continued to look at the linear discriminate (Linear separators).
Here, we have again looked at linear discriminate(Linear separators), bad and good decision boundaries and then we have covered the way of getting the line with the highest margin which will give the equation of the line.
Here, we have covered the Primal and Dual optimization problem followed by understanding the solution of the dual optimization problem. We have also covered the concept of quadratic programming (QP).
Here, we have covered the Quadratic programming (QP) problem and looked at Karush–Kuhn–Tucker (KKT) theorem that can be used for solving QP problem.
Here, we have introduced the concept of slack variable, soft and hard margin in separable cases followed by the optimizations problems. We have also looked at the kernel machine for solving the non-linearly separable class problem.
The main objective of this lecture is to study the Neural networks and its connectionism and Biological neuron. We have also covered the Artificial neural network and its simplest model which is Perceptron.
Here, we have looked at the mechanism of finding the correct set of weights to solve a prediction problem.
Here, we have explained the difference between Perceptron and Gradient Descent algorithm and also explained about the logic gates that can be realized with the perceptron model, and also explained Multilayer Perceptron.
Here, we have covered Sigmoid unit, and weight update rule for the multilayer perceptron along with the issues with ANNs and extension of ANNs.
Here, we have looked at the basics of Clustering and Scatter coefficient for observing the goodness of clustering including an understanding of Hierarchical & Partitional clustering.
Here, we have covered the desirable properties of the clustering algorithm, Hierarchical Agglomerative & Hierarchical Divisive clustering. We have also looked at the three ways of measuring distance b/w two clusters.
Here, we have looked at the K-means clustering algorithm and its Example
The main objective of this lecture is to learn about Idea of Density-based clustering, and density-based algorithm that is DBSCAN (Density-based special clustering acronym)
Here, we have covered the Hybrid clustering algorithm that is CLARA followed by evaluating clustering algorithms.
The focus of this lecture is to understand the concept of Regression problem, univariate and multivariate regression of regression model along with the most common technique of regression called as Linear Regression.
Here, we have looked at the Linear Regression model.
Here, we have continued to look at the Linear Regression model and some of the limitations of Linear regression followed by the Non-Linear regression.
Here, we have covered the problem of Over-fitting, Ochams razor principle and the Time series prediction problem along with its solution.
What You Benefit from This Program
Frequently Asked Questions
This program intends to produce extremely well-rounded Data Scientist professionals with deep knowledge of Data Science, expertise in relevant tools/languages, and an understanding of cutting-edge algorithms and applications.
This program is designed for anyone looking to pick up skills in advanced concepts like Azure, Data Mining, R Programming along with Data Science for Engineers. This program demands consistent work and time commitment over the entire duration of 6 months.
The content will be a mix of asynchronous lectures from industry leaders as well as world-class faculty. Additionally, the program comprises of some live lectures or hangout sessions dedicated to solving your academic queries and to reinforce learning.
Post completion of each course separate certificates will be issued (5 courses 5 certificates) from Prutor.ai, IIT Kanpur.
I'm interested in This Program