2. Classification Techniques (Decision Tree, SVM, NB, Random Forest, Logistic Regression etc.)
A. Decision Tree in R
Decision trees make use flowchart for taking various decisions. As, these structures are easy to understand , we can use them where transparency is needed, such as in banks regarding loan approval.
It is a supervised learning algorithm and used for classification problems. It works for both input and output type of variables. In this technique, the population is splited up into two or more homogeneous sets. Also, it is based on very significant splitter/differentiator for input variables. The decision tree algorithm is powerful classifiers which is non linear. The model of relationships among features is created using a tree structure and potential outcomes too. A structure of branching decisions is used by decision tree classifier.
In classifying data, the decision tree follows the steps mentioned below:
• It puts all training examples to a root.
• Training examples are divided by decision tree based on different selected attributes.
• The attributes are selected by using some statistical measures.
• Recursive partitioning continues until no training example remains.
Important terminologies related to Decision Tree
• Root Node: It represents entire population or sample. Moreover, it gets divided into two or more homogeneous sets.
• Splitting: Process of dividing a node into two or more sub-nodes.
• Decision Tree: it is produced when a sub-node splits into further sub-nodes.
• Leaf/Terminal Node: Nodes do not split is being called Leaf or Terminal node.
• Pruning: When we remove sub-nodes of a decision node, this process is being called pruning. You can say opposite process of splitting.
• Branch / Sub-Tree: A subsection of the entire tree is being called branch or sub-tree.
• Parent and Child Node:
A node, which is being divided into sub-nodes is being called parent node of sub-nodes. Whereas sub-nodes are the child of a parent node.
Types of Decision Tree-
a. Categorical (classification) Variable Decision Tree: Decision Tree which has categorical target variable.
b. Continuous (Regression) Variable Decision Tree: Decision Tree has continuous target variable.
Advantages of Decision Tree in R
• Easy to Understand:
No statistical knowledge is required to read and interpret them. Its graphical representation is very intuitive and users can relate their hypothesis.
• Less data cleaning required: It requires fewer data when compared to some other modeling techniques.
• The data type can handle both numerical and categorical variables.
• It handles nonlinearity.
• It is possible to confirm a model using statistical tests. Gives you confidence it will work on new data sets.
• It performs well even if you slightly deviate from assumptions.