ML - Part 2 - Classical Models
Logistic Regression:
- Logistic regression is a supervised learning algorithm used for binary classification tasks.
- It models the probability of the outcome using a logistic function and separates the classes based on a decision boundary.
- It is a linear algorithm that can handle both numerical and categorical input features.
Decision Trees:
- Decision trees are supervised learning algorithms used for classification and regression tasks.
- They partition the data based on feature conditions to create a tree-like model of decisions and their consequences.
- Decision trees are interpretable and can handle both numerical and categorical features.
Random Forest:
- Random Forest is an ensemble learning algorithm that combines multiple decision trees.
- It builds a collection of decision trees and makes predictions by aggregating the individual tree’s predictions.
- Random Forest is robust against overfitting and can handle high-dimensional data.
Support Vector Machines (SVM):
- SVM is a supervised learning algorithm used for classification and regression tasks.
- It separates classes by finding the hyperplane that maximizes the margin between them.
- SVM can handle both linear and non-linear decision boundaries using kernel functions.
Naive Bayes:
- Naive Bayes is a probabilistic supervised learning algorithm used for classification tasks.
- It is based on Bayes’ theorem and assumes independence between features given the class.
- Naive Bayes is efficient, particularly for high-dimensional data, but it makes a strong assumption of feature independence.
Regularization:
- Regularization is a technique used to prevent overfitting in machine learning models.
- It adds a penalty term to the model’s objective function, discouraging complex or extreme parameter values.
- Common regularization techniques include L1 (Lasso) and L2 (Ridge) regularization.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
- DBSCAN is an unsupervised learning algorithm used for density-based clustering.
- It groups together data points that are close to each other and separates regions of lower density.
- DBSCAN can identify clusters of arbitrary shape and handle noise points.
Gaussian Mixture Model (GMM):
- GMM is a probabilistic unsupervised learning algorithm used for clustering and density estimation.
- It assumes that the data is generated from a mixture of Gaussian distributions.
- GMM assigns probabilities to each data point, allowing for soft clustering and handling overlapping clusters.