The Random Forest algorithm can be described in the following conceptual steps:

- Select
features randomly from the dataset and build a decision tree from those features where*k*(total number of features)*k < m* - Repeat this
times in order to have*n*decision trees from different random combinations of*n*features.*k* - Take each of the
built*n**Decision*and pass a random variable to predict the outcome and store this outcome to get a total of*Trees*outcomes from*n*.*n decision trees* - If the target variable is a
, each tree in the forest would predict the category to which the new record belongs and the new record is assigned to the category that has the majority vote.*categorical variable* - If the target variable is a
, each tree in the forest predicts a value for the target variable and the final value is calculated by taking the average of all the values predicted by the trees that are part of the forest.*continuous variable*

Using the scikit learn package from Python, it is possible to use and tune a ** Random Forest** model based on predefined conditions that will give instructions to the algorithm regarding the construction of the trees that are part of the forest.

**Hyperparameter Tuning**

The scikit learn library allows to tune some important parameters in the tree construction that could increase the predictive power of the model or make the model faster.

Parameters to increase the predictive power:

: represents the number of trees in the forest. In general, a higher number of trees increases the performance and makes the predictions more stable. The default value of this parameter is 10.*n_estimators*: this parameter reflects the maximum number of features from the dataset that the*max_features*is allowed to use in an individual tree when considered for the best split. The default value of the parameter is sqrt(n_features) – Root Square of the n features of the model.*Random Forest*: the minimum number of observations in each*min_samples_leaf*. This parameter would prevent further splitting when the number of observations in the node is less than the value of this parameter. The default value of this parameter is 1.*leaf node*: the maximum depth of the tree. The depth of a*max_depth*is the length of the longest path from a*decision tree*to a*root*. The default value of this parameter is None. In this case the tree is split until nodes contain less than*leaf*samples.*min_samples_split*

Parameters to increase the model speed:

: tells the engine the number of jobs to run in parallel for fit and predict.*n_jobs*: this parameter makes the model reproducible, as it will output the same result every time the model is run.*random_state*: Boolean value that allows to test the model performance keeping a portion of the data to test the model performance.*oob_score*

## Leave a Reply