randomForest {RFO}R Documentation

Classification with Random Forest

Description

randomForest implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification.

Usage

## S3 method for class 'formula'
randomForest(formula, data = NULL, ..., subset,
                               na.action = na.fail)
## Default S3 method:
randomForest(x, y, ntree = 500,
             mtry = floor(sqrt(ncol(x))),
             replace = TRUE, classwt = NULL, cutoff,
             sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
             nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
             maxnodes = NULL, na.action = na.fail, internal = FALSE, ...)
## S3 method for class 'randomForest'
print(x, ...)

Arguments

data

an optional data frame containing the variables in the model. By default the variables are taken from the environment which randomForest is called from.

subset

an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)

x, formula

a data frame or a matrix of predictors, or a formula describing the model to be fitted (for the print method, an randomForest object).

y

A response vector of factor type

ntree

Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.

mtry

Number of variables randomly sampled as candidates at each split.

replace

Should sampling of cases be done with or without replacement?

classwt

Priors of the classes. Need not add up to one.

cutoff

A vector of length equal to number of classes. The ‘winning’ class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes (i.e., majority vote wins).

sampsize

Size(s) of sample to draw.

nodesize

Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time).

maxnodes

Maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (subject to limits by nodesize). If set larger than maximum possible, a warning is issued.

internal

For internal test only.

...

optional parameters to be passed to the low level function randomForest.default.

Value

An object of class randomForest, which is a list with the following components:

call

the original call to randomForest.

type

classification.

classes

the classes of the target.

ntree

number of trees grown.

mtry

number of predictors sampled for spliting at each node.

forest

a list that contains the entire forest.

cutoff

the cutoff vector used to build the model.

ncat

the number of levels of the attributes.

attr.names

the names of the attributes.

xlevels

the levels of the attributes.

Note

For large data sets, especially those with large number of variables, calling randomForest via the formula interface is not advised: There may be too much overhead in handling the formula.

Author(s)

Lei Zhang lei.c.zhang@oracle.com, Andy Liaw andy\_liaw@merck.com and Matthew Wiener matthew\_wiener@merck.com, based on original Fortran code by Leo Breiman and Adele Cutler.

References

Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf.

See Also

predict.randomForest

Examples

## Classification:
##data(iris)
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris)
print(iris.rf)

## "x" can be a matrix instead of a data frame:
set.seed(17)
x <- matrix(runif(5e2), 100)
y <- gl(2, 50)
(myrf <- randomForest(x, y))
(predict(myrf, x))

## Grow no more than 4 terminal nodes per tree:
rf <- randomForest(Species ~ ., data=iris, maxnodes=4, ntree=30)

[Package RFO version 4.6-10 Index]