But we are merging hundreds of trees each of which has been handicapped by removal of multiple features and a fraction of the data. Sounds to me like overfitting is not easy (no single data point or feature contributes to every tree so it can't be represented all the time).
False claims as they maybe, these are claims I've seen in at least two of the most commonly studied statistical learning text books, so given that it makes sense and that it's in the text books, it seems reasonably not false to me. Someone else posted that if too many features or data points are very similar then it will overfit, and that totally makes sense. Whatever you say doesnt. Clarification would be useful.
Deep trees will fortunately overfit your dataset.
Any binary tree of depth log2(P) can completely separate your P points.