I found myself, from time to time, always bumping into a piece of code (written by someone else) to perform grid search across different models in scikit-learn and always adapting it to suit my needs, and fixing it, since it contained some already deprecated calls. I finally decided to post it here in my blog, so I can quickly find it and also to share it with whoever needs it.

The idea is pretty simple, you pass two dictionaries to a helper class: the models and the the parameters; then you call the fit method, wait until everything runs, and after you call the summary() method to have a nice DataFrame with the report for each model instance, according to the parameters.

The credit for the code below goes to Panagiotis Katsaroumpas who initially wrote it, I just fix it, since it was breaking with newer versions of scikit-learn, and also failed in Python 3. The original version is on this blog post.

import pandas as pd
import numpy as np

from sklearn.model_selection import GridSearchCV

class EstimatorSelectionHelper:

    def __init__(self, models, params):
        if not set(models.keys()).issubset(set(params.keys())):
            missing_params = list(set(models.keys()) - set(params.keys()))
            raise ValueError("Some estimators are missing parameters: %s" % missing_params)
        self.models = models
        self.params = params
        self.keys = models.keys()
        self.grid_searches = {}

    def fit(self, X, y, cv=3, n_jobs=3, verbose=1, scoring=None, refit=False):
        for key in self.keys:
            print("Running GridSearchCV for %s." % key)
            model = self.models[key]
            params = self.params[key]
            gs = GridSearchCV(model, params, cv=cv, n_jobs=n_jobs,
                              verbose=verbose, scoring=scoring, refit=refit,
                              return_train_score=True)
            gs.fit(X,y)
            self.grid_searches[key] = gs    

    def score_summary(self, sort_by='mean_score'):
        def row(key, scores, params):
            d = {
                 'estimator': key,
                 'min_score': min(scores),
                 'max_score': max(scores),
                 'mean_score': np.mean(scores),
                 'std_score': np.std(scores),
            }
            return pd.Series({**params,**d})

        rows = []
        for k in self.grid_searches:
            print(k)
            params = self.grid_searches[k].cv_results_['params']
            scores = []
            for i in range(self.grid_searches[k].cv):
                key = "split{}_test_score".format(i)
                r = self.grid_searches[k].cv_results_[key]        
                scores.append(r.reshape(len(params),1))

            all_scores = np.hstack(scores)
            for p, s in zip(params,all_scores):
                rows.append((row(k, s, p)))

        df = pd.concat(rows, axis=1).T.sort_values([sort_by], ascending=False)

        columns = ['estimator', 'min_score', 'mean_score', 'max_score', 'std_score']
        columns = columns + [c for c in df.columns if c not in columns]

        return df[columns]

The code above defines the helper class, now you need to pass it a dictionary of models and a dictionary of parameters for each of the models.

from sklearn import datasets

breast_cancer = datasets.load_breast_cancer()
X_cancer = breast_cancer.data
y_cancer = breast_cancer.target

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC

models1 = {
    'ExtraTreesClassifier': ExtraTreesClassifier(),
    'RandomForestClassifier': RandomForestClassifier(),
    'AdaBoostClassifier': AdaBoostClassifier(),
    'GradientBoostingClassifier': GradientBoostingClassifier(),
    'SVC': SVC()
}

params1 = {
    'ExtraTreesClassifier': { 'n_estimators': [16, 32] },
    'RandomForestClassifier': { 'n_estimators': [16, 32] },
    'AdaBoostClassifier':  { 'n_estimators': [16, 32] },
    'GradientBoostingClassifier': { 'n_estimators': [16, 32], 'learning_rate': [0.8, 1.0] },
    'SVC': [
        {'kernel': ['linear'], 'C': [1, 10]},
        {'kernel': ['rbf'], 'C': [1, 10], 'gamma': [0.001, 0.0001]},
    ]
}

You create a EstimatorSelectionHelper by passing the models and the parameters, and then call the fit() function, which as signature similar to the original GridSearchCV object.

helper1 = EstimatorSelectionHelper(models1, params1)
helper1.fit(X_cancer, y_cancer, scoring='f1', n_jobs=2)
Running GridSearchCV for ExtraTreesClassifier.
Fitting 3 folds for each of 2 candidates, totalling 6 fits

Running GridSearchCV for RandomForestClassifier.
Fitting 3 folds for each of 2 candidates, totalling 6 fits

Running GridSearchCV for GradientBoostingClassifier.
Fitting 3 folds for each of 4 candidates, totalling 12 fits

Running GridSearchCV for AdaBoostClassifier.
Fitting 3 folds for each of 2 candidates, totalling 6 fits

Running GridSearchCV for SVC.
Fitting 3 folds for each of 6 candidates, totalling 18 fits

After the experiments has ran, you can inspect the results of each model and each parameters by calling the score_summary method.

helper1.score_summary(sort_by='max_score')
estimator min_score mean_score max_score std_score C gamma kernel learning_rate n_estimators
5 AdaBoostClassifier 0.962343 0.974907 0.991667 0.0123335 NaN NaN NaN NaN 32
1 ExtraTreesClassifier 0.966387 0.973627 0.987552 0.00984908 NaN NaN NaN NaN 32
4 AdaBoostClassifier 0.95279 0.966463 0.983333 0.0126727 NaN NaN NaN NaN 16
3 RandomForestClassifier 0.958678 0.966758 0.979253 0.00896123 NaN NaN NaN NaN 32
6 GradientBoostingClassifier 0.917031 0.947595 0.979253 0.025414 NaN NaN NaN 0.8 16
9 GradientBoostingClassifier 0.950413 0.962373 0.979079 0.0121747 NaN NaN NaN 1 32
7 GradientBoostingClassifier 0.95279 0.966317 0.975207 0.00972142 NaN NaN NaN 0.8 32
8 GradientBoostingClassifier 0.950413 0.962548 0.975207 0.0101286 NaN NaN NaN 1 16
10 SVC 0.95122 0.961108 0.975207 0.0102354 1 NaN linear NaN NaN
2 RandomForestClassifier 0.953191 0.960593 0.975 0.0101888 NaN NaN NaN NaN 16
0 ExtraTreesClassifier 0.958678 0.96666 0.974359 0.00640498 NaN NaN NaN NaN 16
11 SVC 0.961373 0.963747 0.967213 0.00250593 10 NaN linear NaN NaN
15 SVC 0.935484 0.945366 0.955466 0.00815896 10 0.0001 rbf NaN NaN
13 SVC 0.934959 0.946564 0.954733 0.00843008 1 0.0001 rbf NaN NaN
12 SVC 0.926407 0.936624 0.94958 0.00965657 1 0.001 rbf NaN NaN
14 SVC 0.918455 0.929334 0.940678 0.00907845 10 0.001 rbf NaN NaN

The full code for this blog post is available in this notebook.