• Applying scikit-learn TfidfVectorizer on tokenized text (28 Feb 2018)
    An example showing how to use scikit-learn TfidfVectorizer class on text which is already tokenized, i.e., in a list of tokens.

  • Hyperparameter optimization across multiple models in scikit-learn (23 Feb 2018)
    This blog post shows how to perform hyperparameter optimization across multiple models in scikit-learn, using a helper class one can tune several models at once and print a report with the results and parameters settings.

  • Document Classification (01 Apr 2017)
    An introduction to the Document Classification task, in this case in a multi-class and multi-label scenario, proposed solutions include TF-IDF weighted vectors, an average of word2vec words-embeddings and a single vector representation of the document using doc2vec. Includes code using Pipeline and GridSearchCV classes from scikit-learn.

pos-tags viterbi sequence-prediction scikit-learn syntactic-dependencies conditional-random-fields SyntaxNet NLTK NER word2vec tokenization tf-idf stanford-NER relationship-extraction named-entity-recognition naive-bayes multi-label-classification maximum-entropy-markov-models logistic-regression information-extraction hyperparameter-optimization hidden-markov-models grid-search gensim evaluation_metrics document-classification doc2vec dependency-graph PyData