• Evaluation Metrics, ROC-Curves and imbalanced datasets (19 Aug 2018)
    This blog post describes some evaluation metrics used in NLP, it points out where we should use each one of them and the advantages and disadvantages of each.

  • Document Classification (01 Apr 2017)
    An introduction to the Document Classification task, in this case in a multi-class and multi-label scenario, proposed solutions include TF-IDF weighted vectors, an average of word2vec words-embeddings and a single vector representation of the document using doc2vec. Includes code using Pipeline and GridSearchCV classes from scikit-learn.

viterbi sequence-prediction pos-tags scikit-learn neural-networks conditional-random-fields NER word2vec syntactic-dependencies evaluation_metrics document-classification classification SyntaxNet NLTK word-embeddings tokenization tf-idf stanford-NER relationship-extraction named-entity-recognition naive-bayes multi-label-classification maximum-entropy-markov-models logistic-regression language-models information-extraction imbalanced_data hyperparameter-optimization hidden-markov-models grid-search glove gensim fasttext doc2vec dependency-graph deep-learning convolutional-neural-networks character-language-models character-embeddings PyData LSTM ELMo BERT