• Google's SyntaxNet - HTTP API for Portuguese (22 Jul 2017)
    How to set up a SyntaxNet HTTP endpoint for any language, and how to submit text to be tagged through Python, this post shows an example for Portuguese, but can easily be adapted to any other supported language.

  • Open Information Extraction in Portuguese (08 May 2017)
    An example on how to perform open relationship extraction for Portuguese using only part-of-speech of tags, the rules are based on ReVerb.

  • Document Classification (01 Apr 2017)
    An introduction to the Document Classification task, in this case in a multi-class and multi-label scenario, proposed solutions include TF-IDF weighted vectors, an average of word2vec words-embeddings and a single vector representation of the document using doc2vec. Includes code using Pipeline and GridSearchCV classes from scikit-learn.

  • Google's SyntaxNet in Python NLTK (25 Mar 2017)
    This post shows how to load the output of SyntaxNet into Python NLTK toolkit, precisely how to instantiate a DependencyGraph object with SyntaxNet's output.

pos-tags viterbi sequence-prediction scikit-learn syntactic-dependencies document-classification conditional-random-fields SyntaxNet NLTK NER word2vec tokenization tf-idf stanford-NER relationship-extraction named-entity-recognition naive-bayes multi-label-classification maximum-entropy-markov-models logistic-regression information-extraction hyperparameter-optimization hidden-markov-models grid-search gensim evaluation_metrics doc2vec dependency-graph deep-learning convolutional-neural-networks PyData