Text Summarisation Techniques

Text summarisation resumes the information contained in a text, into a shorter version, still capturing the original semantics. This can be achieved by selecting representative words or phrases from the original text and combining them to make a summary (extractive), or by writing a summary as a human would do, by generating a textual summary of the main content in the original text (abstractive). This post quickly reviews some of the recent works on NLP Text summarisation using both approaches.

Extractive

Extractive approaches select representative phrases from the original text and combine them to make a summary, in its essence, they score or measure some sort of saliency or ranking of the sentences contained in a text. It’s language and domain-agnostic, doesn’t require any training data and the generated summaries are grammatically and factually correct.

However, the generated summary may not cover all important content or be longer than necessary, containing unnecessary parts that may not be needed in the summary, moreover, it might have a reduction of semantic quality or cohesion and possible wrong connections between selected sentences. Here are some (old) popular approaches, and some more recent ones:

2004 - TextRank

TextRank: Bringing Order into Text
Inspired by PageRank
This page contains a very good resume of the algorithm: Textrank for Summarizing Text
It was part of gensim package - removed on 4.0.0 - still present in older releases

2011 - LexRank

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
A connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences.
Uses eigenvector centrality to select sentences.

2019 - HIBERT

Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Relies on two encoders to obtain the representation of a document
A sentence encoder to transform each sentence into a vector
A document encoder learns sentence representations based on their surrounding context sentences

Figure 1 - The encoding and training mechanism of the HIBERT model.

2019 - BERTSumExt

Text Summarization with Pretrained Encoders
An encoder creates sentence representations and a classifier predicts which sentences should be selected as summaries
Has a document-level encoder based on BERT to obtain sentence representations
The modification mainly consists of surrounding each sentence of the text with:
- a [CLS] (which represents the entire sentence)
- and [SEP] (which represents the boundary between two sentences)
- assigning different segment embeddings for every pair of sentences
Sentence-level contextual representations fed to a classifier for binary classification

2020 - MatchSum

Extractive Summarization as Text Matching
The source document and candidate summaries will be extracted from the original text and matched in a semantic space
Siamese-BERT architecture to compute the similarity between the source document and the candidate summary
Leverages the pre-trained BERT in a Siamese network structure to derive semantically meaningful text embeddings that can be compared using cosine-similarity

Figure 3 - MatchSum approach of matching candidate summaries and original text in the same semantic space.

Abstractive

Generated a concise summary capturing the essence of the original text. This approach can decrease the amount of summary text, by removing redundancy and obtaining an expressive summary. But, it seeds a precise syntactic and semantic representation of text data and may contain factual errors due to the algorithm not grasping the text’s context well enough.

Earlier approaches would be templated-based or rule-based, the Seq2Seq (encoder-decoder) based on vanilla RNN/LSTM or augmented with Attention Mechanism was another popular approach exploring the generative capability of recurrent neural networks, and more recently the Encoder-Decoder mechanism based on the Transformer dominate most of the new approaches:

2019 - BERTSumAbs

Text Summarization with Pretrained Encoders
Adopts an encoder-decoder architecture, combining the same pre-trained BERT encoder and a randomly-initialised Transformer decoder
Training separates the optimisers of the encoder and the decoder in order to accommodate the fact that the encoder is pre-trained while the decoder must be trained from scratch
Propose a two-stage fine-tuning approach, where we first fine-tune the encoder on the extractive summarisation task and then fine-tune it on the abstractive summarisation task
Combine extractive and abstractive objectives, a two-stage approach: the encoder is fine-tuned twice, first with an extractive objective and subsequently on the abstractive summarisation task.

2019 T5 - Text-to-Text Transfer Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Pre-trained Transformer encoder-decoder unifying many tasks as the same text-to-text problem
Input of the encoder is a task description (e.g., “Summarize: ”) followed by task input (e.g., a sequence of tokens from an article), and the decoder predicts the task output (e.g., a sequence of tokens summarizing the input article)
T5 is trained to generate some target text conditional on input text to perform as text-to-text

Figure 4 - Text-to-Text Transfer Transformer (T5) approach.

2020 BART

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Trained on reconstructing documents to which noise has been introduced
The noise may take multiple forms, ranging from removing tokens to permuting sentences

2020 PEGASUS

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Specifically designed and pre-trained neural network for the automatic text summarisation task
Pre-training self-supervised objective - gap-sentence generation - for Transformer encoder-decoder models
Fine-tuned on 12 diverse summarisation datasets
Select and mask whole sentences from documents, and concatenate the gap sentences into a pseudo-summary.

Comparative Summarisation

Name	Method	Max. Input	Code	Pre-Trained models	Languages

TextRank (2004)	Extractive	-	gensim	-	-
LexRank (2011	Extractive	-	lexrank	-	-
BertSum (2019)	Both	?	PreSumm	see code	English
HIBERT (2019)	Extractive	?	HIBERT	see code	English
T5 (2019)	Abstractive	1024 - 16384	T5x	huggingface.co	several
MatchSum (2020)	Extractive	?	MatchSum	see code	English
BART (2020)	Abstractive	1024	fairseq	huggingface.co	several
PEGASUS (2020	Abstractive	?	pegasus	huggingface.co	several

Evaluation

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) - Recall
- compare produced summary against reference summary by overlapping n-grams
- ROUGE-N: report with different n-grams size
- ROUGE-L: instead of n-gram overlap measures the Longest Common Subsequence
BLEU (Bilingual Evaluation Understudy) - Precision
- compare the reference summary against the produced summary
- how much the words (and/or n-grams) in the reference appeared in the machine summary
Limitations
- always need a reference summary
- just measuring string overlaps
- alternative is to have a human evaluation