TL;DR Progress

Unsupervised Single Document Abstractive Summarization using Semantic Units

Jhen-Yi Wu, Ying-Jia Lin, Hung-Yu Kao

AACL 2022 Code Paper

TL;DR: The paper discusses the importance of content frequency in abstractive summarization and proposes a two-stage training framework for the model to learn the frequency of each semantic unit in the source text. The model is trained in an unsupervised manner and identifies sentences with high-frequency semantic units during inference to generate summaries. The model outperforms other unsupervised methods on the CNN/Daily Mail summarization task and achieves competitive ROUGE scores with fewer parameters than pre-trained models. It can be trained under low-resource language settings and is a potential solution for real-world applications where pre-trained models are not applicable.

Problems & Solutions

Lack of sufficient training pairs is a common issue in real-world applications of supervised summarization models.

The authors propose an unsupervised summarization method that utilizes the frequency of contents in the source text to automatically learn semantic unit frequency and discriminate salient parts in source documents for abstractive summarization.

Large pre-trained models for language generation or summarization may require less data for fine-tuning, but they are often trained on English corpus only and thus are not suitable for low-resource languages.

The authors propose an unsupervised summarization method that does not rely on pre-trained models and can be applied to any language.

Creating high-quality training pairs for supervised summarization models can be costly.

The authors propose an unsupervised summarization method that does not require any human-written summaries during training or inference, making it suitable for real-world applications where human-written summaries are rarely accessible.

It is difficult to identify the most salient information in a source text for summarization.

The authors propose dividing and enumerating all text spans with a fixed-size sliding window to create "semantic units" (SUs) that contain brief semantic concepts. They argue that a refined summary should at least contain the semantic units frequently occurring in the original articles since the high-frequency semantic units should be the topic or contain key descriptions.

It is difficult to retrieve frequency information from source documents only.

The authors propose a model that automatically learns semantic unit frequency and uses the learned frequency information to filter the sentences in the source text and generate a summary.

It is difficult to decide how much to focus on each semantic unit when generating text.

The authors propose using the attention mechanism to obtain semantic unit frequency in the inference stage, which helps the model decide how much to focus on each semantic unit when generating text. The recorded attention weights are used to assign weights to the semantic units, which are considered the semantic unit frequency.

Details

Metrics

ROUGE

Datasets

CNN/DailyMail

Pipeline Components

Input Encoding Objective Function

Challenges

Lack Of Suitable Training Data

GenCompareSum: a hybrid unsupervised summarization method using salience

Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou

ACL 2022 Code Paper

TL;DR: The paper proposes a hybrid, unsupervised, abstractive-extractive approach for text summarization (TS) that generates salient textual fragments representing key points in a document and selects the most important sentences using BERTScore. The approach is evaluated on documents from the biomedical and general scientific domains and compared to existing unsupervised and supervised methods. The authors show that their approach out-performs existing methods despite not needing a vast amount of labelled training data.

Problems & Solutions

Transformer-based architectures have limitations in processing long documents, resulting in truncated analysis for text summarization.

The authors adopt a hybrid unsupervised approach, where the PLMs are required only to act on short sections of the document at any time, meaning that their method can be extended to any document length.

Supervised methods for long document summarization require large amounts of labelled training data, which are often unavailable or time-consuming and costly to produce.

The authors address the challenges of supervised methods by adopting a hybrid unsupervised approach, which does not require manually labelled training data for the extractive summarization task.

Earlier unsupervised, graph-based methods have been criticised in their ability to effectively represent documents which present multiple facts.

The authors address this by generating multiple salient texts per document, thus enabling it to represent multiple facts per document.

Abstractive summarization cannot be trusted to be factually consistent, making it unsuitable in many practical applications.

The authors choose to opt for a hybrid approach, using transformer-based models for the generation of salient points, but ultimately generating an extractive summarization to ensure factual consistency.

Existing unsupervised methods for text summarization have generally used graph-based methods.

The authors differ from these previous approaches by evaluating the effectiveness of a novel approach – generating and using salient textual fragments to guide the extractive summarization.

The authors aim to achieve a summary which harnesses the semantic knowledge of transformer-based models, whilst being extendable to any length document, without requiring a large corpus of training data.

The authors fuse state-of-the-art PLMs with unsupervised approaches to achieve their goal.

The authors aim to outperform both existing unsupervised methods and state-of-the-art supervised methods, both on long and short documents.

The authors evaluate their approach on short and long versions of data sets from the biomedical and scientific domains and demonstrate that their hybrid method outperforms both existing unsupervised methods and state-of-the-art supervised methods.

Details

Metrics

ROUGE

Datasets

CORD-19 PubMed arXiv S2ORC

Pipeline Components

Unit Selection

Challenges

Efficient Encoding Of Long Documents

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Puyuan Liu, Chenyang Huang, Lili Mou

ACL 2022 Code Paper

TL;DR: The paper proposes a Non-Autoregressive Unsupervised Summarization (NAUS) approach for generating short summaries without the need for parallel data. The approach involves edit-based search and training an encoder-only non-autoregressive Transformer based on the search result. The paper also introduces a dynamic programming approach for length-control decoding, which is important for the summarization task. Experiments on two datasets show that NAUS achieves state-of-the-art performance for unsupervised summarization and improves inference efficiency. Additionally, the algorithm is able to perform explicit length-transfer summary generation.

Problems & Solutions

State-of-the-art text summarization models are typically trained in a supervised way with large training corpora, comprising pairs of long texts and their summaries. However, such parallel data are expensive to obtain, preventing the applications to less popular domains and less spoken languages.

The authors propose an unsupervised approach to text summarization that does not require parallel data for training.

One widely used unsupervised approach is to compress a long text into a short one, and to reconstruct it to the long text by a cycle consistency loss. However, such an approach requires reinforcement learning (or its variants), which makes the training difficult.

The authors propose a Non-Autoregressive approach to Unsupervised Summarization (NAUS) that utilizes non-autoregressive decoders, which generate all output tokens in parallel. This approach does not require reinforcement learning and is faster than autoregressive generation.

Recently, Schumann et al. propose an edit-based approach for unsupervised summarization. However, the search approach is slow in inference because hundreds of search steps are needed for each data sample. Moreover, their approach can only select words from the input sentence with the word order preserved. Thus, it is restricted and may generate noisy summaries due to the local optimality of search algorithms.

The authors propose to perform search as in Schumann et al. and to train a machine learning model to smooth out the noise and to speed up the inference process. They also propose a length-control algorithm based on dynamic programming to satisfy the constraint of output lengths, which is typical in summarization applications but cannot be easily achieved with autoregressive models.

Autoregressive models are slow and cannot easily achieve length control.

The authors propose to utilize non-autoregressive models, which are faster and can achieve length control with dynamic programming.

The proposed approach may generate noisy summaries due to the local optimality of search algorithms.

The authors propose to train a machine learning model to smooth out the noise and to improve the quality of the generated summaries.

Details

Metrics

ROUGE

Datasets

Gigaword DUC 2004

Pipeline Components

Controlled Generation Objective Function

Challenges

Lack Of Suitable Training Data

Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training

Amir Soleimani, Vassilina Nikoulina, Benoit Favre, Salah Ait-Mokhtar

ACL 2022 Code Paper

TL;DR: The paper explores the zero-shot setting for aspect-based scientific document summarization, which can improve document assistance systems and reader experience. However, current datasets have limited aspects, causing models to over-fit to specific domains. The authors establish baseline results for zero-shot performance and propose a self-supervised pre-training approach to enhance it. They create a biomedical aspect-based summarization dataset using PubMed structured abstracts and show promising results when pre-trained with unlabelled in-domain data.

Problems & Solutions

Little research has been conducted on aspect-based scientific document summarization.

The authors propose to focus on zero-shot aspect-based summarization of scientific literature, leveraging pre-trained models and semantic representations to establish a connection between the aspect and the summary.

The data for aspect-based summarization of scientific papers is scarce, and most existing methods rely on pre-defined aspects.

The authors propose to use large pre-trained models such as BERT and BART, and to continue the pre-training task with domain-related or target datasets to improve performance on low-resource domains. They also propose an additional pre-training procedure to reinforce the semantic connection between aspect and summary.

Readers may be interested in new aspects beyond proposed annotations or new domains, particularly in the biomedical area.

The authors propose to establish baselines for aspect-based summarization using two datasets from different domains, biomedical and management, and to analyze the zero-shot capabilities of those models on unseen aspects.

Such approaches only cover limited aspects.

The authors propose to leverage the semantic representations emerging during LM pre-training to allow the model to establish a semantic connection between the aspect and the summary, and to propose self-supervised pre-training to boost the zero-shot capability of the model.

It is unclear how different models behave as the amount of supervision decreases.

The authors analyze how different models behave as the amount of supervision decreases.

Details

Metrics

ROUGE BERTScore

Datasets

PubMed FacetSum

Pipeline Components

Controlled Generation Objective Function

Challenges

Lack Of Suitable Training Data

An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks

Xinnian Liang, Jing Li, Shuangzhi Wu, Jiali Zeng, Yufan Jiang, Mu Li, Zhoujun Li

COLING 2022 Code Paper

TL;DR: The paper proposes an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The framework addresses the problem of existing methods failing to consider efficiency and effectiveness at the same time when the input document is extremely long. The proposed method converts the one-step ranking method into the hierarchical multi-granularity two-stage ranking, where the coarse-level stage splits the document into facet-aware semantic blocks and filters insignificant blocks, and the fine-level stage selects salient sentences in each block and extracts the final summary from selected sentences. The framework achieves new state-of-the-art unsupervised summarization results on Gov-Report and BillSum and speeds up 4-28 times more than previous methods.

Problems & Solutions

As the input length increases, the document will have more noise and insignificant facets. The relevance computation between the candidate summary and the document may cause the facet-aware ranking to be influenced by insignificant facets.

The authors propose a Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) Framework based on semantic blocks, which consists of two stages with different granularities: semantic blocks and sentences. The coarse-level stage aims to filter insignificant facets via a coarse-level centrality estimator to measure the salience of blocks.

The running time of FAR will rise rapidly as the number of extracted sentences increases. Due to FAR needs to compute the relevance score number of combinations Ckm times, where k is the number of extracted summary sentences and m is the number of candidate salient sentences.

The authors propose a fine-level stage that can reduce the influence of facets with many sentences by only selecting several related sentences for the final ranking. This framework with a hierarchical coarse-to-fine structure can guarantee effective and efficient long document summarization.

Details

Metrics

ROUGE Word Overlap

Datasets

GovReport BillSum arXiv PubMed

Pipeline Components

Input Encoding

Challenges

Information Loss And Incoherence In Extractive Summarization Exploiting The Structure Of Long Documents

Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control

Haopeng Zhang, Semih Yavuz, Wojciech Kryscinsk, Kazuma Hashimoto, Yingbo Zhou

NAACL 2022 Paper

TL;DR: The paper discusses the limitations of abstractive summarization systems that use pre-training language models, which are prone to hallucinating facts that are not faithful to the input context. To address this issue, the authors propose a method called Entity Coverage Control (ECC) that computes entity coverage precision and adds a control code to each training example to guide the model to recognize faithful contents. They also extend their method through intermediate fine-tuning on noisy data extracted from Wikipedia to enable zero-shot summarization. The proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings, as demonstrated by experimental results on three benchmark datasets of different domains and styles.

Problems & Solutions

Existing abstractive summarization systems lack faithfulness of generated outputs, which may contain hallucinated or fabricated statements.

The authors propose to guide the model learning process with entity control code (ECC) to reduce hallucinated entities effectively without decreasing the fluency and salience of generated summaries. They utilize the entity coverage precision between the training document and its reference summary as faithfulness guidance and prepend it to the corresponding document in the training phase. They also prepend faithful control code during inference.

Summary hallucination at the entity level is a common problem in abstractive summarization systems.

The authors propose to address entity hallucination by guiding the model learning process with entity control code (ECC) and utilizing the entity coverage precision between the training document and its reference summary as faithfulness guidance.

Existing methods to address entity hallucination, such as post-processing and filtering the training data, have limitations.

The authors propose a new method that utilizes entity control code (ECC) to reduce hallucinated entities effectively without decreasing the quality of the summary. They also extend control code to a Wikipedia-based intermediate fine-tuning model, which generates faithful and salient summaries across domains in the zero-shot setting. They validate their methods on three benchmark datasets across different domains, and experimental results demonstrate the effectiveness of their methods.

Details

Metrics

ROUGE

Datasets

XSum PubMed SAMSUM

Pipeline Components

External Knowledge Controlled Generation

Challenges

Hallucinations In The Generated Summaries

FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

David Wan, Mohit Bansal

NAACL 2022 Code Paper

TL;DR: The paper presents FACTPEGASUS, an abstractive summarization model that focuses on factuality during pre-training and finetuning. The model uses a sentence selection strategy to create pseudosummaries that are both important and factual, and introduces three complementary components for fine-tuning: a corrector to remove hallucinations, a contrastor to differentiate factual from nonfactual summaries, and a connector to improve knowledge transfer. Experiments show that FACTPEGASUS substantially improves factuality and is more factual than using the original pre-training objective in zero-shot and few-shot settings, while also retaining factual behavior more robustly than strong baselines.

Problems & Solutions

Current abstractive summarization models suffer from the problem of hallucinations, where a summary contains facts or entities not present in the original document, raising questions about their trustworthiness for real-world applications.

The authors propose FACTPEGASUS, a model that addresses the problem of hallucinations holistically by incorporating factuality into the whole training pipeline. They explore incorporating factuality into the pre-training objective of PEGASUS and propose three complementary modules that further address factuality problems during fine-tuning: Corrector, Contrastor, and Connector.

Current pre-training objectives focus on improving the quality of the generated output in the downstream tasks but often overlook the factuality aspect.

The authors explore incorporating factuality into the pre-training objective of PEGASUS by combining ROUGE and the factuality metric FactCC as the selection criteria, so that the model learns to generate sentences that cover the most important information of the input document as well as remain faithful to it.

Postprocessing models that correct hallucinations are often constrained by external resources to train additional correction or selection models.

The authors propose a Corrector module that removes hallucinations existing in reference summaries, allowing training on the full training set without learning unfaithful behaviors.

Models often struggle to differentiate factual summaries from nonfactual ones during fine-tuning.

The authors propose a Contrastor module that encourages the model to better differentiate factual summaries from nonfactual ones by paying attention to the document using contrastive learning.

Models often struggle to adapt their knowledge of generating factual summaries directly to the downstream tasks during fine-tuning.

The authors propose a Connector module, a special mask-token fine-tuning technique enabled by the GSG-style objective, that simulates the pre-training task during fine-tuning by inserting the mask token into the input document so that the pre-trained model can adapt its knowledge of generating factual summaries directly to the downstream tasks.

Models often struggle to maintain factuality during fine-tuning.

The authors conduct thorough factuality analysis and show that FACTPEGASUS generates more factual summaries with no or little supervision, slows down factuality degradation observed for current models, and improves factuality not by becoming more extractive. They also highlight the importance of ensuring factuality during fine-tuning.

Details

Metrics

ROUGE BERTScore BLEURT

Datasets

XSum WikiHow Gigaword

Pipeline Components

Objective Function Controlled Generation

Challenges

Hallucinations In The Generated Summaries

Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer

Yuanjie Lyu, Chen Zhu, Tong Xu, Zikai Yin, Enhong Chen

CIKM 2022 Paper

TL;DR: The paper proposes a new model for abstractive summarization called Entity-Relation Pointer Generator Network (ERPGN) that formalizes the facts in the original document as a factual knowledge graph and generates a high-quality summary by directly modeling consistency between the summary and the knowledge graph. The model uses two pointer network structures to capture the facts in the original document and two semantic-level losses to measure the disagreement between the summary and the facts. The experiments show that ERPGN outperforms classic abstractive summarization models and state-of-the-art fact-aware baseline methods in terms of faithfulness.

Problems & Solutions

Abstractive summarization models are vulnerable to hallucinations, which can lead to inconsistencies between the summary and the original document.

The authors propose a fact-aware abstractive summarization model, named Entity-Relation Pointer Generator Network (ERPGN), which directly models the semantic consistency between a summary and a factual knowledge graph to improve faithfulness. They leverage two pointer network structures to capture the entities and relationships in the original document and design two extra semantic-level losses to measure the disagreement between the summary and facts from the original document.

Prior solutions to the hallucination problem have not directly modeled the semantic-level consistency between the summary and the original document, limiting their practical performance.

The authors propose a fact-aware approach that formalizes the faithfulness of abstractive summarization as the consistency between a summary and corresponding factual knowledge graph. They treat extrinsic hallucination as the inconsistency on entities and intrinsic hallucination as the semantic conflicts on relationships.

The rich semantic information in the original document, such as factual knowledge like entities and relations, has not been fully utilized to support the summarization task.

The authors extract the fact in original documents as a factual knowledge graph, which contains all the entities with their relationships mentioned in the documents. They use this graph to enhance the faithfulness of abstractive summarization via fact-aware consistency.

Most sequence-to-sequence solutions focus on the optimization of token-level likelihood, which may result in semantic-level inaccuracy.

The authors design two extra semantic-level losses to measure the disagreement between the summary and facts from the original document, in addition to the traditional token-level likelihood loss. This enhances the faithful abstractive summarization by integrating both token-level accuracy and semantic-level accuracy.

The authors aim to improve the faithfulness of abstractive summarization.

The authors propose a fact-aware abstractive summarization model, named ERPGN, which effectively integrates both token-level accuracy and semantic-level accuracy for enhancing the faithful abstractive summarization. Extensive experiments on two widely-used public datasets demonstrate that their ERPGN framework achieves significant improvements in terms of faithfulness by multiple factuality measurements.

Details

Metrics

ROUGE BLEU

Datasets

CNN/DailyMail XSum

Pipeline Components

Input Encoding Objective Function

Challenges

Hallucinations In The Generated Summaries

Attention Temperature Matters in Abstractive Summarization Distillation

Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei

ACL 2022 Code Paper

TL;DR: The paper discusses how abstractive text summarization relies on large, computationally expensive pre-trained sequence-to-sequence Transformer models, and proposes a method to distill these models into smaller ones with minimal performance loss. The method involves manipulating attention temperatures in Transformers to make pseudo labels easier to learn for student models. Experiments on three summarization datasets show that this method consistently improves vanilla pseudo-labeling based methods, and both pseudo labels and summaries produced by the student models are shorter and more abstractive. The code for the proposed method is available on GitHub.

Problems & Solutions

The authors aim to distill large Transformer summarization models into smaller ones with minimal loss in performance, as the large models are slow for online inference and difficult to use in production environments.

The authors propose a class of methods called knowledge distillation, which leverages the output of a (large) teacher model to guide the training of a (small) student model. An effective distillation method for Seq2Seq models is called pseudo-labeling, where the teacher model generates pseudo summaries for all documents in the training set and the resulting document–pseudo-summary pairs are used to train the student model.

The attention distributions of a Seq2Seq teacher model might be too sharp, resulting in sub-optimal pseudo labels for student models.

The authors propose a method called PLATE (Pseudo-labeling with Larger Attention TEmperature) to smooth attention distributions of teacher models. Specifically, attention weights in all attention modules are re-scaled with a higher temperature, leading to softer attention distributions. This reduces the copy bias and leading bias in pseudo summaries, encouraging student models to be more abstractive and take advantage of longer context in documents.

Pseudo summaries generated from the teacher model tend to copy more continuous text spans from original documents and summarize the leading part of a document.

PLATE reduces the copy bias and leading bias in pseudo summaries, resulting in shorter and more abstractive summaries generated by both teacher and student models, which matches the goal of abstractive summarization.

Details

Metrics

ROUGE

Datasets

CNN/DailyMail XSum NYT

Pipeline Components

Input Encoding External Knowledge

Challenges

Efficient Encoding Of Long Documents

Towards Abstractive Grounded Summarization of Podcast Transcripts

Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu

ACL 2022 Code Paper

TL;DR: The paper discusses the challenges of summarizing podcasts, including factual inconsistencies and speech disfluencies in transcripts. The authors propose a novel abstractive summarization method that grounds summary segments in specific regions of the transcript to improve summarization quality. They conducted a series of analyses on a large podcast dataset and found that their approach achieved promising results, improving both automatic and human evaluation of summarization quality.

Problems & Solutions

There is an increased demand for textual summaries of podcasts to help people decide if they want to listen to a podcast or subscribe to a channel, but current summarization methods may contain misleading or inaccurate information due to transcription errors and hallucinations.

Generate grounded summaries from podcast transcripts, where summary text is closely tethered to the original audio, allowing users to verify the information consistency of summary parts against the original audio clips.

Aligning summary text and podcast transcripts in a post-processing step to generate grounded summaries is difficult due to hallucinations that are not found in the transcripts, and attention weights are not reliable indicators of the relative importance of inputs.

Explore an on-demand abstractive summarizer that mimics how a human might approach a lengthy transcript, identifying a portion of the transcript that is deemed most important and relevant to the existing summary, and using it as a ground to produce a new piece of the summary. This approach allows for a novel regularization technique that enables the summarizer to visit portions of the transcript in chronological order, while allowing zigzags in order to produce a coherent summary.

Earlier research on extract-then-abstract methods require selected transcript chunks to have high salience, but do not consider the importance of the salient content appearing at the beginning of the selected chunks, making it difficult for users to start listening to the corresponding audio clips.

Require selected transcript chunks to have high salience and for the salient content to appear at the beginning of the selected chunks, so that the corresponding audio clips can provide good jump-in points for users to start listening.

Details

Metrics

ROUGE

Datasets

Spotify Podcast Dataset

Pipeline Components

Objective Function Input Encoding

Challenges

Hallucinations In The Generated Summaries Identifying Important Contents From The Document

Filter by Components of the Summarization Pipeline

1 Document Representation

2 Model Training

3 Summary Generation

Unsupervised Single Document Abstractive Summarization using Semantic Units

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

GenCompareSum: a hybrid unsupervised summarization method using salience

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Attention Temperature Matters in Abstractive Summarization Distillation

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges

Towards Abstractive Grounded Summarization of Podcast Transcripts

Problems & Solutions

Details

Metrics

Datasets

Pipeline Components

Challenges