Expert answer:Summary of 2 chapter

Uncategorized

Solved by verified expert:I need Clear and GOOD summary of the following two chapter in 24 hrs. (no plagiarism)The summary should be done using APA style and 2 pages, double spaced, 12 font size, times new roman.
summary.pdf

Unformatted Attachment Preview

3
Methods Using Semantics and Discourse
All methods overviewed in the previous section compute sentence
importance on the basis of repeated occurrence of the same word in
diﬀerent places in the input. Even early researchers acknowledged that
better understanding of the input would be achieved via methods that
instead track references to the same entity, or the same topic in the
document. These methods either rely on existing manually constructed
semantic resources (lexical chains, concepts), on coreference tools, or on
knowledge about lexical items induced from large collections of unannotated text (Latent Semantic Analysis, verb specificity).
3.1
Lexical Chains and Related Approaches
Lexical chains [7, 64, 192] attempt to represent topics that are discussed
throughout a text or text segment. They capture semantic similarity
between noun phrases to determine the importance of sentences. The
lexical chains approach exploits the intuition that topics are expressed
using not a single word but instead diﬀerent related words. For example, the occurrence of the words “car”, “wheel”, “seat”, “passenger”
indicates a clear topic, even if each of the words is not by itself very
frequent. The approach heavily relies on WordNet [137], a manually
143
144
Methods Using Semantics and Discourse
compiled thesaurus which lists the diﬀerent sense of each word, as well
as word relationships such as synonymy, antonymy, part-whole and
general-specific. In addition, the lexical chains approach requires some
degree of linguistic preprocessing, including part of speech tagging and
division into topically related segments of the input to the summarizer.
Barzilay and Elhadad [7] present a summarizer that segments an
input document, identifies lexical chains first within segments and then
across segments, identifies and scores lexical chains, and finally selects
one sentence for each of the most highly scored chains.
A large part of Barzilay and Elhadad’s work is on new methods
for constructing good lexical chains, with emphasis on word sense disambiguation of words with multiple meaning: for example the word
“bank” can mean a financial institution or the land near a river or
lake. They develop an algorithm that improves on previous work by
waiting to disambiguate polysemous words until all possible chains for
a text have been constructed; word senses are disambiguated by selecting the interpretations (i.e., chains) with the most connections in the
text. Later research further improved both the run-time of the algorithms for building of lexical chains, and the accuracy of word sense
disambiguation [64, 192].
Barzilay and Elhadad claim that the most prevalent discourse topic
will play an important role in the summary and argue that lexical chains
provide a better indication of discourse topic than does word frequency
simply because diﬀerent words may refer to the same topic. They define
the strength of a lexical chain by its length, defined as the number of
words found to be members of the same chain, and its homogeneity,
where homogeneity captures the number of distinct lexical items in the
chain divided by its length. They build the summary by extracting
a sentence for each strong chain, choosing the first sentence in the
document containing a representative word for the chain.
In later work, researchers chose to avoid the problem of word
sense disambiguation altogether but still used WordNet to track the
frequency of all members of a concept set. In the robust multidocument summarization system dems [186], concepts were derived
using WordNet synonyms, hypernyms and hyponyms relations. Rather
than attempting to disambiguate polysemous words and only then find
semantically related words, as was done in the lexical chains approach,
3.2 Latent Semantic Analysis
145
in the dems system, words with more than five senses (“matter”,
“issue”, etc.) are excluded from consideration. Given that many common words are polysemous, this policy of exclusion can be viewed as too
restrictive. In order to compensate for the loss of information, highly
polysemous words were replaced by other nouns that were strongly
associated with the same verb. For example if the word “oﬃcer” is
excluded from consideration because it has many senses, “policeman”
would be added in, because both nouns are strongly associated with
the verb “arrest”.
After concepts are formed, frequency information can be collected
much more accurately, counting the occurrence of a concept rather than
a specific word. Sample concepts for one article consisted of C1 = {war,
campaign, warfare, eﬀort, cause, operation, conflict}, C2 = {concern,
carrier, worry, fear, scare}, C3 = {home, base, source, support, backing}. Each of the individual words in the concept could appear only
once or twice in the input, but the concept itself appeared in the document frequently.
Shallow semantic interpretation on the level of concepts was also
employed by Ye et al. [224]. They also used WordNet to derive the
concepts, but to find semantically related words they employ a measure
of the content overlap of the WordNet definitions, called glosses, of two
words rather than the WordNet relations. The intuition is that the more
content is shared in the definitions, the more related two words are.
Example concepts derived using their approach are {British, Britain,
UK}, {war, fought, conflict, military}, {area, zone}.
The heavy reliance on WordNet is clearly a bottleneck for the
approaches above, because success is constrained by the coverage of
WordNet and the sense granularity annotated there. Because of this,
robust methods that do not use a specific static hand-crafted resource
have much appeal, explaining the adoption of Latent Semantic Analysis
as an approximation for semantic interpretation of the input.
3.2
Latent Semantic Analysis
Latent semantic analysis (LSA) [46] is a robust unsupervised technique
for deriving an implicit representation of text semantics based on
observed co-occurrence of words. Gong and Liu [69] proposed the use
146
Methods Using Semantics and Discourse
of LSA for single and multi-document generic summarization of news,
as a way of identifying important topics in documents without the use
of lexical resources such as WordNet.
At the heart of the approach is the representation of the input documents as a word by sentence matrix A: each row corresponds to a word
that appears in the input and each column corresponds to a sentence
in the input. Each entry aij of the matrix corresponds to the weight
of word i in sentence j. If the sentence does not contain the word, the
weight is zero, otherwise the weight is equal to the TF∗ IDF weight of
the word. Standard techniques for singular value decomposition (SVD)
from linear algebra are applied to the matrix A, to represent it as
the product of three matrices: A = U ΣV T . Gong and Liu suggested
that the rows of V T can be regarded as mutually independent topics
discussed in the input, while each column represents a sentence from
the document. In order to produce an extractive summary, they consecutively consider each row of V T , and select the sentence with the
highest value, until the desired summary length is reached. Steinberger
et al. [195] later provided an analysis of several variations of Gong and
Liu’s method, improving over the original method. Neither method has
been directly compared with any of the approaches that rely on WordNet for semantic analysis, or with TF∗ IDF or topic word summarizers.
An alternative way of using the singular value decomposition
approach was put forward by Hachey et al. [73]. They followed more
directly the original LSA approach, and build the initial matrix A based
on the information of word occurrence in a large collection of documents
instead of based on the input documents to be summarized. They compared the performance of their approach with and without SVD, and
with a TF∗ IDF summarizer. SVD helped improve sentence selection
results over a general co-occurrence method but did not significantly
outperform the TF∗ IDF summarizer.
3.3
Coreference Information
Yet another way of tracking lexically diﬀerent references to the same
semantic entity is the use of coreference resolution. Coreference resolution is the process of finding all references to the same entity in a
3.4 Rhetorical Structure Theory
147
document, regardless of the syntactic form of the reference: full noun
phrase or pronoun.
Initial use of coreference information exclusively to determine
sentence importance for summarization [4, 18] did not lead to substantial
improvements in content selection compared to shallower methods.
However, later work has demonstrated that coreference resolution can
be incorporated in and substantially improve summarization systems
that rely on word frequency features. A case in point is a study on generic
single document summarization of news carried out by Steinberger
et al. [195]. The output of an automatic system for anaphora resolution
was used to augment an LSA-driven summarizer [69]. In one experiment,
all references to the same entity, including those when pronouns were
used, were replaced by the first mention of that entity and the resulting text was given as an input to the traditional LSA summarizer. In
another experiment, the presence of an entity in a sentence was used as
an additional feature to consider when determining the importance of the
sentence, and the references themselves remained unchanged. The first
approach led to a decrease in performance compared to the traditional
LSA summarizer, while the second gave significant improvements. An
oracle study with gold-standard manual coreference resolution showed
that there is further potential for improvement as the performance of
coreference systems gets better.
3.4
Rhetorical Structure Theory
Other research uses analysis of the discourse structure of the input
document to produce single document summaries. Rhetorical Structure
Theory (RST) [117], which requires the overall structure of a text to be
represented by a tree, a special type of graph (see Section 2.1.3), is one
such approach that has been applied to summarization. In RST, the
smallest units of text analysis are elementary discourse units (EDUs),
which are in most cases sub-sentential clauses. Adjacent EDUs are combined through rhetorical relations into larger spans. The larger units
recursively participate in relations, yielding a hierarchical tree structure covering the entire text. The discourse units participating in a
relation are assigned nucleus or satellite status; a nucleus is considered
148
Methods Using Semantics and Discourse
to be more central in the text than a satellite. Relations characterized
by the presence of a nucleus and a satellite are called mononuclear
relations. Relations can also be multinuclear, when the information in
both participating EDUs is considered equally important. Properties
of the RST tree used in summarization include the nucleus–satellite
distinction, notions of salience and the level of an EDU in the tree.
In early work, Ono et al. [159] suggested a penalty score for every
EDU based on the nucleus–satellite structure of the RST tree. Satellite
spans are considered less essential than spans containing the nucleus
of a relation. With the Ono penalty, spans that appear with satellite
status are assigned a lower score than spans which mostly take nucleus
status. The penalty is defined as the number of satellite nodes found
on the path from the root of the discourse tree to the EDU.
Marcu [120] proposes another method to utilize the nucleus–
satellite distinction, rewarding nucleus status instead of penalizing
satellite. He put forward the idea of a promotion set, consisting of
salient/important units of a text span. The nucleus is considered as
the more salient unit in the full span of a mononuclear relation. In a
multinuclear relation, all the nuclei become salient units of the larger
span. At the leaves, salient units are the EDUs themselves. Under this
framework, a relation between two spans is defined to hold between the
salient units in their respective promotion sets. Units in the promotion
sets of nodes close to the root are hypothesized to be more important
than those appearing at lower levels. The highest promotion of an EDU
occurs at the node closest to the root which contains that EDU in its
promotion set. The depth of the tree from this node gives the importance for that EDU. The closer to the root an EDU is promoted, the
better its score.
A further modification of the idea of a promotion set [120] takes into
account the length of the path from an EDU to the highest promotion
set it appears in. An EDU promoted successively over multiple levels
should be more important than one which is promoted fewer times. The
depth score fails to make this distinction; all EDUs in the promotion
sets of nodes at the same level receive the same scores. In order to
overcome this, a promotion score was introduced which is a measure of
the number of levels over which an EDU is promoted.
3.5 Discourse-motivated Graph Representations of Text
149
The RST approach for content selection has been shown to give
good results for single document summarization of news and Scientific
American articles [119, 120, 122].
3.5
Discourse-motivated Graph Representations of Text
In the RST based approaches, the importance of a discourse segment
is calculated on the basis of the depth of the discourse tree and the
position of the segment in it, relation importance, and nuclearity and
satellite status. Marcu’s work on using RST for single document summarization has been the most comprehensive study of tree-based text
representations for summarization [119, 120, 122] but suggestions for
using RST for summarization were proposed even earlier [159].
Graph-based summarization methods are very flexible and allow for
the smooth incorporation of discourse and semantic information. For
example, graph representations of a text that are more linguisticallyinformed than simply using sentence similarity can be created using
information about the discourse relations that hold between sentences.
Wolf and Gibson [219] have demonstrated that such discourse-driven
graph representations are more powerful for summarization than wordor sentence level frequency for single document summarization. In their
work, sentences again are represented by vertices in a graph, but the
edges between vertices are defined by the presence of discourse coherence relation between the sentences. For example, there is a cause–eﬀect
relation between the sentences “There was bad weather at the airport.
So our flight got delayed.” Other discourse relations included violated
expectation, condition, similarity, elaboration, attribution and temporal
sequence. After the construction of the graph representation of text,
the importance of each sentence is computed as the stationary distribution of the Markov chain, as in the sentence-similarity graph methods.
Wolf and Gibson claim that their method outperformed summarization approaches using more restricted discourse representations such
as RST.
Both the RST approach and the GraphBank work rely on the
structure of the text, be it a tree or a general graph, to define importance of sentences. In recent work [110], the RST and GraphBank
150
Methods Using Semantics and Discourse
methods were again compared directly with each other, as well as
against discourse information that also included the semantic type of
the discourse relation and non-discourse features including topic words,
word probabilities and sentence position. The summarizers were tested
on single-document summarization of news. Of the three classes of
features — structural, semantic and non-discourse — the structural
features proposed by Marcu lead to the best summaries in terms of content. The three classes of features are complimentary to each other, and
their combination results in even better summaries. Such results indicate that the development of robust discourse parsers has the potential
of contributing to more meaningful input interpretation, and overall
better summarization performance.
3.6
Discussion
The discourse-informed summarization approaches described in this
section are appealing because they oﬀer perspectives for more semantically and linguistically rich treatment of text for summarization. At the
same time, these methods require additional processing time for coreference resolution, lexical chain disambiguation or discourse structure.
Because of this, they would often not be used for applications in which
speed is of great importance. Furthermore, the above approaches have
been tested only on single document summarization and have not been
extended for multi-document summarization or for genre specific summarization. Discourse-informed summarization approaches will likely
have a comeback as recent DUC/TAC evaluations have shown that
systems have gotten impressively good at selecting important content
but lack in linguistic quality and organization. Addressing these harder
problems will require linguistic processing anyway and some of the discourse approaches could be used as the basis for techniques for improving the linguistic quality of the summaries rather than the content
selection capabilities of summarizers (see Section 4).
In concluding this section, we would like to point out that the use of
semantic interpretation in summarization seems intuitively necessary:
the majority of summarization systems still process text at the word
level, with minor pre-processing such as tokenization and stemming.
3.6 Discussion
151
Current systems are complex, relying on multiple representations of
the input and features and algorithms to compute importance. Virtually no recent work has attempted to analyze the direct benefits of
using semantic representations, either based on WordNet or derived
from large corpora. In this sense, the development and assessment of
semantic modules for summarization remains much of an open problem. No clear answers can be given to questions such as how much
run-time overhead is incurred by using such methods. This can be
considerable, for example, if word sense disambiguation for full lexical chains is performed, or a coreference resolution system is run as
a pre-processing step. Also unclear is by how much content selection
is improved compared to simpler methods that do not use semantics
at all? Future research is likely to address these questions, because the
need for semantic interpretation will grow as summarization approaches
move toward the goal of producing abstractive rather than extractive
summaries, which will most likely require semantics.
4
Generation for Summarization
Determining which sentences in the input documents are important
and summary-worthy can be done very successfully in the extractive
summarization framework. Sentence extraction, however, falls short of
producing optimal summaries both in terms of content and linguistic
quality. In contrast to most systems, people tend to produce abstractive
summaries, rewriting unclear phrases and paraphrasing to produce a
concise version of the content found in the input. They do also re-use
portions of the input document, but they often cut and paste pieces
of input sentences instead of using the full sentence in the summary.
While extensive research has been done on extractive summarization,
very little work has been carried out in the abstractive framework. It is
clearly time to make more headway on this alternative approach.
Content of an extractive summary may inadvertently include
unnecessary detail along with salient information. Once an extractive
approach determines that a sentence in an input document contains
important information, all information in the sentence will b …
Purchase answer to see full
attachment

How it works

Paste your instructions in the instructions box. You can also attach an instructions file
Select the writer category, deadline, education level and review the instructions
Make a payment for the order to be assignment to a writer
Download the paper after the writer uploads it

Will the writer plagiarize my essay?

You will get a plagiarism-free paper and you can get an originality report upon request.

Is this service safe?

All the personal information is confidential and we have 100% safe payment methods. We also guarantee good grades

Continue to order Get a quote

Place your order

Type of paper

Academic level

Deadline

Pages (550 words)

Approximate price: $22

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Expert answer:Summary of 2 chapter

How it works

Will the writer plagiarize my essay?

Is this service safe?

Recent Posts

Recent Comments

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee