Graph-based event schema induction in open-domain corpus

View article
PeerJ Computer Science
We use OntoNotes sense grouping as our sense dictionary: https://verbs.colorado.edu/html_groupings/

Main article text

 

Introduction

  • We propose a graph construction method and extract event structure features from the graph to improve the clustering effect of the model.

  • We are not limited to clustering for the event schema induction task and the conceptualization of similar events is well done with the help of in-context learning to generate event schema directly.

  • We experimentally validate the effectiveness of the method. The structural features enhance the clustering model. And the generated event schemas have high Acceptable Ratio.

Problem Definition

Methodology

Word extraction

Extract candidate word sets

Salience-based filter

Word sense disambiguation

Graph builder

 
__________________________ 
Algorithm 1 : Graph Builder._____________________________________________________________________________ 
Input: Subgraph list [G1,G2,...,Gn], Gi = [nodesi,linksi]; 
     where nodesi is a list, contains predicate verb and entity nouns of sentence Si, linksi is a lists, each 
     element consists of two node ids, indicating that there is a correlation between the two; 
 1:  G ← ϕ; 
 2:  nodes ← ϕ;   // A hashmap, key is str, value is list. 
 3:  links ← ϕ;   // A list. 
 4:  // Step1. Concat all subgraphs 
 5:  for i ← 0 to n ∈ do 
 6:     for j ← 0 to Len(Gi[0]) do 
 7:        node = Gi[0]j; 
 8:        if node not in nodes then 
 9:           nodes ← Put(node, [i]); 
10:        else 
11:           nodes ← Append(links,Gi[1]); 
12:        end if 
13:     end for 
14:     links ← Append(links,Gi[1]) 
15:  end for 
16:  // Step2. Query and link concept node by ConceptNet WebAPI 
17:  for i ← 0 to Len(nodes) do 
18:     node,sentIds = Entry(nodesi); 
19:     edges = Query(node);    // request WebAPI. 
20:     for j ← 0 to Len(edges) do 
21:        conceptNode = edgesi[end]; 
22:        if conceptNode not in nodes then 
23:           nodes ← PutIfAbsent(conceptNode); 
24:           links ← Append([node,conceptNode]); 
25:        end if 
26:     end for 
27:  end for 
28:  G = [nodes,links] 
Return:  G;_________________________________________________________________________________________________________________________    

Text encoder

Graph autoencoder

Clustering

Conceptualization

Experiments

Datasets and evaluation metric

Datasets.

Cluster Metrics.

Induction metrics.

Experimental setting

Baselines

Results of event mention clustering

Ablation study of event mention clustering

  • w/o ht: Removing the graph autoencoder model means that graph structure embedding features are not generated, only text embedding features of predicates and entity nouns.

  • w/o Filter: Removing the filter means that the significance of the extracted words will not be considered, and the quality of the words may vary.

  • w/o WSD: Removing the word sense disambiguation model means that the polysemy problem of predicate verbs will not be considered, and polysemous words will be clustered as the same word.

  • As can be seen from the absolute percentages of the different model evaluation metrics in Tables 4 and 5, the highest scores are for the complete GESI model. Removing any component of the model results in a decrease in performance, which confirms that the inclusion of graph autoencoders, filtering, and disambiguation can all improve the clustering ability of the model.

  • From the relative changes of the different variants of the model, it can be seen that the largest performance degradation is in w/o  ht. This indicates that the structural features generated by the graph autoencoder have a greater effect on model enhancement.

  • From the comparison of each variant model with the complete GESI model, it can be seen that the w/o Filter variant model performs better than the w/o WSD variant model on the ACE 2005 dataset, and the opposite is true on the MAVEN-ERE dataset, suggesting that filtering is more important than disambiguation in the presence of large data sizes.

  • In terms of the performance of the model on different datasets, for example, the variant model w/o ht, ARI, NMI, ACC, and BCubed-F1 decreased by about 4%, 5.03%, 3.31%, 5.74% on ACE 2005 dataset. And on MAVEN-ERE dataset, ARI, NMI, ACC, and BCubed- F1 decreased by about 10.88%, 6.63%, 7.52%, and 6.77% respectively, after removing the graph autoencoder, of which ARI decreased more significantly. Compared with the ACE 2005 dataset, ARI decreased more than other metrics, indicating that the graph autoencoder significantly improved ARI for larger datasets.

Results of event schema induction

Conclusions and Future Work

Supplemental Information

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Keyu Yan conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, and approved the final draft.

Wei Liu conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Shaorong Xie analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Yan Peng conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code is available in the Supplemental File.

The MAVEN-ERE datasets are available at GitHub: https://github.com/THU-KEG/MAVEN-ERE.

The ACE 2005 datasets are available at: https://catalog.ldc.upenn.edu/LDC2006T06.

Funding

This work was supported by the Major Program of the National Natural Science Foundation of China (No. 61991410) and the Program of the Pujiang National Laboratory (No. P22KN00391). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

931 Visitors 866 Views 53 Downloads