Abstract:
Code clone detection remains one of the main challenges in maintaining software projects. Recently, state-of-the-art researches have shown that neural models based on abs...Show MoreMetadata
Abstract:
Code clone detection remains one of the main challenges in maintaining software projects. Recently, state-of-the-art researches have shown that neural models based on abstract syntax trees (ASTs) can better represent code fragment. However, existing tree-based models are prone to gradient vanishing problems due to the large size of ASTs. In this paper, we represent a code fragment as the set of compositional paths in its abstract syntax tree (AST) and use this code representation to train a classifier to detect clone pairs. Unlike the siamese based model that obtains the embeddings of code fragments separately and then computes the similarity in vector space, our compare-aggregate based network takes two code fragments as a whole to obtain the vectors for classification. To validate our model's ability to detect code clones, we evaluated it on the publicly available dataset BigCloneBench, and the experimental results show our model outperforms the state-of-the-art model ASTNN.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 20 September 2021
ISBN Information: