Results 1 to 10 of about 11,357,928 (275)

Source code analysis dataset [PDF]

open access: yesData in Brief, 2019
The data in this article pair source code with three artifacts from 108,568 projects downloaded from Github that have a redistributable license and at least 10 stars.
Ben Gelman   +3 more
doaj   +2 more sources

Vulnerability Prediction From Source Code Using Machine Learning

open access: yesIEEE Access, 2020
As the role of information and communication technologies gradually increases in our lives, software security becomes a major issue to provide protection against malicious attempts and to avoid ending up with noncompensable damages to the system.
Zeki Bilgin   +5 more
doaj   +2 more sources

The Stack: 3 TB of permissively licensed source code [PDF]

open access: yesTrans. Mach. Learn. Res., 2022
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation.
Denis Kocetkov   +12 more
semanticscholar   +1 more source

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection [PDF]

open access: yesInternational Symposium on Recent Advances in Intrusion Detection, 2023
We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects.
Yizheng Chen   +4 more
semanticscholar   +1 more source

An Empirical Comparison of Pre-Trained Models of Source Code [PDF]

open access: yesInternational Conference on Software Engineering, 2023
While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly limited.
Changan Niu   +5 more
semanticscholar   +1 more source

NatGen: generative pre-training by “naturalizing” source code [PDF]

open access: yesESEC/SIGSOFT FSE, 2022
Pre-trained Generative Language models (e.g., PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation. These models have adopted varying pre-training objectives to
Saikat Chakraborty   +4 more
semanticscholar   +1 more source

VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection [PDF]

open access: yesIEEE International Joint Conference on Neural Network, 2022
This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects.
Hazim Hanif, S. Maffeis
semanticscholar   +1 more source

CoditT5: Pretraining for Source Code and Natural Language Editing [PDF]

open access: yesInternational Conference on Automated Software Engineering, 2022
Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits.
Jiyang Zhang   +4 more
semanticscholar   +1 more source

A Transformer-based Approach for Source Code Summarization [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2020
Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range ...
Wasi Uddin Ahmad   +3 more
semanticscholar   +1 more source

Semantic similarity metrics for evaluating source code summarization [PDF]

open access: yesIEEE International Conference on Program Comprehension, 2022
Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs.
S. Haque   +3 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy