Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types

Holt, Arthur M.; Troy, Ang Michael; Smalheiser, Neil R.

doi:10.1186/s13063-025-08741-w

Research
Open access
Published: 31 January 2025

Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types

Arthur M. Holt¹,
Ang Michael Troy¹ &
Neil R. Smalheiser¹

Trials volume 26, Article number: 34 (2025) Cite this article

689 Accesses
2 Citations
Metrics details

Abstract

Background

Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles.

Methods

Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics.

Results

Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall.

Conclusions

The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.

Peer Review reports

Background

Much effort has been devoted to creating infrastructure to track the progress and results of clinical trials. On the one hand, trial registries such as ClinicalTrials.gov (and many international registries) assign a unique number to each registered trial, and the registries permit trial organizers to submit follow-up information including results, any observed adverse effects, and any publications that arise from the trial. Unfortunately, trial organizers often fail to update the registry with results or to link their publications to the registered trial, either directly by submitting the publication information to the registry or indirectly, by mentioning the registry number in the abstract or associated metadata of the publication. Only about half of registered trials result in any publications [1,2,3], and only about half of publications explicitly list the registry number [4, 5]. Thus, investigators interested in studying specific trials, or trials in general, must undertake onerous and uncertain manual literature searches to attempt to find the linked publications.

A variety of manual search strategies and automated tools have been developed to assist investigators in linking trials to their publications. Huser and Cimino examined a limited set of interventional trials in detail and found the majority had no structured links to result publications [4] and had previously estimated that the PMID-link precision from ClinicalTrials.gov to PubMed was between 63 and 96% [6]. Similarly, Bashir et al. [5] examined in detail a set of studies and found low levels of automatically linked trial result publications as well as determined that the use of automatic links to registry entries has not increased over time. Goodwin et al. developed a deep learning model to connect trials with result publications with some features that include mention of trial registry numbers [7]. Dunn et al. compared multiple document similarity methodologies and found them to be effective at identifying previously unreported links [8]. Altman et al. launched the Linked Reports of Clinical Trials project to address the lack of reliable connections between protocol, analysis, primary trial results, and secondary analysis publications [9]. Pan and Roberts developed a transformer-based language model to link cancer trials to results publications [10].

Most of these methods consider evidence obtainable from PubMed, namely, the title, abstract, author, journal, and other metadata with which articles are indexed. However, some articles do not mention the trial registry number in metadata but only within the full-text. We hypothesized that NCT and other registry numbers mentioned in different article sections will be differently associated with different types of articles, such that knowing the specific location of a registry mention can inform the automated linking of trials to publications.

In this paper, we have examined the distribution of trial registry numbers mentioned within the full-text of open access articles and author manuscripts deposited in PubMed Central (PMC). This dataset contains the full-text content of more than 7 million journal-submitted and NIH-funded author-submitted articles across all subjects, almost a quarter of the entire set of PubMed-indexed literature, encompassing a wide range of articles published both in traditional journals (e.g., JAMA and Trials) as well as dedicated open access journals. There is no reason to suspect that open access articles differ systematically from non-open access articles in how their inherent sections (e.g., Introduction-Methods-Results-Discussion) are written or structured. Thus, these can be regarded as a representative sample of the biomedical literature for present purposes.

Methods

Full-text sources

PubMed Central® (PMC) is an archive of full-text content for biomedical and life sciences journal literature at the US National Institutes of Health’s National Library of Medicine [11]. First released in the year 2000, the PMC now contains the full-text content of more than 10 million journal-submitted and NIH-funded author-submitted articles. For this analysis, we make use of the Open Access subset and Author Manuscripts provided as bulk downloads for the purpose of reading or text mining [12]. Bulk collections were downloaded from https://ftp.ncbi.nlm.nih.gov/pub/pmc in XML format, comprising 7,138,912 documents, of which 6,901,686 were accompanied with PubMed identifiers as of July 7, 2024.

Open Access—Commercial License, 4,135,362 documents
Open Access—Non-Commercial License, 1,616,427 documents
Author Manuscripts, 1,387,123 documents

Registry identifiers

Seventeen search criteria were encoded to identify clinical trial registry numbers for ClinicalTrials.gov and 16 international trial registries. Since the NCT trial identifiers can be readily confirmed with the publicly accessible ClinicalTrials.gov database, more tolerance was given to misspellings, superfluous characters, and extra spaces. Additional File, Table 1A shows the full list of registries and character patterns searched within PMC documents.

XML scanning

The XML for each document was scanned recursively with the python XML library [13] and the inner text of each element was checked for the presence of any one of the 17 trial registry regular expressions (Additional File, Table 1A). When found, a registry number was outputted to a database table along with a modified form of the entire accumulated XML path where it was found as well as the PMC and PubMed identifiers. Modification of the XML paths was necessary to develop detailed information about the registry number location within the full text, which vary widely in structure and section designations. Each time a < sec > tag was encountered, the following < title > tag, which generally provides the section title, was retrieved and added to the XML path to enable further analysis and filtering by detailed section.

Section identification

For analysis purposes, 5 generalized section types were designated such that they are mutually exclusive and applicable to most publication formats. Discovered registry numbers were tagged with either Metadata, Introduction, Methods, Results, Conclusions, Tables, or Other using a case-insensitive XML path criteria, as shown in Additional File, Table 2A. For the most part, registry numbers mentioned in the “Metadata” section contained explicit mentions of NCT numbers in the article abstract, though occasionally it was found instead in other fields such as Secondary Source ID or Corporate Author. We capitalize the term Metadata when it refers to the extracted generalized section type.

Analysis data preparation

For discovered links containing trials from ClinicalTrials.gov, the extracted NCT numbers were stripped of extraneous characters such as;:/ and spaces. The resulting condensed registry numbers were compared to the current set of existing clinical trial registry numbers from ClinicalTrials.gov [14]. Mentions of NCT numbers not found to match an existing registered trial, likely due to typographic errors, were discarded. No attempt was made to validate non-US registry identifiers.

We augmented the list of publication-registry links with publication types, as indexed by the NLM. Articles only indexed as “Journal Article” or “National Research Support,” or lacking any NLM indexing at all, were further processed using the Multi-Tagger 1.0 tool [15], and assigned to one or more of 50 specific publication types or study designs if predicted by that tool. Those lacking predicted Multi-Tagger publication types were marked as NONE for further analysis (many of these were nonclinical articles such as biological, preclinical, or biochemical studies). A total of 323,250 unique publications were found with a trial registry of any country mentioned, of which 176,983 had NLM indexed publication types and another 79,592 received inferred publication types from the Multi-Tagger 1.0 model. A total of 3766 documents lacking PubMed identifiers (PMID) were excluded from the analysis data set.

Our group previously created a machine learning-based model (Trials to Publications [16]) to predict, for a given registered trial in ClinicalTrials.gov, the set of clinical trial articles that arise from that trial, even when no NCT number is mentioned in the Abstract or other Metadata. Briefly, multiple textual features from specific sections of the ClinicalTrials.gov registered trial entry were compared against title, abstract, investigator names, and other metadata fields of candidate articles; as well, candidate articles were compared against other articles already known to be linked to that registered trial [16]. Each PubMed article is given a probabilistic estimate (0 < p < 1) that it presents a clinical trial outcome of that registered trial. Model scores for the Trials to Publications model were applied for the evaluations shown below in Fig. 2.

Results

As shown in Fig. 1, NCT registry numbers are mentioned quite often within the full-text of articles and can appear in any section. The Metadata and Methods sections tend to mention a single NCT number per article, describing one specific trial under discussion. In contrast, an article that mentions NCT numbers in Tables tended to list multiple NCT numbers, i.e., discussing an entire set of trials. Only about half of NCT numbers mentioned within full-text sections were mentioned again in the Metadata, and are thus hidden from PubMed literature searches (Additional File, Table 10A). For example, 69% of the trials mentioned in the Methods section failed to appear in the Metadata. Conversely, when a NCT number is explicitly mentioned in the Metadata, it is mentioned in the Methods section in only about half of cases, and rarely mentioned elsewhere in full-text. Thus, registry numbers mentioned in full-text are non-redundant and complementary to those found in article Metadata.

Publication types associated with registry mentions in different sections

We found that articles mentioning registry numbers in different full-text sections were associated with very different publication types. As shown in Table 1, articles with NCT numbers mentioned in Abstract or other Metadata fields are almost entirely clinical trials, with the most frequent being randomized controlled trials. Similarly, articles that mention NCT numbers in the Methods section are also predominantly clinical trial articles (Table 2). In contrast, NCT numbers located inside Tables are almost entirely from reviews (Table 3). When NCT numbers are mentioned in Introduction, Results, Conclusions, or Other sections of full-text, the results are not strongly predictive of any particular publication types (Additional File, Tables 3A–9A).

Table 1 Top publication types for NCT numbers in Metadata

Full size table

Shown are the most frequent publication types, displayed in order of frequency, up to the point where they cumulatively comprise 80% of the total. Trial mentions in Metadata stem primarily from clinical trial articles. Articles only indexed as “Journal Article” or “National Research Support,” or lacking any NLM indexing at all, were designated as having publication type “NONE”.

Table 2 Top publication types for NCT numbers in Methods section

Full size table

Table 3 Top publication types for NCT numbers mentioned in Tables

Full size table

The article section distribution and publication type associations observed for NCT number mentions (ClinicalTrials.gov) were also similar for articles that mention foreign and international registry numbers (Additional File, Fig. 4A and Tables 11A–15A).

Trials to publications model score evaluation

Articles mentioning NCT numbers only in the Methods section exhibited a single peak of Trials to Publications model [16] scores at the high end of the range (90–100%), which was similar to the peak of scores exhibited when applied to articles mentioning NCT numbers in Metadata (Fig. 2). Although 54.6% of these articles mentioning NCT only in Methods had model predictive scores of 90% or higher, only 29.8% had scores of 98% or higher. Conversely, of those mentioning NCT numbers in Tables, only 2.6% had model predictive scores above 90% (Fig. 2). Scores were very heterogeneous for other full-text sections (Additional File, Figs. 2A and 3A). These results agree with previous evaluations that indicate that the tool has high precision for identifying additional linked publications [16], but also indicates that when registry mentions are not explicitly listed in metadata, textual clues alone could identify only roughly 1/3–1/2 of the publications in this category with high confidence. It is important to note that among articles that are indexed by NLM as Clinical Trial [Publication Type], that mentioned NCT numbers only in the Methods section, manual inspection of 50 randomly chosen articles verified that all 50 were indeed discussing the results of the registered trial associated with that NCT number, regardless of the predicted Trials to Publications score. Thus, clinical trial articles mentioning NCT numbers in Methods sections can be regarded as reliably linking trials to publications that describe clinical outcomes. This underscores the need to extract NCT mentions from full-text articles for better recall of linked publications.

Relation of registry identifier mentions to CONSORT and SPIRIT guidelines

Many journals now implement reporting standards such as CONSORT and SPIRIT, which require that registry entries be explicitly mentioned in specific locations within manuscripts. However, it is not clear whether adoption of these guidelines in articles actually increases the reporting of NCT and other registry numbers, and whether NCT mentions are more likely to be located in Metadata where they can be automatically linked to their underlying trials. To answer these questions, we compared articles within our dataset vs. those which mentioned the term “CONSORT” or “SPIRIT” (all-capitalized) anywhere in the metadata or full-text, with particular attention to articles indexed as Clinical Trial [Publication Type]. (Most of the CONSORT articles are expected to be reporting clinical outcomes of a particular trial, whereas SPIRIT articles are largely clinical trial protocols.) CONSORT guidelines were first published in 1996 [17], and SPIRIT in 2013 [18]. We analyzed articles from 2010 (for CONSORT) and 2015 (for SPIRIT) through 2023 to ensure that there was adequate time to adopt the guidelines and to focus on recent trends; the results were tabulated as to how often they mentioned NCT or other registry numbers at all, and if so, in which article sections. About 1/5 of clinical trial articles overall mentioned CONSORT and 1/5 mentioned SPIRIT.

As shown in Fig. 3A, among all articles indexed as Clinical Trial [Publication Type], mentions of registry numbers (anywhere in the article) have increased steadily, from 28% in 2010 to 82% in 2023. Clinical Trial articles that mentioned CONSORT mentioned registry numbers in 75% of articles in 2010, increasing to 89% in 2023. SPIRIT mentions of registry numbers similarly converged to 91% by 2023. Thus, although overall reporting of registry numbers has greatly improved over time, articles that mention (and presumably incorporate) CONSORT or SPIRIT guidelines show even better compliance with reporting registry numbers.

Figure 3B shows, for those clinical trial articles that did mention one or more registry numbers (anywhere in the article), the distribution of mentions in specific article sections. Overall, there has been a dramatic increase in mentions in abstract/metadata, from 47% (2010) to 78% in 2023, though this still means that nearly 1/4 of these articles overall are not automatically linked to their ClinicalTrials.gov trials at present, and are not directly findable through PubMed searches. Articles mentioning CONSORT listed registry numbers in abstract/metadata more often (from 75% in 2010 to 81% in 2023), though currently about 1/5 of these articles still fail to be automatically linked to their registered trials. In contrast, those mentioning SPIRIT currently mention registry numbers in abstract/metadata in almost every article (94% in 2023). Both overall and in articles that mentioned CONSORT, about half mentioned registry numbers in the Methods Sect. (51% vs. 53% in 2023, respectively) and SPIRIT articles actually showed relatively fewer mentions in Methods (36% in 2023), which presumably reflects the fact that many of these are clinical trial protocol articles.

Discussion

The present study shows that there is definite value in extracting registry number mentions from the full-text of biomedical articles, and in knowing in which section(s) the mention(s) occur. In particular, articles that mention registry numbers within Abstract/Metadata or the Methods section largely comprise clinical trial articles, which are reporting clinical outcomes related to the mentioned registered trials. In contrast, publications mentioning registry numbers within Tables are nearly all reviews. Registry identifiers mentioned in other full-text sections such as Introduction, Results, or Conclusions comprise a heterogeneous mix of reviews, clinical trial related publications, and nonclinical article types, and thus, have limited value for linking trials to their publications. Articles which mentioned CONSORT or SPIRIT reporting guidelines mentioned registry numbers most of the time (and specifically in the Abstract or other Metadata), so that they are more likely to be automatically linked with their registered trials than articles overall.

Our findings indicate the need for users to carry out full-text search of bibliographic databases, in order to link trials to publications with high recall. Our Trials to Publications tool [16], which employs textual features from PubMed article metadata, was able to infer links in only 1/3–1/2 of articles that mentioned NCT numbers selectively in the Methods section. Neither PubMed nor the Cochrane Central Register of Controlled Trials (CENTRAL) contain full-text articles or permit full-text search. However, users can be advised to utilize the public full-text search capabilities of the ~ 10 million PubMed Central database and/or the ~ 250 million OpenAlex database [19]. We also plan to improve recall of the Trials to Publications tool in the future by querying all NCT numbers in full-text of these databases and extracting registry identifier mentions within the articles retrieved. These data will be integrated with the other predictions of the tool and made publicly available.

A byproduct of this study is that simply knowing where a registry number is mentioned in full-text provides a strong clue as to an article’s publication types and study designs. This may be relevant to a separate project in our laboratory that seeks to automatically index biomedical articles according to their publication types and study designs [15, 20]. To date, we have only used features derived from title, abstract, and other metadata, initially using a support vector machine (SVM)-based platform [15] and more recently using a Transformer (PubMedBERT) model [20]. It will be interesting to learn if extending this model to consider full-text features will improve its ability to assign publication types and study designs.

A limitation of our study is that it is not currently feasible to obtain large, comprehensive collections of non-open access full-text articles, so we examined open access articles and author manuscripts deposited into PubMed Central. As mentioned in the Introduction, we feel that PMC is likely to be representative of the literature as a whole for this purpose, since is a relatively large set (> 7 million articles) which covers a wide variety of journals and article types. In particular, there is no reason to expect that non-open access articles will exhibit substantially different patterns of article section mentions than open access articles. Nevertheless, in the future, large-scale full-text querying of OpenAlex or other comprehensive databases should allow us to compare the quantitative results for non-open access vs. open access articles directly.

Conclusions

The appearance of ClinicalTrials.gov and international registry numbers within full-text of medical publications, and their location within different sections, are indicative of the nature of the publication and type of study being presented. Full-text registry mentions substantially augment the number of articles that can be found via metadata-based searches within tools such as PubMed.gov or Trials to Publications [16]. The present study indicates the potential value of extracting full-text registry number mentions for linking trials to publications and for indexing publication types.

Data availability

The Additional File generated and analyzed during the current study is available in the Dryad repository, https://datadryad.org/https://doi.org/10.5061/dryad.dbrv15fb1.

Abbreviations

MeSH:: Medical Subject Heading
NCT:: National Clinical Trial number, ClinicalTrials.gov identifier for a registered trial
NLM:: National Library of Medicine
NLP:: Natural language processing
PMID:: PubMed Reference Number
PMC:: PubMed Central
RCT:: Randomized controlled trial
SVM:: Support vector machine

References

Manzoli L, Flacco ME, D’Addario M, Capasso L, De Vito C, Marzuillo C, Villari P, Ioannidis JP. Non-publication and delayed publication of randomized trials on vaccines: survey. BMJ. 2014May;16(348): g3058. https://doi.org/10.1136/bmj.g3058.
Article Google Scholar
Sreekrishnan A, Mampre D, Ormseth C, Miyares L, Leasure A, Ross JS, Sheth KN. Publication and dissemination of results in clinical trials of neurology. JAMA Neurol. 2018Jul 1;75(7):890–1. https://doi.org/10.1001/jamaneurol.2018.0674.
Article PubMed PubMed Central Google Scholar
Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.gov: a cross-sectional analysis. PLoS medicine. 2009;6(9):e1000144. https://doi.org/10.1371/journal.pmed.1000144
Huser V, Cimino JJ. Linking ClinicalTrials.gov and PubMed to track results of interventional human clinical trials. PloS one. 2013;8(7):e68409. https://doi.org/10.1371/journal.pone.0068409
Bashir R, Bourgeois FT, Dunn AG. A systematic review of the processes used to link clinical trial registrations to their published results. Syst Rev. 2017Dec;6:1–7. https://doi.org/10.1186/s13643-017-0518-3.
Article Google Scholar
Huser V, Cimino JJ. Precision and negative predictive value of links between ClinicalTrials.gov and PubMed. AMIA Annu Symp Proc. 2012;2012:400–8.
Goodwin TR, Skinner MA, Harabagiu SM. Automatically linking registered clinical trials to their published results with deep highway networks. AMIA summits on translational science proceedings. 2018;2018:54.
PubMed Central Google Scholar
Dunn AG, Coiera E, Bourgeois FT. Unreported links between trial registrations and published articles were identified using document similarity measures in a cross-sectional analysis of ClinicalTrials.gov. J clin epidemiol. 2018:95:94-101. https://doi.org/10.1016/j.jclinepi.2017.12.007
Altman DG, Furberg CD, Grimshaw JM, Shanahan DR. Linked publications from a single trial: a thread of evidence. Trials. 2014Dec;15:1–3. https://doi.org/10.1186/1745-6215-15-369.
Article Google Scholar
Pan E, Roberts K. Linking cancer clinical trials to their result publications. AMIA summits on translational science proceedings. 2024;2024:642.
PubMed Central Google Scholar
National Library of Medicine: National Center for Biotechnology Information, About PMC - PMC. https://www.ncbi.nlm.nih.gov/pmc/about/intro/ (2024). Accessed 07 Jul 2024.
National Library of Medicine: National Center for Biotechnology Information, FTP service - bulk download. https://ftp.ncbi.nlm.nih.gov/pub/pmc/ (2024). Accessed 07 Jul 2024.
Python Software Foundation. (2024). Python language reference, version 3.7. Available at http://www.python.org
National Library of Medicine: National Center for Biotechnology Information, ClinicalTrials.gov. https://clinicaltrials.gov/ (2024). Accessed 07 Jul 2024.
Cohen AM, Schneider J, Fu Y, McDonagh MS, Das P, Holt AW, Smalheiser NR. Fifty ways to tag your pubtypes: multi-tagger, a set of probabilistic publication type and study design taggers to support biomedical indexing and evidence-based medicine. medRxiv. 2021 Jul:2021–07. https://doi.org/10.1101/2021.07.13.21260468.
Smalheiser NR, Holt AW. A web-based tool for automatically linking clinical trials to their publications. J Am Med Inform Assoc. 2022May 1;29(5):822–30. https://doi.org/10.1093/jamia/ocab290.
Article PubMed PubMed Central Google Scholar
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF. Improving the quality of reporting of randomized controlled trials. The CONSORT statement JAMA. 1996Aug 28;276(8):637–9. https://doi.org/10.1001/jama.276.8.637.
Article CAS Google Scholar
Chan A-W, Tetzlaff JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-Jerić K, Hróbjartsson A, Mann H, Dickersin K, Berlin J, Doré C, Parulekar W, Summerskill W, Groves T, Schulz K, Sox H, Rockhold FW, Rennie D, Moher D. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med. 2013;158:200–7.
Article PubMed PubMed Central Google Scholar
OpenAlex https://openalex.org/
Menke, JD, Kilicoglu H, Smalheiser NR. Publication type tagging using transformer models and multi-label classification. 2024 AMIA Annu Symp Proc, in press.

Download references

Funding

Supported by NIH grant 1R01LM014292-01. Funder had no influence on the study, its design, or its publication.

Author information

Authors and Affiliations

Department of Psychiatry, University of Illinois College of Medicine, Chicago, IL, 60612, USA
Arthur M. Holt, Ang Michael Troy & Neil R. Smalheiser

Authors

Arthur M. Holt
View author publications
Search author on:PubMed Google Scholar
Ang Michael Troy
View author publications
Search author on:PubMed Google Scholar
Neil R. Smalheiser
View author publications
Search author on:PubMed Google Scholar

Contributions

Neil Smalheiser: conceptualization, methodology, writing—original draft, writing—review and editing, supervision, funding acquisition. Ang Michael Troy: data analysis. Arthur Holt: formal analysis, investigation, coding, writing—original draft, visualization.

Corresponding author

Correspondence to Neil R. Smalheiser.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Holt, A.M., Troy, A.M. & Smalheiser, N.R. Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types. Trials 26, 34 (2025). https://doi.org/10.1186/s13063-025-08741-w

Download citation

Received: 29 July 2024
Accepted: 26 January 2025
Published: 31 January 2025
DOI: https://doi.org/10.1186/s13063-025-08741-w

Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Full-text sources

Registry identifiers

XML scanning

Section identification

Analysis data preparation

Results

Publication types associated with registry mentions in different sections

Trials to publications model score evaluation

Relation of registry identifier mentions to CONSORT and SPIRIT guidelines

Discussion

Conclusions

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Trials

Contact us