Predicted Potential for Aquatic Exposure Effects of Per- and Polyfluorinated Alkyl Substances (PFAS) in Pennsylvania’s Statewide Network of Streams
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation
2.1.1. Stream Surface Water PFAS Concentrations
2.1.2. Exposure Activity Ratios
2.1.3. Geospatial Predictors
2.2. Study Area
2.3. Machine Learning Models and Feature Importance Analysis
2.3.1. CNN Architecture
Feature Extraction via Convolutional Layers
Activation, Pooling Mechanisms, Dense Layers, and Regularization
Summary of CNN Architecture
2.3.2. Traditional Machine Learning Models
- Logistic Regression: A linear model (Cox [64]) that is highly valued for its simplicity and interpretability. It was implemented using the LogisticRegression class from Scikit-learn, configured to run for up to 1000 iterations to ensure convergence.
- Support Vector Machine (SVM): The SVM model (Cortes and Vapnik [65]) was configured with probability estimates enabled, which allowed for the calculation of the AUC-ROC scores—a metric that represents the model’s ability to distinguish between different classes. The use of a radial basis function (RBF) kernel was particularly important for capturing non-linear relationships within the data, making SVM an effective tool for handling complex datasets.
- Gradient Boosting: Implemented using the GradientBoostingClassifier (Friedman [61]), this ensemble method builds a series of decision trees, where each subsequent tree aims to correct the errors made by the previous ones. Gradient Boosting is known for its robustness in handling various data complexities.
- Random Forest: Another ensemble method, the Random Forest classifier (Ho [66]), constructs multiple decision trees during training and makes predictions based on the majority vote of these trees. Random Forest is particularly robust against overfitting due to its ensemble nature, which averages out the biases of individual trees.
2.3.3. Model Training
- -
- 3 Convolutional Stacks: with 512 filters in the first layer and 256 filters in subsequent layers.
- -
- 3 Fully Connected Layers: with 512 and 256 neurons in the first and second layers, respectively.
- -
- Other Hyperparameters: Kernel Sizes of 3 and 4, a Pooling Layer Size of 2, a Dropout Rate of 0.5, and a Batch Size of 6.
- -
- Logistic Regression: C = 1.0 (C is the inverse of the regularization strength), Penalty = L2
- -
- SVM: C = 10 (C is the regularization that controls the trade-off between maximizing margin and minimizing classification error), Kernel = RBF
- -
- Gradient Boosting: n_estimators = 200, Learning Rate = 0.1, max_depth = 3
- -
- Random Forest: n_estimators = 300, max_depth = 20, min_samples_split = 5
2.3.4. Feature Importance Using SHAP (SHapley Additive exPlanations)
- -
- Global interpretability to assess the overall importance of features across the dataset, identifying key drivers of PFAS bioeffect potential, such as land use or geologic patterns, which can inform broader land management strategies.
- -
- Local interpretability provides insights into individual predictions, highlighting how specific factors, such as rainfall intensity or nearby industrial activity, contribute to PFAS bioeffect potential in specific streams, which can inform targeted sampling efforts.
3. Results
3.1. PFAS Concentrations
3.2. Site-Wise PFAS Bioeffects Potential
3.3. Machine Learning Model Performances
3.4. CNN Predictions of PFAS Bioeffect Potential and SHAP Feature Importance
4. Discussion
4.1. In-Stream PFAS Concentrations and Exposure Activity Ratios
4.2. Machine Learning
4.2.1. Comparative Model Metric Implications
4.2.2. CNN and SHAP Implications
4.3. Limitations and Future Direction
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- De Silva, A.O.; Armitage, J.M.; Bruton, T.A.; Dassuncao, C.; Heiger-Bernays, W.; Hu, X.C.; Karrman, A.; Kelly, B.; Ng, C.; Robuck, A.; et al. PFAS exposure pathways for humans and wildlife: A synthesis of current knowledge and key gaps in understanding. Environ. Toxicol. Chem. 2021, 40, 631–657. [Google Scholar] [CrossRef] [PubMed]
- Sunderland, E.M.; Hu, X.C.; Dassuncao, C.; Tokranov, A.K.; Wagner, C.C.; Allen, J.G. A review of the pathways of human exposure to poly-and perfluoroalkyl substances (PFASs) and present understanding of health effects. J. Expo. Sci. Environ. Epidemiol. 2019, 29, 131–147. [Google Scholar] [CrossRef]
- Pennsylvania Fish and Boat Commission. Commonwealth of Pennsylvania Public Health Advisory 2022 Fish Consumption. 2022. Available online: https://www.pa.gov/agencies/dep/programs-and-services/water/clean-water/water-quality/fishconsumption-advisories.html (accessed on 20 December 2024).
- U.S. Environmental Protection Agency. Per- and Polyfluoroalkyl Substances (PFAS)|US EPA. 2024. Available online: https://www.epa.gov/sdwa/and-polyfluoroalkyl-substances-pfas (accessed on 27 September 2024).
- Ruffle, B.; Archer, C.; Vosnakis, K.; Butler, J.D.; Davis, C.W.; Goldsworthy, B.; Parkman, R.; Key, T.A. US and international per- and polyfluoroalkyl substances surface water quality criteria: A review of the status, challenges, and implications for use in chemical management and risk assessment. Integr. Environ. Assess. Manag. 2024, 20, 36–58. [Google Scholar] [CrossRef] [PubMed]
- Banyoi, S.M.; Porseryd, T.; Larsson, J.; Grahn, M.; Dinnétz, P. The effects of exposure to environmentally relevant PFAS concentrations for aquatic organisms at different consumer trophic levels: Systematic review and meta-analyses. Environ. Pollut. 2022, 315, 120422. [Google Scholar] [CrossRef]
- Hamed, M.; Vats, A.; Lim, I.E.; Sapkota, B.; Abdelmoneim, A. Effects of developmental exposure to individual and combined PFAS on development and behavioral stress responses in larval zebrafish. Environ. Pollut. 2024, 349, 123912. [Google Scholar] [CrossRef] [PubMed]
- Olker, J.H.; Elonen, C.M.; Pilli, A.; Anderson, A.; Kinziger, B.; Erickson, S.; Skopinski, M.; Pomplun, A.; LaLone, C.A.; Russom, C.L.; et al. The ECOTOXicology Knowledgebase: A Curated Database of Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment. Environ. Toxicol. Chem. 2022, 41, 1520–1539. [Google Scholar] [CrossRef] [PubMed]
- Stackpoole, S.M.; Shoda, M.E.; Medalie, L.; Stone, W.W. Pesticides in US Rivers: Regional differences in use, occurrence, and environmental toxicity, 2013 to 2017. Sci. Total Environ. 2021, 787, 147147. [Google Scholar] [CrossRef]
- Shoda, M.E.; Sprague, L.A.; Murphy, J.C.; Riskin, M.L. Water-quality trends in U.S. rivers, 2002 to 2012: Relations to levels of concern. Sci. Total Environ. 2019, 650, 2314–2324. [Google Scholar] [CrossRef] [PubMed]
- Dix, D.J.; Houck, K.A.; Martin, M.T.; Richard, A.M.; Setzer, R.W.; Kavlock, R.J. The ToxCast Program for Prioritizing Toxicity Testing of Environmental Chemicals. Toxicol. Sci. 2006, 95, 5–12. [Google Scholar] [CrossRef]
- Bradley, P.M.; Romanok, K.M.; Smalling, K.L.; Masoner, J.R.; Kolpin, D.W.; Gordon, S.E. Predicted aquatic exposure effects from a national urban stormwater study. Environ. Sci. Water Res. Technol. 2023, 9, 3191–3199. [Google Scholar] [CrossRef]
- Corsi, S.; Loken, L.; Ankley, G.; Alvarez, D.; Villeneuve, D. Potential for biological effects of PFAS in Great Lakes tributaries and associations with land cover and wastewater effluent. Environ. Toxicol. Chem. 2025, 809, 151003. [Google Scholar]
- DeCicco, L.; Corsi, S.; Villeneuve, D.; Blackwell, B.; Ankley, G. toxEval: Exploring Biological Relevance of Environmental Chemistry Observations. R Package Available at CRAN. R Package Version 1.3.2. 2024. Available online: https://CRAN.R-project.org/package=toxEval (accessed on 4 January 2024).
- U.S. Environmental Protection Agency. ToxCast & Tox21 Summary Files from Invitrodb v3.5. 2022. Available online: https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data (accessed on 2 July 2024).
- McMahon, P.B.; Tokranov, A.K.; Bexfield, L.M.; Lindsey, B.D.; Johnson, T.D.; Lombard, M.A.; Watson, E. Perfluoroalkyl and Polyfluoroalkyl Substances in Groundwater Used as a Source of Drinking Water in the Eastern United States. Environ. Sci. Technol. 2022, 56, 2279–2288. [Google Scholar] [CrossRef]
- Dong, X.; Zhang, Y.; Wang, J.; Li, M.; Wang, X.; Wang, Y. Prediction of 35 Target Per- and Polyfluoroalkyl Substances (PFASs) in California Groundwater Using Multilabel Semisupervised Machine Learning. Environ. Sci. Technol. 2023, 57, 3651–3660. [Google Scholar] [CrossRef]
- DeLuca, N.M.; Mullikin, A.; Brumm, P.; Rappold, A.G.; Cohen Hubal, E. Using geospatial data and random forest to predict PFAS contamination in fish tissue in the Columbia river basin, United States. Environ. Sci. Technol. 2023, 57, 14024–14035. [Google Scholar] [CrossRef] [PubMed]
- Kowalska, D.; Sosnowska, A.; Zdybel, S.; Stepnik, M.; Puzyn, T. Predicting bioconcentration factors (BCFs) for per-and polyfluoroalkyl substances (PFAS). Chemosphere 2024, 364, 143146. [Google Scholar] [CrossRef]
- Khaki, S.; Wang, L. Crop Yield Prediction Using Deep Neural Networks. Front. Plant Sci. 2019, 10, 621. [Google Scholar] [CrossRef]
- Pyo, J.; Park, L.J.; Pachepsky, Y.; Baek, S.S.; Kim, K.; Cho, K.H. Using convolutional neural network for predicting cyanobacteria concentrations in river water. Water Res. 2020, 186, 116349. [Google Scholar] [CrossRef]
- Gandhimathi, G.; Chellaswamy, C.; Selvan, T. Comprehensive river water quality monitoring using convolutional neural networks and gated recurrent units: A case study along the Vaigai River. J. Environ. Manag. 2024, 365, 121567. [Google Scholar]
- Pu, F.; Ding, C.; Chao, Z.; Yu, Y.; Xu, X. Water-quality classification of inland lakes using landsat8 images by convolutional neural networks. Remote Sens. 2019, 11, 1674. [Google Scholar] [CrossRef]
- Limbu, S.; Glasgow, E.; Block, T.; Dakshanamurthy, S. A Machine-Learning-Driven Pathophysiology-Based New Approach Method for the Dose-Dependent Assessment of Hazardous Chemical Mixtures and Experimental Validations. Toxics 2024, 12, 481. [Google Scholar] [CrossRef] [PubMed]
- Feinstein, J.; Sivaraman, G.; Picel, K.; Peters, B.; Vázquez-Mayagoitia, Á.; Ramanathan, A.; MacDonell, M.; Foster, I.; Yan, E. Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity. J. Chem. Inf. Model. 2021, 61, 5996–6006. [Google Scholar] [CrossRef] [PubMed]
- U.S. Environmental Protection Agency. Enforcement and Compliance History Online (ECHO) PFAS Analytic Tools. 2024. Available online: https://echo.epa.gov/trends/pfas-tools (accessed on 2 April 2024).
- U.S. Geological Survey. USGS Water Data for the Nation: U.S. Geological Survey National Water Information System Database; U.S. Geological Survey: Reston, VA, USA, 2024. [CrossRef]
- Roberts, D. The Sprink Creek Watersehd Atlas, PFAS Survey Data. 2024. Available online: https://www.springcreekwatershedatlas.org/post/pfas-in-the-spring-creek-and-bald-eagle-creek-watersheds (accessed on 25 October 2024).
- Breitmeyer, S.E.; Williams, A.M.; Duris, J.W.; Eicholtz, L.W.; Shull, D.R.; Wertz, T.A.; Woodward, E.E. Per- and polyfluorinated alkyl substances (PFAS) in Pennsylvania surface waters: A statewide assessment, associated sources, and land-use relations. Sci. Total Environ. 2023, 888, 164161. [Google Scholar] [CrossRef]
- McKay, L.; Bondelid, T.; Dewald, T.; Johnston, J.; Moore, R.; Rea, A.U.S. Geological Survey NHDPlusV2 User Guide. 2012. Available online: https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus (accessed on 10 June 2024).
- R Core Team. R: A Language and Environment for Statistical Computing. 2024. Available online: https://www.R-project.org/ (accessed on 4 January 2024).
- Blackwell, B.R.; Ankley, G.T.; Corsi, S.R.; DeCicco, L.A.; Houck, K.A.; Judson, R.S.; Li, S.; Martin, M.T.; Murphy, E.; Schroeder, A.L.; et al. An “EAR” on environmental surveillance and monitoring: A case study on the use of exposure–activity ratios (EARs) to prioritize sites, chemicals, and bioactivities of concern in Great Lakes waters. Environ. Sci. Technol. 2017, 51, 8713–8724. [Google Scholar] [CrossRef] [PubMed]
- Filer, D.L.; Kothiya, P.; Setzer, R.W.; Judson, R.S.; Martin, M.T. tcpl: The ToxCast pipeline for high-throughput screening data. Bioinformatics 2016, 33, 618–620. [Google Scholar] [CrossRef] [PubMed]
- Fay, K.A.; Villeneuve, D.L.; Swintek, J.; Edwards, S.W.; Nelms, M.D.; Blackwell, B.R.; Ankley, G.T. Differentiating pathway-specific from nonspecific effects in high-throughput toxicity data: A foundation for prioritizing adverse outcome pathway development. Toxicol. Sci. 2018, 163, 500–515. [Google Scholar] [CrossRef]
- Corsi, S.R.; De Cicco, L.A.; Villeneuve, D.L.; Blackwell, B.R.; Fay, K.A.; Ankley, G.T.; Baldwin, A.K. Prioritizing chemicals of ecological concern in Great Lakes tributaries using high-throughput screening data and adverse outcome pathways. Sci. Total Environ. 2019, 686, 995–1009. [Google Scholar] [CrossRef]
- Judson, R.; Richard, A.; Dix, D.J.; Houck, K.; Martin, M.; Kavlock, R.; Dellarco, V.; Henry, T.; Holderman, T.; Sayre, P.; et al. The Toxicity Data Landscape for Environmental Chemicals. Environ. Health Perspect. 2009, 117, 685–695. [Google Scholar] [CrossRef]
- U.S. Geological Survey. Watershed Boundary Dataset (WBD). 2021. Available online: https://prd-tnm.s3.amazonaws.com/index.html?prefix=StagedProducts/Hydrography/WBD/National/ (accessed on 5 July 2024).
- Jones, J.; Doctor, D.; Wood, N.; Falgout, J.; Rapstine, N. Closed Depression Density in Karst Regions of the Conterminous United States: Features and Grid Data; U.S. Geological Survey Data Release; U.S. Geological Survey: Reston, VA, USA, 2021. [CrossRef]
- Blodgett, D.; Johnson, M. nhdplusTools: Tools for Accessing and Working with the NHDPlus; U.S. Geological Survey Software Release; U.S. Geological Survey: Reston, VA, USA, 2023. [CrossRef]
- Wieferich, D.; Gressler, B.; Krause, K.; Wieczorek, M.; McDonald, S. xstrm local; U.S. Geological Survey Software Release; U.S. Geological Survey: Reston, VA, USA, 2022. [CrossRef]
- Wieczorek, M.; Jackson, S.; Schwarz, G. Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States (Ver. 4.0, August 2023); U.S. Geological Survey Data Release; U.S. Geological Survey: Reston, VA, USA, 2018. [CrossRef]
- Peterson, R.A.; Cavanaugh, J.E. Ordered quantile normalization: A semiparametric transformation built for the cross-validation era. J. Appl. Stat. 2020, 47, 2312–2327. [Google Scholar] [CrossRef]
- U.S. Environmental Protection Agency. Level III and IV Ecoregions of the Continental United States. U.S. EPA Office of Research & Development (ORD)—National Health and Environmental Effects Research Laboratory (NHEERL). 2010. Available online: https://www.epa.gov/eco-research/level-iii-and-iv-ecoregions-continental-united-states (accessed on 2 April 2024).
- Alvarez, D.A.; Corsi, S.R.; De Cicco, L.A.; Villeneuve, D.L.; Baldwin, A.K. Identifying chemicals and mixtures of potential biological concern detected in passive samplers from Great Lakes tributaries using high-throughput data and biological pathways. Environ. Toxicol. Chem. 2021, 40, 2165–2182. [Google Scholar] [CrossRef]
- Oliver, S.K.; Corsi, S.R.; Baldwin, A.K.; Nott, M.A.; Ankley, G.T.; Blackwell, B.R.; Villeneuve, D.L.; Hladik, M.L.; Kolpin, D.W.; Loken, L.; et al. Pesticide prioritization by potential biological effects in tributaries of the Laurentian Great Lakes. Environ. Toxicol. Chem. 2023, 42, 367–384. [Google Scholar] [CrossRef] [PubMed]
- Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
- Alpaydin, E. Introduction to Machine Learning, 4th ed.; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, B.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Chen, J. Recent Advances in Convolutional Neural Networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25, pp. 1097–1105. [Google Scholar]
- Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: Toxicity Prediction Using Deep Learning. Front. Environ. Sci. 2016, 3, 80. [Google Scholar] [CrossRef]
- O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015. [Google Scholar] [CrossRef]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 7–9 July 2015; pp. 2048–2057. [Google Scholar]
- Rashid, R.; Ahmed, K.; Anwar, W.; Ali, H. XTox: Toxicity Prediction Using Shallow Learning Models. Comput. Chem. Eng. 2019, 125, 191–199. [Google Scholar] [CrossRef]
- Li, J.; Monroe, W.; Jurafsky, D. Visualizing and Understanding Neural Models in NLP. In Proceedings of the 5th Workshop on Vision and Language, Berlin, Germany, 12 August 2016; pp. 57–65. [Google Scholar]
- Wu, Z.; Zhang, F.; Pang, X.; Wu, X.; Cao, W.; Liu, R. Convolutional Neural Networks for Toxicity Prediction. J. Chem. Inf. Model. 2018, 58, 1553–1560. [Google Scholar] [CrossRef] [PubMed]
- Han, H.; Li, Y.; Zhu, X. Convolutional neural network learning for generic data classification. Inf. Sci. 2019, 477, 448–465. [Google Scholar] [CrossRef]
- Vieira, V.M.; Hoffman, K.; Shin, H.M.; Weinberg, J.M.; Webster, T.F.; Fletcher, T. Perfluorooctanoic Acid Exposure and Cancer Outcomes in a Contaminated Community: A Geographic Analysis. Environ. Health Perspect. 2013, 121, 318–323. [Google Scholar] [CrossRef]
- Nguyen, T.V.; Reinhard, M.; Gin, K.Y.H. Sorption equilibria of perfluoroalkyl acids between sediment and water: Influence of sediment organic carbon and molecular structure. J. Hazard. Mater. 2016, 320, 540–549. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2000, 29, 1189–1232. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, Z.; Wang, X. XGBoost model as an efficient machine learning approach for PFAS removal: Effects of material characteristics and operation conditions. Environ. Res. 2022, 204, 112314. [Google Scholar] [CrossRef]
- Shin, H.M.; Vieira, V.M.; Ryan, P.B.; Detwiler, R.; Sanders, B.; Steenland, K.; Bartell, S.M. Environmental fate and transport modeling for perfluorooctanoic acid emitted from the Washington Works Facility in West Virginia. Environ. Sci. Technol. 2011, 45, 1435–1442. [Google Scholar] [CrossRef] [PubMed]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B (Methodol.) 1958, 20, 215–232. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 1, pp. 278–282. [Google Scholar]
- Python Software Foundation. Python 3.11. 2023. Available online: https://www.python.org/downloads/release/python-3110/ (accessed on 10 May 2024).
- Sarkar, D.; Bali, R.; Ghosh, T. Hands-On Transfer Learning with Python: Implement Advanced Deep Learning and Neural Network Models Using TensorFlow and Keras; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: http://tensorflow.org/ (accessed on 5 July 2024).
- Head, T.; Cherti, M.; Pedregosa, F.; Zhdanov, M.; Louppe, G.; Raffel, C.; Mueller, A.; Fauchere, N.; McInnes, L.; Grisel, O. Scikit-Optimize: Sequential Model-Based Optimization with Scikit-Learn. 2018. Available online: https://scikit-optimize.github.io/ (accessed on 22 November 2024).
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable; Lulu.com: Morrisville, NC, USA, 2019. [Google Scholar]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
- Mihaljevic, I.; Vujica, L.; Dragojavic, J.; Loncar, J.; Smital, T. Differential Toxicity of Perfluorooctane Sulfonate (PFOS) in Wild-Type and Oatp1d1 Mutant Zebrafish Embryos. bioRxiv 2024. [Google Scholar] [CrossRef]
- Geslin, M.; Auperin, B. Relationship between changes in mRNAs of the genes encoding steroidogenic acute regulatory protein and P450 cholesterol side chain cleavage in head kidney and plasma levels of cortisol in response to different kinds of acute stress in the rainbow trout (Oncorhynchus mykiss). Gen. Comp. Endocrinol. 2004, 135, 70–80. [Google Scholar]
- Kavlock, R.; Chandler, K.; Houck, K.; Hunter, S.; Judson, R.; Kleinstreuer, N.; Knudsen, T.; Martin, M.; Padilla, S.; Reif, D.; et al. Update on EPA’s ToxCast program: Providing high throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 2012, 25, 1287–1302. [Google Scholar] [CrossRef] [PubMed]
- Bylund, J.; Ericsson, J.; Oliw, E.H. Analysis of cytochrome P450 metabolites of arachidonic and linoleic acids by liquid chromatography–mass spectrometry with ion trap MS2. Anal. Biochem. 1998, 265, 55–68. [Google Scholar] [CrossRef]
- Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Tokranov, A.K.; Ransom, K.M.; Bexfield, L.M.; Lindsey, B.D.; Watson, E.; Dupuy, D.I.; Stackelberg, P.E.; Fram, M.S.; Voss, S.A.; Kingsbury, J.A.; et al. Predictions of groundwater PFAS occurrence at drinking water supply depths in the United States. Science 2024, 386, 748–755. [Google Scholar] [CrossRef]
- Barber, L.B.; Keefe, S.H.; Brown, G.K.; Furlong, E.T.; Gray, J.L.; Kolpin, D.W.; Meyer, M.T.; Sandstrom, M.W.; Zaugg, S.D. Persistence and potential effects of complex organic contaminant mixtures in wastewater-impacted streams. Environ. Sci. Technol. 2013, 47, 2177–2188. [Google Scholar] [CrossRef] [PubMed]
- Smalling, K.L.; Romanok, K.M.; Bradley, P.M.; Morriss, M.C.; Gray, J.L.; Kanagy, L.K.; Gordon, S.E.; Williams, B.M.; Breitmeyer, S.E.; Jones, D.K.; et al. Per-and polyfluoroalkyl substances (PFAS) in United States tapwater: Comparison of underserved private-well and public-supply exposures and associated health implications. Environ. Int. 2023, 178, 108033. [Google Scholar] [CrossRef] [PubMed]
- Woodward, E.E.; Senior, L.A.; Fleck, J.A.; Barber, L.B.; Hansen, A.M.; Duris, J.W. Using a Time-of-Travel Sampling Approach to Quantify Per-and Polyfluoroalkyl Substances (PFAS) Stream Loading and Source Inputs in a Mixed-Source, Urban Catchment. ACS ES&T Water 2024, 4, 4356–4367. [Google Scholar]
- Viticoski, R.L.; Wang, D.; Feltman, M.A.; Mulabagal, V.; Rogers, S.R.; Blersch, D.M.; Hayworth, J.S. Spatial distribution and mass transport of Perfluoroalkyl Substances (PFAS) in surface water: A statewide evaluation of PFAS occurrence and fate in Alabama. Sci. Total Environ. 2022, 836, 155524. [Google Scholar] [CrossRef] [PubMed]
- Imbrigiotta, T.E.; Fiore, A.R. Distribution of Chlorinated Volatile Organic Compounds and Per- and Polyfluoroalkyl Substances in Monitoring Wells at the Former Naval Air Warfare Center, West Trenton, New Jersey, 2014–17; Technical Report; U.S. Geological Survey: Reston, VA, USA, 2021.
- Kolpin, D.W.; Hubbard, L.E.; Cwiertny, D.M.; Meppelink, S.M.; Thompson, D.A.; Gray, J.L. A comprehensive statewide spatiotemporal stream assessment of per-and polyfluoroalkyl substances (PFAS) in an agricultural region of the United States. Environ. Sci. Technol. Lett. 2021, 8, 981–988. [Google Scholar] [CrossRef]
- Kurwadkar, S.; Dane, J.; Kanel, S.R.; Nadagouda, M.N.; Cawdrey, R.W.; Ambade, B.; Struckhoff, G.C.; Wilkin, R. Per-and polyfluoroalkyl substances in water and wastewater: A critical review of their global occurrence and distribution. Sci. Total Environ. 2022, 809, 151003. [Google Scholar] [CrossRef]
- Pfotenhauer, D.; Sellers, E.; Olson, M.; Praedel, K.; Shafer, M. PFAS concentrations and deposition in precipitation: An intensive 5-month study at National Atmospheric Deposition Program–National trends sites (NADP-NTN) across Wisconsin, USA. Atmos. Environ. 2022, 291, 119368. [Google Scholar] [CrossRef]
- Pike, K.A.; Edmiston, P.L.; Morrison, J.J.; Faust, J.A. Correlation analysis of perfluoroalkyl substances in regional US precipitation events. Water Res. 2021, 190, 116685. [Google Scholar] [CrossRef]
- Martinez, B.; Da Silva, B.F.; Aristizabal-Henao, J.J.; Denslow, N.D.; Osborne, T.Z.; Morrison, E.S.; Bianchi, T.S.; Bowden, J.A. Increased levels of perfluorooctanesulfonic acid (PFOS) during Hurricane Dorian on the east coast of Florida. Environ. Res. 2022, 208, 112635. [Google Scholar] [CrossRef]
- Pennsylvania Department of Environmental Protection. PA Water Use Annual Summary Report. Commonwealth of Pennsylvania. Available online: https://www.pa.gov/agencies/dep/data-and-tools/reports/water-reports.html (accessed on 27 September 2024).
- Adu, O.; Ma, X.; Sharma, V.K. Bioavailability, phytotoxicity and plant uptake of per-and polyfluoroalkyl substances (PFAS): A review. J. Hazard. Mater. 2023, 447, 130805. [Google Scholar] [CrossRef] [PubMed]
- Johnson, G.R. PFAS in soil and groundwater following historical land application of biosolids. Water Res. 2022, 211, 118035. [Google Scholar] [CrossRef] [PubMed]
- Pepper, I.L.; Brusseau, M.L.; Prevatt, F.J.; Escobar, B.A. Incidence of Pfas in soil following long-term application of class B biosolids. Sci. Total Environ. 2021, 793, 148449. [Google Scholar] [CrossRef]
- Caniglia, J.; Snow, D.D.; Messer, T.; Bartelt-Hunt, S. Extraction, analysis, and occurrence of per-and polyfluoroalkyl substances (PFAS) in wastewater and after municipal biosolids land application to determine agricultural loading. Front. Water 2022, 4, 892451. [Google Scholar] [CrossRef]
- University of Massachusetts Amherst. Manure Application on Hay Fields. Online Resource. 2024. Available online: https://ag.umass.edu/crops-dairy-livestock-equine/fact-sheets/manure-application-on-hay-fields (accessed on 25 November 2024).
- White, E.L.; Aron, G.; White, W.B. The influence of urbanization of sinkhole development in central Pennsylvania. Environ. Geol. Water Sci. 1986, 8, 91–97. [Google Scholar] [CrossRef]
- Alimi, O.S.; Farner Budarz, J.; Hernandez, L.M.; Tufenkji, N. Microplastics and nanoplastics in aquatic environments: Aggregation, deposition, and enhanced contaminant transport. Environ. Sci. Technol. 2018, 52, 1704–1724. [Google Scholar] [CrossRef] [PubMed]
- Santhanam, S.D.; Ramamurthy, K.; Priya, P.S.; Sudhakaran, G.; Guru, A.; Arockiaraj, J. A combinational threat of micro-and nano-plastics (MNPs) as potential emerging vectors for per-and polyfluoroalkyl substances (PFAS) to human health. Environ. Monit. Assess. 2024, 196, 1182. [Google Scholar] [CrossRef]
- Sarkar, S.; Diab, H.; Thompson, J. Microplastic pollution: Chemical characterization and impact on wildlife. Int. J. Environ. Res. Public Health 2023, 20, 1745. [Google Scholar] [CrossRef]
- Baldwin, A.K.; Corsi, S.R.; De Cicco, L.A.; Lenaker, P.L.; Lutz, M.A.; Sullivan, D.J.; Richards, K.D. Organic contaminants in Great Lakes tributaries: Prevalence and potential aquatic toxicity. Sci. Total Environ. 2016, 554, 42–52. [Google Scholar] [CrossRef] [PubMed]
- Meng, L.; Zhou, B.; Liu, H.; Chen, Y.; Yuan, R.; Chen, Z.; Luo, S.; Chen, H. Advancing toxicity studies of per-and poly-fluoroalkyl substances (pfass) through machine learning: Models, mechanisms, and future directions. Sci. Total Environ. 2024, 174201. [Google Scholar] [CrossRef] [PubMed]
- Interstate Technology and Regulatory Council. PFAS Technical and Regulatory Guidance Document and Fact Sheets PFAS. 2020. Available online: https://pfas-1.itrcweb.org/ (accessed on 27 November 2024).
- Schwichtenberg, T.; Bogdan, D.; Carignan, C.C.; Reardon, P.; Rewerts, J.; Wanzek, T.; Field, J.A. PFAS and dissolved organic carbon enrichment in surface water foams on a northern US freshwater lake. Environ. Sci. Technol. 2020, 54, 14455–14464. [Google Scholar] [CrossRef] [PubMed]
PFAS Bioeffect Potential | EAR Range | Training Site Count | Validation Site Count |
---|---|---|---|
None to less | <0.00002 | 84 | 10 |
Existent | 0.00002–0.001 | 84 | 9 |
Greater | 0.001–0.04 | 84 | 9 |
Layer Aspect | Single Convolutional Layer | Dual Convolutional Layers |
---|---|---|
Feature depth | Captures basic patterns | Detects more complex, higher-level patterns |
Receptive field | Limited to small portions | Expands to cover larger portions of input |
Feature abstraction | Tied closely to raw input | Produces abstract, high-level representations |
Learning capacity | Limited due to fewer parameters | Increased, allowing modeling of complex relationships |
Component | Configuration | Role | Significance |
---|---|---|---|
Input Layer | 1D input vector | Prepares predictors for analysis | Maintains predictor data order, crucial for detecting patterns |
First Convolutional Layer | Filters, ReLU activation | Extracts basic patterns, such as geologic and climatic characteristics | Captures fundamental geospatial signatures |
Second Convolutional Layer | Filters, ReLU activation | Detects complex patterns | Identifies complex relationships between predictor variables |
MaxPooling Layers | Pooling size | Reduces data dimensionality, retaining significant features | Focuses on critical features, reduces noise |
First Dense Layer | Neurons, ReLU activation | Refines features into high-level representations | Synthesizes patterns into a cohesive understanding |
Second Dense Layer | Neurons, ReLU activation | Enhances classification accuracy | Ensures accurate and nuanced classifications |
Dropout Layers | Dropout rate | Prevents overfitting | Ensures generalization across diverse conditions |
Final Output Layer | 3 neurons, softmax activation | Outputs classification into impact labels | Provides clear, interpretable classifications |
Chemical | Chemical Group | Detection Frequency (DF) | Concentration (Range) Median, ng/L | Concentration Interquartile Range, ng/L | ToxCAST Bioassay(s) Available |
---|---|---|---|---|---|
PFAS | na | 81% | (nd–268) 10.1 | 31.8 | No |
PFOA | PFCA | 75% | (nd–25.0) 1.7 | 4.0 | Yes |
PFHxA | PFCA | 67% | (nd–20.0) 1.5 | 3.9 | Yes |
PFOS | PFSA | 59% | (nd–84.0) 1.1 | 3.3 | Yes |
PFPeA | PFCA | 57% | (nd–29.0) 1.8 | 5.1 | Yes |
PFBS | PFSA | 55% | (nd–53.6) 0.9 | 3.1 | Yes |
PFHpA | PFCA | 49% | (nd–9.6) nd | 2.0 | Yes |
PFBA | PFCA | 45% | (nd–17.0) nd | 4.4 | Yes |
PFHxS | PFSA | 42% | (nd–61.0) nd | 1.6 | Yes |
PFNA | PFCA | 29% | (nd–12.0) nd | 0.8 | Yes |
6:2 FTS | Precursor/Other | 9% | (nd–120) nd | 0.0 | Yes |
PFPeS | PFSA | 6% | (nd–9.4) nd | 0.0 | No |
PFDA | PFCA | 5% | (nd–1.9) nd | 0.0 | Yes |
PFHpS | PFSA | 3% | (nd–1.9) nd | 0.0 | Yes |
PFOSA | Precursor/Other | 2% | (nd–1.2) nd | 0.0 | Yes |
PFUnDA | PFCA | 0.01 | (nd–1.5) nd | 0.0 | Yes |
HFPO-DA | Precursor/Other | 1% | (nd–5.9) nd | 0.0 | No |
FPePA | PFCA | <1% | (nd–34.7) nd | 0.0 | No |
N-EtFOSAA | Precursor/Other | <1% | (nd–0.6) nd | 0.0 | No |
N-MeFOSAA | Precursor/Other | <1% | (nd–1.6) nd | 0.0 | No |
Predictor | Count | Importance Score (Mean ± Std. Dev) | Description |
---|---|---|---|
Wet Deposition Ammonia from Manure | 25 | 0.014 ± 0.007 | The fraction of the total ammonia wet deposition due to emissions from animal manure |
Rain Event Intensity | 25 | 0.013 ± 0.007 | Annual average (1981–2010) of daily intensity of precipitation for a rain event where there are consecutive days with precipitation ≥ 1 mm |
Freshwater Withdrawals | 24 | 0.013 ± 0.007 | County-level estimates of freshwater withdrawals from 1995–2000 |
Non-Alfalfa Hay | 22 | 0.013 ± 0.009 | Any type of hay crop that is not alfalfa. Can include grasses, legumes, and forbs |
Industrial/Military | 21 | 0.012 ± 0.008 | Includes heavy and light industry, seaports/ harbors, manufacturing, mills/factories, utilities, waste/recycling/ landfills, energy production, warehousing/distribution, water-management features, major communication facilities, and military bases |
Sand | 20 | 0.013 ± 0.010 | Average percent of sand in soil |
Commercial/Services | 19 | 0.012 ± 0.009 | Includes retail stores, shopping centers, office buildings, commercial zones, professional services and organizations, universities, schools, hospitals, churches, prisons, police and fire stations, and so on |
High Urban Interface | 19 | 0.012 ± 0.009 | Land in an urban area with a housing density > 500 or in or near an urban core area. Probable medium to high anthropogenic influence |
Residual Carbonates | 18 | 0.014 ± 0.009 | Residual surficial materials developed in carbonate rocks, discontinuous or patchy in distribution |
Sinkholes | 16 | 0.020 ± 0.014 | Mean sinkhole density, often found in karst landscapes characterized by limestone/dolomite bedrock that is susceptible to dissolution by water |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Breitmeyer, S.E.; Williams, A.M.; Conlon, M.D.; Wertz, T.A.; Heflin, B.C.; Shull, D.R.; Duris, J.W. Predicted Potential for Aquatic Exposure Effects of Per- and Polyfluorinated Alkyl Substances (PFAS) in Pennsylvania’s Statewide Network of Streams. Toxics 2024, 12, 921. https://doi.org/10.3390/toxics12120921
Breitmeyer SE, Williams AM, Conlon MD, Wertz TA, Heflin BC, Shull DR, Duris JW. Predicted Potential for Aquatic Exposure Effects of Per- and Polyfluorinated Alkyl Substances (PFAS) in Pennsylvania’s Statewide Network of Streams. Toxics. 2024; 12(12):921. https://doi.org/10.3390/toxics12120921
Chicago/Turabian StyleBreitmeyer, Sara E., Amy M. Williams, Matthew D. Conlon, Timothy A. Wertz, Brian C. Heflin, Dustin R. Shull, and Joseph W. Duris. 2024. "Predicted Potential for Aquatic Exposure Effects of Per- and Polyfluorinated Alkyl Substances (PFAS) in Pennsylvania’s Statewide Network of Streams" Toxics 12, no. 12: 921. https://doi.org/10.3390/toxics12120921
APA StyleBreitmeyer, S. E., Williams, A. M., Conlon, M. D., Wertz, T. A., Heflin, B. C., Shull, D. R., & Duris, J. W. (2024). Predicted Potential for Aquatic Exposure Effects of Per- and Polyfluorinated Alkyl Substances (PFAS) in Pennsylvania’s Statewide Network of Streams. Toxics, 12(12), 921. https://doi.org/10.3390/toxics12120921