CVE-2024-5206: Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

Published Jun 6, 2024
·
Updated

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stopwords attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stopwords attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

Other sources

scikit-learn could allow a remote authenticated attacker to obtain sensitive information, caused by an unexpected storage of all tokens present in the training data within the stopwords attribute. By sending a specially crafted request, an attacker could exploit this vulnerability to obtain passwords or keys information, and use this information to launch further attacks against the affected system.

IBM

Affected Software

2 affected componentsFixes available
pip/scikit-learn<1.5.0
1.5.0
scikit-learn Scikit-learn Python<1.5.0

Event History

Jun 6, 2024
CVE Published
via MITRE·06:28 PM
Data Sourced
via MITRE·06:28 PM
DescriptionSeverityWeakness
Advisory Published
via GitHub·09:30 PM
May 2, 2025
Data Sourced
via IBM·12:00 AM
DescriptionAffected Software

Parent advisories

This vulnerability appears in the following advisories.

Free Weekly Intel

Don't miss critical vulnerabilities

Join thousands of security professionals who receive our weekly digest of trending CVEs, zero-days, and exploited vulnerabilities.

No spam. Unsubscribe anytime.

Frequently Asked Questions

1

What is the severity of CVE-2024-5206?

CVE-2024-5206 is classified as a sensitive data leakage vulnerability.

2

How do I fix CVE-2024-5206?

To fix CVE-2024-5206, upgrade scikit-learn to version 1.5.0 or later.

3

Which versions of scikit-learn are affected by CVE-2024-5206?

CVE-2024-5206 affects versions of scikit-learn up to and including 1.4.1.post1.

4

What products are impacted by CVE-2024-5206?

CVE-2024-5206 impacts IBM Cloud Pak for Security and IBM QRadar Suite Software in specific versions.

5

Is CVE-2024-5206 a known issue in libraries?

Yes, CVE-2024-5206 is a known issue in the scikit-learn library regarding token storage.

Contact

SecAlerts Pty Ltd.
132 Wickham Terrace
Fortitude Valley,
QLD 4006, Australia
info@secalerts.co
By using SecAlerts services, you agree to our services end-user license agreement. This website is safeguarded by reCAPTCHA and governed by the Google Privacy Policy and Terms of Service. All names, logos, and brands of products are owned by their respective owners, and any usage of these names, logos, and brands for identification purposes only does not imply endorsement. If you possess any content that requires removal, please get in touch with us.
© 2026 SecAlerts Pty Ltd.
ABN: 70 645 966 203, ACN: 645 966 203
CVE-2024-5206 - Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn - SecAlerts