CVE-2024-5206: Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

Q: What is the severity of CVE-2024-5206?

CVE-2024-5206 is classified as a sensitive data leakage vulnerability.

Q: How do I fix CVE-2024-5206?

To fix CVE-2024-5206, upgrade scikit-learn to version 1.5.0 or later.

Q: Which versions of scikit-learn are affected by CVE-2024-5206?

CVE-2024-5206 affects versions of scikit-learn up to and including 1.4.1.post1.

Q: What products are impacted by CVE-2024-5206?

CVE-2024-5206 impacts IBM Cloud Pak for Security and IBM QRadar Suite Software in specific versions.

Q: Is CVE-2024-5206 a known issue in libraries?

Yes, CVE-2024-5206 is a known issue in the scikit-learn library regarding token storage.

Published Jun 6, 2024

Updated

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stopwords attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the stopwords attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

Other sources

scikit-learn could allow a remote authenticated attacker to obtain sensitive information, caused by an unexpected storage of all tokens present in the training data within the stopwords attribute. By sending a specially crafted request, an attacker could exploit this vulnerability to obtain passwords or keys information, and use this information to launch further attacks against the affected system.

— IBM

Affected Software

2 affected componentsFixes available

pip/scikit-learn<1.5.0

1.5.0

scikit-learn Scikit-learn Python<1.5.0

Remediation

Recommended actions to resolve this vulnerability, in priority order.

Upgrade
Upgrade pip/scikit-learn to a version that resolves this vulnerability.
Fixed in 1.5.0
Upgrade
Upgrade scikit-learn/sklearn.feature_extraction.text.TfidfVectorizer to a version that resolves this vulnerability.
Fixed in 1.5.0

Event History

Jun 6, 2024

CVE Published

via MITRE·06:28 PM

Data Sourced

via MITRE·06:28 PM

DescriptionSeverityWeakness

Data Sourced

via NVD·07:16 PM

RemedyDescriptionSeverityWeaknessAffected Software

Advisory Published

via GitHub·09:30 PM

May 2, 2025

Data Sourced

via IBM·12:00 AM

DescriptionAffected Software

Parent advisories

This vulnerability appears in the following advisories.

IBM-7232197

Frequently Asked Questions

What is the severity of CVE-2024-5206?

CVE-2024-5206 is classified as a sensitive data leakage vulnerability.

How do I fix CVE-2024-5206?

To fix CVE-2024-5206, upgrade scikit-learn to version 1.5.0 or later.

Which versions of scikit-learn are affected by CVE-2024-5206?

CVE-2024-5206 affects versions of scikit-learn up to and including 1.4.1.post1.

What products are impacted by CVE-2024-5206?

CVE-2024-5206 impacts IBM Cloud Pak for Security and IBM QRadar Suite Software in specific versions.

Is CVE-2024-5206 a known issue in libraries?

Yes, CVE-2024-5206 is a known issue in the scikit-learn library regarding token storage.

CVE-2024-5206: Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

Other sources

Affected Software

Remediation

Event History

Parent advisories

Don't miss critical vulnerabilities

Frequently Asked Questions

What is the severity of CVE-2024-5206?

How do I fix CVE-2024-5206?

Which versions of scikit-learn are affected by CVE-2024-5206?

What products are impacted by CVE-2024-5206?

Is CVE-2024-5206 a known issue in libraries?

Company

Resources

Contact