nh3

Python bindings to the ammonia HTML sanitization library.

Installation

pip install nh3

Usage

Use clean() to sanitize HTML fragments:

>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'

It has many options to customize the sanitization, as documented below. For example, to only allow <b> tags:

>>> nh3.clean("<b><a href='https://example.com'>Hello</a></b>", tags={"b"})
'<b>Hello</b>'

API reference

Python bindings to the ammonia HTML sanitization library ( https://github.com/rust-ammonia/ammonia ).

nh3.clean(html, tags=None, clean_content_tags=None, attributes=None, attribute_filter=None, strip_comments=True, link_rel='noopener noreferrer', generic_attribute_prefixes=None, tag_attribute_values=None, set_tag_attribute_values=None, url_schemes=None)

Sanitize an HTML fragment according to the given options.

Parameters:
  • html (str) – Input HTML fragment

  • tags (set[str], optional) – Sets the tags that are allowed.

  • clean_content_tags (set[str], optional) – Sets the tags whose contents will be completely removed from the output.

  • attributes (dict[str, set[str]], optional) – Sets the HTML attributes that are allowed on specific tags, * key means the attributes are allowed on any tag.

  • attribute_filter (Callable[[str, str, str], str | None], optional) – Allows rewriting of all attributes using a callback. The callback takes name of the element, attribute and its value. Returns None to remove the attribute, or a value to use.

  • strip_comments (bool) – Configures the handling of HTML comments, defaults to True.

  • link_rel (str) –

    Configures a rel attribute that will be added on links, defaults to noopener noreferrer. To turn on rel-insertion, pass a space-separated list. If rel is in the generic or tag attributes, this must be set to None. Common rel values to include:

    • noopener: This prevents a particular type of XSS attack, and should usually be turned on for untrusted HTML.

    • noreferrer: This prevents the browser from sending the source URL to the website that is linked to.

    • nofollow: This prevents search engines from using this link for ranking, which disincentivizes spammers.

  • generic_attribute_prefixes (set[str], optional) – Sets the prefix of attributes that are allowed on any tag.

  • tag_attribute_values (dict[str, dict[str, set[str]]], optional) – Sets the values of HTML attributes that are allowed on specific tags. The value is structured as a map from tag names to a map from attribute names to a set of attribute values. If a tag is not itself whitelisted, adding entries to this map will do nothing.

  • set_tag_attribute_values (dict[str, dict[str, str]], optional) – Sets the values of HTML attributes that are to be set on specific tags. The value is structured as a map from tag names to a map from attribute names to an attribute value. If a tag is not itself whitelisted, adding entries to this map will do nothing.

  • url_schemes (set[str], optional) – Sets the URL schemes permitted on href and src attributes.

Returns:

Sanitized HTML fragment

Return type:

str

For example:

>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'
nh3.clean_text(html)

Turn an arbitrary string into unformatted HTML.

Roughly equivalent to Python’s html.escape() or PHP’s htmlspecialchars and htmlentities. Escaping is as strict as possible, encoding every character that has special meaning to the HTML parser.

Parameters:

html (str) – Input HTML fragment

Returns:

Cleaned text

Return type:

str

For example:

>>> nh3.clean_text('Robert"); abuse();//')
'Robert&quot;);&#32;abuse();&#47;&#47;'
nh3.is_html(html)

Determine if a given string contains HTML.

This function parses the full string and checks for any HTML syntax.

Note: This function will return True for strings that contain invalid HTML syntax like <g> and even Vec::<u8>::new().

Parameters:

html (str) – Input string

Return type:

bool

For example:

>>> nh3.is_html("plain text")
False
>>> nh3.is_html("<p>html!</p>")
True
nh3.ALLOWED_TAGS

The default set of tags allowed by clean(). Useful for customizing the default to add or remove some tags:

>>> tags = nh3.ALLOWED_TAGS - {"b"}
>>> nh3.clean("<b><i>yeah</i></b>", tags=tags)
'<i>yeah</i>'
nh3.ALLOWED_ATTRIBUTES

The default mapping of tags to allowed attributes for clean(). Useful for customizing the default to add or remove some attributes:

>>> from copy import deepcopy
>>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES)
>>> attributes["img"].add("data-invert")
>>> nh3.clean("<img src='example.jpeg' data-invert=true>", attributes=attributes)
'<img src="example.jpeg" data-invert="true">'