nh3¶
Python bindings to the ammonia HTML sanitization library.
Installation¶
pip install nh3
Usage¶
Use clean()
to sanitize HTML fragments:
>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'
It has many options to customize the sanitization, as documented below.
For example, to only allow <b>
tags:
>>> nh3.clean("<b><a href='https://example.com'>Hello</a></b>", tags={"b"})
'<b>Hello</b>'
API reference¶
Python bindings to the ammonia HTML sanitization library ( https://github.com/rust-ammonia/ammonia ).
- nh3.clean(html, tags=None, clean_content_tags=None, attributes=None, attribute_filter=None, strip_comments=True, link_rel='noopener noreferrer', generic_attribute_prefixes=None, tag_attribute_values=None, set_tag_attribute_values=None, url_schemes=None)¶
Sanitize an HTML fragment according to the given options.
- Parameters:
html (
str
) – Input HTML fragmenttags (
set[str]
, optional) – Sets the tags that are allowed.clean_content_tags (
set[str]
, optional) – Sets the tags whose contents will be completely removed from the output.attributes (
dict[str, set[str]]
, optional) – Sets the HTML attributes that are allowed on specific tags,*
key means the attributes are allowed on any tag.attribute_filter (
Callable[[str, str, str], str | None]
, optional) – Allows rewriting of all attributes using a callback. The callback takes name of the element, attribute and its value. ReturnsNone
to remove the attribute, or a value to use.strip_comments (
bool
) – Configures the handling of HTML comments, defaults toTrue
.link_rel (
str
) –Configures a
rel
attribute that will be added on links, defaults tonoopener noreferrer
. To turn on rel-insertion, pass a space-separated list. Ifrel
is in the generic or tag attributes, this must be set toNone
. Commonrel
values to include:noopener
: This prevents a particular type of XSS attack, and should usually be turned on for untrusted HTML.noreferrer
: This prevents the browser from sending the source URL to the website that is linked to.nofollow
: This prevents search engines from using this link for ranking, which disincentivizes spammers.
generic_attribute_prefixes (
set[str]
, optional) – Sets the prefix of attributes that are allowed on any tag.tag_attribute_values (
dict[str, dict[str, set[str]]]
, optional) – Sets the values of HTML attributes that are allowed on specific tags. The value is structured as a map from tag names to a map from attribute names to a set of attribute values. If a tag is not itself whitelisted, adding entries to this map will do nothing.set_tag_attribute_values (
dict[str, dict[str, str]]
, optional) – Sets the values of HTML attributes that are to be set on specific tags. The value is structured as a map from tag names to a map from attribute names to an attribute value. If a tag is not itself whitelisted, adding entries to this map will do nothing.url_schemes (
set[str]
, optional) – Sets the URL schemes permitted onhref
andsrc
attributes.
- Returns:
Sanitized HTML fragment
- Return type:
str
For example:
>>> import nh3 >>> nh3.clean("<unknown>hi") 'hi' >>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>") '<b><img src="">XSS?</b>'
- nh3.clean_text(html)¶
Turn an arbitrary string into unformatted HTML.
Roughly equivalent to Python’s html.escape() or PHP’s htmlspecialchars and htmlentities. Escaping is as strict as possible, encoding every character that has special meaning to the HTML parser.
- Parameters:
html (
str
) – Input HTML fragment- Returns:
Cleaned text
- Return type:
str
For example:
>>> import nh3 >>> nh3.clean_text('Robert"); abuse();//') 'Robert"); abuse();//'
- nh3.is_html(html)¶
Determine if a given string contains HTML.
This function parses the full string and checks for any HTML syntax.
Note: This function will return True for strings that contain invalid HTML syntax like
<g>
and evenVec::<u8>::new()
.- Parameters:
html (
str
) – Input string- Return type:
bool
For example:
>>> nh3.is_html("plain text") False >>> nh3.is_html("<p>html!</p>") True
- nh3.ALLOWED_TAGS¶
The default set of tags allowed by
clean()
. Useful for customizing the default to add or remove some tags:>>> tags = nh3.ALLOWED_TAGS - {"b"} >>> nh3.clean("<b><i>yeah</i></b>", tags=tags) '<i>yeah</i>'
- nh3.ALLOWED_ATTRIBUTES¶
The default mapping of tags to allowed attributes for
clean()
. Useful for customizing the default to add or remove some attributes:>>> from copy import deepcopy >>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES) >>> attributes["img"].add("data-invert") >>> nh3.clean("<img src='example.jpeg' data-invert=true>", attributes=attributes) '<img src="example.jpeg" data-invert="true">'
- nh3.ALLOWED_URL_SCHEMES¶
The default set of URL schemes permitted on
href
andsrc
attributes. Useful for customizing the default to add or remove some URL schemes:>>> url_schemes = nh3.ALLOWED_URL_SCHEMES - {'tel'} >>> nh3.clean('<a href="tel:+1">Call</a> or <a href="mailto:contact@me">email</a> me.', url_schemes=url_schemes) '<a rel="noopener noreferrer">Call</a> or <a href="mailto:contact@me" rel="noopener noreferrer">email</a> me.'