This specification is like no other — it has been processed with you, the humble web developer, in mind.
The focus of this specification is readability and ease of access. Unlike the full HTML Standard, this "developer's edition" removes information that only browser vendors need know. It is automatically produced from the full specification by our build tooling, and thus always in sync with the latest developments in HTML.
To read about its conception, construction, and future, read the original press release, and the blog post about its relaunch.
Finally, feel free to contribute on GitHub to make this edition better for everyone!
There are various places in HTML that accept particular data types, such as dates or numbers. This section describes what the conformance criteria for content in those formats is, and how to parse them.
The White_Space characters are those that have the Unicode
property "White_Space" in the Unicode
PropList.txt data file. [UNICODE]
This is not to be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the
Unicode.txt data file.
A number of attributes are boolean attributes. The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.
If the attribute is present, its value must either be the empty string or a value that is an ASCII case-insensitive match for the attribute's canonical name, with no leading or trailing whitespace.
The values "true" and "false" are not allowed on boolean attributes. To represent a false value, the attribute has to be omitted altogether.
<label><input type=checkbox checked name=cheese disabled> Cheese</label>
This could be equivalently written as this:
<label><input type=checkbox checked=checked name=cheese disabled=disabled> Cheese</label>
You can also mix styles; the following is still equivalent:
<label><input type='checkbox' checked name=cheese disabled=""> Cheese</label>
Some attributes are defined as taking one of a finite set of keywords. Such attributes are called enumerated attributes. The keywords are each defined to map to a particular state (several keywords might map to the same state, in which case some of the keywords are synonyms of each other; additionally, some of the keywords can be said to be non-conforming, and are only in the specification for historical reasons). In addition, two default states can be given. The first is the invalid value default, the second is the missing value default.
If an enumerated attribute is specified, the attribute's value must be an ASCII case-insensitive match for one of the given keywords that are not said to be non-conforming, with no leading or trailing whitespace.
When the attribute is specified, if its value is an ASCII case-insensitive match for one of the given keywords then that keyword's state is the state that the attribute represents. If the attribute value matches none of the given keywords, but the attribute has an invalid value default, then the attribute represents that state. Otherwise, if the attribute value matches none of the keywords but there is a missing value default state defined, then that is the state represented by the attribute. Otherwise, there is no default, and invalid values mean that there is no state represented.
When the attribute is not specified, if there is a missing value default state defined, then that is the state represented by the (missing) attribute. Otherwise, the absence of the attribute means that there is no state represented.
The empty string can be a valid keyword.
A string is a valid integer if it consists of one or more ASCII digits, optionally prefixed with a U+002D HYPHEN-MINUS character (-).
A valid integer without a U+002D HYPHEN-MINUS (-) prefix represents the number that is represented in base ten by that string of digits. A valid integer with a U+002D HYPHEN-MINUS (-) prefix represents the number represented in base ten by the string of digits that follows the U+002D HYPHEN-MINUS, subtracted from zero.
A string is a valid non-negative integer if it consists of one or more ASCII digits.
A valid non-negative integer represents the number that is represented in base ten by that string of digits.
A string is a valid floating-point number if it consists of:
A valid floating-point number represents the number obtained by multiplying the significand by ten raised to the power of the exponent, where the significand is the first number, interpreted as base ten (including the decimal point and the number after the decimal point, if any, and interpreting the significand as a negative number if the whole string starts with a U+002D HYPHEN-MINUS character (-) and the number is not zero), and where the exponent is the number after the E, if any (interpreted as a negative number if there is a U+002D HYPHEN-MINUS character (-) between the E and the number and the number is not zero, or else ignoring a U+002B PLUS SIGN character (+) between the E and the number if there is one). If there is no E, then the exponent is treated as zero.
The Infinity and Not-a-Number (NaN) values are not valid floating-point numbers.
A valid list of floating-point numbers is a number of valid floating-point numbers separated by U+002C COMMA characters, with no other characters (e.g. no ASCII whitespace). In addition, there might be restrictions on the number of floating-point numbers that can be given, or on the range of values allowed.
In the algorithms below, the number of days in month month of year year is: 31 if month is 1, 3, 5, 7, 8, 10, or 12; 30 if month is 4, 6, 9, or 11; 29 if month is 2 and year is a number divisible by 400, or if year is a number divisible by 4 but not by 100; and 28 otherwise. This takes into account leap years in the Gregorian calendar. [GREGORIAN]
When ASCII digits are used in the date and time syntaxes defined in this section, they express numbers in base ten.
Where this specification refers to the proleptic Gregorian calendar, it means the modern Gregorian calendar, extrapolated backwards to year 1. A date in the proleptic Gregorian calendar, sometimes explicitly referred to as a proleptic-Gregorian date, is one that is described using that calendar even if that calendar was not in use at the time (or place) in question. [GREGORIAN]
The use of the Gregorian calendar as the wire format in this specification is an
arbitrary choice resulting from the cultural biases of those involved in the decision. See also
the section discussing date, time, and number formats in forms
A month consists of a specific proleptic-Gregorian date with no time-zone information and no date information beyond a year and a month. [GREGORIAN]
A string is a valid month string representing a year year and month month if it consists of the following components in the given order:
A date consists of a specific proleptic-Gregorian date with no time-zone information, consisting of a year, a month, and a day. [GREGORIAN]
A string is a valid date string representing a year year, month month, and day day if it consists of the following components in the given order:
A yearless date consists of a Gregorian month and a day within that month, but with no associated year. [GREGORIAN]
A string is a valid yearless date string representing a month month and a day day if it consists of the following components in the given order:
In other words, if the month is "
meaning February, then the day can be 29, as if the year was a leap year.
A time consists of a specific time with no time-zone information, consisting of an hour, a minute, a second, and a fraction of a second.
A string is a valid time string representing an hour hour, a minute minute, and a second second if it consists of the following components in the given order:
The second component cannot be 60 or 61; leap seconds cannot be represented.
A local date and time consists of a specific proleptic-Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone. [GREGORIAN]
A string is a valid local date and time string representing a date and time if it consists of the following components in the given order:
A string is a valid normalized local date and time string representing a date and time if it consists of the following components in the given order:
A time-zone offset consists of a signed number of hours and minutes.
A string is a valid time-zone offset string representing a time-zone offset if it consists of either:
A U+005A LATIN CAPITAL LETTER Z character (Z), allowed only if the time zone is UTC
Or, the following components, in the given order:
This format allows for time-zone offsets from -23:59 to +23:59. Right now, in practice, the range of offsets of actual time zones is -12:00 to +14:00, and the minutes component of offsets of actual time zones is always either 00, 30, or 45. There is no guarantee that this will remain so forever, however, since time zones are used as political footballs and are thus subject to very whimsical policy decisions.
See also the usage notes and examples in the global date and time section below for details on using time-zone offsets with historical times that predate the formation of formal time zones.
A global date and time consists of a specific proleptic-Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time-zone offset, consisting of a signed number of hours and minutes. [GREGORIAN]
A string is a valid global date and time string representing a date, time, and a time-zone offset if it consists of the following components in the given order:
Times in dates before the formation of UTC in the mid twentieth century must be expressed and interpreted in terms of UT1 (contemporary Earth solar time at the 0° longitude), not UTC (the approximation of UT1 that ticks in SI seconds). Time before the formation of time zones must be expressed and interpreted as UT1 times with explicit time zones that approximate the contemporary difference between the appropriate local time and the time observed at the location of Greenwich, London.
The following are some examples of dates written as valid global date and time strings.
Several things are notable about these dates:
T" is replaced by a space, it must be a single space character. The string "
2001-12-21 12:00Z" (with two spaces between the components) would not be parsed successfully.
A week consists of a week-year number and a week number representing a seven-day period starting on a Monday. Each week-year in this calendaring system has either 52 or 53 such seven-day periods, as defined below. The seven-day period starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa. [GREGORIAN]
A week-year with a number year has 53 weeks if it corresponds to either a year year in the proleptic Gregorian calendar that has a Thursday as its first day (January 1st), or a year year in the proleptic Gregorian calendar that has a Wednesday as its first day (January 1st) and where year is a number divisible by 400, or a number divisible by 4 but not by 100. All other week-years have 52 weeks.
The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.
The week-year number of a particular day can be different than the number of the year that contains that day in the proleptic Gregorian calendar. The first week in a week-year y is the week that contains the first Thursday of the Gregorian year y.
For modern purposes, a week as defined here is equivalent to ISO weeks as defined in ISO 8601. [ISO8601]
A string is a valid week string representing a week-year year and week week if it consists of the following components in the given order:
A duration consists of a number of seconds.
Since months and seconds are not comparable (a month is not a precise number of seconds, but is instead a period whose exact length depends on the precise day from which it is measured) a duration as defined in this specification cannot include months (or years, which are equivalent to twelve months). Only durations that describe a specific number of seconds can be described.
A string is a valid duration string representing a duration t if it consists of either of the following:
A literal U+0050 LATIN CAPITAL LETTER P character followed by one or more of the following subcomponents, in the order given, where the number of days, hours, minutes, and seconds corresponds to the same number of seconds as in t:
One or more ASCII digits followed by a U+0044 LATIN CAPITAL LETTER D character, representing a number of days.
A U+0054 LATIN CAPITAL LETTER T character followed by one or more of the following subcomponents, in the order given:
One or more ASCII digits followed by a U+0048 LATIN CAPITAL LETTER H character, representing a number of hours.
One or more ASCII digits followed by a U+004D LATIN CAPITAL LETTER M character, representing a number of minutes.
The following components:
This, as with a number of other date- and time-related microsyntaxes defined in this specification, is based on one of the formats defined in ISO 8601. [ISO8601]
A duration time component is a string consisting of the following components:
Zero or more ASCII whitespace.
If the duration time component scale specified is 1 (i.e. the units are seconds), then, optionally, a U+002E FULL STOP character (.) followed by one, two, or three ASCII digits, representing a fraction of a second.
Zero or more ASCII whitespace.
One of the following characters, representing the duration time component scale of the time unit used in the numeric part of the duration time component:
Zero or more ASCII whitespace.
This is not based on any of the formats in ISO 8601. It is intended to be a more human-readable alternative to the ISO 8601 duration format.
A string is a valid date string with optional time if it is also one of the following:
A simple color consists of three 8-bit numbers in the range 0..255, representing the red, green, and blue components of the color respectively, in the sRGB color space. [SRGB]
A string is a valid simple color if it is exactly seven characters long, and the first character is a U+0023 NUMBER SIGN character (#), and the remaining six characters are all ASCII hex digits, with the first two digits representing the red component, the middle two digits representing the green component, and the last two digits representing the blue component, in hexadecimal.
A string is a valid lowercase simple color if it is a valid simple color and doesn't use any characters in the range U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F.
The 2D graphics context has a separate color syntax that also handles opacity.
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more ASCII whitespace, where words consist of any string of one or more characters, none of which are ASCII whitespace.
A string containing a set of space-separated tokens may have leading or trailing ASCII whitespace.
An unordered set of unique space-separated tokens is a set of space-separated tokens where none of the tokens are duplicated.
An ordered set of unique space-separated tokens is a set of space-separated tokens where none of the tokens are duplicated but where the order of the tokens is meaningful.
Sets of space-separated tokens sometimes have a defined set of allowed values. When a set of allowed values is defined, the tokens must all be from that list of allowed values; other values are non-conforming. If no such set of allowed values is provided, then all values are conforming.
How tokens in a set of space-separated tokens are to be compared (e.g. case-sensitively or not) is defined on a per-set basis.
A set of comma-separated tokens is a string containing zero or more tokens each separated from the next by a single U+002C COMMA character (,), where tokens consist of any string of zero or more characters, neither beginning nor ending with ASCII whitespace, nor containing any U+002C COMMA characters (,), and optionally surrounded by ASCII whitespace.
For instance, the string "
a ,b,,d d " consists of four tokens: "a", "b", the empty
string, and "d d". Leading and trailing whitespace around each token doesn't count as part of
the token, and the empty string can be a token.
Sets of comma-separated tokens sometimes have further restrictions on what consists a valid token. When such restrictions are defined, the tokens must all fit within those restrictions; other values are non-conforming. If no such restrictions are specified, then all values are conforming.
A valid hash-name reference to an element of type type is a
string consisting of a U+0023 NUMBER SIGN character (#) followed by a string which exactly matches
the value of the
name attribute of an element with type type in
the same tree.
A string is a valid media query list if it matches the
<media-query-list> production of the Media Queries specification. [MQ]
A string matches the environment of the user if it is the empty string, a string consisting of only ASCII whitespace, or is a media query list that matches the user's environment according to the definitions given in the Media Queries specification. [MQ]