1. 4.8.9 The video element
      2. 4.8.10 The audio element
      3. 4.8.11 The track element
      4. 4.8.12 Media elements
        1. 4.8.12.1 Error codes
        2. 4.8.12.2 Location of the media resource
        3. 4.8.12.3 MIME types
        4. 4.8.12.4 Network states
        5. 4.8.12.5 Loading the media resource
        6. 4.8.12.6 Offsets into the media resource
        7. 4.8.12.7 Ready states
        8. 4.8.12.8 Playing the media resource
        9. 4.8.12.9 Seeking
        10. 4.8.12.10 Media resources with multiple media tracks
          1. 4.8.12.10.1 AudioTrackList and VideoTrackList objects
          2. 4.8.12.10.2 Selecting specific audio and video tracks declaratively
        11. 4.8.12.11 Timed text tracks
          1. 4.8.12.11.1 Text track model
          2. 4.8.12.11.2 Sourcing in-band text tracks
          3. 4.8.12.11.3 Text track API
          4. 4.8.12.11.4 Best practices for metadata text tracks
        12. 4.8.12.12 Identifying a track kind through a URL
        13. 4.8.12.13 User interface
        14. 4.8.12.14 Time ranges
        15. 4.8.12.15 The TrackEvent interface
        16. 4.8.12.16 Event summary
        17. 4.8.12.17 Best practices for authors using media elements

4.8.9 The video element

Support: videoChrome for Android 62+Chrome 4+iOS Safari 3.2+UC Browser for Android 11.4+Firefox 20+IE 9+Samsung Internet 4+Opera Mini NoneSafari 4+Edge 12+Android Browser 2.3+Opera 10.5+

Source: caniuse.com

Categories:
Flow content.
Phrasing content.
Embedded content.
If the element has a controls attribute: Interactive content.
Palpable content.
Contexts in which this element can be used:
Where embedded content is expected.
Content model:
If the element has a src attribute: zero or more track elements, then transparent, but with no media element descendants.
If the element does not have a src attribute: zero or more source elements, then zero or more track elements, then transparent, but with no media element descendants.
Tag omission in text/html:
Neither tag is omissible.
Content attributes:
Global attributes
src — Address of the resource
crossorigin — How the element handles crossorigin requests
poster — Poster frame to show prior to video playback
preload — Hints how much buffering the media resource will likely need
autoplay — Hint that the media resource can be started automatically when the page is loaded
playsinline — Encourage the user agent to display video content within the element's playback area
loop — Whether to loop the media resource
muted — Whether to mute the media resource by default
controls — Show user agent controls
width — Horizontal dimension
height — Vertical dimension
DOM interface:
Uses HTMLVideoElement.

A video element is used for playing videos or movies, and audio files with captions.

Content may be provided inside the video element; it is intended for older Web browsers which do not support video, so that legacy video plugins can be tried, or to show text to the users of these older browsers informing them of how to access the video contents.

In particular, this content is not intended to address accessibility concerns. To make video content accessible to the partially sighted, the blind, the hard-of-hearing, the deaf, and those with other physical or cognitive disabilities, a variety of features are available. Captions can be provided, either embedded in the video stream or as external files using the track element. Sign-language tracks can be embedded in the video stream. Audio descriptions can be embedded in the video stream or in text form using a WebVTT file referenced using the track element and synthesized into speech by the user agent. WebVTT can also be used to provide chapter titles. For users who would rather not use a media element at all, transcripts or other textual alternatives can be provided by simply linking to them in the prose near the video element. [WEBVTT]

The video element is a media element whose media data is ostensibly video data, possibly with associated audio data.

The src, preload, autoplay, loop, muted, and controls attributes are the attributes common to all media elements.

The poster attribute gives the URL of an image file that the user agent can show while no video data is available. The attribute, if present, must contain a valid non-empty URL potentially surrounded by spaces.

The image given by the poster attribute, the poster frame, is intended to be a representative frame of the video (typically one of the first non-blank frames) that gives the user an idea of what the video is like.

The playsinline attribute is a boolean attribute. If present, it serves as a hint to the user agent that the video ought to be displayed "inline" in the document by default, constrained to the element's playback area, instead of being displayed fullscreen or in an independent resizable window.

The absence of the playsinline attributes does not imply that the video will display fullscreen by default. Indeed, most user agents have chosen to play all videos inline by default, and in such user agents the playsinline attribute has no effect.

video . videoWidth
video . videoHeight

These attributes return the intrinsic dimensions of the video, or zero if the dimensions are not known.

The video element supports dimension attributes.

This example shows how to detect when a video has failed to play correctly:

<script>
 function failed(e) {
   // video playback failed - show a message saying why
   switch (e.target.error.code) {
     case e.target.error.MEDIA_ERR_ABORTED:
       alert('You aborted the video playback.');
       break;
     case e.target.error.MEDIA_ERR_NETWORK:
       alert('A network error caused the video download to fail part-way.');
       break;
     case e.target.error.MEDIA_ERR_DECODE:
       alert('The video playback was aborted due to a corruption problem or because the video used features your browser did not support.');
       break;
     case e.target.error.MEDIA_ERR_SRC_NOT_SUPPORTED:
       alert('The video could not be loaded, either because the server or network failed or because the format is not supported.');
       break;
     default:
       alert('An unknown error occurred.');
       break;
   }
 }
</script>
<p><video src="tgif.vid" autoplay controls onerror="failed(event)"></video></p>
<p><a href="tgif.vid">Download the video file</a>.</p>

4.8.10 The audio element

Support: audioChrome for Android 62+Chrome 4+iOS Safari 4.0+UC Browser for Android 11.4+Firefox 20+IE 9+Samsung Internet 4+Opera Mini NoneSafari 4+Edge 12+Android Browser 2.3+Opera 10.5+

Source: caniuse.com

Categories:
Flow content.
Phrasing content.
Embedded content.
If the element has a controls attribute: Interactive content.
If the element has a controls attribute: Palpable content.
Contexts in which this element can be used:
Where embedded content is expected.
Content model:
If the element has a src attribute: zero or more track elements, then transparent, but with no media element descendants.
If the element does not have a src attribute: zero or more source elements, then zero or more track elements, then transparent, but with no media element descendants.
Tag omission in text/html:
Neither tag is omissible.
Content attributes:
Global attributes
src — Address of the resource
crossorigin — How the element handles crossorigin requests
preload — Hints how much buffering the media resource will likely need
autoplay — Hint that the media resource can be started automatically when the page is loaded
loop — Whether to loop the media resource
muted — Whether to mute the media resource by default
controls — Show user agent controls
DOM interface:
Uses HTMLAudioElement.

An audio element represents a sound or audio stream.

Content may be provided inside the audio element; it is intended for older Web browsers which do not support audio, so that legacy audio plugins can be tried, or to show text to the users of these older browsers informing them of how to access the audio contents.

In particular, this content is not intended to address accessibility concerns. To make audio content accessible to the deaf or to those with other physical or cognitive disabilities, a variety of features are available. If captions or a sign language video are available, the video element can be used instead of the audio element to play the audio, allowing users to enable the visual alternatives. Chapter titles can be provided to aid navigation, using the track element and a WebVTT file. And, naturally, transcripts or other textual alternatives can be provided by simply linking to them in the prose near the audio element. [WEBVTT]

The audio element is a media element whose media data is ostensibly audio data.

The src, preload, autoplay, loop, muted, and controls attributes are the attributes common to all media elements.

audio = new Audio( [ url ] )

Returns a new audio element, with the src attribute set to the value passed in the argument, if applicable.

4.8.11 The track element

Categories:
None.
Contexts in which this element can be used:
As a child of a media element, before any flow content.
Content model:
Nothing.
Tag omission in text/html:
No end tag.
Content attributes:
Global attributes
kind — The type of text track
src — Address of the resource
srclang — Language of the text track
label — User-visible label
default — Enable the track if no other text track is more suitable
DOM interface:
Uses HTMLTrackElement.

The track element allows authors to specify explicit external timed text tracks for media elements. It does not represent anything on its own.

The kind attribute is an enumerated attribute. The following table lists the keywords defined for this attribute. The keyword given in the first cell of each row maps to the state given in the second cell.

Keyword State Brief description
subtitles Subtitles Transcription or translation of the dialogue, suitable for when the sound is available but not understood (e.g. because the user does not understand the language of the media resource's audio track). Overlaid on the video.
captions Captions Transcription or translation of the dialogue, sound effects, relevant musical cues, and other relevant audio information, suitable for when sound is unavailable or not clearly audible (e.g. because it is muted, drowned-out by ambient noise, or because the user is deaf). Overlaid on the video; labeled as appropriate for the hard-of-hearing.
descriptions Descriptions Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind). Synthesized as audio.
chapters Chapters metadata Tracks intended for use from script. Not displayed by the user agent.
metadata Metadata

The attribute may be omitted. The missing value default is the subtitles state. The invalid value default is the metadata state.

The src attribute gives the URL of the text track data. The value must be a valid non-empty URL potentially surrounded by spaces. This attribute must be present.

If the element's track URL identifies a WebVTT resource, and the element's kind attribute is not in the chapters metadata or metadata state, then the WebVTT file must be a WebVTT file using cue text. [WEBVTT]

The srclang attribute gives the language of the text track data. The value must be a valid BCP 47 language tag. This attribute must be present if the element's kind attribute is in the subtitles state. [BCP47]

The label attribute gives a user-readable title for the track. This title is used by user agents when listing subtitle, caption, and audio description tracks in their user interface.

The value of the label attribute, if the attribute is present, must not be the empty string. Furthermore, there must not be two track element children of the same media element whose kind attributes are in the same state, whose srclang attributes are both missing or have values that represent the same language, and whose label attributes are again both missing or both have the same value.

The default attribute is a boolean attribute, which, if specified, indicates that the track is to be enabled if the user's preferences do not indicate that another track would be more appropriate.

Each media element must have no more than one track element child whose kind attribute is in the subtitles or captions state and whose default attribute is specified.

Each media element must have no more than one track element child whose kind attribute is in the description state and whose default attribute is specified.

Each media element must have no more than one track element child whose kind attribute is in the chapters metadata state and whose default attribute is specified.

There is no limit on the number of track elements whose kind attribute is in the metadata state and whose default attribute is specified.

track . readyState

Returns the text track readiness state, represented by a number from the following list:

track . NONE (0)

The text track not loaded state.

track . LOADING (1)

The text track loading state.

track . LOADED (2)

The text track loaded state.

track . ERROR (3)

The text track failed to load state.

track . track

Returns the TextTrack object corresponding to the text track of the track element.

This video has subtitles in several languages:

<video src="brave.webm">
 <track kind=subtitles src=brave.en.vtt srclang=en label="English">
 <track kind=captions src=brave.en.hoh.vtt srclang=en label="English for the Hard of Hearing">
 <track kind=subtitles src=brave.fr.vtt srclang=fr lang=fr label="Français">
 <track kind=subtitles src=brave.de.vtt srclang=de lang=de label="Deutsch">
</video>

(The lang attributes on the last two describe the language of the label attribute, not the language of the subtitles themselves. The language of the subtitles is given by the srclang attribute.)

4.8.12 Media elements

HTMLMediaElement objects (audio and video, in this specification) are simply known as media elements.

The media element attributes, src, crossorigin, preload, autoplay, loop, muted, and controls, apply to all media elements. They are defined in this section.

Media elements are used to present audio data, or video and audio data, to the user. This is referred to as media data in this section, since this section applies equally to media elements for audio or for video. The term media resource is used to refer to the complete set of media data, e.g. the complete video file, or complete audio file.

A media resource can have multiple audio and video tracks. For the purposes of a media element, the video data of the media resource is only that of the currently selected track (if any) as given by the element's videoTracks attribute when the event loop last reached step 1, and the audio data of the media resource is the result of mixing all the currently enabled tracks (if any) given by the element's audioTracks attribute when the event loop last reached step 1.

Both audio and video elements can be used for both audio and video. The main difference between the two is simply that the audio element has no playback area for visual content (such as video or captions), whereas the video element does.

4.8.12.1 Error codes
media . error

Returns a MediaError object representing the current error state of the element.

Returns null if there is no error.

media . error . code

Returns the current error's error code, from the list below.

media . error . message

Returns a specific informative diagnostic message about the error condition encountered. The message and message format are not generally uniform across different user agents. If no such message is available, then the empty string is returned.

MEDIA_ERR_ABORTED (numeric value 1)
The fetching process for the media resource was aborted by the user agent at the user's request.
MEDIA_ERR_NETWORK (numeric value 2)
A network error of some description caused the user agent to stop fetching the media resource, after the resource was established to be usable.
MEDIA_ERR_DECODE (numeric value 3)
An error of some description occurred while decoding the media resource, after the resource was established to be usable.
MEDIA_ERR_SRC_NOT_SUPPORTED (numeric value 4)
The media resource indicated by the src attribute or assigned media provider object was not suitable.
4.8.12.2 Location of the media resource

The src content attribute on media elements gives the URL of the media resource (video, audio) to show. The attribute, if present, must contain a valid non-empty URL potentially surrounded by spaces.

If the itemprop attribute is specified on the media element, then the src attribute must also be specified.

The crossorigin content attribute on media elements is a CORS settings attribute.

A media provider object is an object that can represent a media resource, separate from a URL. MediaStream objects, MediaSource objects, and Blob objects are all media provider objects.

media . srcObject [ = source ]

Allows the media element to be assigned a media provider object.

media . currentSrc

Returns the URL of the current media resource, if any.

Returns the empty string when there is no media resource, or it doesn't have a URL.

There are three ways to specify a media resource: the srcObject IDL attribute, the src content attribute, and source elements. The IDL attribute takes priority, followed by the content attribute, followed by the elements.

4.8.12.3 MIME types

A media resource can be described in terms of its type, specifically a MIME type, in some cases with a codecs parameter. (Whether the codecs parameter is allowed or not depends on the MIME type.) [RFC6381]

Types are usually somewhat incomplete descriptions; for example "video/mpeg" doesn't say anything except what the container type is, and even a type like "video/mp4; codecs="avc1.42E01E, mp4a.40.2"" doesn't include information like the actual bitrate (only the maximum bitrate). Thus, given a type, a user agent can often only know whether it might be able to play media of that type (with varying levels of confidence), or whether it definitely cannot play media of that type.

A type that the user agent knows it cannot render is one that describes a resource that the user agent definitely does not support, for example because it doesn't recognize the container type, or it doesn't support the listed codecs.

The MIME type "application/octet-stream" with no parameters is never a type that the user agent knows it cannot render. User agents must treat that type as equivalent to the lack of any explicit Content-Type metadata when it is used to label a potential media resource.

Only the MIME type "application/octet-stream" with no parameters is special-cased here; if any parameter appears with it, it will be treated just like any other MIME type. This is a deviation from the rule that unknown MIME type parameters should be ignored.

media . canPlayType(type)

Returns the empty string (a negative response), "maybe", or "probably" based on how confident the user agent is that it can play media resources of the given type.

This script tests to see if the user agent supports a (fictional) new format to dynamically decide whether to use a video element or a plugin:

<section id="video">
 <p><a href="playing-cats.nfv">Download video</a></p>
</section>
<script>
 var videoSection = document.getElementById('video');
 var videoElement = document.createElement('video');
 var support = videoElement.canPlayType('video/x-new-fictional-format;codecs="kittens,bunnies"');
 if (support != "probably" && "New Fictional Video Plugin" in navigator.plugins) {
   // not confident of browser support
   // but we have a plugin
   // so use plugin instead
   videoElement = document.createElement("embed");
 } else if (support == "") {
   // no support from browser and no plugin
   // do nothing
   videoElement = null;
 }
 if (videoElement) {
   while (videoSection.hasChildNodes())
     videoSection.removeChild(videoSection.firstChild);
   videoElement.setAttribute("src", "playing-cats.nfv");
   videoSection.appendChild(videoElement);
 }
</script>

The type attribute of the source element allows the user agent to avoid downloading resources that use formats it cannot render.

4.8.12.4 Network states
media . networkState

Returns the current state of network activity for the element, from the codes in the list below.

NETWORK_EMPTY (numeric value 0)
The element has not yet been initialized. All attributes are in their initial states.
NETWORK_IDLE (numeric value 1)
The element has selected a resource, but it is not actually using the network at this time.
NETWORK_LOADING (numeric value 2)
The user agent is actively trying to download data.
NETWORK_NO_SOURCE (numeric value 3)
The element has not yet found a resource to use.
4.8.12.5 Loading the media resource
media . load()

Causes the element to reset and start selecting and loading a new media resource from scratch.


The preload attribute is an enumerated attribute. The following table lists the keywords and states for the attribute — the keywords in the left column map to the states in the cell in the second column on the same row as the keyword. The attribute can be changed even once the media resource is being buffered or played; the descriptions in the table below are to be interpreted with that in mind.

Keyword State Brief description
none None Hints to the user agent that either the author does not expect the user to need the media resource, or that the server wants to minimize unnecessary traffic. This state does not provide a hint regarding how aggressively to actually download the media resource if buffering starts anyway (e.g. once the user hits "play").
metadata Metadata Hints to the user agent that the author does not expect the user to need the media resource, but that fetching the resource metadata (dimensions, track list, duration, etc), and maybe even the first few frames, is reasonable. If the user agent precisely fetches no more than the metadata, then the media element will end up with its readyState attribute set to HAVE_METADATA; typically though, some frames will be obtained as well and it will probably be HAVE_CURRENT_DATA or HAVE_FUTURE_DATA. When the media resource is playing, hints to the user agent that bandwidth is to be considered scarce, e.g. suggesting throttling the download so that the media data is obtained at the slowest possible rate that still maintains consistent playback.
auto Automatic Hints to the user agent that the user agent can put the user's needs first without risk to the server, up to and including optimistically downloading the entire resource.

The empty string is also a valid keyword, and maps to the Automatic state. The attribute's missing value default is user-agent defined, though the Metadata state is suggested as a compromise between reducing server load and providing an optimal user experience.

Authors might switch the attribute from "none" or "metadata" to "auto" dynamically once the user begins playback. For example, on a page with many videos this might be used to indicate that the many videos are not to be downloaded unless requested, but that once one is requested it is to be downloaded aggressively.

The autoplay attribute can override the preload attribute (since if the media plays, it naturally has to buffer first, regardless of the hint given by the preload attribute). Including both is not an error, however.


media . buffered

Returns a TimeRanges object that represents the ranges of the media resource that the user agent has buffered.

4.8.12.6 Offsets into the media resource
media . duration

Returns the length of the media resource, in seconds, assuming that the start of the media resource is at time zero.

Returns NaN if the duration isn't available.

Returns Infinity for unbounded streams.

media . currentTime [ = value ]

Returns the official playback position, in seconds.

Can be set, to seek to the given time.


The loop attribute is a boolean attribute that, if specified, indicates that the media element is to seek back to the start of the media resource upon reaching the end.

4.8.12.7 Ready states
media . readyState

Returns a value that expresses the current state of the element with respect to rendering the current playback position, from the codes in the list below.

HAVE_NOTHING (numeric value 0)

No information regarding the media resource is available. No data for the current playback position is available. Media elements whose networkState attribute are set to NETWORK_EMPTY are always in the HAVE_NOTHING state.

HAVE_METADATA (numeric value 1)

Enough of the resource has been obtained that the duration of the resource is available. In the case of a video element, the dimensions of the video are also available. No media data is available for the immediate current playback position.

HAVE_CURRENT_DATA (numeric value 2)

Data for the immediate current playback position is available, but either not enough data is available that the user agent could successfully advance the current playback position in the direction of playback at all without immediately reverting to the HAVE_METADATA state, or there is no more data to obtain in the direction of playback. For example, in video this corresponds to the user agent having data from the current frame, but not the next frame, when the current playback position is at the end of the current frame; and to when playback has ended.

HAVE_FUTURE_DATA (numeric value 3)

Data for the immediate current playback position is available, as well as enough data for the user agent to advance the current playback position in the direction of playback at least a little without immediately reverting to the HAVE_METADATA state, and the text tracks are ready. For example, in video this corresponds to the user agent having data for at least the current frame and the next frame when the current playback position is at the instant in time between the two frames, or to the user agent having the video data for the current frame and audio data to keep playing at least a little when the current playback position is in the middle of a frame. The user agent cannot be in this state if playback has ended, as the current playback position can never advance in this case.

HAVE_ENOUGH_DATA (numeric value 4)

All the conditions described for the HAVE_FUTURE_DATA state are met, and, in addition, either of the following conditions is also true:

In practice, the difference between HAVE_METADATA and HAVE_CURRENT_DATA is negligible. Really the only time the difference is relevant is when painting a video element onto a canvas, where it distinguishes the case where something will be drawn (HAVE_CURRENT_DATA or greater) from the case where nothing is drawn (HAVE_METADATA or less). Similarly, the difference between HAVE_CURRENT_DATA (only the current frame) and HAVE_FUTURE_DATA (at least this frame and the next) can be negligible (in the extreme, only one frame). The only time that distinction really matters is when a page provides an interface for "frame-by-frame" navigation.

It is possible for the ready state of a media element to jump between these states discontinuously. For example, the state of a media element can jump straight from HAVE_METADATA to HAVE_ENOUGH_DATA without passing through the HAVE_CURRENT_DATA and HAVE_FUTURE_DATA states.

The autoplay attribute is a boolean attribute. When present, the user agent will automatically begin playback of the media resource as soon as it can do so without stopping.

Authors are urged to use the autoplay attribute rather than using script to trigger automatic playback, as this allows the user to override the automatic playback when it is not desired, e.g. when using a screen reader. Authors are also encouraged to consider not using the automatic playback behavior at all, and instead to let the user agent wait for the user to start playback explicitly.

4.8.12.8 Playing the media resource
media . paused

Returns true if playback is paused; false otherwise.

media . ended

Returns true if playback has reached the end of the media resource.

media . defaultPlaybackRate [ = value ]

Returns the default rate of playback, for when the user is not fast-forwarding or reversing through the media resource.

Can be set, to change the default rate of playback.

The default rate has no direct effect on playback, but if the user switches to a fast-forward mode, when they return to the normal playback mode, it is expected that the rate of playback will be returned to the default rate of playback.

media . playbackRate [ = value ]

Returns the current rate playback, where 1.0 is normal speed.

Can be set, to change the rate of playback.

media . played

Returns a TimeRanges object that represents the ranges of the media resource that the user agent has played.

media . play()

Sets the paused attribute to false, loading the media resource and beginning playback if necessary. If the playback had ended, will restart it from the start.

media . pause()

Sets the paused attribute to true, loading the media resource if necessary.

4.8.12.9 Seeking
media . seeking

Returns true if the user agent is currently seeking.

media . seekable

Returns a TimeRanges object that represents the ranges of the media resource to which it is possible for the user agent to seek.

media . fastSeek( time )

Seeks to near the given time as fast as possible, trading precision for speed. (To seek to a precise time, use the currentTime attribute.)

This does nothing if the media resource has not been loaded.

4.8.12.10 Media resources with multiple media tracks

A media resource can have multiple embedded audio and video tracks. For example, in addition to the primary video and audio tracks, a media resource could have foreign-language dubbed dialogues, director's commentaries, audio descriptions, alternative angles, or sign-language overlays.

media . audioTracks

Returns an AudioTrackList object representing the audio tracks available in the media resource.

media . videoTracks

Returns a VideoTrackList object representing the video tracks available in the media resource.

4.8.12.10.1 AudioTrackList and VideoTrackList objects

Support: audiotracksChrome for Android NoneChrome NoneiOS Safari 7.0+UC Browser for Android NoneFirefox NoneIE 10+Samsung Internet NoneOpera Mini NoneSafari 6.1+Edge 12+Android Browser NoneOpera None

Source: caniuse.com

The AudioTrackList and VideoTrackList interfaces are used by attributes defined in the previous section.

media . audioTracks . length
media . videoTracks . length

Returns the number of tracks in the list.

audioTrack = media . audioTracks[index]
videoTrack = media . videoTracks[index]

Returns the specified AudioTrack or VideoTrack object.

audioTrack = media . audioTracks . getTrackById( id )
videoTrack = media . videoTracks . getTrackById( id )

Returns the AudioTrack or VideoTrack object with the given identifier, or null if no track has that identifier.

audioTrack . id
videoTrack . id

Returns the ID of the given track. This is the ID that can be used with a fragment if the format supports media fragment syntax, and that can be used with the getTrackById() method.

audioTrack . kind
videoTrack . kind

Returns the category the given track falls into. The possible track categories are given below.

audioTrack . label
videoTrack . label

Returns the label of the given track, if known, or the empty string otherwise.

audioTrack . language
videoTrack . language

Returns the language of the given track, if known, or the empty string otherwise.

audioTrack . enabled [ = value ]

Returns true if the given track is active, and false otherwise.

Can be set, to change whether the track is enabled or not. If multiple audio tracks are enabled simultaneously, they are mixed.

media . videoTracks . selectedIndex

Returns the index of the currently selected track, if any, or −1 otherwise.

videoTrack . selected [ = value ]

Returns true if the given track is active, and false otherwise.

Can be set, to change whether the track is selected or not. Either zero or one video track is selected; selecting a new track while a previous one is selected will unselect the previous one.

Return values for AudioTrack.kind and VideoTrack.kind
Category Definition Applies to...Examples
"alternative" A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video). Audio and video. Ogg: "audio/alternate" or "video/alternate"; DASH: "alternate" without "main" and "commentary" roles, and, for audio, without the "dub" role (other roles ignored).
"captions" A version of the main video track with captions burnt in. (For legacy content; new content would use text tracks.) Video only. DASH: "caption" and "main" roles together (other roles ignored).
"descriptions" An audio description of a video track. Audio only. Ogg: "audio/audiodesc".
"main" The primary audio or video track. Audio and video. Ogg: "audio/main" or "video/main"; WebM: the "FlagDefault" element is set; DASH: "main" role without "caption", "subtitle", and "dub" roles (other roles ignored).
"main-desc" The primary audio track, mixed with audio descriptions. Audio only. AC3 audio in MPEG-2 TS: bsmod=2 and full_svc=1.
"sign" A sign-language interpretation of an audio track. Video only. Ogg: "video/sign".
"subtitles" A version of the main video track with subtitles burnt in. (For legacy content; new content would use text tracks.) Video only. DASH: "subtitle" and "main" roles together (other roles ignored).
"translation" A translated version of the main audio track. Audio only. Ogg: "audio/dub". DASH: "dub" and "main" roles together (other roles ignored).
"commentary" Commentary on the primary audio or video track, e.g. a director's commentary. Audio and video. DASH: "commentary" role without "main" role (other roles ignored).
"" (empty string) No explicit kind, or the kind given by the track's metadata is not recognized by the user agent. Audio and video.
4.8.12.10.2 Selecting specific audio and video tracks declaratively

The audioTracks and videoTracks attributes allow scripts to select which track should play, but it is also possible to select specific tracks declaratively, by specifying particular tracks in the fragment of the URL of the media resource. The format of the fragment depends on the MIME type of the media resource. [RFC2046] [URL]

In this example, a video that uses a format that supports media fragment syntax is embedded in such a way that the alternative angles labeled "Alternative" are enabled instead of the default video track.

<video src="myvideo#track=Alternative"></video>
4.8.12.11 Timed text tracks
4.8.12.11.1 Text track model

A media element can have a group of associated text tracks, known as the media element's list of text tracks. The text tracks are sorted as follows:

  1. The text tracks corresponding to track element children of the media element, in tree order.
  2. Any text tracks added using the addTextTrack() method, in the order they were added, oldest first.
  3. Any media-resource-specific text tracks (text tracks corresponding to data in the media resource), in the order defined by the media resource's format specification.

A text track consists of:

The kind of text track

This decides how the track is handled by the user agent. The kind is represented by a string. The possible strings are:

The kind of track can change dynamically, in the case of a text track corresponding to a track element.

A label

This is a human-readable string intended to identify the track for the user.

The label of a track can change dynamically, in the case of a text track corresponding to a track element.

When a text track label is the empty string, the user agent should automatically generate an appropriate label from the text track's other properties (e.g. the kind of text track and the text track's language) for use in its user interface. This automatically-generated label is not exposed in the API.

An in-band metadata track dispatch type

This is a string extracted from the media resource specifically for in-band metadata tracks to enable such tracks to be dispatched to different scripts in the document.

For example, a traditional TV station broadcast streamed on the Web and augmented with Web-specific interactive features could include text tracks with metadata for ad targeting, trivia game data during game shows, player states during sports games, recipe information during food programs, and so forth. As each program starts and ends, new tracks might be added or removed from the stream, and as each one is added, the user agent could bind them to dedicated script modules using the value of this attribute.

Other than for in-band metadata text tracks, the in-band metadata track dispatch type is the empty string. How this value is populated for different media formats is described in steps to expose a media-resource-specific text track.

A language

This is a string (a BCP 47 language tag) representing the language of the text track's cues. [BCP47]

The language of a text track can change dynamically, in the case of a text track corresponding to a track element.

A readiness state

One of the following:

Not loaded

Indicates that the text track's cues have not been obtained.

Loading

Indicates that the text track is loading and there have been no fatal errors encountered so far. Further cues might still be added to the track by the parser.

Loaded

Indicates that the text track has been loaded with no fatal errors.

Failed to load

Indicates that the text track was enabled, but when the user agent attempted to obtain it, this failed in some way (e.g. URL could not be parsed, network error, unknown text track format). Some or all of the cues are likely missing and will not be obtained.

The readiness state of a text track changes dynamically as the track is obtained.

A mode

One of the following:

Disabled

Indicates that the text track is not active. Other than for the purposes of exposing the track in the DOM, the user agent is ignoring the text track. No cues are active, no events are fired, and the user agent will not attempt to obtain the track's cues.

Hidden

Indicates that the text track is active, but that the user agent is not actively displaying the cues. If no attempt has yet been made to obtain the track's cues, the user agent will perform such an attempt momentarily. The user agent is maintaining a list of which cues are active, and events are being fired accordingly.

Showing

Indicates that the text track is active. If no attempt has yet been made to obtain the track's cues, the user agent will perform such an attempt momentarily. The user agent is maintaining a list of which cues are active, and events are being fired accordingly. In addition, for text tracks whose kind is subtitles or captions, the cues are being overlaid on the video as appropriate; for text tracks whose kind is descriptions, the user agent is making the cues available to the user in a non-visual fashion; and for text tracks whose kind is chapters, the user agent is making available to the user a mechanism by which the user can navigate to any point in the media resource by selecting a cue.

A list of zero or more cues

A list of text track cues, along with rules for updating the text track rendering. For example, for WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]

The list of cues of a text track can change dynamically, either because the text track has not yet been loaded or is still loading, or due to DOM manipulation.

Each text track has a corresponding TextTrack object.


Each media element has a list of pending text tracks, which must initially be empty, a blocked-on-parser flag, which must initially be false, and a did-perform-automatic-track-selection flag, which must also initially be false.

When the user agent is required to populate the list of pending text tracks of a media element, the user agent must add to the element's list of pending text tracks each text track in the element's list of text tracks whose text track mode is not disabled and whose text track readiness state is loading.

Whenever a track element's parent node changes, the user agent must remove the corresponding text track from any list of pending text tracks that it is in.

Whenever a text track's text track readiness state changes to either loaded or failed to load, the user agent must remove it from any list of pending text tracks that it is in.

When a media element is created by an HTML parser or XML parser, the user agent must set the element's blocked-on-parser flag to true. When a media element is popped off the stack of open elements of an HTML parser or XML parser, the user agent must honor user preferences for automatic text track selection, populate the list of pending text tracks, and set the element's blocked-on-parser flag to false.

The text tracks of a media element are ready when both the element's list of pending text tracks is empty and the element's blocked-on-parser flag is false.

Each media element has a pending text track change notification flag, which must initially be unset.

Whenever a text track that is in a media element's list of text tracks has its text track mode change value, the user agent must run the following steps for the media element:

  1. If the media element's pending text track change notification flag is set, return.

  2. Set the media element's pending text track change notification flag.

  3. Queue a task to run these steps:

    1. Unset the media element's pending text track change notification flag.

    2. Fire an event named change at the media element's textTracks attribute's TextTrackList object.

  4. If the media element's show poster flag is not set, run the time marches on steps.

The task source for the tasks listed in this section is the DOM manipulation task source.


A text track cue is the unit of time-sensitive data in a text track, corresponding for instance for subtitles and captions to the text that appears at a particular time and disappears at another time.

Each text track cue consists of:

An identifier

An arbitrary string.

A start time

The time, in seconds and fractions of a second, that describes the beginning of the range of the media data to which the cue applies.

An end time

The time, in seconds and fractions of a second, that describes the end of the range of the media data to which the cue applies.

A pause-on-exit flag

A boolean indicating whether playback of the media resource is to pause when the end of the range to which the cue applies is reached.

Some additional format-specific data

Additional fields, as needed for the format, including the actual data of the cue. For example, WebVTT has a text track cue writing direction and so forth. [WEBVTT]

The text track cue start time and text track cue end time can be negative. (The current playback position can never be negative, though, so cues entirely before time zero cannot be active.)

Each text track cue has a corresponding TextTrackCue object (or more specifically, an object that inherits from TextTrackCue — for example, WebVTT cues use the VTTCue interface). A text track cue's in-memory representation can be dynamically changed through this TextTrackCue API. [WEBVTT]

A text track cue is associated with rules for updating the text track rendering, as defined by the specification for the specific kind of text track cue. These rules are used specifically when the object representing the cue is added to a TextTrack object using the addCue() method.

In addition, each text track cue has two pieces of dynamic information:

The active flag

This flag must be initially unset. The flag is used to ensure events are fired appropriately when the cue becomes active or inactive, and to make sure the right cues are rendered.

The user agent must synchronously unset this flag whenever the text track cue is removed from its text track's text track list of cues; whenever the text track itself is removed from its media element's list of text tracks or has its text track mode changed to disabled; and whenever the media element's readyState is changed back to HAVE_NOTHING. When the flag is unset in this way for one or more cues in text tracks that were showing prior to the relevant incident, the user agent must, after having unset the flag for all the affected cues, apply the rules for updating the text track rendering of those text tracks. For example, for text tracks based on WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]

The display state

This is used as part of the rendering model, to keep cues in a consistent position. It must initially be empty. Whenever the text track cue active flag is unset, the user agent must empty the text track cue display state.

The text track cues of a media element's text tracks are ordered relative to each other in the text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first (so e.g. for cues from a WebVTT file, that would initially be the order in which the cues were listed in the file). [WEBVTT]

4.8.12.11.2 Sourcing in-band text tracks

A media-resource-specific text track is a text track that corresponds to data found in the media resource.

4.8.12.11.3 Text track API
media . textTracks . length

Returns the number of text tracks associated with the media element (e.g. from track elements). This is the number of text tracks in the media element's list of text tracks.

media . textTracks[ n ]

Returns the TextTrack object representing the nth text track in the media element's list of text tracks.

textTrack = media . textTracks . getTrackById( id )

Returns the TextTrack object with the given identifier, or null if no track has that identifier.


textTrack = media . addTextTrack( kind [, label [, language ] ] )

Creates and returns a new TextTrack object, which is also added to the media element's list of text tracks.

textTrack . kind

Returns the text track kind string.

textTrack . label

Returns the text track label, if there is one, or the empty string otherwise (indicating that a custom label probably needs to be generated from the other attributes of the object if the object is exposed to the user).

textTrack . language

Returns the text track language string.

textTrack . id

Returns the ID of the given track.

For in-band tracks, this is the ID that can be used with a fragment if the format supports media fragment syntax, and that can be used with the getTrackById() method.

For TextTrack objects corresponding to track elements, this is the ID of the track element.

textTrack . inBandMetadataTrackDispatchType

Returns the text track in-band metadata track dispatch type string.

textTrack . mode [ = value ]

Returns the text track mode, represented by a string from the following list:

"disabled"

The text track disabled mode.

"hidden"

The text track hidden mode.

"showing"

The text track showing mode.

Can be set, to change the mode.

textTrack . cues

Returns the text track list of cues, as a TextTrackCueList object.

textTrack . activeCues

Returns the text track cues from the text track list of cues that are currently active (i.e. that start before the current playback position and end after it), as a TextTrackCueList object.

textTrack . addCue( cue )

Adds the given cue to textTrack's text track list of cues.

textTrack . removeCue( cue )

Removes the given cue from textTrack's text track list of cues.

In this example, an audio element is used to play a specific sound-effect from a sound file containing many sound effects. A cue is used to pause the audio, so that it ends exactly at the end of the clip, even if the browser is busy running some script. If the page had relied on script to pause the audio, then the start of the next clip might be heard if the browser was not able to run the script at the exact time specified.

var sfx = new Audio('sfx.wav');
var sounds = sfx.addTextTrack('metadata');

// add sounds we care about
function addFX(start, end, name) {
  var cue = new VTTCue(start, end, '');
  cue.id = name;
  cue.pauseOnExit = true;
  sounds.addCue(cue);
}
addFX(12.783, 13.612, 'dog bark');
addFX(13.612, 15.091, 'kitten mew'))

function playSound(id) {
  sfx.currentTime = sounds.getCueById(id).startTime;
  sfx.play();
}

// play a bark as soon as we can
sfx.oncanplaythrough = function () {
  playSound('dog bark');
}
// meow when the user tries to leave,
// and have the browser ask them to stay
window.onbeforeunload = function (e) {
  playSound('kitten mew');
  e.preventDefault();
}

cuelist . length

Returns the number of cues in the list.

cuelist[index]

Returns the text track cue with index index in the list. The cues are sorted in text track cue order.

cuelist . getCueById( id )

Returns the first text track cue (in text track cue order) with text track cue identifier id.

Returns null if none of the cues have the given identifier or if the argument is the empty string.


cue . track

Returns the TextTrack object to which this text track cue belongs, if any, or null otherwise.

cue . id [ = value ]

Returns the text track cue identifier.

Can be set.

cue . startTime [ = value ]

Returns the text track cue start time, in seconds.

Can be set.

cue . endTime [ = value ]

Returns the text track cue end time, in seconds.

Can be set.

cue . pauseOnExit [ = value ]

Returns true if the text track cue pause-on-exit flag is set, false otherwise.

Can be set.

4.8.12.11.4 Best practices for metadata text tracks

Text tracks can be used for storing data relating to the media data, for interactive or augmented views.

For example, a page showing a sports broadcast could include information about the current score. Suppose a robotics competition was being streamed live. The image could be overlayed with the scores, as follows:

In order to make the score display render correctly whenever the user seeks to an arbitrary point in the video, the metadata text track cues need to be as long as is appropriate for the score. For example, in the frame above, there would be maybe one cue that lasts the length of the match that gives the match number, one cue that lasts until the blue alliance's score changes, and one cue that lasts until the red alliance's score changes. If the video is just a stream of the live event, the time in the bottom right would presumably be automatically derived from the current video time, rather than based on a cue. However, if the video was just the highlights, then that might be given in cues also.

The following shows what fragments of this could look like in a WebVTT file:

WEBVTT

...

05:10:00.000 --> 05:12:15.000
matchtype:qual
matchnumber:37

...

05:11:02.251 --> 05:11:17.198
red:78

05:11:03.672 --> 05:11:54.198
blue:66

05:11:17.198 --> 05:11:25.912
red:80

05:11:25.912 --> 05:11:26.522
red:83

05:11:26.522 --> 05:11:26.982
red:86

05:11:26.982 --> 05:11:27.499
red:89

...

The key here is to notice that the information is given in cues that span the length of time to which the relevant event applies. If, instead, the scores were given as zero-length (or very brief, nearly zero-length) cues when the score changes, for example saying "red+2" at 05:11:17.198, "red+3" at 05:11:25.912, etc, problems arise: primarily, seeking is much harder to implement, as the script has to walk the entire list of cues to make sure that no notifications have been missed; but also, if the cues are short it's possible the script will never see that they are active unless it listens to them specifically.

When using cues in this manner, authors are encouraged to use the cuechange event to update the current annotations. (In particular, using the timeupdate event would be less appropriate as it would require doing work even when the cues haven't changed, and, more importantly, would introduce a higher latency between when the metadata cues become active and when the display is updated, since timeupdate events are rate-limited.)

4.8.12.12 Identifying a track kind through a URL

Other specifications or formats that need a URL to identify the return values of the AudioTrack.kind or VideoTrack.kind IDL attributes, or identify the kind of text track, must use the about:html-kind URL.

4.8.12.13 User interface

The controls attribute is a boolean attribute. If present, it indicates that the author has not provided a scripted controller and would like the user agent to provide its own set of controls.

media . volume [ = value ]

Returns the current playback volume, as a number in the range 0.0 to 1.0, where 0.0 is the quietest and 1.0 the loudest.

Can be set, to change the volume.

Throws an "IndexSizeError" DOMException if the new value is not in the range 0.0 .. 1.0.

media . muted [ = value ]

Returns true if audio is muted, overriding the volume attribute, and false if the volume attribute is being honored.

Can be set, to change whether the audio is muted or not.

The muted content attribute on media elements is a boolean attribute that controls the default state of the audio output of the media resource, potentially overriding user preferences.

This attribute has no dynamic effect (it only controls the default state of the element).

This video (an advertisement) autoplays, but to avoid annoying users, it does so without sound, and allows the user to turn the sound on. The user agent can pause the video if it's unmuted without a user interaction.

<video src="adverts.cgi?kind=video" controls autoplay loop muted></video>
4.8.12.14 Time ranges

Objects implementing the TimeRanges interface represent a list of ranges (periods) of time.

media . length

Returns the number of ranges in the object.

time = media . start(index)

Returns the time for the start of the range with the given index.

Throws an "IndexSizeError" DOMException if the index is out of range.

time = media . end(index)

Returns the time for the end of the range with the given index.

Throws an "IndexSizeError" DOMException if the index is out of range.

4.8.12.15 The TrackEvent interface
event . track

Returns the track object (TextTrack, AudioTrack, or VideoTrack) to which the event relates.

4.8.12.16 Event summary

The following events fire on media elements as part of the processing model described above:

Event name Interface Fired when... Preconditions
loadstart Event The user agent begins looking for media data, as part of the resource selection algorithm. networkState equals NETWORK_LOADING
progress Event The user agent is fetching media data. networkState equals NETWORK_LOADING
suspend Event The user agent is intentionally not currently fetching media data. networkState equals NETWORK_IDLE
abort Event The user agent stops fetching the media data before it is completely downloaded, but not due to an error. error is an object with the code MEDIA_ERR_ABORTED. networkState equals either NETWORK_EMPTY or NETWORK_IDLE, depending on when the download was aborted.
error Event An error occurs while fetching the media data or the type of the resource is not supported media format. error is an object with the code MEDIA_ERR_NETWORK or higher. networkState equals either NETWORK_EMPTY or NETWORK_IDLE, depending on when the download was aborted.
emptied Event A media element whose networkState was previously not in the NETWORK_EMPTY state has just switched to that state (either because of a fatal error during load that's about to be reported, or because the load() method was invoked while the resource selection algorithm was already running). networkState is NETWORK_EMPTY; all the IDL attributes are in their initial states.
stalled Event The user agent is trying to fetch media data, but data is unexpectedly not forthcoming. networkState is NETWORK_LOADING.
loadedmetadata Event The user agent has just determined the duration and dimensions of the media resource and the text tracks are ready. readyState is newly equal to HAVE_METADATA or greater for the first time.
loadeddata Event The user agent can render the media data at the current playback position for the first time. readyState newly increased to HAVE_CURRENT_DATA or greater for the first time.
canplay Event The user agent can resume playback of the media data, but estimates that if playback were to be started now, the media resource could not be rendered at the current playback rate up to its end without having to stop for further buffering of content. readyState newly increased to HAVE_FUTURE_DATA or greater.
canplaythrough Event The user agent estimates that if playback were to be started now, the media resource could be rendered at the current playback rate all the way to its end without having to stop for further buffering. readyState is newly equal to HAVE_ENOUGH_DATA.
playing Event Playback is ready to start after having been paused or delayed due to lack of media data. readyState is newly equal to or greater than HAVE_FUTURE_DATA and paused is false, or paused is newly false and readyState is equal to or greater than HAVE_FUTURE_DATA. Even if this event fires, the element might still not be potentially playing, e.g. if the element is paused for user interaction or paused for in-band content.
waiting Event Playback has stopped because the next frame is not available, but the user agent expects that frame to become available in due course. readyState is equal to or less than HAVE_CURRENT_DATA, and paused is false. Either seeking is true, or the current playback position is not contained in any of the ranges in buffered. It is possible for playback to stop for other reasons without paused being false, but those reasons do not fire this event (and when those situations resolve, a separate playing event is not fired either): e.g., playback has ended, or playback stopped due to errors, or the element has paused for user interaction or paused for in-band content.
seeking Event The seeking IDL attribute changed to true, and the user agent has started seeking to a new position.
seeked Event The seeking IDL attribute changed to false after the current playback position was changed.
ended Event Playback has stopped because the end of the media resource was reached. currentTime equals the end of the media resource; ended is true.
durationchange Event The duration attribute has just been updated.
timeupdate Event The current playback position changed as part of normal playback or in an especially interesting way, for example discontinuously.
play Event The element is no longer paused. Fired after the play() method has returned, or when the autoplay attribute has caused playback to begin. paused is newly false.
pause Event The element has been paused. Fired after the pause() method has returned. paused is newly true.
ratechange Event Either the defaultPlaybackRate or the playbackRate attribute has just been updated.
resize Event One or both of the videoWidth and videoHeight attributes have just been updated. Media element is a video element; readyState is not HAVE_NOTHING
volumechange Event Either the volume attribute or the muted attribute has changed. Fired after the relevant attribute's setter has returned.

The following event fires on source element:

Event name Interface Fired when...
error Event An error occurs while fetching the media data or the type of the resource is not supported media format.

The following events fire on AudioTrackList, VideoTrackList, and TextTrackList objects:

Event name Interface Fired when...
change Event One or more tracks in the track list have been enabled or disabled.
addtrack TrackEvent A track has been added to the track list.
removetrack TrackEvent A track has been removed from the track list.

The following event fires on TextTrack objects and track elements:

Event name Interface Fired when...
cuechange Event One or more cues in the track have become active or stopped being active.

The following events fire on track elements:

Event name Interface Fired when...
error Event An error occurs while fetching the track data or the type of the resource is not supported text track format.
load Event A track data has been fetched and successfully processed.

The following events fire on TextTrackCue objects:

Event name Interface Fired when...
enter Event The cue has become active.
exit Event The cue has stopped being active.
4.8.12.17 Best practices for authors using media elements

Playing audio and video resources on small devices such as set-top boxes or mobile phones is often constrained by limited hardware resources in the device. For example, a device might only support three simultaneous videos. For this reason, it is a good practice to release resources held by media elements when they are done playing, either by being very careful about removing all references to the element and allowing it to be garbage collected, or, even better, by removing the element's src attribute and any source element descendants, and invoking the element's load() method.

Similarly, when the playback rate is not exactly 1.0, hardware, software, or format limitations can cause video frames to be dropped and audio to be choppy or muted.