Media Types [rescheduled lecture]

Web Architecture [./]
Fall 2011 — INFO 253 (CCN 42598)

Dilan Mahendran, UC Berkeley School of Information
2011-11-21

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents D. Mahendran: Media Types [rescheduled lecture]

Contents

D. Mahendran: Media Types [rescheduled lecture]

(2) Abstract

One of the most important aspect of computer-based communications is the concept of media types, the question what type of information some digital artifact represents, and how it is encoded. The most common standard for this information is the scheme introduced by Multipurpose Internet Mail Extensions (MIME). Media types can be negotiated by peers communicating through HTTP. Some media types allow fragment identifiers, which allow references to a resource to identify a fragment of the complete resource.



D. Mahendran: Media Types [rescheduled lecture]

(3) Multipurpose Internet Mail Extensions (MIME)



D. Mahendran: Media Types [rescheduled lecture]

(4) Unix File Type Handling

image/bmp   bmp
image/cgm
image/g3fax
image/gif   gif
image/ief   ief
image/jpeg   jpeg jpg jpe
image/naplps
image/png   png
image/prs.btif
image/prs.pti
image/tiff   tiff tif
0 string  P1  image/x-portable-bitmap
0 string  P2  image/x-portable-graymap
0 string  P3  image/x-portable-pixmap
0 string  P4  image/x-portable-bitmap
0 string  P5  image/x-portable-graymap
0 string  P6  image/x-portable-pixmap
0 string  IIN1  image/tiff
0 string  MM\x00\x2a image/tiff
0 string  II\x2a\x00 image/tiff
0 string  \x89PNG  image/x-png
1 string  PNG  image/x-png
0 string  GIF8  image/gif


D. Mahendran: Media Types [rescheduled lecture]

(5) Windows File Type Handling

Windows File Type Handling

Media Types and the Web

Outline (Media Types and the Web)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Media Types and the Web D. Mahendran: Media Types [rescheduled lecture]

(7) Browsers and Resources



Media Types and the Web D. Mahendran: Media Types [rescheduled lecture]

(8) Firefox Media Type Handling

Controlling Media Type Handling in Firefox

Media Types and the Web D. Mahendran: Media Types [rescheduled lecture]

(9) Media Type Control in Browsers



Media Types

Outline (Media Types)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Media Types D. Mahendran: Media Types [rescheduled lecture]

(11) Content Types



Media Types D. Mahendran: Media Types [rescheduled lecture]

(12) Subtypes



Media Types D. Mahendran: Media Types [rescheduled lecture]

(13) What is XML?



Media Types D. Mahendran: Media Types [rescheduled lecture]

(14) Media Type Registration



Media Types D. Mahendran: Media Types [rescheduled lecture]

(15) application/msword Media Type


SECURITY CONSIDERATIONS:
None known.


PUBLISHED SPECIFICATION:

Specification by example:

   From any microsoft word application select "Save As..." from the
   "File" menu.  Enter a filename, make sure that "Normal" is specified
   for the file type, and click "Save".

Company Contact:

   Microsoft Inc.

   16011 NE 36th Way
   Box 97017
   Redmond WA, 98073-9717


Text Content Types

Outline (Text Content Types)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Text Content Types D. Mahendran: Media Types [rescheduled lecture]

(17) Plain Text

  • RFC 2046 [http://tools.ietf.org/html/rfc2046] defines plain text files as a basic media type
    • any text file that does not contain structures which are intended for machine-based processing
    • even Comma-Separated Values (CSV) [Comma-Separated Values (CSV) (1)] files do not count as plain text
  • Guessing of character encoding is hard and unreliable and should be avoided
    • the character encoding can be specified with an additional parameter: text/plain; charset=iso-8859-1
    • if no such parameter is present, ASCII should be assumed as the character encoding
  • For more specific text subtypes, various other subtypes exist [http://www.iana.org/assignments/media-types/text/]
    • calendar for information about calendar entries
    • javascript for JavaScript code (should now be marked as application/javascript)
    • sgml and xml for text with additional markup


Text Content Types D. Mahendran: Media Types [rescheduled lecture]

(18) HTML

  • RFC 2854 [http://tools.ietf.org/html/rfc2854] registers text/html for HTML documents
    • like Plain Text [Plain Text (1)] the character encoding can also be specified as a parameter
    • it is not specific for some version of HTML (version information can be found in the HTML document)
  • HTML Fragment Identifiers [HTML Fragment Identifiers (1)] are also defined by the media type registration
  • HTML in many cases needs additional resources to be self-contained
    • images which are references by img elements (maybe external image maps)
    • other media referenced by object or applet (or the deprecated embed)
    • stylesheets or scripts which are referenced in the document head (they may reference other files …)
    • generating a truly self-contained HTML is a rather hard task
  • MIME can be used to represent a self-contained MHTML [http://dret.net/glossary/mhtml] (RFC 2557 [http://tools.ietf.org/html/rfc2557])


Text Content Types D. Mahendran: Media Types [rescheduled lecture]

(19) Comma-Separated Values (CSV)

  • RFC 4180 [http://tools.ietf.org/html/rfc4180] defines a textual format for spreadsheet data
  • CSV has been used for a long time, but some of the details were solved differently
  • Defining a media type makes it easier for implementations to know what to expect
    • the registration not only registers the type, but also defines it
  • CSV is not overly complex, but some issues have to be solved
    • how to separate lines (CRLF)
    • how to end the file (CRLF is allowed but optional)
    • are there headers allowed (yes, but they are not marked as such)
    • may different lines use different numbers of fields (no)
    • are spaces significant (yes)
    • are quotes significant (no, they are delimiters, so quotes as values must be escaped)
    • how to treat fields with CRLF, commas, or quotes (enclose the value in quotes)


Application Content Types

Outline (Application Content Types)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Application Content Types D. Mahendran: Media Types [rescheduled lecture]

(21) JSON

  • RFC 4627 [http://tools.ietf.org/html/rfc4627] registers JSON as a media type
  • The definition of JSON is derived from ECMAScript's object literals
  • JSON is a very limited notation intended for simple structures
    • it allows the four primitive types strings, numbers, booleans, and null
    • it allows the two structured types objects (unordered) and arrays (ordered)
  • The value of the media type in this case is the clean integration into the Web
    • information providers may choose to expose their data in JSON and XML
    • [@http-conneg] can be used to specify which representation is preferred


Image Content Types

Outline (Image Content Types)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Image Content Types D. Mahendran: Media Types [rescheduled lecture]

(23) Graphic Interchange Format (GIF)

  • RFC 2046 [http://tools.ietf.org/html/rfc2046] registers the oldest graphics format on the Web
  • GIF was subject of a long patent debate
    • the compression technique of GIF (LZW [http://en.wikipedia.org/wiki/Lzw]) had been patented by Unisys (1983)
    • Unisys wanted to get licensing fees from all commercial online uses of GIF
    • Portable Network Graphics (PNG) [Portable Network Graphics (PNG) (1)] was developed as an effort to develop a copyright-free format
    • in 1999, Unisys changed its tactics and wanted to collect one-time fees ($5000-$7500) from all users
    • all GIF-related LZW expired in 2003/2004, so GIF is freely available now
  • GIF's poor features make PNG the better choice anyway
    • 8 bit color (requires dithering for photographs), binary transparency
    • GIF's animation feature is the only thing that is not available in PNG … running-wolf.gif


Image Content Types D. Mahendran: Media Types [rescheduled lecture]

(24) Joint Photographic Experts Group (JPEG)

  • RFC 2046 [http://tools.ietf.org/html/rfc2046] standardizes the second popular image format for the Web
  • JPEG has been specifically designed for photographs
    • it always is lossy (it cannot preserve the complete information from a random bitmap)
    • it uses perception-based compression (for example, color precision is sacrificed for brightness)
Average Quality JPEG Low Quality JPEG Lowest Quality JPEG
Q = 50, filesize 15,138 bytes Q = 10, filesize 4,787 bytes Q = 1, filesize 1,523 bytes


Image Content Types D. Mahendran: Media Types [rescheduled lecture]

(25) Portable Network Graphics (PNG)

png-transparency.png
  • PNG is registered as image/png and is the third major image format
    • PNG was intended to be a royalty- and copyright-free replacement of GIF [Graphic Interchange Format (GIF) (1)]
    • image formats need to supported by browsers and thus take a long time until they are established
    • IE6 implements PNG in a very rudimentary form, IE7 handles PNG correctly
  • PNG has some advantages over GIF and JPEG
    • lossless, compressed palette, grayscale, or true color images
    • 8 bit alpha channel for gradual opacity (blending into the background)
  • JPEG still is the preferred format for photographic pictures
  • GIF still is the preferred format for animated images
    • MNG [http://en.wikipedia.org/wiki/Mng] and APNG [http://en.wikipedia.org/wiki/Apng] are two available but not widely supported PNG animation formats


Fragment Identifiers

Outline (Fragment Identifiers)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Fragment Identifiers D. Mahendran: Media Types [rescheduled lecture]

(27) Identification of Resource Fragments



Fragment Identifiers D. Mahendran: Media Types [rescheduled lecture]

(28) HTML Fragment Identifiers



Fragment Identifiers D. Mahendran: Media Types [rescheduled lecture]

(29) XML Fragment Identifiers



Conclusions

Outline (Conclusions)

  1. Media Types and the Web [3]
  2. Media Types [12]
    1. Text Content Types [3]
    2. Application Content Types [1]
    3. Image Content Types [3]
  3. Fragment Identifiers [3]
  4. Conclusions [1]
Conclusions D. Mahendran: Media Types [rescheduled lecture]

(31) Know Your Resources



2011-11-21 Web Architecture [./]
Fall 2011 — INFO 253 (CCN 42598)