User agent string

From Wiki @ Karl Jones dot com
Jump to: navigation, search

A user agent string is a string of text which identifies a user agent.

Description

When a software agent operates in a network protocol, it often identifies itself, its application type, operating system, software vendor, or software revision, by submitting a characteristic identification string to its operating peer.

In HTTP, SIP, and SMTP/NNTP protocols, this identification is transmitted in a header field User-Agent. Bots, such as Web crawlers, often also include a URL and/or e-mail address so that the Webmaster can contact the operator of the bot.

In HTTP, the User-Agent string is often used for content negotiation, where the origin server selects suitable content or operating parameters for the response.

For example, the User-Agent string might be used by a web server to choose variants based on the known capabilities of a particular version of client software.

The User-Agent string is one of the criteria by which Web crawlers may be excluded from accessing certain parts of a Web site using the Robots Exclusion Standard (robots.txt file).

As with many other HTTP request headers, the information in the "User-Agent" string contributes to the information that the client sends to the server, since the string can vary considerably from user to user.

Format for human-operated web browsers

The User-Agent string format is currently specified by Section 5.5.3 of HTTP/1.1 Semantics and Content. The format of the User-Agent string in HTTP is a list of product tokens (keywords) with optional comments. For example if a user's product were called WikiBrowser, their user agent string might be WikiBrowser/1.0 Gecko/1.0. The "most important" product component is listed first. The parts of this string are as follows:

Product name and version (WikiBrowser/1.0)

Layout engine and version (Gecko/1.0)

During the first browser war, many web servers were configured to only send web pages that required advanced features, including frames, to clients that were identified as some version of Mozilla.[5] Other browsers were considered to be older products such as Mosaic, Cello or Samba and would be sent a bare bones HTML document.

For this reason, most Web browsers use a User-Agent value as follows: Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]. For example, Safari on the iPad has used the following:

Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405

The components of this string are as follows:

Mozilla/5.0: Previously used to indicate compatibility with the Mozilla rendering engine
(iPad; U; CPU OS 3_2_1 like Mac OS X; en-us): Details of the system in which the browser is running
AppleWebKit/531.21.10: The platform the browser uses
(KHTML, like Gecko): Browser platform details
Mobile/7B405: This is used by the browser to indicate specific enhancements that are available directly in the browser or through third parties. An example of this is Microsoft Live Meeting which registers an extension so that the Live Meeting service knows if the software is already installed, which means it can provide a streamlined experience to joining meetings.

Before migrating to the Chromium code base, Opera was the most prolific web browser to not begin its User-Agent string with "Mozilla" (instead beginning it with "Opera"). As of July 15, 2013, Opera's User-Agent string begins with "Mozilla/5.0" and, to avoid encountering legacy server rules, no longer includes the word "Opera" (instead using the string "OPR" to denote the Opera version).

Format for automated agents (bots)

Automated web crawling tools can use a simplified form, where an important field is contact information in case of problems. By convention the word "bot" is included in the name of the agent. For example:

Googlebot/2.1 (+http://www.google.com/bot.html)

robots.txt

Automated agents are expected to follow rules in a special file called "robots.txt".

See also

External links