The URI module contains functions and modules to specify and convert URIs. (URLs are a type of URI.) In addition to the URL module itself, there are also: URI::URL, URI::Escape, and URI::Heuristic. Of primary importance to many LWP applications is the URI::URL class, which creates the objects used by LWP::UserAgent to determine protocols, server locations, and resource names.
The URI::Escape module replaces unsafe characters in URL strings with their appropriate escape sequences. URI::Heuristic provides convenience methods for creating proper URLs out of short strings and incomplete addresses.
The URI module is a successor to URI::URL and was written by Gisle Aas. While not clearly stated in the LWP documentation, you should use the URI module whenever possible, since URI.pm has essentially deprecated URI::URL.
The URI module implements the URI class. Objects created from the URI class represent Uniform Resource Identifiers (URIs). With the URI module, you can identify the key parts of a URI: scheme, scheme-specific parts, and fragment identifiers, which may be referred to respectfully as authority, path, and query components. For example, as shown in the URI module documentation:
<scheme>:<scheme-specific-part>#<fragment> <scheme>://<authority><path>?<query>#<fragment> <path>?<query>#<fragment>
You can break down http://www.oreilly.com/somefile.html as:
scheme: http authority: www.oreilly.com path: /somefile.html
In the case of relative URIs, you can use the URI module to deal with only the query component of a URI. With the URI module, you can parse the above URI as follows:
#!/usr/local/bin/perl -w use URI; my $url = 'http://www.oreilly.com/somefile.html'; my $u1 = URI->new($url); print "scheme: ", $u1->scheme, "\n"; print "authority: ", $u1->authority, "\n"; print "path: ", $u1->path, "\n";
The following methods give you access to components of a URI. These methods will return a string, unless the URI component is invalid, in which case undef is returned. Bear in mind that an empty string ("") is not equivalent to an undefined value.
new |
new($uri, [$scheme])
Constructor. $uri is given as an argument with the optional $scheme. new removes additional whitespace, double quotes, and arrows from the URL. $scheme is used only when $str is a relative URI; it is a simple string that denotes the scheme or an absolute URI object. $str will be treated like a generic URI if $scheme isn't defined.
new |
URI::file->new($file, [$os])
Constructs a new file URI from a filename.
new_abs |
URI::file->new_abs($file, [$os])
Constructs a new absolute file URI from a filename.
abs |
abs($base_uri)
Returns an absolute URI reference. If $uri is already absolute, then a reference to $uri is returned. abs returns a new absolute URI that contains $uri and $base_uri if $uri is relative.
as_string |
as_string
Returns a URI object as a plain string.
authority |
authority([$auth])
Sets and gets the authority component of the $uri. This component will be escaped.
canonical |
canonical
Returns a normalized version of the URI. This includes lowercasing the scheme and hostname components, as well as removing an explicit port specification (if it mtaches the default port). canonical will return the original $uri if $uri was already in the correct form.
clone |
clone
Returns a copy of the URI.
cwd |
URI::file->cwd
Returns the current working directory as a file URI.
default_port |
default_port()
Returns the default port of the URI scheme that $uri belongs to. You cannot change the default port for a scheme.
eq |
eq()
Compares two URIs.
fragment |
fragment([$new_frag])
Returns the fragment identifier of a URI reference as an escaped string.
host |
host([$new_host])
Sets and gets the unescaped hostname. To specify a different port:
$new_host = "hostname:port_number"
host_port |
host_port([$new_host_port])
Sets and gets the host and port as a single unit. Hostname and port are colon-separated.
new_abs |
new_abs($str, $base_uri)
Creates a new absolute URI object. $str represents the absolute URI, and $base_uri represents the relative URI.
opaque |
opaque([$new_opaque_value])
Sets and gets the scheme-specific part of $uri.
path |
path([$path])
Sets and gets the escaped path component of $uri. Returns empty string ("") if there is no path.
path |
path([$new_path])
Gets and sets the same value as opaque, unless the URI supports the generic syntax for heirarchical namespaces. path returns the part of the URI between the hostname and the fragment.
path_query |
path_query([$path_here])
Sets and gets the escapted path and query components.
path_segments |
path_segments([$seg])
Sets and gets the path. In a scalar content, path_segments is equivalent to path. In a list contents, path_segments returns the unescaped path segments that make up the path.
port |
port([$new_port])
Sets and gets the port, which is an integer. If $new_port is not defined, then the default port of the URI scheme will be returned.
query |
query([$q])
Sets and gets the escaped query component of $ uri .
query_form |
query_form([$key => $val])
Sets and gets query components that use the urlencoded format.
query_keywords |
query_keywords([$keywords])
Sets and gets query components that use keywords separated by a +.
rel |
rel($base_uri)
Returns a relative URI reference, if one exists. Otherwise, $uri is returned.
scheme |
scheme([$some_scheme])
Sets and gets the scheme part of the URI. Such values include: data, file, ftp, gopher, http, https, ldap, mailto, news, nntp, pop, rlogin, rsync, snews, telnet, and ssh. In the case of relative URIs, scheme will return undef; otherwise, scheme will return the scheme in lowercase. With $some_scheme, scheme will set the scheme of the current URI. scheme will die if the scheme isn't supported, or if it contains non-US-ASCII characters.
userinfo |
userinfo([$new_userinfo])
Sets and gets the escaped "userinfo" part of the authority component (of the URI). Often, the userinfo will appear as a username and password separated by a colon. Bear in mind that sending a password in the clear is a bad idea.
This module escapes or unescapes "unsafe" characters within a URL string. Unsafe characters in URLs are described by RFC 1738. Before you form URI::URL objects and use that class's methods, you should make sure your strings are properly escaped. This module does not create its own objects; it exports the following functions:
uri_escape |
uri_escape uri, [regexp]
Given a URI as the first parameter, returns the equivalent URI with certain characters replaced with % followed by two hexadecimal digits. The first parameter can be a text string, such as "http:www.oreilly.com", or an object of type URI::URL. When invoked without a second parameter, uri_escape escapes characters specified by RFC 1738. Otherwise, you can pass in a regular expression (in the context of [ ]) of characters to escape as the second parameter. For example:
$escaped_uri = uri_escape($uri, 'aeiou')
This code escapes all lowercase vowels in $uri and returns the escaped version.
This module creates URL objects that store all the elements of a URL. These objects are used by the request method of LWP::UserAgent for server addresses, port numbers, filenames, protocol, and many other elements that can be loaded into a URL.
The new constructor is used to make a URI::URL object:
$url = URI::URL->new($url_string [, $base_url])
This method creates a new URI::URL object with the URL given as the first parameter. An optional base URL can be specified as the second parameter and is useful for generating an absolute URL from a relative URL.
The following methods are for the URI::URL class.
abs |
$url->abs([base, [scheme]])
Returns the absolute URL, given a base. If invoked with no parameters, any previous definition of the base is used. The second parameter is a Boolean that modifies abs's behavior. When the second parameter is nonzero, abs will accept a relative URL with a scheme but no host, such as http:index.html.
as_string |
$url->as_string( )
Returns the URL as a scalar string. All defined components of the URL are included in the string.
base |
$url->base([base])
Gets or sets the base URL associated with the URL in this URI::URL object. The base URL is useful for converting a relative URL into an absolute URL.
crack |
$url->crack( )
Returns an array with the following data: (scheme, user, password, host, port, epath, eparams, equery, frag).
default_port |
$url->default_port([port])
When invoked with no parameters, this method returns the default port for the URL defined in the object. The default port is based on the scheme used. Even if the port for the URL is explicitly changed by the user with the port method, the default port is always the same.
eparams |
$url->eparams([param])
When invoked with no arguments, this method returns the escaped parameter of the URL defined in the object. When invoked with an argument, the object's escaped parameter is assigned to that value.
epath |
$url->epath( )
When invoked with no parameters, this method returns the escaped path of the URL defined in the object. When invoked with a parameter, the object's escaped path is assigned to that value.
equery |
$url->equery([string])
When invoked with no arguments, this method returns the escaped query string of the URL defined in the object. When invoked with an argument, the object's escaped query string is assigned to that value.
frag |
$url->frag([frag])
When invoked with no arguments, this method returns the fragment of the URL defined in the object. When invoked with an argument, the object's fragment is assigned to that value.
full_path |
$url->full_path( )
Returns a string consisting of the escaped path, escaped parameters, and escaped query string.
host |
$url->host([hostname])
When invoked with no parameters, this method returns the hostname in the URL defined in the object. When invoked with a parameter, the object's hostname is assigned to that value.
netloc |
$url->netloc([netloc])
When invoked with no parameters, this method returns the network location for the URL defined in the object. The network location is a string composed of "user:password@host:port", in which the user, password, and port may be omitted when not defined. When netloc is invoked with a parameter, the object's network location is defined to that value. Changes to the network location are reflected in the user, password, host, and port methods.
params |
$url->params([param])
Same as eparams, except that the parameter that is set/returned is not escaped.
password |
$url->password([passwd])
When invoked with no parameters, this method returns the password in the URL defined in the object. When invoked with a parameter, the object's password is assigned to that value.
path |
$url->path([pathname])
Same as epath, except that the path that is set/returned is not escaped.
port |
$url->port([port])
When invoked with no parameters, this method returns the port for the URL defined in the object. If a port wasn't explicitly defined in the URL, a default port is assumed. When invoked with a parameter, the object's port is assigned to that value.
query |
$url->query([param])
Same as equery, except that the parameter that is set/returned is not escaped.
rel |
$url->rel(base)
Given a base as a first parameter or a previous definition of the base, returns the current object's URL relative to the base URL.
scheme |
$url->scheme([scheme])
When invoked with no parameters, this method returns the scheme in the URL defined in the object. When invoked with a parameter, the object's scheme is assigned to that value.
strict |
URI::URL::strict(bool)
When set, the URI::URL module calls croak upon encountering an error. When disabled, the URI::URL module may behave more gracefully. The function returns the previous value of strict. This function is not exported explicitly by the module.
Copyright © 2002 O'Reilly & Associates. All rights reserved.