MetaParser Class

com.bea.p13n.util
MetaParser Class

public class MetaParser

    extends Object

A utility which can pull META tags from an HTML file.

This will also pull the title of an HTML from the <title></title> section anywhere in the document. No matter what the casing of the title tag, it will be put in the metadata properties as "title".


Hierarchy
Object
  MetaParser

Method Summary

public static final String
determineEncoding(File f, String encoding)
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
public static final String
determineEncoding(BufferedReader reader)
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
public static Properties
load(File f, Properties p)
Load the META tag name/value pairs from f into p.
public static Properties
load(File f, Properties p, String enc)
Load the META tag name/value pairs from f into p.
public static Properties
load(BufferedReader reader, Properties p)
Load the META tag name/value pairs from the input stream into p.
public static final BufferedReader
open(File f, String encoding)
Open a file with the given encoding.
 
Methods from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
   

Method Detail

determineEncoding(File, String) Method

public static final String determineEncoding(File f, 
                                             String encoding)
throws IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

If an encoding is passed in and the files doesn't contain an appropriate META tag, that encoding will be returned.

Parameters

f
the input file
encoding
the encoding to open the file with (null for default).

Returns

the encoding to use, null for unknown.

Exceptions

IOException
thrown on an error reading the file.

determineEncoding(BufferedReader) Method

public static final String determineEncoding(BufferedReader reader)
throws IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

Parameters

reader
the input stream

Returns

the encoding to use, null for unknown.

Exceptions

IOException
thrown on an error reading the file.

load(File, Properties) Method

public static Properties load(File f, 
                              Properties p)
throws IOException
Load the META tag name/value pairs from f into p.

Parameters

f
the file.
p
the properties object (null to create new).

Returns

the META tag name/values (p if p was not null).

Exceptions

IOException
thrown on an error reading the file.

load(File, Properties, String) Method

public static Properties load(File f, 
                              Properties p, 
                              String enc)
throws IOException
Load the META tag name/value pairs from f into p.

This will look for the encoding name to use by trying to find a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > tag in the HTML (this will use the passed in encoding to find the encoding). If a valid encoding name is found, the file will be opened with that encoding, and the "encoding" property in the Properties will be set. If a valid encoding is not found and an encoding was passed in, that encoding will be used. If an encoding was not passed in, the system default will be used.

Parameters

f
the file.
p
the properties object (null to create new).
enc
the file encoding name to try (null for system default).

Returns

the META tag name/values (p if p was not null).

Exceptions

IOException
thrown on an error reading the file.

load(BufferedReader, Properties) Method

public static Properties load(BufferedReader reader, 
                              Properties p)
throws IOException
Load the META tag name/value pairs from the input stream into p.

This operates on a last-seen-is-returned alogirithm for META tags with the same name. It will also find all meta tags in file, not just those in the head.

Parameters

reader
the input reader.
p
the properties object (null to create new).

Returns

the META tag name/values (p if p was not null).

Exceptions

IOException
thrown on an error reading the file.

open(File, String) Method

public static final BufferedReader open(File f, 
                                        String encoding)
throws IOException
Open a file with the given encoding.

Parameters

f
a file object.
encoding
the encoding to use, null for default.

Exceptions

IOException
if invalid encoding or unable to open file.