org.w3c.tidy
Class EncodingUtils

java.lang.Object
  extended by org.w3c.tidy.EncodingUtils

public final class EncodingUtils
extends java.lang.Object

Version:
$Revision: 779 $ ($Author: fgiust $)
Author:
Fabrizio Giustina

Field Summary
static int FSM_ASCII
          states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets.
static int FSM_ESC
          state ESC.
static int FSM_ESCD
          state ESCD.
static int FSM_ESCDP
          state ESCDP.
static int FSM_ESCP
          state ESCP.
static int FSM_NONASCII
          state NONASCII.
static int HIGH_UTF16_SURROGATE
          UTF-16 high surrogate.
static int LOW_UTF16_SURROGATE
          utf16 low surrogate.
static int MAX_UTF16_FROM_UCS4
          Max UTF-16 value.
static int MAX_UTF8_FROM_UCS4
          Max UTF-88 valid char value.
static int UNICODE_BOM
          the default (big-endian) UNICODE BOM.
static int UNICODE_BOM_BE
          the big-endian (default) UNICODE BOM.
static int UNICODE_BOM_LE
          the little-endian UNICODE BOM.
static int UNICODE_BOM_UTF8
          the UTF-8 UNICODE BOM.
static int UTF16_HIGH_SURROGATE_BEGIN
          UTF-16 surrogate pair areas: high surrogates begin.
static int UTF16_HIGH_SURROGATE_END
          UTF-16 surrogate pair areas: high surrogates end.
static int UTF16_LOW_SURROGATE_BEGIN
          UTF-16 surrogate pair areas: low surrogates begin.
static int UTF16_LOW_SURROGATE_END
          UTF-16 surrogate pair areas: low surrogates end.
static int UTF16_SURROGATES_BEGIN
          UTF-16 surrogates begin.
 
Method Summary
protected static int decodeMacRoman(int c)
          Function to convert from MacRoman to Unicode.
protected static int decodeWin1252(int c)
          Function for conversion from Windows-1252 to Unicode.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNICODE_BOM_BE

public static final int UNICODE_BOM_BE
the big-endian (default) UNICODE BOM.

See Also:
Constant Field Values

UNICODE_BOM

public static final int UNICODE_BOM
the default (big-endian) UNICODE BOM.

See Also:
Constant Field Values

UNICODE_BOM_LE

public static final int UNICODE_BOM_LE
the little-endian UNICODE BOM.

See Also:
Constant Field Values

UNICODE_BOM_UTF8

public static final int UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM.

See Also:
Constant Field Values

FSM_ASCII

public static final int FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" + "$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.

See Also:
Constant Field Values

FSM_ESC

public static final int FSM_ESC
state ESC.

See Also:
Constant Field Values

FSM_ESCD

public static final int FSM_ESCD
state ESCD.

See Also:
Constant Field Values

FSM_ESCDP

public static final int FSM_ESCDP
state ESCDP.

See Also:
Constant Field Values

FSM_ESCP

public static final int FSM_ESCP
state ESCP.

See Also:
Constant Field Values

FSM_NONASCII

public static final int FSM_NONASCII
state NONASCII.

See Also:
Constant Field Values

MAX_UTF8_FROM_UCS4

public static final int MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value.

See Also:
Constant Field Values

MAX_UTF16_FROM_UCS4

public static final int MAX_UTF16_FROM_UCS4
Max UTF-16 value.

See Also:
Constant Field Values

LOW_UTF16_SURROGATE

public static final int LOW_UTF16_SURROGATE
utf16 low surrogate.

See Also:
Constant Field Values

UTF16_SURROGATES_BEGIN

public static final int UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin.

See Also:
Constant Field Values

UTF16_LOW_SURROGATE_BEGIN

public static final int UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin.

See Also:
Constant Field Values

UTF16_LOW_SURROGATE_END

public static final int UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end.

See Also:
Constant Field Values

UTF16_HIGH_SURROGATE_BEGIN

public static final int UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin.

See Also:
Constant Field Values

UTF16_HIGH_SURROGATE_END

public static final int UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end.

See Also:
Constant Field Values

HIGH_UTF16_SURROGATE

public static final int HIGH_UTF16_SURROGATE
UTF-16 high surrogate.

See Also:
Constant Field Values
Method Detail

decodeWin1252

protected static int decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode.

Parameters:
c - char to decode
Returns:
decoded char

decodeMacRoman

protected static int decodeMacRoman(int c)
Function to convert from MacRoman to Unicode.

Parameters:
c - char to decode
Returns:
decoded char


Copyright © 2000-2006 sourceforge. All Rights Reserved.