|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.lizardtech.djvu.DjVuObject
com.lizardtech.djvu.text.DjVuText
public class DjVuText
This class implements annotations understood by the DjVu plugins and encoders. using: contents of TXT chunks. Contents of the FORM:TEXT should be passed to decode for parsing, which initializes this class and fills in the decoded data.
Description of the text contained in a DjVu page. This class contains the textual data for the page. It describes the text as a hierarchy of zones corresponding to page, column, region, paragraph, lines, words, etc... The piece of text associated with each zone is represented by an offset and a length describing a segment of a global UTF8 encoded byteArray.
Constants are used to tell what a zone describes. This can be useful for a copy/paste application. The deeper we go into the hierarchy, the higher the constant.
Nested Class Summary | |
---|---|
static class |
DjVuText.Zone
Data structure representing document textual components. |
Field Summary | |
---|---|
static int |
CHARACTER
Indicates a character zone. |
static int |
COLUMN
Indicates a column zone. |
static int |
end_of_column
VT: Vertical Tab |
static int |
end_of_line
LF: Line Feed |
static int |
end_of_paragraph
US: Unit Separator |
static int |
end_of_region
GS: Group Separator |
boolean |
isUTF8
True if UTF8 encoded. |
static int |
LINE
Indicates a line zone. |
static int |
PAGE
Indicates a page zone. |
DjVuText.Zone |
page_zone
Main zone in the document. |
static int |
PARAGRAPH
Indicates a paragraph zone. |
static int |
REGION
Indicates a region zone. |
protected byte[] |
textByteArray
Textual data for this page. |
static int |
WORD
Indicates a word zone. |
Fields inherited from class com.lizardtech.djvu.DjVuObject |
---|
hasReferences |
Constructor Summary | |
---|---|
DjVuText()
Creates a new DjVuText object. |
Method Summary | |
---|---|
static DjVuText |
createDjVuText(DjVuInterface ref)
Creates an instance of DjVuInfo with the options interherited from the specified reference. |
void |
decode(DataPool pool)
Decodes the hidden text layer TXT into internal representation. |
java.util.Vector |
find_text_with_rect(GRect box,
java.lang.StringBuffer text)
Find the text specified by the rectangles. |
java.util.Vector |
find_text_with_rect(GRect box,
java.lang.StringBuffer text,
int padding)
Find the text specified by the rectangles. |
void |
get_zones(int zone_type,
DjVuText.Zone parent,
java.util.Vector zone_list)
Get all zones of zone type zone_type under node parent. |
int |
getLength(int from,
int end)
Count the number of characters. |
java.lang.String |
getString(int start,
int end)
Query the string from the specified range of bytes. |
boolean |
has_valid_zones()
Tests whether there is a meaningful zone hierarchy. |
DjVuText |
init(DataPool pool)
Searches a file for TXTz and TXTa chunks and decodes each of them. |
DjVuText |
init(IFFInputStream iff)
Searches a file for TXTz and TXTa chunks and decodes each of them. |
boolean |
isImageData()
Query if this is image data. |
int |
length()
Get the number of bytes of hidden text. |
void |
normalize_text()
Normalize textual data. |
int |
search_string(java.util.Vector zone_list,
java.lang.String string,
int from,
boolean search_fwd,
boolean match_case)
Searches the TXT chunk for the given byteArray. |
int |
search_string(java.util.Vector zone_list,
java.lang.String string,
int from,
boolean search_fwd,
boolean match_case,
boolean whole_word)
Searches the TXT chunk for the given byteArray. |
void |
setTextByteArray(byte[] textByteArray)
Set the text data from an array of bytes. |
int |
startsWith(java.lang.String substring,
int from,
boolean match_case)
Returns end position of the first character in string beyond the the found string, if text contains the same words as the substring in the same order (but possibly with different number of separators between words). |
java.lang.String |
toString()
Query the entire text layer as a string |
Methods inherited from class com.lizardtech.djvu.DjVuObject |
---|
checkLockTime, create, create, createSoftReference, createWeakReference, getDjVuOptions, getFromReference, invoke, setDjVuOptions |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface com.lizardtech.djvu.DjVuInterface |
---|
getDjVuOptions, setDjVuOptions |
Field Detail |
---|
public static final int PAGE
public static final int COLUMN
public static final int REGION
public static final int PARAGRAPH
public static final int LINE
public static final int WORD
public static final int CHARACTER
public static final int end_of_column
public static final int end_of_region
public static final int end_of_paragraph
public static final int end_of_line
public DjVuText.Zone page_zone
public boolean isUTF8
protected byte[] textByteArray
Name | Octal | Ascii name |
---|---|---|
DjVuText.end_of_column | 013 | VT, Vertical Tab |
DjVuText.end_of_region | 035 | GS, Group Separator |
DjVuText.end_of_paragraph | 037 | US, Unit Separator |
DjVuText.end_of_line | 012 | LF: Line Feed |
Constructor Detail |
---|
public DjVuText()
Method Detail |
---|
public static DjVuText createDjVuText(DjVuInterface ref)
ref
- Object to interherit DjVuOptions from.
public boolean isImageData()
isImageData
in interface Codec
public int getLength(int from, int end)
from
- byte position to start counting fromend
- byte position to stop counting
public java.lang.String getString(int start, int end)
start
- byte position of the first character.end
- byte position to end the string
public void setTextByteArray(byte[] textByteArray)
textByteArray
- array of bytes to interpretpublic void decode(DataPool pool) throws java.io.IOException
decode
in interface Codec
pool
- The chunk to decode.
java.io.IOException
- if an error occures.public java.util.Vector find_text_with_rect(GRect box, java.lang.StringBuffer text, int padding)
box
- bounding box to searchtext
- buffer to fill with the text foundpadding
- number of pixels to add to each rectangle
public java.util.Vector find_text_with_rect(GRect box, java.lang.StringBuffer text)
box
- bounding box to searchtext
- buffer to fill with the text found
public void get_zones(int zone_type, DjVuText.Zone parent, java.util.Vector zone_list)
zone_type
- the zone type to list.parent
- parent zone to start fromzone_list
- vector to add the zones topublic boolean has_valid_zones()
public DjVuText init(IFFInputStream iff) throws java.io.IOException
iff
- input stream to read.
java.io.IOException
- if an IO error occures.public DjVuText init(DataPool pool) throws java.io.IOException
pool
- input stream to read.
java.io.IOException
- if an IO error occures.public int length()
public void normalize_text()
public int search_string(java.util.Vector zone_list, java.lang.String string, int from, boolean search_fwd, boolean match_case, boolean whole_word)
zone_list
- A list of smallest zones covering the text.string
- String to be found. May contain spaces as word separators.from
- Position returned by last search. If from is out of bounds
of textByteArray it will be set to -1 for searching forward and
textByteArray.length for searching backwards.search_fwd
- TRUE means to search forward. FALSE - backward.match_case
- If set to FALSE the search will be case-insensitive.whole_word
- If set to TRUE the function will try to find a whole
word matching the passed string. The word separators are all
blank and punctuation characters. The passed string may
not contain word separators, that is it must be a
whole word.
java.lang.IllegalArgumentException
- if no none-white spaces are specified in the search stringpublic int search_string(java.util.Vector zone_list, java.lang.String string, int from, boolean search_fwd, boolean match_case)
zone_list
- A list of smallest zones covering the text.string
- String to be found. May contain spaces as word separators.from
- Position returned by last search. If from is out of bounds
of textByteArray it will be set to -1 for searching forward and
textByteArray.length for searching backwards.search_fwd
- TRUE means to search forward. FALSE - backward.match_case
- If set to FALSE the search will be case-insensitive.
java.lang.IllegalArgumentException
- if no none-white spaces are specified in the search stringpublic int startsWith(java.lang.String substring, int from, boolean match_case)
substring
- string to search forfrom
- start positionmatch_case
- true if case sensative
public java.lang.String toString()
toString
in class java.lang.Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |