All Packages Class Hierarchy This Package Previous Next Index
Class Acme.HtmlScanner
java.lang.Object
|
+----java.io.InputStream
|
+----java.io.FilterInputStream
|
+----Acme.HtmlScanner
- public class HtmlScanner
- extends FilterInputStream
A fast HTML scanning class.
This is a FilterInputStream that lets you read an HTML file, and at
the same time scans it for URLs. You get the full text of the file
through the normal read() calls, and you also get special callbacks
with the URL strings.
The scanning is done by a hand-built finite-state machine.
Fetch the software.
Fetch the entire Acme package.
-
gettingUrl
- Whether the interpreter is currently accumulating a URL.
-
HtmlScanner(InputStream, URL, HtmlObserver)
- Constructor.
-
HtmlScanner(InputStream, URL, HtmlObserver, Object)
- Constructor with clientData.
-
addObserver(HtmlObserver)
- Add an extra observer to this scanner.
-
addObserver(HtmlObserver, Object)
- Add an extra observer to this scanner.
-
close()
- Override close() with one that makes sure the entire file gets
read, so that all its URLs get extracted, even if the caller isn't
interested in the data.
-
finalize()
- Add a finalize method to try and make sure that our
jiggered close() gets called.
-
markSupported()
- Disallow mark()/reset().
-
read()
- Override to make sure this goes through the above
read( byte[], int, int) method.
-
read(byte[])
- Override to make sure this goes through the above
read( byte[], int, int) method.
-
read(byte[], int, int)
- Special version of read() that runs all data through the HTML scanner.
-
skip(long)
- Override to make sure this goes through the above
read( byte[], int, int) method.
-
substitute(int, String)
- Can be used to change the scan buffer in the middle of a scan.
gettingUrl
protected boolean gettingUrl
- Whether the interpreter is currently accumulating a URL.
HtmlScanner
public HtmlScanner(InputStream s,
URL thisUrl,
HtmlObserver observer)
- Constructor. If the client is not interested in getting called back
with URLs, observer can be null (but then there's not much point in
using this class).
HtmlScanner
public HtmlScanner(InputStream s,
URL thisUrl,
HtmlObserver observer,
Object clientData)
- Constructor with clientData. If the client is not interested in
getting called back with URLs, observer can be null (but then there's
not much point in using this class).
addObserver
public void addObserver(HtmlObserver observer)
- Add an extra observer to this scanner. Multiple observers get called
in the order they were added.
addObserver
public void addObserver(HtmlObserver observer,
Object clientData)
- Add an extra observer to this scanner. Multiple observers get called
in the order they were added.
read
public int read(byte b[],
int off,
int len) throws IOException
- Special version of read() that runs all data through the HTML scanner.
- Overrides:
- read in class FilterInputStream
close
public void close() throws IOException
- Override close() with one that makes sure the entire file gets
read, so that all its URLs get extracted, even if the caller isn't
interested in the data.
- Overrides:
- close in class FilterInputStream
finalize
protected void finalize() throws Throwable
- Add a finalize method to try and make sure that our
jiggered close() gets called.
- Throws: Throwable
- if there's a problem
- Overrides:
- finalize in class Object
read
public int read() throws IOException
- Override to make sure this goes through the above
read( byte[], int, int) method.
- Overrides:
- read in class FilterInputStream
read
public int read(byte b[]) throws IOException
- Override to make sure this goes through the above
read( byte[], int, int) method.
- Overrides:
- read in class FilterInputStream
skip
public long skip(long n) throws IOException
- Override to make sure this goes through the above
read( byte[], int, int) method.
- Overrides:
- skip in class FilterInputStream
markSupported
public boolean markSupported()
- Disallow mark()/reset().
- Overrides:
- markSupported in class FilterInputStream
substitute
protected void substitute(int oldLen,
String newStr)
- Can be used to change the scan buffer in the middle of a scan.
Black Magic! Dangerous! Be careful! For use only by
HtmlEditScanner - any other use voids warranty.
All Packages Class Hierarchy This Package Previous Next Index
ACME Java ACME Labs