|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--org.ilrt.inkling.app.Scutter
A Scutter RDF harvester (see http://rdfweb.org/topic/ScutterSpec) todo: - fix robots, forcelocal, etc etc - sometimes gets stuck - timeout? - delete a thread when too old (tmp fix - random waiting) - randomlise urls list arnd remove duplicates (ok) - also robots.txt - check local copy first - add some more options - force check local copy - set db and driver - check scutterplan name out and in - make choosable start and endfile. - exclude some urls - store url info in SQLdatabase - check if visited before
| Field Summary | |
static java.lang.String |
DISALLOW
|
boolean |
forcelocal
|
| Constructor Summary | |
Scutter()
|
|
Scutter(java.lang.String db)
db is a JDBC database url, e.g. |
|
| Method Summary | |
void |
addSingleUrl(java.lang.String url)
adds a single url to the store and a scutterplan called 'scutter.new2.rdf' |
boolean |
checkAvoid(java.lang.String url)
Makes sure this one is not a url to avoid scuttering. |
void |
checkDone(int i)
Checks if all the threads are finished and can save the scutterplan |
boolean |
checkModified(ScutterURLData data)
etag methods - from http://www.hackdiary.com/archives/000028.html by Matt Biddulph |
java.lang.String |
getdb()
|
java.lang.String |
getDriver()
|
java.util.Vector |
getPlan()
|
java.util.Vector |
getThreads()
|
static void |
main(java.lang.String[] args)
|
java.util.Vector |
readScutter(java.lang.String uri)
Reads in a scutterplan, creating a ScutterUrlData for each url, and returning a Vector of these. |
boolean |
robotSafe(java.lang.String u)
robots.txt handling |
void |
runScutter()
reads a Scutterplan (e.g. |
void |
saveScutter(java.util.Vector newplan,
java.lang.String filen)
Saves Scutterplan with the filename specified |
void |
setdb(java.lang.String db)
|
void |
setDriver(java.lang.String driver)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
public boolean forcelocal
public static final java.lang.String DISALLOW
| Constructor Detail |
public Scutter()
public Scutter(java.lang.String db)
| Method Detail |
public java.util.Vector getPlan()
public static void main(java.lang.String[] args)
public void runScutter()
public void checkDone(int i)
public boolean checkAvoid(java.lang.String url)
public java.util.Vector readScutter(java.lang.String uri)
public void saveScutter(java.util.Vector newplan,
java.lang.String filen)
public void addSingleUrl(java.lang.String url)
public boolean checkModified(ScutterURLData data)
public void setdb(java.lang.String db)
public void setDriver(java.lang.String driver)
public java.lang.String getdb()
public java.lang.String getDriver()
public java.util.Vector getThreads()
public boolean robotSafe(java.lang.String u)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||