org.apache.uima.examples.cpe
Class PersonTitleDBWriterCasConsumer

java.lang.Object
  extended by org.apache.uima.resource.Resource_ImplBase
      extended by org.apache.uima.resource.ConfigurableResource_ImplBase
          extended by org.apache.uima.collection.CasConsumer_ImplBase
              extended by org.apache.uima.examples.cpe.PersonTitleDBWriterCasConsumer
All Implemented Interfaces:
CasObjectProcessor, CasProcessor, CasConsumer, ConfigurableResource, Resource

public class PersonTitleDBWriterCasConsumer
extends CasConsumer_ImplBase

A simple CAS consumer that creates a Derby (Cloudscape) database in the file system. You can obtain this database from http://incubator.apache.org/derby/ *

This CAS Consumer takes one parameters:

It deletes all the databases at the system location (!!!), Creates a new database (takes the most time - order of 10+ seconds) creates a table in the database to hold instances of the PersonTitle annotation Adds entries for each PersonTitle annotation in each CAS to the database To use - add derby.jar to the classpath when you start the CPE GUI - run the CPE Gui and select the Name Recognizer and Person Title Annotator aggregate. - a good sample collection reader is the FileSystemCollectionReader, and - a good sample data is the /examples/data The processing is set up to handle multiple CASes. The end is indicated by using the CollectionProcessComplete call. Batching of updates to the database is done. The batch size is set to 50. The larger size takes more Java heap space, but perhaps runs more efficiently. The Table is populated with a slightly denormalized form of the data: the URI of the document is included with every record.


Field Summary
static int DB_LOAD_BATCH_SIZE
           
static int MAX_TITLE_LENGTH
           
static int MAX_URI_LENGTH
           
static java.lang.String PARAM_OUTPUTDIR
          Name of configuration parameter that must be set to the path of a directory into which the Derby Database will be written.
 
Fields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
 
Constructor Summary
PersonTitleDBWriterCasConsumer()
           
 
Method Summary
 void collectionProcessComplete(ProcessTrace arg0)
          Completes the processing of an entire collection.
 void initialize()
          This method is called during initialization, and does nothing by default.
 void processCas(CAS aCAS)
          Processes the CasContainer which was populated by the TextAnalysisEngines.
 
Methods inherited from class org.apache.uima.collection.CasConsumer_ImplBase
batchProcessComplete, destroy, getProcessingResourceMetaData, initialize, isReadOnly, isStateless, processCas, reconfigure, typeSystemInit
 
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
 

Field Detail

PARAM_OUTPUTDIR

public static final java.lang.String PARAM_OUTPUTDIR
Name of configuration parameter that must be set to the path of a directory into which the Derby Database will be written.

See Also:
Constant Field Values

MAX_URI_LENGTH

public static final int MAX_URI_LENGTH
See Also:
Constant Field Values

MAX_TITLE_LENGTH

public static final int MAX_TITLE_LENGTH
See Also:
Constant Field Values

DB_LOAD_BATCH_SIZE

public static final int DB_LOAD_BATCH_SIZE
See Also:
Constant Field Values
Constructor Detail

PersonTitleDBWriterCasConsumer

public PersonTitleDBWriterCasConsumer()
Method Detail

initialize

public void initialize()
                throws ResourceInitializationException
Description copied from class: CasConsumer_ImplBase
This method is called during initialization, and does nothing by default. Subclasses should override it to perform one-time startup logic.

Overrides:
initialize in class CasConsumer_ImplBase
Throws:
ResourceInitializationException - if a failure occurs during initialization.

processCas

public void processCas(CAS aCAS)
                throws ResourceProcessException
Processes the CasContainer which was populated by the TextAnalysisEngines.
In this case, the CAS is assumed to contain annotations of type PersonTitle, created with the PersonTitleAnnotator. These Annotations are stored in a database table called PersonTitle.

Parameters:
aCAS - CasContainer which has been populated by the TAEs
Throws:
ResourceProcessException - if there is an error in processing the Resource
See Also:
CasObjectProcessor.processCas(org.apache.uima.cas.CAS)

collectionProcessComplete

public void collectionProcessComplete(ProcessTrace arg0)
                               throws ResourceProcessException,
                                      java.io.IOException
Description copied from interface: CasProcessor
Completes the processing of an entire collection.

Specified by:
collectionProcessComplete in interface CasProcessor
Overrides:
collectionProcessComplete in class CasConsumer_ImplBase
Parameters:
arg0 - an object that records information, such as timing, about this method's execution.
Throws:
ResourceProcessException - if an exception occurs during processing
java.io.IOException - if an I/O failure occurs
See Also:
CasProcessor.collectionProcessComplete(org.apache.uima.util.ProcessTrace)


Copyright © 2012. All Rights Reserved.