Below is the API documention for the the Stetl Python code.
Main Entry Points
There are several entry points through which Stetl can be called.
The most common is to use the commandline script bin/stetl. This command should
be available after doing an install.
In some contexts like integrations
you may want to call Stetl via Python. The entries are then.
-
stetl.main.main()[source]
The main function, to be called from commandline, like python src/main.py -c etl.cfg.
- Args:
-c |
--config <config_file> |
| the Stetl config file. |
|
-s |
–section <section_name> the section in the Stetl config (ini) file to execute (default is [etl]). |
-a |
–args <arglist> substitutable args for symbolic, {arg}, values in Stetl config file, in format “arg1=foo arg2=bar” etc. |
-d |
–doc <class> Get component documentation like its configuration parameters, e.g. stetl –doc stetl.inputs.fileinput.FileInput |
-h |
–help get help info |
-
stetl.main.print_doc(class_name)[source]
Print documentation for class in particular config options
-
class stetl.etl.ETL(options_dict, args_dict=None)[source]
The main class: builds ETL Chains with connected Components from a config and let them run.
Usually this class is called via main but it may be called directly for direct integration.
Core Framework
The core framework is directly under the directory src/stetl.
Below are the main seven classes. Their interrelation is as follows:
One or more stetl.chain.Chain objects are built from
a Stetl ETL configuration via the stetl.factory.Factory class.
A stetl.chain.Chain consists of a set of connected stetl.component.Component objects.
A stetl.component.Component is either an stetl.input.Input, an stetl.output.Output
or a stetl.filter.Filter. Data and status flows as stetl.packet.Packet objects
from an stetl.input.Input via zero or more stetl.filter.Filter objects to a final stetl.output.Output.
As a trivial example: an stetl.input.Input could be an XML file, a stetl.filter.Filter could represent
an XSLT file and an stetl.output.Output a PostGIS database. This is effected by specialized classes in
the subpackages inputs, filters, and outputs.
-
class stetl.factory.Factory[source]
Object and class Factory (Pattern).
Based on: http://stackoverflow.com/questions/2226330/instantiate-a-python-class-from-a-name
-
class_forname(class_string)[source]
Returns class instance specified by a string.
- Args:
- class_string: The string representing a class.
- Raises:
- ValueError if module part of the class is not specified.
-
new_instance(class_obj, configdict, section)[source]
Returns object instance from class instance.
- Args:
- class_obj: object representing a class instance.
args: standard args.
kwargs: standard args.
-
class stetl.component.Component(configdict, section, consumes='none', produces='none')[source]
Abstract Base class for all Input, Filter and Output Components.
-
after_chain_invoke(packet)[source]
Called right after entire Component Chain invoke.
-
after_invoke(packet)[source]
Called right after Component invoke.
-
before_invoke(packet)[source]
Called just before Component invoke.
-
exit()[source]
Allows derived Components to perform a one-time exit/cleanup.
-
init()[source]
Allows derived Components to perform a one-time init.
-
input_format()[source]
CONFIG -
The specific input format if the consumes parameter is a list or the format to be converted to the output_format.
Required: False
Default: None
-
invoke(packet)[source]
Components override for Component-specific behaviour, typically read, filter or write actions.
-
output_format()[source]
CONFIG -
The specific output format if the produces parameter is a list or the format to which the input format is converted.
Required: False
Default: None
-
class stetl.component.Config(ptype=<type 'str'>, default=None, required=False)[source]
Decorator class to tie config values from the .ini file to object instance
property values. Somewhat like the Python standard @property but with
the possibility to define default values, typing and making properties required.
Each property is defined by @Config(type, default, required).
Basic idea comes from: https://wiki.python.org/moin/PythonDecoratorLibrary#Cached_Properties
-
class stetl.chain.Chain(chain_str, config_dict)[source]
Holder for single invokable pipeline of components
A Chain is basically a singly linked list of Components
Each Component executes a part of the total ETL.
Data along the Chain is passed within a Packet object.
The compatibility of input and output for linked
Components is checked when adding a Component to the Chain.
-
add(etl_comp)[source]
Add component to end of Chain
:param etl_comp:
:return:
-
assemble()[source]
Builder method: build a Chain of linked Components
:return:
-
run()[source]
Run the ETL Chain.
:return:
-
class stetl.packet.Packet(data=None)[source]
Represents units of (any) data and status passed along Chain of Components.
-
class stetl.input.Input(configdict, section, produces)[source]
Bases: stetl.component.Component
Abstract Base class for all Input Components.
-
class stetl.output.Output(configdict, section, consumes)[source]
Bases: stetl.component.Component
Abstract Base class for all Output Components.
-
class stetl.filter.Filter(configdict, section, consumes, produces)[source]
Bases: stetl.component.Component
Maps input to output. Abstract base class for specific Filters.