CSV Codec Examples¶
The CSV codec is different to other codecs in that it produces an iterable of
odin.Resource
objects rather than a single structure.
Typical Usage¶
When using the odin.codecs.csv_codec
my recommendation is to define
your CSV file format using a CSV csv.Dialect
and a sub-class of the
odin.codecs.csv_codec.Reader
to configure how files are handled.
A simple example of a customised csv.Dialect
and
odin.codecs.csv_codec.Reader
:
import csv
from odin.codecs import csv_codec
class MyFileDialect(csv.Dialect):
"""
Custom CSV dialect
"""
delimiter = '\t' # Tab delimited
quotechar = '"' # Double quotes for quoting
lineterminator = '\n' # UNIX style line termination
class MyFileReader(csv_codec.Reader):
"""
Custom file reader
"""
# Specify our custom dialect
csv_dialect = MyFileDialect
# Header case is not important
ignore_header_case = True
# Treat empty values as `None`
default_empty_value = None
This can then be used with:
# To read a file
with open("my_file.csv") as f:
for r in MyFileReader(f, MyResource):
pass
# To write a file
with open("my_file.csv", "w") as f:
csv_codec.dump(f, my_resources, dialect=MyFileDialect)
Handling errors¶
As resources are generated as each CSV row is read any validation errors raised also need to be handled. The default behavior is for a validation error to be raised when a invalid row is encountered effectively stopping processing. However, it may be valid to skip bad rows or, a report of bad rows can be generated and processing continued.
There are two approaches that can be used:
Provide an
error_callback
value to theodin.codecs.csv_codec.Reader
- Sub-class the
odin.codecs.csv_codec.Reader
class and implement a custom
handle_validation_error
method to process errors.
- Sub-class the
Note
Providing an error_callback
will overwrite any custom
handle_validation_error
method.
The error_callback
option is the simplest:
def error_callback(validation_error, idx):
"""
Handle errors reading rows from CSV file.
:param validation_error: The validation error exception
:param idx: The row index the error occurred on.
:returns: ``None`` or ``False`` to explicitly case the exception to
be raised.
"""
print("Error in row {}: {}".format(idx, validation_error), file=sys.stderr)
with open("my_file.csv") as f:
for r in MyFileReader(f, MyResource, error_callback=error_callback):
...
The sub-class method is more involved upfront but does allow for more customisation:
class MyReader(csv_codec.Reader):
"""
Custom file reader that reports errors to a file.
"""
def __init__(self, f, error_file, *args, **kwargs):
super().__init__(self, *args, **kwargs)
self.error_file = error_file
def handle_validation_error(self, validation_error, idx):
self.error_file.write("{}\t{}\n".format(idx, validation_error))
with open("my_file.csv") as f_in, open("my_file.error.csv", "w") as f_err:
for r in MyReader(f_in, f_err, MyResource):
...
The second option allows for a lot of customisation and reuses. For example the error report could itself output a CSV file.