Unit Testing Hive UDFs
As discussed in previous posts, a User Defined Table Function (UDTF) is a variant on the normal Hive UDF. Instead of reading one or more columns as input and writing a single column as output, the UDTF takes in one or more columns and writes multiple rows.
For UDFs, input records are passed to the following function for processing, with the result being used as the return value:
public static evaluate();
This fits the normal JUnit testing framework, so traditional testing methods can be applied.
However, for UDTFs, the input records are passed to the following function:
public void process(Object record);
Notice that the return value is "void". In the case of UDTFs, output values are written through calls to the "forward" method:
protected final void forward(java.lang.Object o);
Since both the process and forward methods have a void return value, this does not conform to the JUnit testing process, and an alternative approach is required.
AspectJ is an extension to the Java language that allows programmers to define "Aspects" - structures that provide mechanisms for overriding functionality in particular methods, or for supplementing additional functionality before or after a particular event. Events can be method calls, modifications of variables, initialization of classes, or thrown exceptions.
This technology is applicable to the UDTF case because it will allow us to apply AspectJ "advice" around the forward method - calling the normal Hive method during normal execution and calling a custom method that will fit into the JUnit framework during the testing phase.