LET’S PLAY WITH
TRANSFORMATION
As
mentioned earlier, Transformations are
business logics which are used to make inconsistent data to consistent format.
1) Aggregator
Transformation
It
is used to do some aggregation concepts like sum, average, count, count
distinct (count distinct and Non null column).
This transformation does not
generate an error output.
Options
1)
Advanced / Basic
Display or hide options to
configure multiple aggregations for multiple outputs. By default, the advanced
options are hidden.
2)
Aggregation Name
In the Advanced display,
type a friendly name for the aggregation.
3)
Group By Columns
In the Advanced display,
select columns for grouping by using the Available Input Columns list as
described below.
4)
Key Scale
In the Advanced display,
optionally specify the approximate number of keys that the aggregation can
write. By default, the value of this option is Unspecified. If both the Key
Scale and Keys properties are set, the value of Keys takes
precedence.
Value
|
Description
|
Unspecified
|
The Key Scale property is not used.
|
Low
|
Aggregation can write approximately 500,000 keys.
|
Medium
|
Aggregation can write approximately 5,000,000
keys.
|
High
|
Aggregation can write more than 25,000,000 keys.
|
5)
Keys
In the Advanced display,
optionally specify the exact number of keys that the aggregation can write. If
both Key Scale and Keys are specified, Keys takes
precedence.
6)
Available Input Columns
Select from the list of
available input columns by using the check boxes in this table.
7)
Input Column
Select from the list of
available input columns.
8)
Output Alias
Type an alias for each column.
The default is the name of the input column; however, you can choose any
unique, descriptive name.
9)
Operation
Choose from the list of
available operations, using the following table as a guide.
Operation
|
Description
|
GroupBy
|
Divides datasets into groups. Columns with any
data type can be used for grouping.
|
Sum
|
Sums the values in a column. Only columns with numeric data
types can be summed.
|
Average
|
Returns the average of the column values in a
column. Only columns with numeric data types can be averaged.
|
Count
|
Returns the number of items in a group.
|
Count Distinct
|
Returns the number of unique non null values
in a group.
|
Minimum
|
Returns the minimum value in a group. Restricted to numeric data
types.
|
Maximum
|
Returns the maximum value in a group.
Restricted to numeric data types.
|
10)
Comparison Flags
If you choose Group
By, use the check boxes to control how the transformation performs the
comparison.
11)
Count Distinct Scale
Optionally specify the
approximate number of distinct values that the aggregation can write. By
default, the value of this option is Unspecified. If both CountDistinctScale
and CountDistinctKeys are specified, CountDistinctKeys takes
precedence.
Value
|
Description
|
Unspecified
|
The CountDistinctScale property is not
used.
|
Low
|
Aggregation can write approximately 500,000 distinct values.
|
Medium
|
Aggregation can write approximately 5,000,000
distinct values.
|
High
|
Aggregation can write more than 25,000,000 distinct values.
|
12)
Count Distinct Keys
Optionally specify the
exact number of distinct values that the aggregation can write. If both CountDistinctScale
and CountDistinctKeys are specified, CountDistinctKeys takes
precedence.
The Aggregate
transformation applies aggregate functions, such as Average, to column
values and copies the results to the transformation output. Besides aggregate
functions, the transformation provides the GROUP BY clause, which you can use
to specify groups to aggregate across.
You configure the
Aggregate transformation at the transformation, output, and column levels.
At the transformation
level, you configure the Aggregate transformation for performance by specifying
the following values:
·
The number of groups
that are expected to result from a Group by operation.
·
The number of distinct
values that are expected to result from a Count distinct operation.
·
The percentage by which
memory can be extended during the aggregation.
·
The Aggregate
transformation can also be configured to generate a warning instead of failing
when the value of a divisor is zero.
At the output level, you configure the
Aggregate transformation for performance by specifying the number of groups
that are expected to result from a Group by operation. The Aggregate
transformation supports multiple outputs, and each can be configured
differently.
At the column level, you specify the following
values:
·
The aggregation that the
column performs.
·
The comparison options
of the aggregation.
You can also configure
the Aggregate transformation for performance by specifying these values:
·
The number of groups
that are expected to result from a Group by operation on the column.
·
The number of distinct
values that are expected to result from a Count distinct operation on
the column.
Performance Considerations
The Aggregate transformation includes a set of properties that you
can set to enhance the performance of the transformation.
·
When performing a Group
by operation, set the Keys or KeysScale properties of the
component and the component outputs. Using Keys, you can specify the
exact number of keys the transformation is expected to handle. (In this
context, Keys refers to the number of groups that are expected to result
from a Group by operation.) Using KeysScale, you can specify an
approximate number of keys. When you specify an appropriate value for Keys
or KeyScale, you improve performance because the tranformation is able
to allocate adequate memory for the data that the transformation caches.
·
When performing a distinct
count operation, set the CountDistinctKeys or CountDistinctScale
properties of the component. Using CountDistinctKeys, you can specify
the exact number of keys the transformation is expected to handle for a count
distinct operation. (In this context, CountDistinctKeys refers to the
number of distinct values that are expected to result from a Distinct count
operation.) Using CountDistinctScale, you can specify an approximate
number of keys for a count distinct operation. When you specify an appropriate
value for CountDistinctKeys or CountDistinctScale, you improve
performance because the transformation is able to allocate adequate memory for
the data that the transformation caches.
Comments
Post a Comment