If there is a need to SELECT unique rows from a table present in Teradata, a better performance oriented query option would be to use the GROUP BY clause instead of the DISTINCT.  You might wonder why?

Here are the reasons:

n        When you are using DISTINCT Clause, data would be spooled first and then duplicate rows would be eliminated. This is not a very efficient way unless you have less number of rows present in individual AMPs.

n        In case of GROUP BY, the elimination happens as a first step which makes a lot of difference in case of a large dataset.

n        As you know, in a sub query, rows returned are automatically DISTICT so to make this sub query run faster make it GROUP BY.

n        Using DISTINCT rows would be sorted automatically but not with the GROUP BY. I guess this might be another reason DISTINCT is less efficient.

To test the performance, create a temp table mentioned in the previous articles and insert some 50-100 rows and run the following queries and check the elapsed time.

SELECT emp_id, COUNT(*)
FROM emp
GROUP BY 1 ORDER BY 1;

SELECT COUNT(distinct emp_id)
FROM  emp

Our Random Articles

More Links