Fast Delete of Duplicate Rows

Description: With reference to your tip for the week 08/04/2002 submitted by Madan Patil, this is another way of deleting duplicate rows from a table. The difference being the time it takes to delete the duplicate rows with this method is many times faster than the earlier method.

Let us take a table containing 3 columns, then we can use the following command to delete the duplicate rows from the table.


delete from where rowid in (
SELECT rowid FROM
group by rowid,col1,col2,col3
minus
SELECT min(rowid) FROM

group by col1,col2,col3);

To show the difference let's consider a table:


Table: EMP
EMPNO NUMBER
ENAME VARCHAR2(20)
JOB VARCHAR2(20)


CREATE TABLE EMP (
EMPNO NUMBER,
ENAME VARCHAR2(20),
JOB VARCHAR2(20)
);
/
begin
for i in 1..20 loop
insert into emp values (1,'xx','clerk');
end loop;

commit;
end;
/
begin
for i in 1..20 loop
insert into emp values (2,'yy','accountant');
end loop;
commit;
end;
/
begin

for i in 1..20000 loop
insert into emp values (3,'zz','manager');
end loop;
commit;
end;
/
begin
for i in 1..10000 loop
insert into emp values (4,'ab','accountant');
end loop;
commit;
end;

/

Using the previous method as in your TIP for the Week 08/04/2002


------------------------------------------------------------------------
SQL> select count(*) from emp;

COUNT(*)
----------
30040


SQL> set timing on;
SQL> DELETE FROM EMP E
2 WHERE E.ROWID > ANY (SELECT ROWID
3 FROM EMP M
4 WHERE M.EMPNO = E.EMPNO
5 AND M.ENAME = E.ENAME
6 AND M.JOB = E.JOB );

30036 rows deleted.

Elapsed: 00:03:207.48

SQL> select count(*) from emp;


COUNT(*)
----------
4

Elapsed: 00:00:00.10

Using the NEW suggested method:


-------------------------------------------
SQL> select count(*) from emp;


COUNT(*)
----------
30040
SQL> delete from emp where rowid in (
2 SELECT rowid FROM emp
3 group by rowid,empno,ename,job
4 minus
5 SELECT min(rowid) FROM emp
6 group by empno,ename,job);

30036 rows deleted.

Elapsed: 00:00:02.94

SQL> select count(*) from emp;


COUNT(*)
----------
4

Elapsed: 00:00:00.10
--------------------------------------------------------------------

As we can see the difference is multifold to achieve the same result. This is because the new method uses the set operator to compute the list of duplicate rows. The bigger the table the better you can appreciate the difference.

Source:http://www.oracle.com/technology/oramag/code/tips2002/101302.html

Comments

Popular Posts