Fast Delete of Duplicate Rows

April 12, 2006

Fast Delete of Duplicate Rows

Description: With reference to your tip for the week 08/04/2002 submitted by Madan Patil, this is another way of deleting duplicate rows from a table. The difference being the time it takes to delete the duplicate rows with this method is many times faster than the earlier method.

Let us take a table containing 3 columns, then we can use the following command to delete the duplicate rows from the table.


   delete from  where rowid in (
   SELECT  rowid FROM 
   group by rowid,col1,col2,col3
   minus
   SELECT  min(rowid) FROM 

   group by col1,col2,col3);

To show the difference let's consider a table:


Table: EMP
EMPNO     NUMBER
ENAME     VARCHAR2(20)
JOB       VARCHAR2(20)


CREATE TABLE EMP (
EMPNO     NUMBER,
ENAME     VARCHAR2(20),
JOB       VARCHAR2(20)
);
/
begin
for i in 1..20 loop
insert into emp values (1,'xx','clerk');
end loop;

commit;
end;
/
begin
for i in 1..20 loop
insert into emp values (2,'yy','accountant');
end loop;
commit;
end;
/
begin

for i in 1..20000 loop
insert into emp values (3,'zz','manager');
end loop;
commit;
end;
/
begin
for i in 1..10000 loop
insert into emp values (4,'ab','accountant');
end loop;
commit;
end;

/

Using the previous method as in your TIP for the Week 08/04/2002


------------------------------------------------------------------------
SQL> select count(*) from emp;

 COUNT(*)
----------
    30040


SQL> set timing on;
SQL> DELETE FROM EMP E
 2  WHERE E.ROWID > ANY (SELECT ROWID
 3  FROM EMP M
 4  WHERE M.EMPNO = E.EMPNO
 5  AND M.ENAME = E.ENAME
 6  AND M.JOB = E.JOB );

30036 rows deleted.

Elapsed: 00:03:207.48

SQL> select count(*) from emp;


 COUNT(*)
----------
        4

Elapsed: 00:00:00.10

Using the NEW suggested method:


-------------------------------------------
SQL> select count(*) from emp;


 COUNT(*)
----------
    30040
SQL> delete from emp where rowid in (
 2  SELECT  rowid FROM emp
 3  group by rowid,empno,ename,job
 4  minus
 5  SELECT  min(rowid) FROM emp
 6  group by empno,ename,job);

30036 rows deleted.

Elapsed: 00:00:02.94

SQL> select count(*) from emp;


 COUNT(*)
----------
        4

Elapsed: 00:00:00.10
--------------------------------------------------------------------

As we can see the difference is multifold to achieve the same result. This is because the new method uses the set operator to compute the list of duplicate rows. The bigger the table the better you can appreciate the difference.

Source:http://www.oracle.com/technology/oramag/code/tips2002/101302.html

Search This Blog

Engine of Creation

Fast Delete of Duplicate Rows

Comments

Popular Posts

Some Linux Commands