Deleting duplicate records

This is one of the common stuffs that every developer has to face / faced. In this post, I will explain three different methods of idenifying and deleting the duplicate records; make sure that you backup the database before running these scripts (or else you can put these scripts inside a BEGIN TRAN...ROLLBACK TRAN for testing purpose ) .

Method 1 :

If you have a Primary key column, or you can do the DML operations in case there is no primary key defined on that table, then you can use this method

In case there is no primary key defined on that table, you can alter the table to add an IDENTITY column;

ALTER urOriginalTable 
ADD i int IDENTITY

Suppose the other two columns are Col1 and Col2; and your delete query will look like

DELETE
FROM urOriginalTable 
WHERE I NOT IN (SELECT MIN(i) from urOriginalTable GROUP BY Col1, col2 )

and as the final step you need to drop the identity column created earlier.

ALTER TABLE urOriginalTable
DROP COLUMN I

Now, if you really dont want to alter your table or you dont have the permission to do so, you can follow any of the following methods.

Method 2 : Using a temp Table

This is a three step process, the first step is to copy the Distinct records to a temp table; now as you have the distinct records in the new table, you can either drop / rename the original table; and as the final step, rename the new table to the original.

--Step 1
SELECT DISTINCT Col1, Col2 
INTO LookupTable 
FROM OriginalTable ; 

--Step2 
Exec sp_Rename 'originalTable' , 'OriginalTable_backup' 

--Step3 
Exec sp_rename 'LookupTable' , 'OriginalTable'

Method 3: Using a cursor

This is a straight forward method and I think you can easily understood this from the followong example

DECLARE @tab TABLE (
         NAME VARCHAR(100)
        ,Sal INT
        )
DECLARE @Name VARCHAR(100)
       ,@Sal INT
       ,@Rows int
SET NOCOUNT ON 
INSERT  INTO @Tab
SELECT  'a',200
UNION ALL
SELECT  'b',300
UNION ALL
SELECT  'a',200
UNION ALL
SELECT  'c',400
UNION ALL
SELECT  'd',700
UNION ALL
SELECT  'c',400
UNION ALL
SELECT  'c',400
UNION ALL
SELECT  'e',600

SELECT  *
FROM    @Tab
DECLARE CurDelDupes CURSOR
FOR SELECT  NAME  ,Sal
FROM    @Tab
GROUP BY NAME
       ,[Sal]
OPEN CurDelDupes
FETCH NEXT FROM CurDelDupes INTO @Name, @Sal
WHILE @@FETCH_STATUS = 0 
      BEGIN
            SELECT  @Rows = COUNT(*) - 1 -- Since we need to keep one copy here, reduce one from Count
            FROM    @Tab
            WHERE   NAME = @Name
                    AND Sal = @Sal 
            WHILE @Rows > 0 
                  BEGIN
                        SET ROWCOUNT 1
                        DELETE  FROM @Tab
                        WHERE   NAME = @Name
                                AND Sal = @Sal 
                        SET ROWCOUNT 0
                        SET @Rows = @Rows - 1 
                  END
            FETCH NEXT FROM CurDelDupes INTO @Name, @Sal
      END 
CLOSE CurDelDupes
DEALLOCATE curDelDupes
SELECT  *
FROM    @Tab

Method 4: Using CTE (Only works with SQL Server 2005 or above )

In case your database is SQL server 2005, then you can make use of the CTE for this purpose; Check this example

DECLARE @tab TABLE ( NAME VARCHAR(10), age INT ) 
SET NOCOUNT ON

INSERT INTO @tab SELECT 'a',10
INSERT INTO @tab SELECT 'a',10
INSERT INTO @tab SELECT 'a',10

INSERT INTO @tab SELECT 'b',10
INSERT INTO @tab SELECT 'b',10

INSERT INTO @tab SELECT 'c',10

;WITH CTE AS 
(
SELECT ROW_NUMBER() OVER ( PARTITION BY NAME ORDER BY NAME DESC ) AS RowNum, 
NAME, age  
FROM @Tab )

SELECT * FROM Cte

I hope this article is useful for you, in case you need any help please let me know.

SQL Server Tips & Tricks

Ads 720 x 90

Deleting duplicate records

Related Posts

Total Pageviews

Label List

Popular

Followers