Search This Blog

Deleting duplicate records

This is one of the common stuffs that every developer has to face / faced. In this post, I will explain three different methods of idenifying and deleting the duplicate records; make sure that you backup the database before running these scripts (or else you can put these scripts inside a BEGIN TRAN...ROLLBACK TRAN  for testing purpose ) .

 

Method 1 :

If you have a Primary key column, or you can do the DML operations in case there is no primary key defined on that table, then you can use this method

In case there is no primary key defined on that table, you can alter the table to add an IDENTITY column;

ALTER urOriginalTable 
ADD i int IDENTITY

Suppose the other two columns are Col1 and Col2; and your delete query will look like


DELETE
FROM urOriginalTable
WHERE I NOT IN (SELECT MIN(i) from urOriginalTable GROUP BY Col1, col2 )

and as the final step you need to drop the identity column created earlier.

ALTER TABLE urOriginalTable
DROP COLUMN I

 

Now, if you really dont want to alter your table or you dont have the permission to do so, you can follow any of the following methods.

 

Method 2 :  Using a temp Table


This is a three step process, the first step is to copy the Distinct records to a temp table;  now as you have the distinct records in the new table, you can either drop / rename the original table; and as the final step, rename the new table to the original.

--Step 1
SELECT DISTINCT Col1, Col2
INTO LookupTable
FROM OriginalTable ;

--Step2
Exec sp_Rename 'originalTable' , 'OriginalTable_backup'

--Step3
Exec sp_rename 'LookupTable' , 'OriginalTable'




Method 3: Using a cursor


This is a straight forward method and I  think you can easily understood this from the followong example


DECLARE @tab TABLE (
NAME VARCHAR(100)
,Sal INT
)
DECLARE @Name VARCHAR(100)
,@Sal INT
,@Rows int
SET NOCOUNT ON
INSERT INTO @Tab
SELECT 'a',200
UNION ALL
SELECT 'b',300
UNION ALL
SELECT 'a',200
UNION ALL
SELECT 'c',400
UNION ALL
SELECT 'd',700
UNION ALL
SELECT 'c',400
UNION ALL
SELECT 'c',400
UNION ALL
SELECT 'e',600

SELECT *
FROM @Tab
DECLARE CurDelDupes CURSOR
FOR SELECT NAME ,Sal
FROM @Tab
GROUP BY NAME
,[Sal]
OPEN CurDelDupes
FETCH NEXT FROM CurDelDupes INTO @Name, @Sal
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @Rows = COUNT(*) - 1 -- Since we need to keep one copy here, reduce one from Count
FROM @Tab
WHERE NAME = @Name
AND Sal = @Sal
WHILE @Rows > 0
BEGIN
SET ROWCOUNT 1
DELETE FROM @Tab
WHERE NAME = @Name
AND Sal = @Sal
SET ROWCOUNT 0
SET @Rows = @Rows - 1
END
FETCH NEXT FROM CurDelDupes INTO @Name, @Sal
END
CLOSE CurDelDupes
DEALLOCATE curDelDupes
SELECT *
FROM @Tab
Method 4: Using CTE (Only works with SQL Server 2005 or above )
In case your database is SQL server 2005, then you can make use of the CTE for this purpose; Check this example
 

DECLARE @tab TABLE ( NAME VARCHAR(10), age INT ) 
SET NOCOUNT ON

INSERT INTO @tab SELECT 'a',10
INSERT INTO @tab SELECT 'a',10
INSERT INTO @tab SELECT 'a',10

INSERT INTO @tab SELECT 'b',10
INSERT INTO @tab SELECT 'b',10

INSERT INTO @tab SELECT 'c',10

;WITH CTE AS
(
SELECT ROW_NUMBER() OVER ( PARTITION BY NAME ORDER BY NAME DESC ) AS RowNum,
NAME, age
FROM @Tab )

SELECT * FROM Cte
 
 
I hope this article is useful for you, in case you need any help please let me know. 

1 comment:

webtips said...

The below article showed me a simple way
Delete Duplicate Rows in Sql Server

-Webtips