ROW_NUMBER() Window Function – find duplicate values.

Many times, we do not want duplicate rows or values in our SQL tables. On the other hand, in some situations, it does not matter if there are duplicates present. For whatever reason, suppose duplicates have found their way into one of your tables. How can you find them quickly and easily? The ROW_NUMBER() Window function is a fantastic tool to use. Continue reading and see example queries you can apply to your own tables and find those duplicates…

pile of identical plug like circuit chips blue in color — Photo by Joe Green on Unsplash

OS and DB used:

Linux Mint 20 “Ulyana”
MySQL 8.0.22

Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!

Visit this fantastic YouTube video where I learned a great deal for this blog post.

For the examples in this post, we have this arbitrary table with some test data:

mysql-sql [learning]> SELECT * FROM some_names;
+----+------------+
| id | first_name |
+----+------------+
| 1 | Max |
| 2 | Mary |
| 3 | Charlie |
| 4 | Humpty |
| 5 | Roger |
| 6 | Jim |
| 7 | Jupyter |
| 8 | Max |
| 9 | Mary |
| 10 | Charlie |
| 11 | Humpty |
| 12 | Roger |
| 13 | Jim |
| 14 | Jupyter |
| 15 | Jim |
| 16 | Jim |
| 17 | Max |
| 18 | Max |
| 19 | Mary |
+----+------------+
19 rows in set (0.0011 sec)

You can likely pick out the few duplicates that are present since this is a rather small result set. However, with a larger result set, not so easy. Let’s put ROW_NUMBER() to work in finding the duplicates.

But first, let’s visit the online window functions documentation on ROW_NUMBER() and see the syntax and description:

ROW_NUMBER() OVER()

“Returns the number of the current row within its partition. Rows numbers range from 1 to the number of partition rows. ORDER BY affects the order in which rows are numbered. Without ORDER BY, row numbering is nondeterministic.”

ROW_NUMBER() Window Function with empty OVER() clause in MySQL

To start, let’s query the data with ROW_NUMBER() and no arguments in the OVER() clause:

mysql-sql [learning]> SELECT id, first_name, ROW_NUMBER() OVER()
-> FROM some_names;
+----+------------+---------------------+
| id | first_name | ROW_NUMBER() OVER() |
+----+------------+---------------------+
| 1 | Max | 1 |
| 2 | Mary | 2 |
| 3 | Charlie | 3 |
| 4 | Humpty | 4 |
| 5 | Roger | 5 |
| 6 | Jim | 6 |
| 7 | Jupyter | 7 |
| 8 | Max | 8 |
| 9 | Mary | 9 |
| 10 | Charlie | 10 |
| 11 | Humpty | 11 |
| 12 | Roger | 12 |
| 13 | Jim | 13 |
| 14 | Jupyter | 14 |
| 15 | Jim | 15 |
| 16 | Jim | 16 |
| 17 | Max | 17 |
| 18 | Max | 18 |
| 19 | Mary | 19 |
+----+------------+---------------------+
19 rows in set (0.0013 sec)

Based on the query results above, we can see that ROW_NUMBER() provides an incrementing set of numbers in order, starting at 1 and ending with a number equal to that of the total number of rows.

ROW_NUMBER() Window Function with PARTITION BY clause in MySQL

How can you isolate duplicate rows using ROW_NUMBER()?

The optional PARTITION BY clause of the OVER clause will create sub-groups of rows for its specified value. For the purposes of this blog post, we will assume that we want to target the ‘first_name’ column and find any duplicates for that column.

In a nutshell, PARTITION BY first_name will subdivide the ‘first_name’ column into separate groups of identical values. An example query provides a better, visual understanding:

mysql-sql [learning]> SELECT id, first_name,
-> ROW_NUMBER() OVER(PARTITION BY first_name) AS r_num
-> FROM some_names;
+----+------------+-------+
| id | first_name | r_num |
+----+------------+-------+
| 3 | Charlie | 1 |
| 10 | Charlie | 2 |
| 4 | Humpty | 1 |
| 11 | Humpty | 2 |
| 6 | Jim | 1 |
| 13 | Jim | 2 |
| 15 | Jim | 3 |
| 16 | Jim | 4 |
| 7 | Jupyter | 1 |
| 14 | Jupyter | 2 |
| 2 | Mary | 1 |
| 9 | Mary | 2 |
| 19 | Mary | 3 |
| 1 | Max | 1 |
| 8 | Max | 2 |
| 17 | Max | 3 |
| 18 | Max | 4 |
| 5 | Roger | 1 |
| 12 | Roger | 2 |
+----+------------+-------+
19 rows in set (0.0012 sec)

Note that the ‘first_name’ column is now grouped by identical values. Also, each time the ‘first_name’ sub-group changes to a different value, ROW_NUMBER() restarts counting at 1 again and continues in consecutive order until another change occurs within the ‘first_name’ sub-groups.

The OVER() clause also accepts an optional ORDER BY clause that will sort the rows according to the specified value. Here I use the ‘id’ column in ASC order. The ‘id’ column is the table’s PRIMARY KEY so this value is UNIQUE and a solid candidate column to sort by:

mysql-sql [learning]> SELECT id, first_name,
-> ROW_NUMBER() OVER(PARTITION BY first_name ORDER BY id ASC) AS r_num
-> FROM some_names;
+----+------------+-------+
| id | first_name | r_num |
+----+------------+-------+
| 3 | Charlie | 1 |
| 10 | Charlie | 2 |
| 4 | Humpty | 1 |
| 11 | Humpty | 2 |
| 6 | Jim | 1 |
| 13 | Jim | 2 |
| 15 | Jim | 3 |
| 16 | Jim | 4 |
| 7 | Jupyter | 1 |
| 14 | Jupyter | 2 |
| 2 | Mary | 1 |
| 9 | Mary | 2 |
| 19 | Mary | 3 |
| 1 | Max | 1 |
| 8 | Max | 2 |
| 17 | Max | 3 |
| 18 | Max | 4 |
| 5 | Roger | 1 |
| 12 | Roger | 2 |
+----+------------+-------+
19 rows in set (0.0177 sec)

ROW_NUMBER() Window function with Derived Table in MySQL

All that remains to do is filter the query results based on the ROW_NUMBER() value, targeting any value greater than 1 for each sub-group. Any records that match that criteria are duplicates.

Simple enough, right? Just use the WHERE clause:

mysql-sql [learning]> SELECT id, first_name
-> FROM some_names
-> WHERE ROW_NUMBER() OVER(PARTITION BY first_name ORDER BY id ASC) > 1;
ERROR: 3593: You cannot use the window function 'row_number' in this context.

Uh-oh. Turns out, you cannot use a window function like this in the WHERE clause predicate.

Now what?

If you wrap the ROW_NUMBER() query in a derived table, then you can perform the filtering in the WHERE clause:

mysql-sql [learning]> SELECT t.id, t.first_name, t.r_num
-> FROM
-> (
-> SELECT id, first_name, ROW_NUMBER() OVER(PARTITION BY first_name ORDER BY id ASC) AS r_num
-> FROM some_names
-> ) AS t
-> WHERE t.r_num > 1
-> ORDER BY t.first_name ASC;
+----+------------+-------+
| id | first_name | r_num |
+----+------------+-------+
| 10 | Charlie | 2 |
| 11 | Humpty | 2 |
| 13 | Jim | 2 |
| 15 | Jim | 3 |
| 16 | Jim | 4 |
| 14 | Jupyter | 2 |
| 9 | Mary | 2 |
| 19 | Mary | 3 |
| 8 | Max | 2 |
| 17 | Max | 3 |
| 18 | Max | 4 |
| 12 | Roger | 2 |
+----+------------+-------+
12 rows in set (0.0011 sec)

And there are the duplicates, easily found using the ROW_NUMBER() Window Function. Try it yourself next time you need to locate any duplicates in your tables.

Like what you have read? See anything incorrect? Please comment below and thank you for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.

Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

The post ROW_NUMBER() Window Function – find duplicate values. appeared first on Digital Owl's Prose.

ROW_NUMBER() Window Function – find duplicate values.

Self-Promotion:

ROW_NUMBER() Window Function with empty OVER() clause in MySQL

ROW_NUMBER() Window Function with PARTITION BY clause in MySQL

ROW_NUMBER() Window function with Derived Table in MySQL

A Call To Action!

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112