{
    "componentChunkName": "component---src-templates-blog-post-js",
    "path": "/find-duplicate-rows-in-a-pandas-dataframe/",
    "result": {"data":{"markdownRemark":{"html":"<p>Let’s read the <del>budget.xlsx</del> file into a DataFrame.</p>\n<pre class=\"grvsc-container synthwave-84\" data-language=\"py\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"1\"></span><span class=\"grvsc-source\"><span class=\"mtk10\">import</span><span class=\"mtk15\"> pandas </span><span class=\"mtk10\">as</span><span class=\"mtk15\"> pd</span></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"2\"></span><span class=\"grvsc-source\"></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"3\"></span><span class=\"grvsc-source\"><span class=\"mtk15\">budget </span><span class=\"mtk12\">=</span><span class=\"mtk15\"> pd.</span><span class=\"mtk6\">read_excel</span><span class=\"mtk15\">(</span><span class=\"mtk16\">&quot;budget.xlsx&quot;</span><span class=\"mtk15\">)</span></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"4\"></span><span class=\"grvsc-source\"></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"5\"></span><span class=\"grvsc-source\"><span class=\"mtk15\">budget</span></span></span></code></pre>\n<p><strong>Output:</strong></p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 559px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/c4c249b6e7fa82809e79da293a2cb161/11d18/budget.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 69.5%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAAA7DAAAOwwHHb6hkAAACH0lEQVQoz1XS6XKlIBAF4Lz/y01qkkkqN4moiLIKAiLN5tS92Wb6N6e+PlTfwbZGwWpKpRTnnPMu59JaCyHEGK1zx3EYbYwxIQTvnTHG2utDALg7b1NqZYx1CD09/SGELAu9XC7de/f4+DiMI0Lo98PD8/PzOPQdQi8vl1/391Ktn+GawUpaIcxDZxUnQ4f7NyM5J+Ph9JkPyQidhgahwh79NvfvXsu782znebYUrVaQ0jiO+x6UlO8IlVK0kv5aJEspGGPpOuCdHxByq/yUWwa/rblkMuHjOJSUXd/XWo1eQ/C1FqWk4LyUXEoO+46HwWn1tXbJm6TRb2P3qsUyD93wfmEEk/7NG5UPL5ZpGRHsNvotWD11r/Y73Frzfq+1CSGN2VDfL5SGGJWSx3Gc56mNoYzX2kqpKWVKmfP+S65VrzrGiPFozDZNuOtRSnlVwjmbc1ZSEkIAUoxx33eMx22zP3IIoZQyz/O+BwCY57m2thl9HKGdp1KSc3GTC0TAGP8v66s8jqPWhnOGMY4RpODOuZSSMYbSBQA+5H7onXM/YWttKYULrrXu+57QhQu5LHOM8dpZa3btXG9ynPD0EwYAKSUAEEKsc0ZrhPpbZ3m7xCSE4Fx8ywihzX51hpTMttVSFkqds2Qi80xijErKcITzPNd1nRda260zwDT9I6eUtNYAwBiz1t3uiTLGKV1CuP6fVOpDBojeeTJNft//AimXFybRmeEMAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Budget\"\n        title=\"Budget\"\n        src=\"/static/c4c249b6e7fa82809e79da293a2cb161/11d18/budget.png\"\n        srcset=\"/static/c4c249b6e7fa82809e79da293a2cb161/56d15/budget.png 200w,\n/static/c4c249b6e7fa82809e79da293a2cb161/d9f49/budget.png 400w,\n/static/c4c249b6e7fa82809e79da293a2cb161/11d18/budget.png 559w\"\n        sizes=\"(max-width: 559px) 100vw, 559px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n        decoding=\"async\"\n      />\n  </a>\n    </span></p>\n<p>We can see that we have duplicate rows in our DataFrame.</p>\n<p>We can extract these duplicate rows using the <del>duplicated()</del> method.</p>\n<pre class=\"grvsc-container synthwave-84\" data-language=\"py\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"1\"></span><span class=\"grvsc-source\"><span class=\"mtk15\">duplicates </span><span class=\"mtk12\">=</span><span class=\"mtk15\"> budget.</span><span class=\"mtk6\">duplicated</span><span class=\"mtk15\">()</span></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"2\"></span><span class=\"grvsc-source\"></span></span>\n<span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"3\"></span><span class=\"grvsc-source\"><span class=\"mtk15\">duplicates</span></span></span></code></pre>\n<p>The <del>duplicated()</del> method returns a boolean Series.</p>\n<p><strong>Output:</strong></p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 211px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/74f3d306ca6b8ab1515621f203b2503f/9fccc/booleanSeries.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 120%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAYCAIAAAB1KUohAAAACXBIWXMAAA7DAAAOwwHHb6hkAAACSUlEQVQ4y42SaXOjMAyG+///1U4/7LQ7bZMGEogB29wQcjSATyzv5GLT2TaphgEk9EjWKx4sGMM5SHa8CyOFHQaldC/EYKw2oA10jDMhOZdCD/pwne0BtGxnrzyc8TBggccJAsabKkfOzIIyvLO6J4v5+5+nyJn47gQA7MUeLi9glDo8tLIAahVvitjKFtjOyo888ENnShZu7D6D6E/5FxjAnuqdq4Jo0nK9PQYOkbps3PmSJhn2HSvZZ9iO7tlYgbOsOMLGWmsUpzhCYRK8v4D6Er4qI1ZJs9mNnbUSVV0127bCAcj+NmzVijZ1aWEALe2greZ5HHkeWr78vg0fomxT5uHC8h20jWXrZD4lzlviTlJ/AoP+X+1P1vVdXq/+CVY13jIkJI5ofKXrNzBjXd2sLoJBQhNv7scRxgRfyfoN3HdtVdVj5yLLEQppRJI0NXADPmbvd7uiKK/2XPseohGOML7e6tedpeD1qhnhlKbubB746KfwdvcxwmVeUJqUeVFV9a2ZT9kf2w2Nk8O/dZo5LVxnES6RjxDc7Hz40O73RVmNtfI0D46CZXl2X+2ubYsrtcus9DxEMQmjH8xsrX2bTLKj4NZCSlPfQzGmmNL7nQetl2EgpDy5RVYgFJIIx2lyH9ZKO/6CC3Fy66L2FksSRkEU2ruwUspxnO12Px7bnS0CHxFK7s9sjCEUC3E+9qZZxzRe1TWJY3t3z13bTifPc3f2+PgLoUhr+f769DadRUk85tzqLKVkrGd9p5UGMEIIzvl4lhP8FwmUaJLDFma0AAAAAElFTkSuQmCC'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Boolean Series\"\n        title=\"Boolean Series\"\n        src=\"/static/74f3d306ca6b8ab1515621f203b2503f/9fccc/booleanSeries.png\"\n        srcset=\"/static/74f3d306ca6b8ab1515621f203b2503f/56d15/booleanSeries.png 200w,\n/static/74f3d306ca6b8ab1515621f203b2503f/9fccc/booleanSeries.png 211w\"\n        sizes=\"(max-width: 211px) 100vw, 211px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n        decoding=\"async\"\n      />\n  </a>\n    </span></p>\n<p>Note that the first occurrence of the row is marked as <del>False</del> (i.e. non-duplicate).</p>\n<p>Next,we extract the duplicate rows as shown below:</p>\n<pre class=\"grvsc-container synthwave-84\" data-language=\"py\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"grvsc-gutter-pad\"></span><span class=\"grvsc-gutter grvsc-line-number\" aria-hidden=\"true\" data-content=\"1\"></span><span class=\"grvsc-source\"><span class=\"mtk15\">budget[duplicates]</span></span></span></code></pre>\n<p><strong>Output:</strong></p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 522px; \"\n    >\n      <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/2fed6ae788e9003301c50fa6bae242d5/d15a3/duplicateRows.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n    <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 20%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAECAIAAAABPYjBAAAACXBIWXMAAA7DAAAOwwHHb6hkAAAAt0lEQVQI1zXL226DMAwAUP7/4+AJrQgWbgWalnZoIbbjOPG0ajvvpxB38PORiLIqIYYQRCRJYmYEDByYGQC89zFGIvLeE1GgIBILzVnzr+M4PruurutxnNZlMaafp6m5XIZhML2pqqptu7437VtZluM4FfqWkrxeX8y8zPO6rs1Hs203IrLbDRFjjPv9Ya3lGEMg5rBcZ2vvf1lzFhFVPZ3zAPv+PE+vquBBNasqAiKh/nPuGwB/AH664jUEKMN5AAAAAElFTkSuQmCC'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Duplicate Rows\"\n        title=\"Duplicate Rows\"\n        src=\"/static/2fed6ae788e9003301c50fa6bae242d5/d15a3/duplicateRows.png\"\n        srcset=\"/static/2fed6ae788e9003301c50fa6bae242d5/56d15/duplicateRows.png 200w,\n/static/2fed6ae788e9003301c50fa6bae242d5/d9f49/duplicateRows.png 400w,\n/static/2fed6ae788e9003301c50fa6bae242d5/d15a3/duplicateRows.png 522w\"\n        sizes=\"(max-width: 522px) 100vw, 522px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n        decoding=\"async\"\n      />\n  </a>\n    </span></p>\n<h6 id=\"learn-how-to-remove-duplicate-rows-from-a-pandas-dataframe-in-my-blog-post-here\" style=\"position:relative;\"><a href=\"#learn-how-to-remove-duplicate-rows-from-a-pandas-dataframe-in-my-blog-post-here\" aria-label=\"learn how to remove duplicate rows from a pandas dataframe in my blog post here permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Learn how to remove duplicate rows from a pandas DataFrame in my blog post <a href=\"https://hemanta.io/remove-duplicate-rows-from-a-pandas-dataframe/\">here</a>.</h6>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    position: relative;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n    line-height: 1.4;\n  }\n  \n  .grvsc-code {\n    display: table;\n  }\n  \n  .grvsc-line {\n    display: table-row;\n    box-sizing: border-box;\n    width: 100%;\n    position: relative;\n  }\n  \n  .grvsc-line > * {\n    position: relative;\n  }\n  \n  .grvsc-gutter-pad {\n    display: table-cell;\n    padding-left: 0.75rem;\n    padding-left: calc(var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem)) / 2);\n  }\n  \n  .grvsc-gutter {\n    display: table-cell;\n    -webkit-user-select: none;\n    -moz-user-select: none;\n    user-select: none;\n  }\n  \n  .grvsc-gutter::before {\n    content: attr(data-content);\n  }\n  \n  .grvsc-source {\n    display: table-cell;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-source:empty::after {\n    content: ' ';\n    -webkit-user-select: none;\n    -moz-user-select: none;\n    user-select: none;\n  }\n  \n  .grvsc-gutter + .grvsc-source {\n    padding-left: 0.75rem;\n    padding-left: calc(var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem)) / 2);\n  }\n  \n  /* Line transformer styles */\n  \n  .grvsc-has-line-highlighting > .grvsc-code > .grvsc-line::before {\n    content: ' ';\n    position: absolute;\n    width: 100%;\n  }\n  \n  .grvsc-line-diff-add::before {\n    background-color: var(--grvsc-line-diff-add-background-color, rgba(0, 255, 60, 0.2));\n  }\n  \n  .grvsc-line-diff-del::before {\n    background-color: var(--grvsc-line-diff-del-background-color, rgba(255, 0, 20, 0.2));\n  }\n  \n  .grvsc-line-number {\n    padding: 0 2px;\n    text-align: right;\n    opacity: 0.7;\n  }\n  \n  .synthwave-84 { background-color: #262335; }\n  .synthwave-84 .mtk10 { color: #FEDE5D; }\n  .synthwave-84 .mtk15 { color: #FF7EDBFF; }\n  .synthwave-84 .mtk12 { color: #FFFFFFEE; }\n  .synthwave-84 .mtk6 { color: #36F9F6; }\n  .synthwave-84 .mtk16 { color: #FF8B39; }\n  .synthwave-84 .grvsc-line-highlighted::before {\n    background-color: var(--grvsc-line-highlighted-background-color, rgba(255, 255, 255, 0.1));\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, rgba(255, 255, 255, 0.5));\n  }\n</style>","frontmatter":{"title":"Find Duplicate Rows in a Pandas DataFrame","date":"2021-08-08"}}},"pageContext":{"slug":"/find-duplicate-rows-in-a-pandas-dataframe/","prev":{"fields":{"slug":"/filter-a-pandas-dataframe-based-on-a-condition/"},"frontmatter":{"modules":null}},"next":{"fields":{"slug":"/pandas-isnull-and-notnull/"},"frontmatter":{"modules":null}}}},
    "staticQueryHashes": ["3159585216"]}