Java program to delete duplicate lines in text file

Course Curriculum

Java program to delete duplicate lines in text file

Java program to delete duplicate lines in text file

Given a file input.txt . Our Task is to remove duplicate lines from it and save the output in file say output.txt

Naive Algorithm :

1. Create PrintWriter object for output.txt
2. Open BufferedReader for input.txt
3. Run a loop for each line of input.txt
3.1 flag = false
3.2 Open BufferedReader for output.txt
3.3 Run a loop for each line of output.txt
-> If line of output.txt is equal to current line of input.txt
-> flag = true
-> break loop

4. Check flag, if false
-> write current line of input.txt to output.txt
-> Flush PrintWriter stream

5. Close resources.
To successfully run the below program input.txt must exits in same folder OR provide full path for it.

// Java program to remove
// duplicates from input.txt and
// save output to output.txt

import java.io.*;

public class FileOperation
{
public static void main(String[] args) throws IOException
{
// PrintWriter object for output.txt
PrintWriter pw = new PrintWriter("output.txt");

// BufferedReader object for input.txt
BufferedReader br1 = new BufferedReader(new FileReader("input.txt"));

String line1 = br1.readLine();

// loop for each line of input.txt
while(line1 != null)
{
boolean flag = false;

// BufferedReader object for output.txt
BufferedReader br2 = new BufferedReader(new FileReader("output.txt"));

String line2 = br2.readLine();

// loop for each line of output.txt
while(line2 != null)
{

if(line1.equals(line2))
{
flag = true;
break;
}

line2 = br2.readLine();

}

// if flag = false
// write line of input.txt to output.txt
if(!flag){
pw.println(line1);

// flushing is important here
pw.flush();
}

line1 = br1.readLine();

}

// closing resources
br1.close();
pw.close();

System.out.println("File operation performed successfully");
}
}
Output:

File operation performed successfully
Note : If output.txt exist in cwd(current working directory) then it will be overwritten by above program otherwise new file will be created.

A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

To successfully run the below program input.txt must exits in same folder OR provide full path for them.

// Efficient Java program to remove
// duplicates from input.txt and
// save output to output.txt

import java.io.*;
import java.util.HashSet;

public class FileOperation
{
public static void main(String[] args) throws IOException
{
// PrintWriter object for output.txt
PrintWriter pw = new PrintWriter("output.txt");

// BufferedReader object for input.txt
BufferedReader br = new BufferedReader(new FileReader("input.txt"));

String line = br.readLine();

// set store unique values
HashSet<String> hs = new HashSet<String>();

// loop for each line of input.txt
while(line != null)
{
// write only if not
// present in hashset
if(hs.add(line))
pw.println(line);

line = br.readLine();

}

pw.flush();

// closing resources
br.close();
pw.close();

System.out.println("File operation performed successfully");
}
}
Output:

File operation performed successfully

(Next Lesson) How to start learning Java