| |

VerySource

 Forgot password?
 Register
Search
View: 3265|Reply: 27

Text deduplication algorithm problem ~~ Please give pointers

[Copy link]

4

Threads

19

Posts

17.00

Credits

Newbie

Rank: 1

Credits
17.00

 China

Post time: 2020-1-31 19:00:01
| Show all posts |Read mode
The text document content is as follows

 // s1 s2 s3 s4 s5
20161217, D, B, DKDDA332021, ESA3332SS1
20161217, D, B, DKDDA332022, ESA3332SS2
20161217, D, B, DKDDA332023, ESA3332SS3
20161217, D, B, DKDDA332021, ESA3332SS2
20161217, D, B, DKDDA332025, ESA3332SS2
20161217, D, B, DKDDA332021, ESA3332SS3
20161217, D, B, DKDDA332022, ESA3332SS7

Among the text documents, s4 and s5 will have duplicate data. I want to delete all duplicates.
But with SORT sort, you can only sort the entire row. Can you sort by s4 and s4, and delete the duplicate capacity (the entire row)! Thank you everyone
Where S4 is the same, only one line is left, and the remaining lines are deleted; where S5 is the same, only one line is left
Reply

Use magic Report

4

Threads

19

Posts

17.00

Credits

Newbie

Rank: 1

Credits
17.00

 China

 Author| Post time: 2020-3-10 20:15:01
| Show all posts
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, StrUtils;

type
  TForm1 = class (TForm)
    Button1: TButton;
    OpenDialog1: TOpenDialog;
    procedure Button1Click (Sender: TObject);
  private
    {Private declarations}
  public
    {Public declarations}
  end;

var
  Form1: TForm1;

implementation

{$ R * .dfm}

procedure TForm1.Button1Click (Sender: TObject);
var
newlist, s4list, filelist: tstringlist;
liushui, dates, barcode, company, inout: string;
i: integer;
s4: string;
off, d1, d2, d3, d4: integer;
begin
opendialog1.Execute;
filelist: = TStringList.create;
filelist.loadfromfile (opendialog1.FileName);
newlist: = TStringList.Create;
s4list: = TStringList.Create;
s4list.sorted: = True; // sorting will speed up the search

for i: = 0 to filelist.count -1 do
begin
  off: = posEx (',', filelist [i], 0);
 d1: = off;
 dates: = copy (filelist [i], 0, d1-1);
 off: = posEx (',', filelist [i], off + 1);
 d2: = off;
 company: = copy (filelist [i], d1 + 1, d2-d1-1);
 off: = posEx (',', filelist [i], off + 1);
 d3: = off;
 inout: = copy (filelist [i], d2 + 1, d3-d2-1);
 off: = posEx (',', filelist [i], off + 1);
 d4: = off;
 barcode: = copy (filelist [i], d3 + 1, d4-d3-1);
 off: = posEx (',', filelist [i], off + 1);
 liushui: = copy (filelist [i], d4 + 1,100);
 s4: = barcode; // s4
  if newlist.indexof (s4)> 0 then
begin
  newlist.add (filelist [i]);
  s4list.add (s4);
end;
end;

newlist.savetofile (opendialog1.FileName)
end;

end.

The test just failed and the S4 duplicate still exists.
Reply

Use magic Report

0

Threads

34

Posts

21.00

Credits

Newbie

Rank: 1

Credits
21.00

 China

Post time: 2020-3-23 21:00:02
| Show all posts
procedure TForm1.Button1Click (Sender: TObject);
var
  SL: TStringList;
  I, K: Integer;
  str: string;
begin
  SL: = TStringList.Create;
  {The original text is located in the program directory b.txt}
  SL.LoadFromFile (sysutils.ExtractFilePath (application.ExeName) + 'b.txt');
  {Following is the same as deleting the remaining line of S4}
  for I: = SL.Count-1 downto 0 do
      begin
      Str: = SL [I];
      for K: = 1 to 3 do
          Str: = Copy (Str, pos (',', Str) + 1, MAXINT);
      Str: = Copy (Str, 1, Pos (',', Str) -1);
      SL [I]: = Str + ',' + SL [I];
      end;
  SL.Sort;
  str: = Copy (SL [SL.count-1], 1, pos (',', SL [SL.count-1])-1);
  SL [SL.count-1]: = Copy (SL [SL.count-1], pos (',', SL [SL.count-1]) + 1, MAXINT);
  for I: = SL.Count-2 downto 0 do
      begin
      if Copy (SL [I], 1, pos (',', SL [I])-1) = Str then
         SL.Delete (I)
         else
         begin
         str: = Copy (SL [I], 1, pos (',', SL [I])-1);
         SL [I]: = Copy (SL [I], pos (',', SL [I]) + 1, MAXINT);
         end;
      end;
  {The following is the same as deleting the remaining line of S5}
  for I: = SL.Count-1 downto 0 do
      begin
      Str: = SL [I];
      for K: = 1 to 4 do
          Str: = Copy (Str, pos (',', Str) + 1, MAXINT);
      SL [I]: = Str + ',' + SL [I];
      end;
  SL.Sort;
  str: = Copy (SL [SL.count-1], 1, pos (',', SL [SL.count-1])-1);
  SL [SL.count-1]: = Copy (SL [SL.count-1], pos (',', SL [SL.count-1]) + 1, MAXINT);
  for I: = SL.Count-2 downto 0 do
      begin
      if Copy (SL [I], 1, pos (',', SL [I])-1) = Str then
         SL.Delete (I)
         else
         begin
         str: = Copy (SL [I], 1, pos (',', SL [I])-1);
         SL [I]: = Copy (SL [I], pos (',', SL [I]) + 1, MAXINT);
         end;
      end;
  {Completed all, the result is stored in bb.txt in the program directory}
  SL.SaveToFile (sysutils.ExtractFilePath (application.ExeName) + 'bb.txt');
  SL.Free;
end;
Reply

Use magic Report

0

Threads

34

Posts

21.00

Credits

Newbie

Rank: 1

Credits
21.00

 China

Post time: 2020-3-23 21:30:01
| Show all posts
Ha ha, test it, if it doesn't work, post it again
Reply

Use magic Report

0

Threads

34

Posts

21.00

Credits

Newbie

Rank: 1

Credits
21.00

 China

Post time: 2020-3-25 17:15:01
| Show all posts
You try it, my code is fine
Reply

Use magic Report

4

Threads

19

Posts

17.00

Credits

Newbie

Rank: 1

Credits
17.00

 China

 Author| Post time: 2020-3-26 13:15:02
| Show all posts
20161217, D, B, DKDDA332021, ESA3332SS1
20161217, D, B, DKDDA332022, ESA3332SS2
20161217, D, B, DKDDA332023, ESA3332SS3
20161217, D, B, DKDDA332021, ESA3332SS2
20161217, D, B, DKDDA332025, ESA3332SS2
20161217, D, B, DKDDA332021, ESA3332SS3
20161217, D, B, DKDDA332022, ESA3332SS7


DKDDA332021
How could it be deleted ~~ It's strange, let me help you
Reply

Use magic Report

0

Threads

34

Posts

21.00

Credits

Newbie

Rank: 1

Credits
21.00

 China

Post time: 2020-3-26 19:00:02
| Show all posts
Because it's in the first, fourth, and sixth rows of S4, master.
Reply

Use magic Report

4

Threads

19

Posts

17.00

Credits

Newbie

Rank: 1

Credits
17.00

 China

 Author| Post time: 2020-3-27 11:15:01
| Show all posts
Oh, leave me a line repeatedly ~~
Reply

Use magic Report

0

Threads

34

Posts

21.00

Credits

Newbie

Rank: 1

Credits
21.00

 China

Post time: 2020-3-27 13:45:01
| Show all posts
Oh, this is really a problem haha, let me show it
Reply

Use magic Report

4

Threads

19

Posts

17.00

Credits

Newbie

Rank: 1

Credits
17.00

 China

 Author| Post time: 2020-3-27 15:00:01
| Show all posts
Haha, yeah, I didn't express clearly enough, I'm sorry ~
Reply

Use magic Report

You have to log in before you can reply Login | Register

Points Rules

Contact us|Archive|Mobile|CopyRight © 2008-2023|verysource.com ( 京ICP备17048824号-1 )

Quick Reply To Top Return to the list