Thursday, July 30, 2009

Text compare

This is not flex related, I know but since I don't have another blog to post this in, I will put it in this one:) so bare with me. Following my last post which was about implementing track changes in flex side, I decided to get a text comparison algorithm on the server java side to compare two texts and return a merged version with all additions and deletions.

I found a number of character/line comparison algos but the output wasn't that nice and i decided to look for a word by word comparison for change tracking. What I found was a "longest common subsequences algorithm" for Java but it was based on two linked lists or arrays. So decided to complement this with a textutil compare method that receives as a parameters two version of the text and produces a merged copy with deletions wrapped in "[-" & "-]" and additions wrapped in "[+" & "+]".


/**
* Breaks ups a string of text of a series of words
* @param text
* @return list of words
*/
public static
List extractWords (String text)
{

BreakIterator bi = BreakIterator.getWordInstance();
List words = new ArrayList();
bi.setText(text);
int
index =0;
while
(bi.next() != BreakIterator.DONE) {
words.add(text.substring(index,bi.current()));
System.out.print(/*(words.size()-1) +*/ text.substring(index,bi.current()));
index = bi.current();
}
System.out.println(); System.out.println();
return
words;
}

/**

* Compares and finds changes require to be done on text 1 to change it to text 2
* @param text1
* @param text2
* @return a copy of text1 with all merged changes of text2
* deletions are wrapped in "[-" "-]" additions are wrapped in "[+" and "+]"
*
*/
public static
String compareText( String text1, String text2)

{
List list= extractWords(text1);
List list2= extractWords(text2);
Diff diff = new Diff (list,list2);
List diffList = diff.diff();
Iterator itr = diffList.iterator();
Difference difference;
String mergedText= "";
int
i=0;
int
delStart=-1;
int
addStart=-1;
int
delEnd=-1;
int
addEnd=-1;

while
(itr.hasNext()) //iterate through the difference list.
{
difference = itr.next();
delStart = difference.getDeletedStart();
delEnd = difference.getDeletedEnd();
addStart = difference.getAddedStart();
addEnd = difference.getAddedEnd();
while
( i < delStart )
{ //copy all text before the current delete from the old text
mergedText += list.get(i);
i++;
}
if
(delEnd != Difference.NONE)
{ //elseif DelEnd == None => nothing was deleted, most probably only addition
mergedText += "[-";
while
(i<= delEnd)//wrap the delete
{
mergedText += list.get(i);
i++;
}
mergedText += "-]" ;
}
if
(addEnd != Difference.NONE)
{ //elseif Addend == None => nothing was added, most probably only delete
mergedText += "[+";
for
(int j=addStart; j <= addEnd; j++)
{
mergedText += list2.get(j);
}
mergedText += "+]" ;
}
}
while
(i <>//copy the reset of the unchanged text.
{
mergedText += list.get(i);
i++;
}
return mergedText;
}

No comments:

Post a Comment