INTERNET-DRAFT EXPIRATION 6 April 2017 Independent Submission A. Cheney Category: Informational prettydiff.com October 2017 Simple Diff Algorithm draft-cheney-simplediff-00.txt Abstract This document describes a simplified algorithm for performing computational comparisons. The design goal is to achieve an approach that uses the smallest amount of logic. Necessary secondary goals include an approach that is language agnostic, fast, extensible, and more predictable than other similar algorithms. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Comments Comments are solicited and should be addressed to info@prettydiff.com. Cheney Informational [Page 1] RFC XXXX Simple Diff Algorithm October 2017 Table of Contents 1. Introduction ....................................................3 2. Understanding how to compare ....................................3 2.1. It is more about the equality than the differences .........3 2.2. Algorithm quality ..........................................4 2.3. Speed ......................................................4 2.4. Output format ..............................................6 3. How the algorithm works at a high level .........................7 3.1. Getting by with a hash map and some counts .................7 3.2. The third and final pass through the data ..................8 3.3. The "fix" function ........................................11 3.4. Done ......................................................16 4. Extensibility ..................................................23 5. Security considerations ........................................23 6. IANA considerations ............................................23 7. References .....................................................23 8. Author's address ...............................................23 Cheney Informational [Page 2] RFC XXXX Simple Diff Algorithm October 2017 1. Introduction There are many comparison algorithms available, which are commonly referred to as "diff" algorithms after the similarly named Unix application. While there are many algorithms already available none achieve speed, precision, and simplicity in an easily predictable manner. The most commonly fault of many diff algorithms is that they solve for the incorrect problem. Instead of accomplishing the goal, to find how two samples differ, they instead will try to solve for a common computer science problem called "Longest Common Subsequence (LCS)". The LCS problem seeks to find the longest blocks of equality in samples of comparison, which sounds ideal. Unfortunately, the execution of this approach is not predictable as it is bound to the commonality of the compared samples. Furthermore, it isn't simple or fast. Worse still is that this approach is a precision failure in situations where differences are interspersed across frequent repetition. 2. Understanding how to compare 2.1. It is more about the equality than the differences A good diff algorithm will attempt to identify as much equality as possible. Everything else qualifies as differences. The metric for quality and precision is a smaller count of captured differences. The smaller this number the better, presuming differences aren't escaping undetected. False negatives, which is allowing differences through without detection, is really bad. This is absolute failure in a diff algorithm. False positives, which is identifying more differences than there actually are is also bad, but a false positive is much better than a false negative. This means it is safer to report more differences than are actually present, which is why a higher number of reported differences is less precise. Maximum precision is reporting differences without any false negatives or false positives. One way to achieve higher precision is to first capture as much equality as possible. For everything else that is left over prioritize unique items first and report these items as differences. Finally determine the next bit of equality or uniqueness and everything in between is either a change, an insertion, or a deletion. Cheney Informational [Page 3] RFC XXXX Simple Diff Algorithm October 2017 2.2. Algorithm quality The primary priorities when writing this kind of code are execution speed, precision (as previous described), and code simplicity. In most cases precision is the most important factor in code design, but in some cases speed is more important when processing large input or a batch of thousands of files. Simplicity is necessary so that other people can understand the code and modify it as necessary to improve upon the design and additional features. No algorithm is ever capable of meeting all needs for all use cases, so it is important that other people can understand the code with minimal effort. 2.3. Speed Faster execution is the result of a couple of things. The most important consideration for making an algorithm faster is to reduce the number of passes through data where possible. After that eliminate all costly operations. In most programming languages simple arithmetic and static string comparisons are cheap to execute particularly if the examined strings aren't changing. The theoretical minimum number of data passes is two as you have to read the contents of each submitted sample. This proposed algorithm achieves speed by only taking three complete passes through data and making all possible effort to never repeat a step or loop iteration. This approach is linear and predictable where the number of interations passing through data is computed from the sum of: * number of iterations from the first sample * number of iterations from the second sample * the number of iterations from the smallest of those samples. Performance of the mentioned approach is heavily based upon key access in a hash map, which will likely vary by programming language. Until a way to perform a comparison without reading from the samples is discovered 2 data passes can be safely assumed as the fastest possible approach. Between that and the Simple Diff approach there are two possibilities for increased performance. The first possibility is to make decisions and write output immediately upon reading from the second sample so as to have only two data passes. The challenge with this approach is that analysis occurs in the moment of the current loop interation without knowledge of what comes later in the second sample. A more practical second performance possibility is to write a smaller hash map. Writing a smaller hash map means moving some of the decision tree up front before a separate formal analysis phase. In order for this approach to be viable this early step in logic must be tiny, non-replicating, and a different means of iteration must be devised to account for a data store of unpredictable size. Cheney Informational [Page 4] RFC XXXX Simple Diff Algorithm October 2017 The proposed algorithm could be faster than the popular Myers' O(ND) approach. That may or may not be true and no evidence was provided either way (conjecture is not evidence). In terms of experimental criteria algorithms are themselves largely irrelevant. More important is the implementation of those algorithms. If exercising the Myers' approach makes fewer decisions and has fewer total data passes, as described in the previous paragraph, then it likely is faster. Pronounced emphasis is applied to the word "total" because this makes all the difference. Many prior diff applications don't provide a complete third data pass. Instead they provide the minimum two complete data passes over the samples and various smaller passes as calculated by block moves and edit distances. If these various fractional passes are non- linear, which means starting and stopping in different places respectively, their performance is less predictable. If these fractional passes are non-linear and touch any given data index more than once they have repetitive logic, and likely are not as fast. To affirmatively guarantee superior performance over this Simple Diff approach there needs to be fewer passes over data, which means no repetition and a smaller number of iterations. Predictability ensures the performance of an application scales roughly proportionately to the size of the provided samples, where an unpredictable approach would scale disproportionately (slower over time). The approach taken here is fast. It cannot be argued, scientifically, it is the fastest ever (or slowest) approach for its level of accuracy without also writing alternate algorithms into applications with identical application constraints. One could argue this approach is the fastest ever comparative algorithm for its level of predictability and precision. Cheney Informational [Page 5] RFC XXXX Simple Diff Algorithm October 2017 2.4. Output format This Simple Diff algorithm inherits its output format from the application jsdifflib, https://github.com/cemerick/jsdifflib. The output is a big array containing child arrays at each index. The child arrays each contain 5 indexes in the format: * type * start in first sample * end in first sample * start in second sample * end in second sample There are 4 types defined in the output: * "equal" - both array index items are identical at the indexes compared * "replace" - changes are present at indexes compared * "insert" - the index of the second sample is unique to the second sample * "delete" - the index of the first sample is unique to the first sample Cheney Informational [Page 6] RFC XXXX Simple Diff Algorithm October 2017 3. How the algorithm works at a high level 3.1. Getting by with a hash map and some counts This algorithm takes after the Paul Heckel algorithm. First thing is to transform the string input into arrays. The most primative way to do this is to split the input by lines where each line of input is an index in an array. Once the two input samples are converted to arrays they will each have to be examined. The first loop will walk through an array representing one of the code samples and make a decision for a hash map which is named "table" in the following code sample. The two arrays are simply named "one" and "two". Each index of the array, by default is a line of code, which will serve as a key in the hash map named "table". If this key does not exist then it will be added. The value for each key is an object with two properites named "one" and "two". These properties are simply a count indicating the number of times the key is encountered in the two arrays. If the key is already present then we will simply increase the value of properties "one" or "two" respective of the current array. Here is an example of this step in code: do { if (table[two[b]] === undefined) { table[two[b]] = { one: 0, two: 1 }; } else { table[two[b]].two += 1; } b += 1; } while (b < lenb); Cheney Informational [Page 7] RFC XXXX Simple Diff Algorithm October 2017 3.2. The third and final pass through the data Once the table is fully populated with all indexes from both arrays, "one" and "two", we need a third and final loop to walk across the data and make some simple decisions. Here is this complete step in code. do { c = a; d = b; if (one[a] === two[b]) { equality(); } else if (table[one[a]][1] < 1 && table[two[b]][0] < 1) { replaceUniques(); } else if (table[one[a]][1] < 1 && one[a + 1] !== two[b + 2]) { deletion(); } else if (table[two[b]][0] < 1 && one[a + 2] !== two[b + 1]) { insertion(); } else if ( table[one[a]][0] - table[one[a]][1] === 1 && one[a + 1] !== two[b + 2] ) { deletionStatic(); } else if ( table[two[b]][1] - table[two[b]][0] === 1 && one[a + 2] !== two[b + 1] ) { insertionStatic(); } else if (one[a + 1] === two[b]) { deletion(); } else if (one[a] === two[b + 1]) { insertion(); } else { replacement(); } a = a + 1; b = b + 1; } while (a < lena && b < lenb); Before moving any further please understand this logic is fast and simple, but it isn't precise at all. We will fix this later with a child function named "fix". Corrections for precision are a secondary step so as to not disrupt performance and also isolate complexity into a separate single location. In the above code references "a" and "b" are simply positive integer incrementors. The "a" reference is always associated with the "one" array while the "b" reference is always associated with the "two" array. This is necessary so that each array may increment independently without collision. Cheney Informational [Page 8] RFC XXXX Simple Diff Algorithm October 2017 The "c" and "d" references are secondary incrementors used as closures in the child functions. These secondary incrementors allow for child loops to occur without mutation to the primary incrementors. The first thing that happens in this loop is an attempt to discover equality. Everything else is less important and so happens later in the decision tree. The second rule is to identify items that occur only one more time in each array. If the next item to occur in the arrays is not equal but occurs just once more then we know the item is a unique change. The third rule is to determine if the current item is a deletion. If the current item, and possibly additional following items, is present in the first sample but no longer exists in the second sample then it is a dynamic deletion. The fourth rule is the same as the third rule but in reverse to determine dynamic insertions. The fifth rule is to determine if the current index occurs exactly one more time in the first sample than in the second sample and yet is not a match for the next item in the first sample with two items up in the second sample. This is a static deletion, or rather a deletion of a fixed number of items. The sixth rule is the same as the fifth rule but in reverse to determine static insertions. The seventh and final rule determines that the current items in the samples are non-unique changes. The child functions are as follows: equality = function simpleDiff_equality() { do { table[one[c]].one -= 1; table[one[c]].two -= 1; c += 1; d += 1; } while ( c < lena && d < lenb && one[c] === two[d] ); fix(["equal", a, c, b, d]); b = d - 1; a = c - 1; }, Cheney Informational [Page 9] RFC XXXX Simple Diff Algorithm October 2017 deletion = function simpleDiff_deletion() { do { table[one[c]].one -= 1; c += 1; } while ( c < lena && table[one[c]].two < 1 ); fix(["delete", a, c, -1, -1]); a = c - 1; b = d - 1; }, deletionStatic = function simpleDiff_deletionStatic() { table[one[a]].one -= 1; fix([ "delete", a, a + 1, -1, -1 ]); a = c; b = d - 1; }, insertion = function simpleDiff_insertion() { do { table[two[d]].two -= 1; d += 1; } while ( d < lenb && table[two[d]].one < 1 ); fix(["insert", -1, -1, b, d]); a = c - 1; b = d - 1; }, insertionStatic = function simpleDiff_insertionStatic() { table[two[b]].two -= 1; fix([ "insert", -1, -1, b, b + 1 ]); a = c - 1; b = d; }, Cheney Informational [Page 10] RFC XXXX Simple Diff Algorithm October 2017 replacement = function simpleDiff_replacement() { do { table[one[c]].one -= 1; table[two[d]].two -= 1; c += 1; d += 1; } while ( c < lena && d < lenb && table[one[c]].two > 0 && table[two[d]].one > 0 ); fix(["replace", a, c, b, d]); a = c - 1; b = d - 1; }, replaceUniques = function simpleDiff_replaceUniques() { do { table[one[c]].one -= 1; c += 1; d += 1; } while ( c < lena && d < lenb && table[one[c]].two < 1 && table[two[d]].one < 1 ); fix(["replace", a, c, b, d]); a = c - 1; b = d - 1; } It must be noted that most of the these child functions do contain their own loops, but these loops are not duplicates. The primary parent loop, the so called third and final loop, is adjusted forward by the distance these smaller loops increment so that there is no repetition or lose of execution speed. 3.3. The "fix" function We can see from the child function code that arrays are generated which appear to be the format described as the child arrays of the output. Instead of pushing this data to the output array directly it will be passed through a function named "fix" which serves as a sort of quality control filter. This function checks for things like: Cheney Informational [Page 11] RFC XXXX Simple Diff Algorithm October 2017 * A diff type immediately following the same diff type, which can be combined into a single child array for the output * An insertion immediately following a deletion, or vise versa, which should likely be a replacement * Whether false positives generated from replacements immediately following an insertion or deletion can be eliminated The general idea is to only accept a child array of the output as an argument and make a decision after comparing it against the previous child array of the output. Aside from two narrow edge cases there is no analysis of the original data. Think of it like an appeals court in that no new evidence is allowed, because only the laws are under scrutiny. The code for the fix function is as follows: fix = function simpleDiff_fix(code) { var prior = codes[codes.length - 1]; if (prior !== undefined) { if (prior[0] === code[0]) { if (code[0] === "replace" || code[0] === "equal") { prior[2] = code[2]; prior[4] = code[4]; } else if (code[0] === "delete") { prior[2] = code[2]; } else if (code[0] === "insert") { prior[4] = code[4]; } return; } if (prior[0] === "insert" && prior[4] - prior[3] === 1) { if (code[2] - code[1] === 1) { if (code[0] === "replace") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[2]; code[0] = "insert"; code[1] = -1; code[2] = -1; } else if (code[0] === "delete") { code[0] = "replace"; code[3] = prior[3]; code[4] = prior[4]; codes.pop(); prior = codes[codes.length - 1]; if (prior[0] === "replace") { prior[2] = code[2]; prior[4] = code[4]; return; } } Cheney Informational [Page 12] RFC XXXX Simple Diff Algorithm October 2017 } else if (code[0] === "delete") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[1] + 1; code[1] = code[1] + 1; } else if (code[0] === "replace") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[1] + 1; c = prior[2]; d = prior[4]; return; } } else if ( prior[0] === "insert" && code[0] === "delete" && code[2] - code[1] === 1 ) { prior[4] = prior[4] - 1; code[0] = "replace"; code[3] = prior[4]; code[4] = prior[4] + 1; } else if ( prior[0] === "delete" && prior[2] - prior[1] === 1 ) { if (code[4] - code[3] === 1) { if (code[0] === "replace") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[4]; code[0] = "delete"; code[3] = -1; code[4] = -1; } else if (code[0] === "insert") { code[0] = "replace"; code[1] = prior[1]; code[2] = prior[2]; codes.pop(); prior = codes[codes.length - 1]; if (prior[0] === "replace") { prior[2] = code[2]; prior[4] = code[4]; return; } } } else if (code[0] === "insert") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[3] + 1; code[3] = code[3] + 1; Cheney Informational [Page 13] RFC XXXX Simple Diff Algorithm October 2017 } else if (code[0] === "replace") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[4] + 1; c = prior[2]; d = prior[4]; return; } } else if ( prior[0] === "delete" && code[0] === "insert" && code[4] - code[3] === 1 ) { prior[2] = prior[2] - 1; code[0] = "replace"; code[1] = prior[2]; code[2] = prior[2] + 1; } else if (prior[0] === "replace") { if (code[0] === "delete") { if (one[code[2] - 1] === two[prior[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[4] = prior[4] - 1; } c = c - 1; d = d - 1; return; } if (one[code[2]] === two[prior[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[2] = prior[2] - 1; prior[4] = prior[4] - 11; table[one[c - 1]][0] = table[one[c - 1]][0] - 1; } } } else if (code[0] === "insert") { if (one[prior[2] - 1] === two[code[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[2] = prior[2] - 1; } c = c - 1; d = d - 1; return; } if (one[code[2] - 1] === two[prior[4]]) { if (prior[4] - prior[3] > 1) { prior[2] = prior[2] - 1; prior[4] = prior[4] - 1; table[two[d - 1]][1] = table[two[d - 1]][1] - 1; } Cheney Informational [Page 14] RFC XXXX Simple Diff Algorithm October 2017 } } } } codes.push(code); }; Cheney Informational [Page 15] RFC XXXX Simple Diff Algorithm October 2017 3.4. Done! The following is the complete algorithm with all parts assembled: function simpleDiff () { var table = {}, lines = function simpleDiff_lines(str) { str = str .replace(/\r\n/g, "\n") .replace(/\r/g, "\n"); return str.split("\n"); }, one = (typeof options.source === "string") ? lines(options.source) : options.source, two = (typeof options.diff === "string") ? lines(options.diff) : options.diff, lena = one.length, lenb = two.length, a = 0, b = 0, c = 0, d = 0, codes = [], fix = function simpleDiff_fix(code) { var prior = codes[codes.length - 1]; if (prior !== undefined) { if (prior[0] === code[0]) { if ( code[0] === "replace" || code[0] === "equal" ) { prior[2] = code[2]; prior[4] = code[4]; } else if (code[0] === "delete") { prior[2] = code[2]; } else if (code[0] === "insert") { prior[4] = code[4]; } return; } if ( prior[0] === "insert" && prior[4] - prior[3] === 1 ) { if (code[2] - code[1] === 1) { if (code[0] === "replace") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[2]; Cheney Informational [Page 16] RFC XXXX Simple Diff Algorithm October 2017 code[0] = "insert"; code[1] = -1; code[2] = -1; } else if (code[0] === "delete") { code[0] = "replace"; code[3] = prior[3]; code[4] = prior[4]; codes.pop(); prior = codes[codes.length - 1]; if (prior[0] === "replace") { prior[2] = code[2]; prior[4] = code[4]; return; } } } else if (code[0] === "delete") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[1] + 1; code[1] = code[1] + 1; } else if (code[0] === "replace") { prior[0] = "replace"; prior[1] = code[1]; prior[2] = code[1] + 1; c = prior[2]; d = prior[4]; return; } } else if ( prior[0] === "insert" && code[0] === "delete" && code[2] - code[1] === 1 ) { prior[4] = prior[4] - 1; code[0] = "replace"; code[3] = prior[4]; code[4] = prior[4] + 1; } else if ( prior[0] === "delete" && prior[2] - prior[1] === 1 ) { if (code[4] - code[3] === 1) { if (code[0] === "replace") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[4]; code[0] = "delete"; code[3] = -1; code[4] = -1; } else if (code[0] === "insert") { code[0] = "replace"; Cheney Informational [Page 17] RFC XXXX Simple Diff Algorithm October 2017 code[1] = prior[1]; code[2] = prior[2]; codes.pop(); prior = codes[codes.length - 1]; if (prior[0] === "replace") { prior[2] = code[2]; prior[4] = code[4]; return; } } } else if (code[0] === "insert") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[3] + 1; code[3] = code[3] + 1; } else if (code[0] === "replace") { prior[0] = "replace"; prior[3] = code[3]; prior[4] = code[4] + 1; c = prior[2]; d = prior[4]; return; } } else if ( prior[0] === "delete" && code[0] === "insert" && code[4] - code[3] === 1 ) { prior[2] = prior[2] - 1; code[0] = "replace"; code[1] = prior[2]; code[2] = prior[2] + 1; } else if (prior[0] === "replace") { if (code[0] === "delete") { if (one[code[2] - 1] === two[prior[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[4] = prior[4] - 1; } c = c - 1; d = d - 1; return; } if (one[code[2]] === two[prior[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[2] = prior[2] - 1; prior[4] = prior[4] - 11; table[one[c - 1]][0] = table[one[c - 1]][0] - 1; } } } else if (code[0] === "insert") { Cheney Informational [Page 18] RFC XXXX Simple Diff Algorithm October 2017 if (one[prior[2] - 1] === two[code[4] - 1]) { if (prior[2] - prior[1] > 1) { prior[2] = prior[2] - 1; } c = c - 1; d = d - 1; return; } if (one[code[2] - 1] === two[prior[4]]) { if (prior[4] - prior[3] > 1) { prior[2] = prior[2] - 1; prior[4] = prior[4] - 1; table[two[d - 1]][1] = table[two[d - 1]][1] - 1; } } } } } codes.push(code); }, equality = function simpleDiff_equality() { do { table[one[c]][0] = table[one[c]][0] - 1; table[one[c]][1] = table[one[c]][1] - 1; c = c + 1; d = d + 1; } while (c < lena && d < lenb && one[c] === two[d]); fix(["equal", a, c, b, d]); b = d - 1; a = c - 1; }, deletion = function simpleDiff_deletion() { do { table[one[c]][0] = table[one[c]][0] - 1; c = c + 1; } while (c < lena && table[one[c]][1] < 1); fix(["delete", a, c, -1, -1]); a = c - 1; b = d - 1; }, deletionStatic = function simpleDiff_deletionStatic() { table[one[a]][0] = table[one[a]][0] - 1; fix([ "delete", a, a + 1, -1, -1 ]); a = c; b = d - 1; }, Cheney Informational [Page 19] RFC XXXX Simple Diff Algorithm October 2017 insertion = function simpleDiff_insertion() { do { table[two[d]][1] = table[two[d]][1] - 1; d = d + 1; } while (d < lenb && table[two[d]][0] < 1); fix(["insert", -1, -1, b, d]); a = c - 1; b = d - 1; }, insertionStatic = function simpleDiff_insertionStatic() { table[two[b]][1] = table[two[b]][1] - 1; fix([ "insert", -1, -1, b, b + 1 ]); a = c - 1; b = d; }, replacement = function simpleDiff_replacement() { do { table[one[c]][0] = table[one[c]][0] - 1; table[two[d]][1] = table[two[d]][1] - 1; c = c + 1; d = d + 1; } while ( c < lena && d < lenb && table[one[c]][1] > 0 && table[two[d]][0] > 0 ); fix(["replace", a, c, b, d]); a = c - 1; b = d - 1; }, replaceUniques = function simpleDiff_replaceUniques() { do { table[one[c]][0] = table[one[c]][0] - 1; c = c + 1; d = d + 1; } while ( c < lena && d < lenb && table[one[c]][1] < 1 && table[two[d]][0] < 1 ); fix(["replace", a, c, b, d]); a = c - 1; b = d - 1; }; // * First Pass, account for lines from first file // * build the table from the second file Cheney Informational [Page 20] RFC XXXX Simple Diff Algorithm October 2017 do { if (options.diffspaceignore === true) { two[b] = two[b].replace(/\s+/g, ""); } if (table[two[b]] === undefined) { table[two[b]] = [0, 1]; } else { table[two[b]][1] = table[two[b]][1] + 1; } b = b + 1; } while (b < lenb); // * Second Pass, account for lines from second file // * build the table from the first file lena = one.length; a = 0; do { if (options.diffspaceignore === true) { one[a] = one[a].replace(/\s+/g, ""); } if (table[one[a]] === undefined) { table[one[a]] = [1, 0]; } else { table[one[a]][0] = table[one[a]][0] + 1; } a = a + 1; } while (a < lena); a = 0; b = 0; // find all equality... differences are what's left over solve // only for the second set test removing reverse test removing // undefined checks for table refs do { c = a; d = b; if (one[a] === two[b]) { equality(); } else if (table[one[a]][1] < 1 && table[two[b]][0] < 1) { replaceUniques(); } else if ( table[one[a]][1] < 1 && one[a + 1] !== two[b + 2] ) { deletion(); } else if ( table[two[b]][0] < 1 && one[a + 2] !== two[b + 1] ) { insertion(); Cheney Informational [Page 21] RFC XXXX Simple Diff Algorithm October 2017 } else if ( table[one[a]][0] - table[one[a]][1] === 1 && one[a + 1] !== two[b + 2] ) { deletionStatic(); } else if ( table[two[b]][1] - table[two[b]][0] === 1 && one[a + 2] !== two[b + 1] ) { insertionStatic(); } else if (one[a + 1] === two[b]) { deletion(); } else if (one[a] === two[b + 1]) { insertion(); } else { replacement(); } a = a + 1; b = b + 1; } while (a < lena && b < lenb); if (lena - a === lenb - b) { if (one[a] === two[b]) { fix(["equal", a, lena, b, lenb]); } else { fix(["replace", a, lena, b, lenb]); } } else if (a < lena) { fix(["delete", a, lena, -1, -1]); } else if (b < lenb) { fix(["insert", -1, -1, b, lenb]); } return codes; } Cheney Informational [Page 22] RFC XXXX Simple Diff Algorithm October 2017 4. Extensibility Since the code is simple and its execution is predictable the approach is open to modification. In different contexts a most precise comparison of text is not the most desirable objective. One example is to ignore certain particular differences that might be considered as unimportant. This degree of flexibility and extensibility allows for custom precision. Custom flexibility allows for a variety of advanced concerns. Such concerns may include nuances in accounting for language specific criteria whether programming, spoken, or written. Due to flexible rules, whether colloquialisms or syntax variations, differences of literal text may commonly reside that are not represented as differences in the parsed, or interpreted, result. In artificially intelligent systems it is necessary to account for these variations as comparisons occur. 5. Security considerations None. 6. IANA considerations None. 7. References Heckel, P. (April 1978) "A technique for isolating differences between files". Communications of the ACM. 21 (4): 264-268. doi:10.1145/359460.359467 Myers, E. (1986) "An O(ND) Difference Algorithm and Its Variations". Retrieved from http://xmailserver.org/diff2.pdf on 3 October 2017. 8. Author's address Austin Cheney http://prettydiff.com info@prettydiff.com EXPIRATION 6 April 2017 Cheney Informational [Page 23]