implementation.org 12 KB

#+TITLE: Org Merge Driver Implementation Notes #+STARTUP: #+FILETAGS :note:doc: * Project Details ** Contact Information Andrew Young, youngar17 at gmail.com ** Milestones [42%] :milestones: *** DONE Start Project *** DONE prototype parser (limited functionality) *** DONE prototype tree construction *** DONE prototype tree change detection *** DONE prototype tree merging *** DONE finish prototype *** TODO tree data interface *** TODO org-mode parser *** TODO general merging rule interface *** TODO org mode change detection *** TODO org mode merging rules *** TODO additional features *** TODO finishing touches *** TODO documentation and distribution ** Coding Conventions - [[http://www.gnu.org/software/emacs/elisp/html_node/Coding-Conventions.html][elisp conventions]] - must sign a [[http://orgmode.org/worg/org-contribute.html][copyright grant]] for the FSF! - [[http://www.gnu.org/prep/standards/standards.html][GNU Coding Standards]] - use [[http://www.doxygen.org/][Doxygen]] to document the code ** Collaboration * Notes * General Merging Implementation ** Elements - an element is any individual element in a file - elements are stored as a hierarchy - this is to identify not just changes of line-by-line content, but also of relationships (e.g. data is moved) - elements are defined by their mapping entity - elements may not always have a perfectly identifying trait - i.e. a unique object identifier - defined by: - how to read elements - how to write the elements - what changes are available - how to map them - how to merge changes - every element has a parent, and a list of sub-elements - the sub elements list can be empty - an ordered set of elements are represented by: - mapping which needs identical parents - each element is the next one's parent ** Node Identification - Node identification is mapping elements from one file to the same node in a different file - whether or not two elements are mapped depends on the element type, and it's mapping rules - a mapping which is missing another element should represent an element being added or removed - this is really up to the element's rules, it can do whatever it wants with the information - two main types of mappings: - one which needs to have the same mapped parent - one which does not need to have the same parent - a single mapping is from 1 file to another, but can have more then one elements from each - this can be the case for indistinguishable elements - mapping function an element type should have access to: - know if the element's parents are mapped - the element data for each - if the compared element is already mapped, it should have access to the mapping - mapping function will not have access to: - the coordinate of the element in the document - mapping may be affected by whether the elements are considered 'ordered' or 'unordered' - ordered mapping can use an edit script and detect differences in a sequence with exact matching rules - unordered mapping will not take into account the order in which items appear in - mapping function can consider more elements of a - elements which are indistinguishable ** Actions or Changes - Different changes to the same node can be grouped into 3 categories: - identical changes - non-conflicting changes - conflicting changes - changes will only be classified after all changes have been detected - a change will be applied to a mapping - These changes apply to all elements: - adding element - removing element - reordering children - Every other element will have their own set of changes - ie: property changing, moving - these will be detected by the program upon successful mapping between files - at the same time as all other changes - if text has not changed in a branch of elements, then it will not be mapped? - this will not work, if mapping can happen across different levels ** rules - should merging rules should consider all changes across all elements - or only changes to a specific element ** Results - results will be printed in standard conflicting file style: #begin_quote >>> there file === your file <<< #end_quote - the contents of a conlflicting change will have to be printed by each change pair representing a conflict ** Implementation Notes - Very large file support -> stream reading files - this could cause many slow-downs - #begin_ blocks could cause a problem - Tables - Agendas? Calendar? - meta data - stored with the element? - could be used to assist mapping (i.e. UIDs) - need to support writing certain text in different encoding - need to be able to add and remove features / rule conservatism - mapping conservatism - this will automatically affect change detection - will rules be needed for this? - change merging rules - there will be some user defined values that are not defined in this file - are there any other org-mode rules that wont be in the file? ** What I don't have to do - detection of file name changes ** Potential Problems *** Commit and Merge timing problems A merge will always be the same for two files. However, the order of mergin and branching can sometimes produce different files. This problem stems from the fact that 'diff' is the difference between two repository states, and merges aren't transitive across a branches history. http://r6.ca/blog/20110416T204742Z.html example master and branch edit branch edit master merge master -> branch edit master merge master -> branch can produce different results from a merge than edit branch edit master edit master merge master -> branch how to combat? I think that this can be combated by keeping unique IDs for headings, like in mobile org. caveats? what if someone is relying on the non-transitive behavior? (i don't know how) * Prototype ** Parsing - elements: the only element in the prototype is headlines - headlines - identified by a newline staring with stars and ending with a space - headline level is the number of stars - following text is the title - everything not in a title is in the text - text in the prototype is not its own element, just a property of the heading - the file is parsed into a tree - with all headings considered a subheading, top level headings are a subheading of the document - if a heading level is skipped (i.e. going from * to ***) it is still a child of the other - headings are all assumed to be unordered - the entire file is loaded into memory and must remain until an org_document no longer needs to exist ** Mapping - mapping is the process of matching two headings in a file - a mapping can represent indistinguishable elements - it is expected that as the files are processed, they will change into - mappings are stored in a hiearchy, with the same as the elements - mapping happens with a locality heurisitic ** Changes - detected by matching a heading title exactly - measuring changes from the ancestor to each of the new files - each heading will have 2 pointers to changes that affect them - if there are 2 changes to a heading, then there is a conflict - if there are two headings of the same name, it will always produce a conflict (should this be implemented?) when removing, or when 2 change text - every heading which is indistinguishable must have headings - changes are 'add heading', 'remove heading', 'change text' - a heading must be in the same place - changes are actually applied when the document is printed to a file - conflict messages are automatically inserted *** Changes - There are only basic changes for unordered lists implemented: - Add heading - when a new heading has been added to the current spot - Remove heading - Modify text - Move a heading ** Difference Detection - use the same difference detection algorithm used in the UNIX diff program[fn:4] for ordered lists *** Output - * Org-merge-tool ** Org-Mode Data Representation - Elements below a heading will be considered unordered - Numbered list - Identified as a whole if it matches exactly *** Org-mode Document Elements **** Headlines - Headings will be considered unordered trees - A heading will be matched by its UID if it has one - headings will be matched by their title otherwise Never ordered - Keywords not apart of the name **** Keywords - TODO, user defined, properties - perfile keywords #+TODO: ***** TODO What is a perfile :doc:note:question: **** Cookies: [#A] - always in square brackets? - Always in headings? - [#A], [0%], [0/0] **** Tags :tag: - only for headlines - #+filetags for tags for everytihng in the file **** Plain lists - within an outline tree entry - unordered (start with '-', '+', '*' as bullets) - ordered (numeral followed by - description (i.e. ::) (considered unordered items) - ends at two blank lines or less indented - must have the same indentation level - can have checkboxes **** Blocks #BEGIN_.. blocks - #+STARTUP **** Drawers - :DRAWERNAME: this dra :END: - time stamp note in a drawer (C-c C-z) - drawers need to be defined in variable org-drawers - or by file basis: #+DRAWERS: HIDDEN PROPERTIES STATE - PROPERTIES for properties **** Properties - always in PROPERTIES drawer ***** TODO Can Properties be consolidated with drawers? :question: **** Timestamps - in < > or [ ] brackets at the end of a headline - Can actually be placed almost anywhere - Can be inside **** Footnotes - [[fn:2] - [fn:1] footnotes! - footnote inline definitions[fn:3] - [fn:: this is inline] - there is a function to automatically - footnote defintions do not need to be ordered - see org-footnote-auto-label **** Tables (needs more work) - table row order might matter - '/' must be first field, - only 1? - Table references will be updated to match ***** TODO Define Table Elt - Calculations? Exist? **** Hooks? - org-footnote-auto-adjust fixes footnote defintion order ** Command Line Interface The cli will be written using argp and will conform to the gnu standards. Existing mergetool interfaces will be used a basis. *** Options **** Command Line Output **** File Output **** Merge Strategies *** TODO-List ** Lexer *** Responsibilities The Lexer is responsible for: - Complete Analysis of the input files - There is no longer any parser preforming intermediate analysis. - The Lexer is NOT responsible for: - Establishing the file stream. - There is no longer **** TODO Make explicit the Lexer's responsibilities - Especially relating to the parser. *** Implementation Overview - The Lexer is implemented with [[http://flex.sourceforge.net/][flex]]. - Documentation for flex can be found [[http://flex.sourceforge.net/manual/][here]]. *** Tokens **** ** Org Elements - The document elements are placed into a tree for processing *** aeuaoeu ** List of Org Elements *** Headings Headings level is **** Matching Heading Elements - Heading elements may be matched by text if their parents are matched - this allows parents to be matched by UID. - Heading elements may be matched to any element with the same UID - Heading elements may only match elements of the same type. ** Modification Guesser? 1. look for differences in the files 1. this seems like it might have to be O(n^2) 2. create a list of changes (ordered?) 1. tree of changes (first element is nothing) 2. find conflicting changes 3. process the conflicting changes, applying generic rules ** Modification Merger - if A and B both add a heading with the same name in the same place, should it conflict or should both be added? - if a heading is moved in A, and B adds a subheading, should this conflict as B is editing a removed heading in A, or should it merge with the new subheading under the moved heading in A * Footnotes and Bibiliography - [[http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/git-merge-changelog.c][git-merge-changelog]]provides a good example on how to write a merge driver - [[http://www.hiit.fi/files/fi/fc/papers/doceng04-pc.pdf][Three-way XML Merge]] [fn:4] The basic algorithm is described in "An O(ND) Difference Algorithm and its Variations", Eugene W. Myers, 'Algorithmica' Vol. 1 No. 2, 1986, pp. 251-266; and in "A File Comparison Program", Webb Miller and Eugene W. Myers, 'Software--Practice and Experience' Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in "Algorithms for Approximate String * General Algorithm