╤Є ЮкJc#@s=ddkZddklZddklZddkZddkZddgZy eZ Wne j o eZ nXyedZ Wnee fj o eZ nXdДZedДZd ДZd ДZdДZdДZd ДZdДZdДZdДZedДZdДZddkdДГYZddldДГYZdefdДГYZdДZ dДZ!dДZ"dДZ#dДZ$dДZ%de fd ДГYZ&d!e&fd"ДГYZ'd#e&fd$ДГYZ(e)d%ДZ*e)d&ДZ+ei,d'ei-ei.BГZ/ei,d(ei-ei.BГZ0ei,d)ei-ei.BГZ1d*ДZ2ei,d+ГZ3d,ДZ4dmZ5dnZ6doZ7edYДZ8dZДZ9ei,d[ГZ:d\ДZ;d]ДZ<d^ДZ=d_ДZ>d`ДZ?daДZ@edbДZAdcДZBddДZCdeДZDdfДZEdgeiFfdhДГYZGeHdijoddjklIZIeIiJГndS(pi N(tetree(tfragment_fromstringt html_annotatethtmldifft basestringcCs dtit|ГdГ|fS(Ns%si(tcgitescapet_unicode(ttexttversion((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pytdefault_markupsc CsРg}|D]\}}|t||Гq~}|d}x%|dD]}t||Г|}qEWt|Г}t||Г}di|ГiГS(s doclist should be ordered from oldest to newest, like:: >>> version1 = 'Hello World' >>> version2 = 'Goodbye World' >>> print(html_annotate([(version1, 'version 1'), ... (version2, 'version 2')])) Goodbye World The documents must be *fragments* (str/UTF8 or unicode), not complete documents The markup argument is a function to markup the spans of words. This function is called like markup('Hello', 'version 2'), and returns HTML. The first argument is text and never includes any markup. The default uses a span with a title: >>> print(default_markup('Some Text', 'by Joe')) Some Text iit(ttokenize_annotatedthtml_annotate_merge_annotationstcompress_tokenstmarkup_serialize_tokenstjointstrip( tdoclisttmarkupt_[1]tdocR t tokenlistt cur_tokensttokenstresult((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRs) cCs0t|dtГ}x|D]}||_qW|S(sFTokenize a document and add an annotation attribute to each token t include_hrefs(ttokenizetFalset annotation(RRRttok((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRDs cCs}td|d|Г}|iГ}xU|D]M\}}}}}|djo+|||!} |||!} t| | Гq(q(WdS(sИMerge the annotations from tokens_old into tokens_new, when the tokens in the new document already existed in the old document. tatbtequalN(tInsensitiveSequenceMatchertget_opcodestcopy_annotations(t tokens_oldt tokens_newtstcommandstcommandti1ti2tj1tj2teq_oldteq_new((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR Ls cCsPt|Гt|ГjptВx)t||ГD]\}}|i|_q0WdS(sN Copy annotations from the tokens listed in src to the tokens in dest N(tlentAssertionErrortzipR(tsrctdesttsrc_toktdest_tok((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR$Ys cCsu|dg}xa|dD]U}|dio3|io(|di|ijot||Гq|i|ГqW|S(sm Combine adjacent tokens when there is no HTML between the tokens, and they share an annotation iii (t post_tagstpre_tagsRtcompress_merge_backtappend(RRR((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRas cCs░|d}t|Гtj pt|Гtj o|i|Гnlt|Г}|io|d7}n||7}t|d|id|id|iГ}|i|_||dtpreRAtpost((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRВs cCsFt|Г}t|Г}t||Г}di|ГiГ}t|ГS(sМ Do a diff of the old and new document. The documents are HTML *fragments* (str/UTF8 or unicode), they are not complete documents (i.e., no tag). Returns HTML with and tags added around the appropriate text. Markup is generally ignored, with the markup from new_html preserved, and possibly some markup from old_html (though it is considered acceptable to lose some of the old markup). Only the words in the HTML are diffed. The exception is tags, which are treated like words, and the href attribute of tags, which are noted inside the tag itself when there are changes. R(Rthtmldiff_tokensRRtfixup_ins_del_tags(told_htmltnew_htmltold_html_tokenstnew_html_tokensR((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRЧs cCstd|d|Г}|iГ}g}x═|D]┼\}}}}} |djo'|it||| !dtГГq.n|djp |djo$t||| !Г} t| |Гn|djp |djo$t|||!Г}t||Гq.q.Wt|Г}|S(s] Does a diff on the tokens themselves, returning a list of text chunks (not tokens). RR R!tinserttreplacetdelete(R"R#textendt expand_tokenstTruetmerge_inserttmerge_deletetcleanup_delete(thtml1_tokensthtml2_tokensR'R(RR)R*R+R,R-t ins_tokenst del_tokens((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyREоs ccsЖx|D]w}x|iD]}|VqW|p|io,|io|iГdVqe|iГVnx|iD]}|VqoWqWdS(seGiven a list of tokens, return a generator of the chunks of text for the data in the tokens. R;N(R8thide_when_equalR<RAR7(RR!R>RCRD((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRO╘s cCs╗t|Г\}}}|i|Г|o)|didГo|dcd7ins_chunks to the end of that. i R;ss N(tsplit_unbalancedRNtendswithR:(t ins_chunksRtunbalanced_starttbalancedtunbalanced_end((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRQуs t DEL_STARTcBseZRS((t__name__t __module__(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR_·stDEL_ENDcBseZRS((R`Ra(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRb№st NoDeletescBseZdZRS(sY Raised when the document no longer contains any pending deletes (DEL_START/DEL_END) (R`Rat__doc__(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRc scCs+|itГ|i|Г|itГdS(s╛ Adds the text chunks in del_chunks to the document doc (another list of text chunks) with marker to show it is a delete. cleanup_delete later resolves these markers into ~~tags.N(R:R_RNRb(t del_chunksR((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRRs cCsx yt|Г\}}}Wntj oPnXt|Г\}}}t|||Гt|||Г|}|o)|didГo|dcd7~~. To do this while keeping the document valid, it may need to drop some tags (either start or end tags). It may also move the del into adjacent tags to try to move it to a similar location where it was originally located (e.g., moving a delete into preceding

tag, if the del looks like (DEL_START, 'Text

', DEL_END)i R;ss (tsplit_deleteRcRYtlocate_unbalanced_starttlocate_unbalanced_endRZR:RN(tchunkst pre_deleteRMtpost_deleteR\R]R^R((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRSs& c Csуg}g}g}g}xX|D]P}|idГp|i|Гqn|ddj}|iГdidГ}|tjo|i|Гqn|oм|oE|dd|jo0|i|Г|iГ\}}} | ||/i N( t startswithR:tsplitRt empty_tagstpopRNR0tNone( Ritstarttendt tag_stackR]tchunktendtagtnametposttagRt_[2]t_[3]((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRY2s< 1+2c Csby|itГ}Wntj o tВnX|itГ}|| ||d|!||dfS(sц Returns (stuff_before_DEL_START, stuff_inside_DEL_START_END, stuff_after_DEL_END). Returns the first case found (there may be more DEL_STARTs in stuff_after_DEL_END). Raises NoDeletes if there's no DEL_START found. i(tindexR_t ValueErrorRcRb(RiRytpos2((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRfZscCsx№|pPn|d}|iГdidГ}|pPn|d}|tjp|idГoPn|ddjoPn|iГdidГ}|djoPn|djptd|В||jo'|idГ|i|idГГqPqd S( s░ pre_delete and post_delete implicitly point to a place in the document (where the two were split). This moves that point (by popping items from one and pushing them onto the other). It moves the point to try to find a place where unbalanced_start applies. As an example:: >>> unbalanced_start = ['

'] >>> doc = ['

', 'Text', '

', '

', 'More Text', '

'] >>> pre, post = doc[:3], doc[3:] >>> pre, post (['

', 'Text', '

'], ['

', 'More Text', '

']) >>> locate_unbalanced_start(unbalanced_start, pre, post) >>> pre, post (['

', 'Text', '

', '

'], ['More Text', '

']) As you can see, we moved the point so that the dangling

that we found will be effectively replaced by the div in the original document. If this doesn't work out, we just throw away unbalanced_start without doing anything. is<>RliRmtinstdelsUnexpected delete tag: %rN(RoRR_RnR1RqR:(R\RjRktfindingtfinding_nametnextRx((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRgfs* cCs▄x╒|pPn|d}|iГdidГ}|pPn|d}|tjp|idГoPn|iГdidГ}|djp |djoPn||jo$|iГ|id|iГГqPqdS(st like locate_unbalanced_start, except handling end tags and possibly moving the point earlier in the document. i is<>/scBs8eZdZeZddedДZdДZdДZRS(s8 Represents a diffable token, generally a word that is displayed to the user. Opening tags are attached to this token when they are adjacent (pre_tags) and closing tags that follow the word (post_tags). Some exceptions occur when there are empty tags adjacent to a word, so there may be close tags in pre_tags, or open tags in post_tags. We also keep track of whether the word was originally followed by whitespace, even though we do not want to treat the word as equivalent to a similar word that does not have a trailing space.cCseti||Г}|dj o ||_n g|_|dj o ||_n g|_||_|S(N(Rt__new__RrR8R7R<(tclsRR8R7R<tobj((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRЕ├s cCs dti|Г|i|ifS(Nstoken(%s, %r, %r)(Rt__repr__R8R7(tself((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRИ╘scCs t|ГS(N(R(RЙ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRA╫sN( R`RaRdRRXRrRЕRИRA(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR>▓s t tag_tokencBs2eZdZddedДZdДZdДZRS(s║ Represents a token that is actually a tag. Currently this is just the tag, which takes up visible space just like a word but is only represented in a document by a tag. c CsMti|dt|fd|d|d|Г}||_||_||_|S(Ns%s: %sR8R7R<(R>RЕR=Rztdatat html_repr(RЖRzRЛRМR8R7R<RЗ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRЕрs cCs,d|i|i|i|i|i|ifS(NsRtag_token(%s, %s, html_repr=%s, post_tags=%r, pre_tags=%r, trailing_whitespace=%s)(RzRЛRМR8R7R<(RЙ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRИыscCs|iS(N(RМ(RЙ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRAєsN(R`RaRdRrRRЕRИRA(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRК┌s t href_tokencBseZdZeZdДZRS(sh Represents the href in an anchor tag. Unlike other words, we only show the href when it changes. cCsd|S(Ns Link: %s((RЙ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRA¤s(R`RaRdRPRXRA(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRНЎscCsNti|Гo |}nt|dtГ}t|dtd|Г}t|ГS(sk Parse the given HTML and returns token objects (words with attached tags). This parses only the content of a page; anything in the head is ignored, and the and elements are themselves optional. The content is then parsed by lxml, which ensures the validity of the resulting parsed document (though lxml may make incorrect guesses when the markup is particular bad). and ~~tags are also eliminated from the document, as that gets confusing. If include_hrefs is true, then the href attribute of~~ tags is included as a special kind of diffable token.tcleanuptskip_tagR(Rt iselementt parse_htmlRPt flatten_eltfixup_chunks(RARtbody_elRi((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRs cCs'|ot|Г}nt|dtГS(s Parses an HTML fragment, returning an lxml element. Note that the HTML will be wrapped in a

tag that was not in the original document. If cleanup is true, make sure there's no or , and get rid of any and tags. t create_parent(tcleanup_htmlRRP(RARО((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRСss s scCsjti|Г}|o||iГ}nti|Г}|o||iГ }ntid|Г}|S(s│ This 'cleans' the HTML, meaning that any page structure is removed (only the contents of are used, if there is any and tags are removed. R(t_body_retsearchRtt_end_body_reRst_ins_del_retsub(RAtmatch((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRЦ)ss [ \t\n\r]$c Cs*g}d }g}xт|D]┌}t|tГo╪|ddjov|d}|d}|idГo|d }t}nt}td|d|d|d |Г}g}|i|Гq|dd jo9|d}t|d|d tГ}g}|i|Гqqnt |ГoY|idГo|d }t}nt}t |d|d |Г}g}|i|Гqt|Гo|i|Гqt|ГoM|o|i|Гqє|pt d||||fВ|ii|Гqdpt ВqW|pt dd|ГgS|dii|Г|S(sM This function takes a list of chunks and produces a list of tokens. itimgiiR;i    RМR8R<threfs4Weird state, cur_word=%r, result=%r, chunks=%r of %rRN(Rrt isinstancettupleRZRPRRКR:RНtis_wordR>tis_start_tagt is_end_tagR1R7RN( Rit tag_accumtcur_wordRRvR3RzR<RЮ((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRУ9s\ tparamRЭtareatbrtbasefonttinputtbasetmetatlinktcoltaddresst blockquotetcentertdirtdivtdltfieldsettformth1th2th3th4th5th6thrtisindextmenutnoframestnoscripttoltpRCttabletultddtdttframesettlittbodyttdttfoottthttheadttrccsg|p>|idjod|idt|ГfVqEt|ГVn|itjo)|iot|Гo|iodSt|iГ}x|D]}ti |ГVqФWx0|D](}xt |d|ГD]}|Vq╠Wq│W|idjo0|iidГo|od|idfVn|p=t|ГVt|iГ}x |D]}ti |ГVqGWndS(s Takes an lxml element el, and generates all the text chunks for that tag. Each start tag is a chunk, each word is a chunk, and each end tag is a chunk. If skip_tag is true, then the outermost container tag is not returned (just its contents).RЭR3NRRRЮ( Rztattribt start_tagRpRR0ttailtsplit_wordsRRRТtgettend_tag(telRRПtstart_wordstwordtchildtitemt end_words((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRТбs.4 *cCsv|p|iГogSg}|iГiГD]}||dq2~}ti|Гp|dd |dRs %s="%s"(RzRR╨titemsRRRP(R╓RRxtvalue((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR╤╔scCs>|ioti|iГo d}nd}d|i|fS(sg The text representation of an end tag for a tag. Includes trailing whitespace when appropriate. R;Rs%s(R╥tstart_whitespace_reRШRz(R╓textra((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR╒╤s cCs|idГS(NRl(Rn(R((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRб┌scCs |idГS(Ns or tags inside of any block-level elements, e.g. transform
word
to
word
RОt skip_outer(RСRt_fixup_ins_del_tagstserialize_html_fragmentRP(RAR((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRFуs cCs~t|tГptd|Вti|dddtГ}|o5||idГd}||idГ }|iГS|SdS( sи Serialize a single lxml element as HTML. The serialized form includes the elements tail. If skip_outer is true, then don't serialize the outermost tag s3You should pass in an element, not a string like %rtmethodRAtencodingt>iRlN( RЯRR1RttostringRtfindtrfindR(R╓RуRA((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRхьscCsgx`ddgD]R}xI|id|ГD]4}t|Гpq'nt|d|Г|iГq'Wq WdS(s?fixup_ins_del_tags that works on an lxml document in-place RАRБsdescendant-or-self::%sRzN(txpatht_contains_block_level_tagt_move_el_inside_blocktdrop_tag(RRzR╓((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRф■s cCsL|itjp|itjotSx |D]}t|ГotSq,WtS(sPTrue if the element contains any block-level elements, like
, , etc. (Rztblock_level_tagstblock_level_container_tagsRPRэR(R╓R┘((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRэ s cCsjxq|D]}t|ГoPqqWddk}ti|Г}|i|_d|_|it|ГГ|g|(dSxнt|ГD]Я}t|Гo`t||Г|i oEti|Г}|i |_d|_ |i |i|Гd|Гq qБti|Г}|i||Г|i |ГqБW|io8ti|Г}|i|_d|_|i d|ГndS(st helper for _fixup_ins_del_tags; actually takes the etc tags and moves them inside any block-level tags. i    Nii(RэtsysRtElementRRrRNtlistRюR╥RKR}RLR:(R╓RzR┘RЄtchildren_tagttail_tagt child_tagttext_tag((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyRюs8 % cCs7|iГ}|ipd}|ioZt|Гp||i7}qА|dio|di|i7_qА|i|d_n|i|Г}|oЖ|djo d}n||d}|djo*|io|i|7_q||_q|io|i|7_q||_n|iГ|||d+dS(sй Removes an element, but merges its contents into its place, e.g., given
Hi there!
, if you remove the element you get
Hi there!
Ri    iiN(t getparentRR╥R0R}Rrtgetchildren(R╓tparentRR}tprevious((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyt_merge_element_contents4s* R"cBseZdZdZdДZRS(st Acts like SequenceMatcher, but tries not to find very small equal blocks amidst large spans of changes icCsЙtt|iГt|iГГ}t|i|dГ}tii|Г}g}|D].}|d|jp|do||qTqT~S(Nii(tminR0R t thresholdtdifflibtSequenceMatchertget_matching_blocks(RЙtsizeR tactualRR┌((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR^s!(R`RaRdR R(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyR"Vst__main__(t_diffcommand((( RжsimgRзRиRйsinputsbaseRмslinkRо(RпR░scentersdirR│R┤R╡R╢R╖R╕R╣R║R╗R╝R╜R╛R┐R└R┴R┬R├spreR─R┼( R╞R╟R╚R╔R╩R╦R╠R═R╬R╧(KRtlxmlRt lxml.htmlRRtret__all__tunicodeRt NameErrortstrt__builtins__RtKeyErrorR RRR R$RR9RRRERRORQR_Rbt ExceptionRcRRRSRYRfRgRhR>RКRНRPRRСtcompiletItSRЧRЩRЪRЦR▄RУRpRЁRёRТR╙RсR╤R╒RбRгRвRFRхRфRэRюR¤RR"R`Rtmain(((s4/usr/lib64/python2.6/site-packages/lxml/html/diff.pyts╞ ( & ' ( 2 ( < ! "