U i�f9E�@s�dZddlZddlZddlZddlmZdgZe�d�Ze�d�Z e�d�Z e�d�Z e�d �Z e�d �Z e�d �Ze�d �Ze�d �Ze�dej�Ze�d �Ze�d�ZGdd�dej�ZdS)zA parser for HTML and XHTML.�N)�unescape� HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]�>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) \s* # possibly followed by a space )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c@s�eZdZdZdZdd�dd�Zdd�Zd d �Zd d �Zd Z dd�Z dd�Z dd�Z dd�Z dd�Zd9dd�Zdd�Zdd�Zdd �Zd!d"�Zd#d$�Zd%d&�Zd'd(�Zd)d*�Zd+d,�Zd-d.�Zd/d0�Zd1d2�Zd3d4�Zd5d6�Zd7d8�Zd S):raEFind tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument. )ZscriptZstyleT)�convert_charrefscCs||_|��dS)z�Initialize and reset this instance. If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters. N)r�reset)�selfr�r�0/opt/alt/python38/lib64/python3.8/html/parser.py�__init__WszHTMLParser.__init__cCs(d|_d|_t|_d|_tj�|�dS)z1Reset this instance. Loses all unprocessed data.�z???N)�rawdata�lasttag�interesting_normal� interesting� cdata_elem� _markupbase� ParserBaser�rrrr r`s zHTMLParser.resetcCs|j||_|�d�dS)z�Feed data to the parser. Call this as often as you want, with as little or as much text as you want (may include '\n'). rN)r �goahead�r�datarrr �feedhs zHTMLParser.feedcCs|�d�dS)zHandle any buffered data.�N)rrrrr �closeqszHTMLParser.closeNcCs|jS)z)Return full source of start tag: '<...>'.)�_HTMLParser__starttag_textrrrr �get_starttag_textwszHTMLParser.get_starttag_textcCs$|��|_t�d|jtj�|_dS)Nz </\s*%s\s*>)�lowerr�re�compile�Ir)r�elemrrr �set_cdata_mode{s zHTMLParser.set_cdata_modecCst|_d|_dS�N)rrrrrrr �clear_cdata_modeszHTMLParser.clear_cdata_modec CsX|j}d}t|�}||k�r�|jrv|jsv|�d|�}|dkr�|�dt||d��}|dkrpt�d�� ||�sp�q�|}n*|j � ||�}|r�|� �}n|jr��q�|}||kr�|jr�|js�|� t |||���n|� |||��|�||�}||kr��q�|j}|d|��rJt�||��r"|�|�} n�|d|��r:|�|�} nn|d|��rR|�|�} nV|d|��rj|�|�} n>|d |��r�|�|�} n&|d |k�r�|� d�|d } n�q�| dk�r<|�s��q�|�d |d �} | dk�r�|�d|d �} | dk�r|d } n| d 7} |j�r*|j�s*|� t ||| ���n|� ||| ��|�|| �}q|d |��r�t�||�}|�r�|��d d�} |�| �|��} |d| d ��s�| d } |�|| �}qn<d||d�k�r�|� |||d ��|�||d �}�q�q|d|��r�t�||�}|�rP|�d �} |�| �|��} |d| d ��sB| d } |�|| �}qt�||�}|�r�|�r�|��||d�k�r�|��} | |k�r�|} |�||d �}�q�n.|d |k�r�|� d�|�||d �}n�q�qdstd��q|�rF||k�rF|j�sF|j�r(|j�s(|� t |||���n|� |||��|�||�}||d�|_dS)Nr�<�&�"z[\s;]�</�<!--�<?�<!rrz&#�����;zinteresting.search() lied)r �lenrr�find�rfind�maxrr�searchr�start� handle_datarZ updatepos� startswith� starttagopen�match�parse_starttag� parse_endtag� parse_comment�parse_pi�parse_html_declaration�charref�group�handle_charref�end� entityref�handle_entityref� incomplete�AssertionError) rr@r �i�n�jZampposr7r5�k�namerrr r�s�   �                                zHTMLParser.goaheadcCs�|j}|||d�dks"td��|||d�dkr@|�|�S|||d�dkr^|�|�S|||d���d kr�|�d |d�}|d kr�d S|�||d|��|d S|�|�SdS) Nr+r*z+unexpected call to parse_html_declaration()�r(�z<![� z <!doctyperr,r)r rDr:Zparse_marked_sectionrr/� handle_decl�parse_bogus_comment)rrEr �gtposrrr r<s  z!HTMLParser.parse_html_declarationrcCs`|j}|||d�dks"td��|�d|d�}|dkr>dS|rX|�||d|��|dS)Nr+)r*r'z"unexpected call to parse_comment()rr,r)r rDr/�handle_comment)rrEZreportr �posrrr rNszHTMLParser.parse_bogus_commentcCsd|j}|||d�dks"td��t�||d�}|s:dS|��}|�||d|��|��}|S)Nr+r)zunexpected call to parse_pi()r,)r rD�picloser2r3� handle_pir@)rrEr r7rGrrr r;!szHTMLParser.parse_picCs�d|_|�|�}|dkr|S|j}|||�|_g}t�||d�}|sPtd��|��}|�d���|_ }||k�r.t �||�}|s��q.|�ddd�\} } } | s�d} n\| dd�dkr�| dd�ks�n| dd�dkr�| dd�k�rnn | dd�} | �rt | �} |� | ��| f�|��}ql|||�� �} | d k�r�|��\} }d |jk�r�| |j�d �} t|j�|j�d �}n|t|j�}|�|||��|S| �d ��r�|�||�n"|�||�||jk�r�|�|�|S) Nrrz#unexpected call to parse_starttag()r+rK�'r,�")r�/>� rV)r�check_for_whole_start_tagr �tagfind_tolerantr7rDr@r>rr �attrfind_tolerantr�append�stripZgetpos�countr.r0r4�endswith�handle_startendtag�handle_starttag�CDATA_CONTENT_ELEMENTSr!)rrE�endposr �attrsr7rH�tag�m�attrname�restZ attrvaluer@�lineno�offsetrrr r8-s\    & � �       �    zHTMLParser.parse_starttagcCs�|j}t�||�}|r�|��}|||d�}|dkr>|dS|dkr~|�d|�rZ|dS|�d|�rjdS||krv|S|dS|dkr�dS|dkr�dS||kr�|S|dStd ��dS) Nrr�/rVr+r,r z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzwe should not get here!)r �locatestarttagend_tolerantr7r@r5rD)rrEr rerG�nextrrr rX`s.   z$HTMLParser.check_for_whole_start_tagcCs.|j}|||d�dks"td��t�||d�}|s:dS|��}t�||�}|s�|jdk rr|�|||��|St �||d�}|s�|||d�dkr�|dS|� |�S|� d�� �}|� d|���}|�|�|dS|� d�� �}|jdk �r||jk�r|�|||��|S|�|�|��|S) Nr+r'zunexpected call to parse_endtagrr,rKz</>r)r rD� endendtagr2r@� endtagfindr7rr4rYrNr>rr/� handle_endtagr#)rrEr r7rOZ namematchZtagnamer rrr r9�s8       zHTMLParser.parse_endtagcCs|�||�|�|�dSr")r`ro�rrdrcrrr r_�s zHTMLParser.handle_startendtagcCsdSr"rrprrr r`�szHTMLParser.handle_starttagcCsdSr"r)rrdrrr ro�szHTMLParser.handle_endtagcCsdSr"r�rrIrrr r?�szHTMLParser.handle_charrefcCsdSr"rrqrrr rB�szHTMLParser.handle_entityrefcCsdSr"rrrrr r4�szHTMLParser.handle_datacCsdSr"rrrrr rP�szHTMLParser.handle_commentcCsdSr"r)rZdeclrrr rM�szHTMLParser.handle_declcCsdSr"rrrrr rS�szHTMLParser.handle_picCsdSr"rrrrr � unknown_decl�szHTMLParser.unknown_declcCstjdtdd�t|�S)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.r+)� stacklevel)�warnings�warn�DeprecationWarningr)r�srrr r�s �zHTMLParser.unescape)r)�__name__� __module__� __qualname__�__doc__rar rrrrrr!r#rr<rNr;r8rXr9r_r`ror?rBr4rPrMrSrrrrrrr r?s8  z  3"()r{rrtrZhtmlr�__all__rrrCrAr=r6rRZ commentcloserYrZ�VERBOSErkrmrnrrrrrr �<module>s,          ��