tika.parser module¶
- tika.parser.from_buffer(string, serverEndpoint='http://localhost:9998', xmlContent=False, headers=None, config_path=None, requestOptions={}, raw_response=False)[source]¶
Parses the content from buffer :param string: Buffer value :param serverEndpoint: Server endpoint. This is optional :param xmlContent: Whether or not XML content be requested.
Default is ‘False’, which results in text content.
- Parameters:
headers – Request headers to be sent to the tika reset server, should be a dictionary. This is optional
- Returns:
- tika.parser.from_file(filename, serverEndpoint='http://localhost:9998', service='all', xmlContent=False, headers=None, config_path=None, requestOptions={}, raw_response=False)[source]¶
Parses a file for metadata and content :param filename: path to file which needs to be parsed or binary file using open(path,’rb’) :param serverEndpoint: Server endpoint url :param service: service requested from the tika server
Default is ‘all’, which results in recursive text content+metadata. ‘meta’ returns only metadata ‘text’ returns only content
- Parameters:
xmlContent – Whether or not XML content be requested. Default is ‘False’, which results in text content.
headers – Request headers to be sent to the tika reset server, should be a dictionary. This is optional
- Returns:
dictionary having ‘metadata’ and ‘content’ keys. ‘content’ has a str value and metadata has a dict type value.