tika.parser module

tika.parser.from_buffer(string, serverEndpoint='http://localhost:9998', xmlContent=False, headers=None, config_path=None, requestOptions={}, raw_response=False)[source]

Parses the content from buffer :param string: Buffer value :param serverEndpoint: Server endpoint. This is optional :param xmlContent: Whether or not XML content be requested.

Default is ‘False’, which results in text content.

Parameters:

headers – Request headers to be sent to the tika reset server, should be a dictionary. This is optional

Returns:

tika.parser.from_file(filename, serverEndpoint='http://localhost:9998', service='all', xmlContent=False, headers=None, config_path=None, requestOptions={}, raw_response=False)[source]

Parses a file for metadata and content :param filename: path to file which needs to be parsed or binary file using open(path,’rb’) :param serverEndpoint: Server endpoint url :param service: service requested from the tika server

Default is ‘all’, which results in recursive text content+metadata. ‘meta’ returns only metadata ‘text’ returns only content

Parameters:
  • xmlContent – Whether or not XML content be requested. Default is ‘False’, which results in text content.

  • headers – Request headers to be sent to the tika reset server, should be a dictionary. This is optional

Returns:

dictionary having ‘metadata’ and ‘content’ keys. ‘content’ has a str value and metadata has a dict type value.