Apache Tika's icon

Apache Tika

A toolkit that detects and extracts metadata and text from over a thousand different file types.

Description

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Readme

Apache Tika

Getting started

# Launch the GUI application

turbo run apache/tika,eclipse/temurinjre-lts --startup-file=cmd -- /C "java -jar @SYSDRIVE@\tika\tika-app.jar


# Execute a command-line function

turbo run apache/tika,eclipse/temurinjre-lts --startup-file=cmd -- /C "java -jar @SYSDRIVE@\tika\tika-app.jar <args>"


# Execute a command-line function to output text from a file

turbo run apache/tika,eclipse/temurinjre-lts --startup-file=cmd -- /C "java -jar @SYSDRIVE@\tika\tika-app.jar -t "<input file>" > "<output file>""


Release Notes

No release notes


EULA

  • Actions

    Go to TurboScript
  • Dependencies
    No dependencies
  • Used By
    No repositories
  • Website
    Developer: tika.apache.org
    Support: tika.apache.org
  • Current
    3.1.0 updated 2 months ago
  • Details
    Updated: February 3, 2025
    Created: September 23, 2024