In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made.
Current technologies to extract concepts from unstructured documents are extremely computational intensive.
FrameMaker has two ways of approaching documents: structured and unstructured.
It can perhaps help perform trend analysis across documents, determine the theme and gist of documents, allow fuzzy searches on unstructured documents.
Until this transpires, the web largely consists of unstructured documents lacking semantic metadata.
AnyDoc develops technologies to process structured, semi-structured, and unstructured (free-form) documents, as well as classification, and workflow.
For fully unstructured documents (e.g. legal contracts, customer correspondence, and white mail), it is not yet possible to locate and extract all information.
This image-based classification approach, combined with a full-text analysis of certain documents (based on a keyword search), are the main technologies used today to process semi- or unstructured documents.
This is not surprising: text in "unstructured" documents is hard to process.
STAIRS was used in-house by organizations such as large corporations and government agencies with large collections of unstructured documents.