Semantic Search (SQL Server)

Statistical Semantic Search provides deep insight into unstructured documents stored in SQL Server databases by extracting and indexing statistically relevant key phrases. Then it also uses these key phrases to identify and index documents that are similar or related.

You query these semantic indexes by using three Transact-SQL rowset functions to retrieve the results as structured data.

In This Topic

  • What Can I Do with Semantic Search?

    • Find Key Phrases in a Document

    • Find Similar or Related Documents

    • Find the Key Phrases That Make Documents Similar

  • Storing Documents in SQL Server

  • Related Tasks

  • Related Content

Semantic search builds upon the existing full-text search feature in SQL Server, but enables new scenarios that extend beyond keyword searches. While full-text search lets you query the words in a document, semantic search lets you query the meaning of the document. Solutions that are now possible include automatic tag extraction, related content discovery, and hierarchical navigation across similar content. For example, you can query the index of key phrases to build the taxonomy for an organization, or for a corpus of documents. Or, you can query the document similarity index to identify resumes that match a job description.

The following examples demonstrate the capabilities of Semantic Search.

Find the Key Phrases in a Document

The following query gets the key phrases that were identified in the sample document. It presents the results in descending order by the score that ranks the statistical significance of each key phrase. This query calls the semantickeyphrasetable (Transact-SQL) function.

SET @Title = 'Sample Document.docx'

SELECT @DocID = DocumentID
    FROM Documents
    WHERE DocumentTitle = @Title

SELECT @Title AS Title, keyphrase, score
    FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
    ORDER BY score DESC

TOP

The following query gets the documents that were identified as similar or related to the sample document. It presents the results in descending order by the score that ranks the similarity of the 2 documents. This query calls the semanticsimilaritytable (Transact-SQL) function.

SET @Title = 'Sample Document.docx'

SELECT @DocID = DocumentID
    FROM Documents
    WHERE DocumentTitle = @Title

SELECT @Title AS SourceTitle, DocumentTitle AS MatchedTitle,
        DocumentID, score
    FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
    INNER JOIN Documents ON DocumentID = matched_document_key
    ORDER BY score DESC

TOP

The following query gets the key phrases that make the 2 sample documents similar or related to one another. It presents the results in descending order by the score that ranks the weight of each key phrase. This query calls the semanticsimilaritydetailstable (Transact-SQL) function.

SET @SourceTitle = 'first.docx'
SET @MatchedTitle = 'second.docx'

SELECT @SourceDocID = DocumentID FROM Documents WHERE DocumentTitle = @SourceTitle
SELECT @MatchedDocID = DocumentID FROM Documents WHERE DocumentTitle = @MatchedTitle

SELECT @SourceTitle AS SourceTitle, @MatchedTitle AS MatchedTitle, keyphrase, score
    FROM semanticsimilaritydetailstable(Documents, DocumentContent,
        @SourceDocID, DocumentContent, @MatchedDocID)
    ORDER BY score DESC

TOP

Storing Documents in SQL Server

Before you can index documents with Semantic Search, you have to store the documents in a SQL Server database.

The FileTable feature in SQL Server 2012 makes unstructured files and documents first-class citizens of the relational database. As a result, database developers can manipulate documents together with structured data in Transact-SQL set-based operations.

For more information about the FileTable feature, see FileTables (SQL Server). For information about the FILESTREAM feature, which is another option for storing documents in the database, see FILESTREAM (SQL Server).

TOP