Semantic Search (SQL Server)
Statistical Semantic Search provides deep insight into unstructured documents stored in SQL Server databases by extracting and indexing statistically relevant key phrases. Then it also uses these key phrases to identify and index documents that are similar or related.
You query these semantic indexes by using three Transact-SQL rowset functions to retrieve the results as structured data.
Semantic search builds upon the existing full-text search feature in SQL Server, but enables new scenarios that extend beyond keyword searches. While full-text search lets you query the words in a document, semantic search lets you query the meaning of the document. Solutions that are now possible include automatic tag extraction, related content discovery, and hierarchical navigation across similar content. For example, you can query the index of key phrases to build the taxonomy for an organization, or for a corpus of documents. Or, you can query the document similarity index to identify resumes that match a job description.
The following examples demonstrate the capabilities of Semantic Search.
Find the Key Phrases in a Document
The following query gets the key phrases that were identified in the sample document. It presents the results in descending order by the score that ranks the statistical significance of each key phrase. This query calls the semantickeyphrasetable (Transact-SQL) function.
SET @Title = 'Sample Document.docx' SELECT @DocID = DocumentID FROM Documents WHERE DocumentTitle = @Title SELECT @Title AS Title, keyphrase, score FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID) ORDER BY score DESC
Find Similar or Related Documents
The following query gets the documents that were identified as similar or related to the sample document. It presents the results in descending order by the score that ranks the similarity of the 2 documents. This query calls the semanticsimilaritytable (Transact-SQL) function.
SET @Title = 'Sample Document.docx' SELECT @DocID = DocumentID FROM Documents WHERE DocumentTitle = @Title SELECT @Title AS SourceTitle, DocumentTitle AS MatchedTitle, DocumentID, score FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID) INNER JOIN Documents ON DocumentID = matched_document_key ORDER BY score DESC
Find the Key Phrases That Make Documents Similar or Related
The following query gets the key phrases that make the 2 sample documents similar or related to one another. It presents the results in descending order by the score that ranks the weight of each key phrase. This query calls the semanticsimilaritydetailstable (Transact-SQL) function.
SET @SourceTitle = 'first.docx' SET @MatchedTitle = 'second.docx' SELECT @SourceDocID = DocumentID FROM Documents WHERE DocumentTitle = @SourceTitle SELECT @MatchedDocID = DocumentID FROM Documents WHERE DocumentTitle = @MatchedTitle SELECT @SourceTitle AS SourceTitle, @MatchedTitle AS MatchedTitle, keyphrase, score FROM semanticsimilaritydetailstable(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID) ORDER BY score DESC
Before you can index documents with Semantic Search, you have to store the documents in a SQL Server database.
The FileTable feature in SQL Server 2012 makes unstructured files and documents first-class citizens of the relational database. As a result, database developers can manipulate documents together with structured data in Transact-SQL set-based operations.
For more information about the FileTable feature, see FileTables (SQL Server). For information about the FILESTREAM feature, which is another option for storing documents in the database, see FILESTREAM (SQL Server).
