Friday, November 11, 2011

How to Use SharePoint Metadata to Improve Search and Control Content

Documents find-ability in the enterprise environment involves navigation through lists of folders and documents or filtered document sets; faceted navigation structures; or parametric and full text search results. There are two mechanisms that are the foundation of these techniques: extracting and indexing the usable text from the document (which may not be even possible in some cases i.e. pictures or videos) and using human manageable metadata to describe documents (that could be auto-populated using the text extracted from the document).

Content control functions include document workflows, security, printing, archival or retention policies, publishing and other mechanisms that enforce proper document capture, authoring, storage, distribution and disposal. To maintain content states and control information, control activities rely on information about the document, which is stored as a part of the metadata. Since control activities tend to vary from organization to organization, so does the metadata associated with them.

In order to improve search, following approaches would help.

Step I: Optimizing SharePoint Architecture to Improve Search and Control Content
Each SharePoint site includes an out-of-the-box content type hierarchy and predefined columns. They define basic columns necessary to describe content. Examples include name and title columns for a document or date and description for calendar events.

Following figure shows columns that are provided with an out-of-the-box SharePoint document library. It includes generic columns like Title or Created By




Next figure shows SharePoint Content Types Gallery with out-of-the-box content types



A common approach to enhance or customize the available set of fields would be to insert additional columns directly into lists and thus collect more information about the content stored there. Furthermore, adding these columns to list views will facilitate navigation through the content and enable the use of parametric search and/or faceted navigation in the list.



An ideal tactic is to identify typical metadata sets, which can be implemented as content types on the site or site collection level, and reused in lists throughout the site hierarchy.

A similar methodology to list design will result in a standard approach to content accumulation and tagging. Lists with the similar columns should be redesigned in a way where these columns are aggregated in content types on a site level and then pushed down back to the lists. These changes will greatly improve faceted navigation, parametric and basic search for all items inherited from the base content type. Inherited content types also allow the use of generic workflows and policies applicable to all such items.

Content types defined on a site level provide an additional opportunity to structure content in lists in a way that will facilitate common search and content control practices. For example, site content types can be reorganized to reflect the structure of a company and its products and services.

In order to do this, similar columns applicable to many content types should be first extracted and moved up to parent content types. These can then be re-organized to reduce duplication, thus forming a corporate hierarchy of the content types. Next, the SharePoint site collection should be redesigned in a way that reflects the hierarchy of the company workgroups with clearly defined policies of how sites on each site hierarchy level can be created and managed.

Finally, content types should be positioned in the appropriate place in a site hierarchy for their intended use. This will promote creation of new sites that will inherit standard corporate metadata models, search and navigation techniques and general workflows.

Step II: Classifying SharePoint Content to Improve Search and Control Content
Document and item find-ability (and most workflows) in SharePoint rely on the actual metadata values associated with the content. Efficient metadata models improve search and navigation processes and enable generic workflows to be applied to content. This, however, will not be realized unless the metadata is reliably populated for all content. A metadata model needs to achieve a fine balance between not enough metadata to make the content “find-able”; and excessive metadata which adds too much burden to users working with the content.

To encourage accurate and complete entry of metadata, model should simplify the capture of the data for the user.
• Field values could be populated from a pre-defined vocabulary
• Display a hierarchical tree of terms for the user to select from.
• Field values selection could be grouped in cascading relationships

Several approaches are available for assigning metadata to a large number of items that have little or no existing metadata.

 Analyze the text of the document itself and then use an algorithm to extract and assign usable metadata values. Text analytics strategies may be very complex and based on the dictionaries that require separate maintenance and time to learn data patterns and relationships.
 Mass tagging involves filtering a specific set of documents based on a criterion that includes existing metadata and then updating the metadata values for all these items, en masse.

Step III: Enable Content Type Refinement in Search Result page
What you have done in last two steps, needs to get reflected back in search result page and user can get benefit out of the content type and metadata capture process.

Please see the blog a sample process of enrichment.

Friday, October 7, 2011

Office 365 SharePoint Online: Managed Path

Web Application manages a list of managed path. Site Collections created only where a managed path is defined. Site collection has all sites under it in the same Content Database. This is key to how we figure out what database user data is stored in.

Whenever SharePoint receives an 'URL' , the site collection is determined by looking at the list of managed paths for a given Web Application. This means SharePoint has to look at every managed path so try to limit the number of managed paths (<20 is highly recommended).

Office 365 S(Shared) or D(Dedicated) only supports out of the box managed path. i.e. /sites/,/personal/

In order keep the performance high for site querying through browser and search service, this limitation has been imposed.

Note:Key Points for Managed Path
•Managed Paths allow SharePoint to determine what portion of a given URL corresponds to the "site collection URL".
•Managed Paths can be defined per web application (and cannot be defined for host header site collections)
•Managed Paths can be "Explicit" or "Wildcard"
•Explicit Managed Paths allow a single spsite to be created at exactly the given url
•Wildcard Manage Paths allow unlimited spsites to be created under the given url – no spsite can be created at exactly that URL.
•Limit your managed paths to <20 per web application