0

SharePoint Fast Search Concepts and Terminology – Part 4/4

This blog post is Part 4/4 of blog post series that will help you to get familiar with few concepts and terminologies referred in any search technology. As this blog series is more focused towards ‘Fast Search for SharePoint’ you may see jargon relevant to this.

Please check other related posts  Part 1| Part 2 | Part 3 | Part 4

The following graphic gives ten thousand foot view of what I am trying to capture and explain in this blog post series.

 

Fast Search Terminology and Concepts

8. Linguistics

Search engines perform a lot of language specific processing. This includes applying rules specific to that language and offensive language filtering, synonyms etc. Explaining in detail about how FS4SP behaves for languages is out of scope of this blog post.

9. Tokenization

When content sources are defined, Search engines crawl and will index those and will generate indexes. Some search engines refers to them as documents. Tokenization is a process of analyzing those documents and breaking the text into terms or tokens recognizable by the search engine.

10. Keyword Rank

This is the technique to improve ranking on documents that contains predefined keywords. This is a useful technique to forcefully enhance ranking among documents for better enhanced search experience.

11. Rank Profiles

Rank profiles are core component with FS4SP that defines how to rank items with in result set. There is a provision to create multiple Rank Profiles via Powershell, that defines how weights are applied to items or any components.

 

 

 

 

9. Keyword Rank

10. Rank Profiles

0

SharePoint Fast Search Concepts and Terminology – Part 3/4

This blog post is Part 3/3 of blog post series that will help you to get familiar with few concepts and terminologies referred in any search technology. As this blog series is more focused towards ‘Fast Search for SharePoint’ you may see jargon relevant to this.

Please check other related posts  Part 1| Part 2 | Part 3 | Part 4

The following graphic gives ten thousand foot view of what I am trying to capture and explain in this blog post series.

Fast Search Terminology and Concepts

7.Rank Tuning

By definition ‘Rank’ is the position in the hierarchy, in the search world ‘Rank’ is the position of the search result item in the result set. For instance, lets consider Bing or Google Search, when you search for ‘shoes’ or ‘vacation’ or in fact any search term you will find lot of search results. The first few results are the one user would click and finally end up being the revenue generators. So it is important to have your website show up in the first few results and is influenced by Rank. Seriously, how many times did you pass the first page of Google search results? So there are various technique that can be used to tune your Ranking. In the Fast Search world we have the following techniques.

7.1 Static/Quality Rank Tuning: URL Depth, Doc Rank, Site Rank, Hard Wired Boost

As the name indicates this the technique works independent of the search term. Which means that you would like to rank items independent of what user searches. Confused? OK, let me go through few examples and you would realize that you can tune Ranking independent of search terms.

URL Depth Ranking:
Observe  the below URLs and think about which of the two link, you would prefer to click?

1. http://www.sdakoju.wordpress.com/post/2015/mostpopular/fastsearchconfiguration

                                       OR

2. http://www.sdakoju.wordpress.com/FS4SPConfiguration

I would choose the second one, because it is more readable, clean and a more friendly URL. The depth of the URL is very less compared to the first one. ‘Depth’ indicates the importance or popularity of the page. The more deep the the page lies with in the site, the less popular the page is.  Obviously, you would always highlight your popular pages as your landing or one level below items. Search engines would like to show popular pages and not disliked or unknown pages lying some where deep down in the site.

Doc Rank:
If you have a page that is referenced across multiple pages, it influences the rank. Pages like these are ranked higher.

Site Rank:
This is similar to Doc Rank, the more links pointing the site or items with in the site the site has a higher rank

Hard Wired Boost:
Items can be give static ranking via Powershell and forcefully shown are high rank items

7.2 Dynamic Ranking:

This ranking value is based on query and its relation to the result set. This ranking is based on an multiple techniques/algorithms that influence ranking. Covering those is out of scope of this blog post. Please refer to Tune Dynamic Rank

 

0

SharePoint Fast Search Concepts and Terminology – Part 2/4

This blog post is Part 2/4 of blog post series that will help you to get familiar with few concepts and terminologies referred in any search technology. As this blog series is more focused towards ‘Fast Search for SharePoint’ you may see jargon relevant to this.

Please check other related posts  Part 1| Part 2 | Part 3 | Part 4

The following graphic gives ten thousand foot view of what I am trying to capture and explain in this blog post series.

Fast Search Terminology and Concepts

5. Recall and Precision

The total number of results in the result set for a query. You have to find a fair balance between Recall and Precision. The results set should not be too large and should avoid noise as much as possible. If the Recall is too large or too small it will hamper Precision.

There are various techniques to improve Recall such as Synonyms, Stemming etc.

Synonyms:

This is pretty common technique and a very obvious one. For instance, if you search for “happy” the search would also query for “joy”, “elated”, “merry” etc.

Stemming:

The use of Stemming is to get to the root form of a word. Stemming compares the root forms of the search terms to the documents in its content sources. For example, if the user enters “viewer” as the query, the search engine searches for “view” and returns all documents with view, viewer, viewing, preview, review etc

Recall and Precision Balance

Recall and Precision Balance

6.  Corpus

Corpus is Latin term for body. In the Search world, it refers to the scope of all the content sources the crawler would crawl and indexes. Following gives an example of what Corpus can include.

Corpus in Search

Corpus in Search

 

0

SharePoint Fast Search Concepts and Terminology – Part 1/4

This blog post is Part 1/4 of blog post series that will help you to get familiar with few concepts and terminologies referred in any search technology. As this blog series is more focused towards ‘Fast Search for SharePoint’ you may see jargon relevant to this.

Please check other related posts  Part 1| Part 2 | Part 3 | Part 4

Before we deep dive into these concepts, lets try to capture the overall process flow for any search engine. Typically content for any organization is stored in databases or documents on a file system. These documents can be of any type word, excel, images, videos, pdfs power-points etc. The primary job of any search technology is to crawl, index and surface results.

  1. Crawl:

    1. Once you have identified what content you want to crawl you will make search engine aware of where these are located and will grant appropriate permissions to crawl.  A crawl is basically collecting data and primarily metadata about the content.
  2. Index:

    1. Indexing is similar to how indexes work with books, they are just pointers to the actual location of the content. Typically indexes are physical files that are added to the file system and are output of a crawl.
  3. Search:

    1. Once the crawling and Indexing are complete its time to search these indexes, since these indexes are present on the file system querying these is lot faster.

The following graphic gives ten thousand foot view of what I am trying to capture and explain in this blog post series.

Fast Search Terminology and Concepts

Fast Search Terminology and Concepts

Lets get started:

1. Content Processing:

This is the nexus of any search engine, this defines what data-sources the search engine should crawl and the quality of the content itself for crawling. This is all about enriching the content even before it is being crawled.  Some of the tasks include

  1. Making sure that the search engine doesn’t crawl lot of noise consequently impacting the recall
  2. Detecting the language and applying rules
  3. Extracting meta data etc. and the list goes on.

2. Query Processing:

This kicks in after user performs the search, the search engine analyzes what the user is actually requesting and will accept additional query parameters if needed. This also matches result items in the search index  and returning search results to the user.

3. Relevancy:

This is the measure of how accurate/precise the search results are. There are various factors that determine how good the relevancy is, for instance the more the user find the intended results in the top search the better the relevancy is.  There are various techniques that can improve relevancy which will be discussed more in other part of this blog post.

4. Query Expansion: Best Bets, Synonyms, Lemmatization

Query expansion is a technique to improve Recall. The user may search for a term, search engines would not only search for specific term but also other relevant terms. This section explains on various techniques on how Query expansion can be accomplished.

4.1 Lemmatization

Lemma is a greek word which mean assumption or the canonical form of the word. For instance if the user searches for ship it would search shipped, ships etc. Not to be confused with Stemming where only the end of the word changes where it substitutes only the ending. For instance, Stemming would search for See, Seen, Seeing but no Saw. Where as Lemmatization would search for ‘Saw’ as well.

4.2 Synonyms

This is one of the most popular Recall technique, where Search engine would return results not only to the search terms but also for its synonyms. For instance if the user searches for ‘Joy’, it may include results for ‘Merry’, ‘Happy’, ‘Elated’, ‘Celebration’ etc.

4.3 Best Bets/Visual Best Bets

Best Bets are usually links displayed on the top of the search results pointing to different pages or content. These links are manually curated by administrator to display for a particular search term. Visual Best Bet similar to Best Bet except that an additional image is provided along with link and description.

 

11

Errors were encountered during the configuration of the Search Service Application.

I have encountered the following exception after configuring Domain Controller on my stand alone SharePoint 2013 Azure VM.

The actual exceptions is “Windows NT user or group ‘SDsdakoju’ not found. Check the name again. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.”

Root Cause:
Since I did not start off with a Domain Controller, all my SQL accounts were in “machinenameusername” format, so the SQL server Logins did not get the updated username format “domainnameusername” 

Following is the screen capture of the exception

Windows NT user or group not found

Windows NT user or group not found

Resolution:

I have modified the user accounts to “domainname/username” . I set my domain as ‘SD’. Please see below Before-After screen captures. Everything else is self-explanatory.

Errors were encountered during the configuration of the Search Service Application

Screen capture showing before and after changes to the SQL Logins

Hope this helps to resolve your issue.

0

What debuted in SharePoint 2010 and what happened in 2013?

This post will help you understand new features added to SharePoint 2010 and also its new first-class tools, that help developers speed up their development, debugging and deployment process. Since 2013 is already out and with companies rapidly adopting it, it makes sense at this point to mention about the enhancements to each of them in 2013.

1. Sandbox Solutions and Resource Governors:

Quick Overview:
Sandbox solutions is a new concept for 2010. A sandbox solutions run in a restricted execution environment that allows programs to access only certain resources which would consequently contains/restricts bugs to that Sandboxed environment only with out affecting the rest of the SharePoint farm.

One good examples would be Sandbox solutions cannot utilize certain local or network resources, and  may not have access content outside of the site collection they are located in [1]. To get an overview of when to use Sandbox solutions with 2010 you can read my post- Sandbox Solutions – SharePoint 2010

What changed in 2013?
Things changed in SharePoint 2013, Microsoft do not actively  encourage adopting Sandbox solutions as first design choice. These will be still available for backward compatibility only. Microsoft encourages to use ‘Apps‘ instead and leverage their new ‘Apps Model’.

2.Client Object Model

Quick Overview:
With SharePoint 2007, there were limited options to interact with SharePoint. The SharePoint object model is available only for server side applications only. Which means that your code has to be hosted and running on one of the SharePoint servers. There are very limited options for Client applications to interact with SharePoint data.

The only option available in SP 2007  for client applications is to use the web services API. This always worked great on server side code where the service metadata is downloaded and developers could use them seamlessly. But client-side technologies like JavaScript did not fit well with this set up.

So ‘Client Object Model’ was introduced to help develop client side applications that can leverage REST. Also WPF(Windows Presentation Foundation) or Silverlight  also needed something like this to build faster and bandwidth friendly apps.

Client Object Model comes in two flavors

a. Support for .net based clients like WPF or Silverlight apps.
These applications would typically add reference to the following dlls and  use appropriate classes/ methods/ objects to develop applications.
Microsoft.SharePoint.Client.dll
Microsoft.SharePoint.Client.Runtime.dll
Microsoft.SharPoint.Client.Silverlight.dll
b. Support for Javascript based clients
These application would typically reference
SP.js
SP.Core.js
SP.Ribbon.js
SP.Runtime.js.

All of these files are also have debug versions (SP.Debug.js) available for debugging purposes, but these files are larger in size and should not be used on production.

So effectively use them in development environment, by setting SharePoint deployment to use debug-versions  ‘<deployment retail=”false”>‘ under ‘System.web’ element.

What changed in 2013?
All the Client object api calls in 2010 are made via a WCF entry point which is not directly accessible. Proxies have to be used build via .NET code or JavaScript libraries. There were harder to write, there was no compile time checking and less intellisense support.

These are now lot improved in 2013

– Fully leverage REST based API calls that use basic HTTP for CRUD operations

– Client Object model now supports oData protocol. OData is a mainstream data access api for HTTP based clients  for creating and consuming data APIs. OData builds on core relies on protocols like HTTP and commonly accepted methodologies like REST.

– Extended API to support more server-side functionalities like User Profiles, Search, Taxonomy, Search, Feeds, Publishing, Sharing, Workflow, E-Discovery, IRM, Analytics, Business Data etc.

– CSOM also supports Windows Phone Applications

– **Deprecated ‘ListData.svc’, but still available for backward compatibility for older applications. ‘Client.svc’ is introduced with more endpoints catering for more functionality.

More on the following. Keep Reading 🙂

3. Business Connectivity Services

4. List Enhancements

5. Enhancements to Visual Studio

6. Web Solution Packages

7.  Developer Dashboard

8. SilverLight integration

9. Web 2.0 Protocols and New Standards

10. LINQ Enhancements

11. SharePoint Designer Enhancements

12. Visio, Access and InfoPath Enhancements

References:

1. MSDN: Sandbox Solutions Overview (SharePoint Server 2010)

0

Shared Services in SharePoint 2007 vs Service Applications in SharePoint 2010

Why not Shared Services?

As the name indicates these Services are created to be shared among multiple web applications. Primarily these services would provide Search, Excel Services, Profile Synchronization, Audiences and My Sites etc.  For instance if you want to have ‘Search’ enabled on you web application you have to subscribe to all the other services too, it is in fact all or nothing scenario. A perfect analogy would be a combo in fast food restaurant you have to have your fries and drink along with your burger. As explained in the below diagrams ‘Share Service 1’ is share among multiple web applications. ‘Web Application 1’ would need only ‘Search’ but it is served with the rest of the redundant services too.  On the other hand for example due to the sensitive nature of the data if a web application is required to have its own ‘Search’ you still have to create new ‘Shared Service’ which will  feed the Web application with all the other redundant services. The downside to this approach is some of these services would require a stand alone database which has to be created and populate for no reason.

SharedServices1

Why Service Applications?

Unlike Shared Services, Service Applications can subscribe only to the required service as show below. That is one reason why this is referred as Service A la carte and a good marketing strategy by Microsoft.

Service Applications

Key Advantages of Service Applications over Shared Services:

  1. As each service can be subscribed individually redundant databases need not be created to support unwanted services
  2. The architecture is loosely coupled any third party service applications can be easily integrated.
  3. Cross farm communication is possible via Service Application proxy and location easily using Service URI
1

Deploy InfoPath 2010 with manage code as ‘Content Type’ via a ‘Feature’ in SharePoint 2010

1. Creating InfoPath:
I have created the following InfoPath for demonstration purpose, please ignore the content. Before we go further  I believe you have your InfoPath form ready for deployment.

InfoPath form

InfoPath form

2. Publish InfoPath: Please follow the below steps to publish the InfoPath form

Step 1: Change the Security Level to “Full Trust”.

Note: This is required because the private assembly for a managed code form template is running under a hosted CLR application domain, the security settings for forms require full trust.

Step 2: Click on the ‘Info’ and you should find ‘Publish your form’

 

Step 3: Locate the path where you want to publish and name the template appropriately

Step 4: If you are not sure you may leave this field empty

Step 5:  Publish the form.

3. Create Visual  Studio Project:

Step 1 : Create an empty SharePoint project

Step 2:  Add a feature and name it appropriately. Please make sure you have ‘Site’ as the scope of the feature as this InfoPath is deployed as content type

Step 3:  Add a new Element File
Note: Empty elements are most often used to define SharePoint project items that lack a project or project item template in Visual Studio, such as fields. When you add an empty element to your project, it contains a single file that is named Elements.xml. Use XML statements to define the desired elements in Elements.xml

Creating Empty Element
Creating Empty Element

Step 4: Add the dll and the InfoPath to the element folder as shown below

Step 5: Configure element properties as shown below.
Note: Use the XsnFeatureReceiver class to trap events that are raised after an InfoPath form template installation, uninstallation, activation, or deactivation on the server. The assembly and class name used to trap these events must be referenced in the feature.xml file used to deploy the SharePoint feature containing one or more InfoPath form templates.

Feature.xml file would look as below
<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<Feature Id="39EBA28E-2238-4C82-A37E-81BAEBFB7A61" 
    Title="Simple Form Template" 
    Description="This feature deploys infopath as content type" 
    Version="1.0.0.0" 
    Scope="Site" 
    ReceiverClass="Microsoft.Office.InfoPath.Server.Administration.XsnFeatureReceiver" 
    ReceiverAssembly="Microsoft.Office.InfoPath.Server, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" 
    xmlns="http://schemas.microsoft.com/sharepoint/">
    <ElementManifests>
        <ElementManifest Location="Elements.xml" />
        <ElementFile Location="InfoPathasFeature.xsn" />
    </ElementManifests>
    <ActivationDependencies>
        <ActivationDependency FeatureId="C88C4FF1-DBF5-4649-AD9F-C6C426EBCBF5" />
    </ActivationDependencies>
</Feature>

Step 6: Include the .xsn, dll to be a part of deployment files.
If you preview your feature you would see the following the are no items in the feature except for the ‘Element.xml’. See how it changes after completing this step

Modify the xsn properties, change the ‘Deployment Type’ to ‘Element File’

Modify the dll properties, change the ‘Deployment  Type’ to ‘ElementFile’

Note: Setting the ‘DeploymentType’ to ‘ElementFile’ specifies the deployment of element files to SharePoint. Element files are referenced by the feature manifest (feature.xml). The default path for ElementFile is {SharePointRoot}TemplateFeatures.

After completing the above modifications, you will find the items in the feature as show below.

Step 7: Package and Deploy the project

Step 8: After you are done with deployment, verify if the folder structure is shown as below

Modify List Settings:

Follow the below steps to set the default content types to the form library.

Navigate to the advanced setting of the ‘Form Templates’ library and select ‘Yes’ as shown below.

Navigate to the ‘Add  Content Types’ Screen & if you select “All Groups” or preferable “Microsoft InfoPath” you should find the “InfoPathasFeature” Content Type in the list.

Check if the content type is added:

Your are done! Check if the InfoPath is showing up if you select ‘New Document’ as below

References:

1. How to: Deploy InfoPath Form Templates with Code
2. Security levels of InfoPath forms
3. XSNFeatureReciever Class