File Content and Directory Search using Directory.GetFiles and PLINQ

 

 

 

 

 

Array of File Names

Starting .NET 4, you can use PLINQ queries to parallelize operations on file directories. The following code snippet shows how you can write a query by using the GetFiles method to populate an array of file names in a directory and all subdirectories. This method does not return until the entire array is populated, and therefore it can introduce latency at the beginning of the operation. However, after the array is populated, PLINQ can be used to search inside all the files with the specific extension located in a particular directory for a specific word very quickly. For measuring the performance, you can create a folder called CLOBS and create 8 large text files (1GB each).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

After running the project, the CPU usage goes up as it is shown in the following figure:

Finding all matches in 8 large text files (1GB each) takes 407.03 seconds as it is shown in the output window:

File Content and Directory Search using Directory.EnumerateFiles and PLINQ

Enumerable Collection of File Names

Starting .NET 4, you can enumerate directories and files by using methods that return an enumerable collection of strings of their names. In previous versions of the .NET Framework, you could only obtain arrays of these collections. Enumerable collections provide better performance than arrays.

Parallel LINQ (PLINQ)

In .NET 4, you can use Parallel LINQ (PLINQ) for queries that contain computationally expensive operations on every element over all the files in a specified directory tree.
The following code snippet shows how to parallelize operations on file directories. The PLINQ query uses the Directory.EnumerateFiles method to search inside all the files with the specific extension located in the particular directory for a specific word. For measuring the performance, you can create a folder called CLOBS and create 8 large text files (1GB each).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

After running the project, the CPU usage goes up as it is shown in the following figure:

Finding all matches in 8 large text files (1GB each) takes 402.596 seconds as it is shown in the output window:

File Content and Directory Search using Directory.EnumerateFiles and LINQ

Enumerable Collection of File Names

Starting .NET 4, you can enumerate directories and files by using methods that return an enumerable collection of strings of their names. In previous versions of the .NET Framework, you could only obtain arrays of these collections. Enumerable collections provide better performance than arrays.

LINQ Query

Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language. With LINQ, a query is now a first-class language construct, just like classes, methods, events and so on. The following example shows how to use Directory.EnumerateFiles method and LINQ query to search inside all the files with the specific extension located in the particular directory for a specific word. For measuring the performance, you can create a folder called CLOBS and create 8 large text files (1GB each).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

After running the project, the CPU usage goes up as it is shown in the following figure:

Finding all matches in 8 large text files (1GB each) takes 144.06 seconds as it is shown in the output window:

LINQ to Craigslist in C#


 

 

 

Searching Craigslist using LINQ

The RSS feed from the results page of Craigslist is an XML file. LINQ to XML is an up-to-date, redesigned approach to programming with XML. It provides the in-memory document modification capabilities of the Document Object Model (DOM), and supports LINQ query expressions. Although these query expressions are syntactically different from XPath, they provide similar functionality.
The following example shows how to search in Craigslist categories by providing the site name, and the category.

LINQ Useful Links

The Language-Integrated Query (LINQ) covers a set of features that lets you retrieve information from a data source. In many cases, data is stored in a database that is separate from the application. Traditionally, interacting with a relational database would involve generating queries using SQL. Other sources of data, such as XML, would require their own approaches that were completely different. However, LINQ gives C# the ability to generate queries for any LINQ-compatible data source. Furthermore, the syntax used for the query is the same, no matter what data source is used.

The following links are useful to learn the LINQ programing model:

1-      LINQ Official Website

http://msdn.microsoft.com/en-us/netframework/aa904594

2-      LINQ to Everything by Charlie Calvert

http://blogs.msdn.com/b/charlie/archive/2008/02/28/link-to-everything-a-list-of-linq-providers.aspx

3-      Joining LINQ to SQL and LINQ to Excel by Eric White

http://blogs.msdn.com/b/ericwhite/archive/2008/12/04/joining-linq-to-sql-and-linq-to-excel.aspx

4-      Building a LINQ Provider by Pedram Rezaei

http://msdn.microsoft.com/en-us/vcsharp/ee672195.aspx

5-      Dynamic LINQ by Scott Guthrie

http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx

6-      101 LINQ Samples

http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx

7-      LINQ to Facebook

http://www.codeproject.com/KB/aspnet/LinqToFqlAddon.aspx

From C# Learners

8-      LINQ

http://www.csharplearners.com/category/linq/

More »

LINQ and C#

Challenge 
Almost every application uses data in some form, whether the data comes from in memory, databases, XML files, or text files. Many developers find it difficult to switch from strongly typed object-oriented programming to the data access tier in an application. In C#, developers can navigate easily through the namespaces, work with a debugger in the Visual Studio IDE, and more. However, when accessing data, you will notice that things are quite different and more tedious. Developers end up in a world that is not strongly typed, where debugging is a pain or even non existent, and lots of time is spent sending strings to the database as commands. The goal for LINQ was to provide a methodology that simplifies and unifies the implementation of accessing any kind of data. It makes it easier to interact with SQL, relational databases, XML, and the programming languages that communicate with them. LINQ does not force you to use a specific architecture, but facilitates the implementation of several existing architectures for accessing data. Some examples include RAD/prototype, Client/server, N-tier, and Smart client.

Solution 
LINQ stands for Language-Integrated Query and covers a set of features that lets you retrieve information from a data source. In many cases, data is stored in a database that is separate from the application. Traditionally, interacting with a relational database would involve generating queries using SQL (Structured Query Language). Other sources of data, such as XML, would require their own approaches that were completely different. However, LINQ gives C# the ability to generate queries for any LINQ-compatible data source. Furthermore, the syntax used for the query is the same, no matter what data source is used. Accessing a relational database is the same as data stored in an array, and the query capability is fully integrated into the C# language. LINQ in C# is essentially a language within a language and as a result, the subject of LINQ is quite large, involving many features, options, and alternatives. It contains set of standard query operators that provide the underlying query architecture for the navigation, filtering, and execution operations of nearly every kind of data source. LINQ also provides the means for developers to stay within the coding environment that they comfortable with and access the underlying data as objects that work with the IDE, IntelliSense, and even debugging. These aspects, combined with shorter, more meaningful, and expressive syntax boosts developer productivity.

Benefit
LINQ is a lightweight disguise over programmatic data integration. It hardly matters what you are querying against, because queries will be quite similar using a whole new set of procedures and keywords. It gives developers a simplified way to write queries by using a unified query syntax to use regardless of the source of data. It promotes faster development time by removing run-time errors and catching errors at compile time. LINQ is fully integrated with IntelliSense, and supports debugging directly in the development language. Using LINQ, you can query directly against your database and even against the stored procedures that your database exposes. The result of making these set operations, transforms, and constructs first-class operations is a set of methods called the standard query operators. These operators provide query capabilities that include sorting, filtering, aggregation, and projection over a large number of different data sources. LINQ-compatible data sources include LINQ to Objects, LINQ to ADO.NET, LINQ to SQL, LINQ to XML, LINQ to DataSet, and LINQ to Entities. By being able to use LINQ to Objects, C# arrays and collections are treated like databases. This gives developers extensive flexibility and an easy way to query data in the collections. LINQ can be extended to support other data sources such as LINQ to SharePoint, LINQ to Exchange, and LINQ to LDAP. By using LINQ, developers can now enjoy the benefits of a single declarative pattern that can be expressed in any .NET-based programming language, and closes the gap between relational data and object-oriented development.

C# and Auto-Compiled LINQ Queries

The recently released Microsoft Entity Framework (EF) June 2011 CTP includes support for Auto-Compiled LINQ Queries. This allows every LINQ to Entities query to be automatically executed when compiled and placed in the EF query cache. Every time you run the query subsequently, the EF will find it in its query cache and won’t have to go through the whole compilation process again. This feature also provides a boost to queries issued using WCF Data Services, as it uses LINQ in the background.

How does it Work?

The EF will pass the nodes in the expression tree and create a hash, and become the keys used in the query cache. If it does not find the query in the cache then it will go ahead and compile it and store the compiled query in the cache for subsequent use. Each subsequent time, the hash will be calculated and find the compiled query in the tree, thus saving the compilation overhead.

Which performs faster: when the “Auto-Compiled LINQ Queries” is used or when the CompiledQuery is invoked?

The size and complexity of the application and queries will greatly influence its performance boost. By running a regular query 10 times using auto-compiled mode and 10 times where the compilation is turned off, the total time measured by Visual Studio’s profiling tools for the compiled queries is about 3 times faster than the non-compiled one.

In general, auto-compiled queries are not as fast as invoking a CompiledQuery. What we discussed here is that the new released CTP provides the performance savings for free.

Copyright © All Rights Reserved - C# Learners