Linq To XML

  1. Introduction Working with XML using Microsoft's .NET Framework version 2.0 and below is a cumbersome task. The API available follows the W3C DOM model, and is document-centric. Everything begins with the document; you can't create elements without having a document; even a fragment of XML is a document. In the latest release of the .NET Framework, however, this has changed. XML is now element- centric. With features like object initialization and anonymous types, it's very easy to create XML. Add to this the features of LINQ, and we now have a very easy to use and powerful tool for XML. In this article, I will explore some of the features available in .NET Framework release 3.5 related to XML and LINQ. This is, of course, not an extensive discussion of either subject, merely a familiarization and stepping stone for more learning and exploration. The LINQ part When discussing LINQ to XML, or LINQ to whatever, the first thing that needs to be discussed is, of course, LINQ. LINQ Language-Integrated Query, or LINQ, is an extension to the .NET Framework in version 3.5 that makes queries and set operations first class citizens of .NET languages such as C#. It has been further defined as, "A set of general purpose standard query operators that allow traversal, filter, and projection operations to be expressed in a direct yet declarative way in any .NET language." Getting started This is an example of a very basic LINQ query: Collapse string[] names = new string[] { "John", "Paul", "George", "Ringo" }; var name = names.Select(s => s); Two things noticeable here are the var keyword and the strange-looking operator =>. Var keyword is new data type that has been introduced in 3.5. Although it looks similar to the var data var type in VB or JavaScript, it isn't quite the same. In VB and JavaScript, var represents a variant data type, one that can be used to represent just about anything. In LINQ, however, var is more of a placeholder; the actual data type is set at compile time, and is inferred by the context it is used in. In the above example, name is a resolved to System.Linq.Enumerable.SelectIterator Collapse var name = "Hello, World"; In this example though, name is resolved to a string. This ambiguity is useful when you are unsure of what exactly will be returned from a query, and the fact that it is not necessary to cast the variable to another type before using it, is very convenient. Lambda expressions Lambda expressions were first introduced in 1936 by mathematician Alonzo Church as a short hand for expressing algorithms. In .NET 3.5, they are a convenient way for developers to define
  2. functions that can be passed as arguments, and are an evolution of Anonymous Methods introduced in .NET 2.0. The => operator is used to separate input variables on the left and the body of the expression on the right. Collapse string[] names = new string[] { "John", "Paul", "George", "Ringo" }; var name = names.Select(s => s.StartsWith("P")); In this example, each string in the names array is represented by the variable s. It's not necessary to declare a data type because it is inferred from the type of the collection, names in this case. These two statements would be somewhat analogous: Collapse var name = names.Select(s => s); foreach(string s in names) { } The body of the expression, s.StartsWith("P"), just uses the string method to return a boolean value. Select is an extension method (more on that shortly), for that takes as its parameter a Func object. Func and Action and Action are two new methods available in .NET 3.5, and are used to represent Func delegates. Collapse Func This is used to represent a delegate that returns a value, TResult. Collapse Action On the other hand, this is used to represent a delegate that does not return a value. The example we have been using can be rewritten as below: Collapse Func pOnly = delegate(string s) { return s.StartsWith("P"); }; string[] names = new string[] { "John", "Paul", "George", "Ringo" }; var name = names.Select(pOnly); Sequences Running the demo code from this article, you will notice that all of the examples above do not return a single value. Rather, they return a collection of boolean values indicating whether each element in the input collection matched the specified expression. This collection is referred to as a sequence in LINQ.
  3. If we wanted the single value that matched the expression, we would use the Single extension method. Collapse string name = names.Single(pOnly); Notice here that the name variable is typed as a string. Although we could still use var, we know that the return value is, or should be, a string. Extension Methods Extension Methods are a feature of .NET 3.5 that allows developers to add functionality to existing classes without modifying the code for the original class. A useful scenario when you want to provide additional functionality and don't have access to the code base, such as when using third-party libraries. Extension Methods are static methods on static classes. The first parameter of these methods is typed as the data type for which it is extending, and uses the this modifier. Notice that this is being used as a modifier, not as a reference to the current object. Collapse public static class StringExtensions { public static int ToInt(this string number) { return Int32.Parse(number); } public static string DoubleToDollars(this double number) { return string.Format("{0:c}", number); } public static string IntToDollars(this int number) { return string.Format("{0:c}", number); } } When this class is compiled, .NET applies the to it, and when it is in scope, Intellisense can System.Runtime.CompilerServices.Extension read this information and determine which methods apply based on the data type.
  4. As we can see here, in the first example, Intellisense knows that the ToInt method applies to strings, and only DoubleToDollars applies to doubles. Query expression and methods There are two ways to execute LINQ queries: query expression and dot-notation. The former resembles a SQL query, except that the select clause is last. Collapse string[] camps = new string[]{"CodeCamp2007","CodeCamp2008","CodeCamp2009"}; var currentCamp = from camp in camps where camp.EndsWith(DateTime.Now.Year.ToString()) select camp; string currentCamp = camps.Single(c => c.EndsWith(DateTime.Now.Year.ToString())); These two statements produce the same results because the query expression format is converted to methods at compile time. There are several ways to produce results with methods. Each of the below will produce the same results. Collapse string currentCamp2 = camps.Where(c => c.EndsWith(DateTime.Now.Year.ToString())).Single(); string currentCamp3 = camps.Single(c => c.EndsWith(DateTime.Now.Year.ToString())); string currentCamp4 = camps.Select(c => c).Where( c => c.EndsWith(DateTime.Now.Year.ToString())).Single(); The XML part Now that we have an understanding of LINQ, it's time to move on to the XML part. For this article, we will be using this XML file: Collapse Gustavo Achong 7/31/1996 Kim Abercrombie 12/12/1997 Carla Adams 2/6/1998
  5. Jay Adams 2/6/1998 The old way In the previous versions of the .NET Framework, XML was document-centric; in other words, to create any structure, you first had to start with an XMLDocument. Collapse public class OldWay { private static XmlDocument m_doc = new XmlDocument(); public static void CreateEmployees() { XmlElement root = m_doc.CreateElement("employees"); root.AppendChild(AddEmployee(1, "Gustavo Achong", DateTime.Parse("7/31/1996"), false)); root.AppendChild(AddEmployee(3, "Kim Abercrombie", DateTime.Parse("12/12/1997"), true)); root.AppendChild(AddEmployee(8, "Carla Adams", DateTime.Parse("2/6/1998"), false)); root.AppendChild(AddEmployee(9, "Jay Adams", DateTime.Parse("2/6/1998"), false)); m_doc.AppendChild(root); Console.WriteLine(m_doc.OuterXml); } private static XmlElement AddEmployee(int ID, string name, DateTime hireDate, bool isSalaried) { XmlElement employee = m_doc.CreateElement("employee"); XmlElement nameElement = m_doc.CreateElement("name"); nameElement.InnerText = name; XmlElement hireDateElement = m_doc.CreateElement("hire_date"); hireDateElement.InnerText = hireDate.ToShortDateString(); employee.SetAttribute("id", ID.ToString()); employee.SetAttribute("salaried", isSalaried.ToString()); employee.AppendChild(nameElement); employee.AppendChild(hireDateElement); return employee; } } Smart developers would create helper methods to ease the pain, but it was still a verbose, cumbersome process. An XMLElement can't be created on its own, it must be created from an XMLDocument. Collapse XmlElement employee = m_doc.CreateElement("employee"); Trying to do this generates a compiler error: Collapse XmlElement employee = new XmlElement();
  6. Looking at the above example, it is also difficult to get an idea about the scheme for this document. The new way Using the classes from the System.Xml.Linq namespace and the features available in .NET 3.5, constructing an XML document is very easy and very readable. Collapse public static void CreateEmployees() { XDocument doc = new XDocument( new XDeclaration("1.0", "utf-8", "yes"), new XComment("A sample xml file"), new XElement("employees", new XElement("employee", new XAttribute("id", 1), new XAttribute("salaried", "false"), new XElement("name", "Gustavo Achong"), new XElement("hire_date", "7/31/1996")), new XElement("employee", new XAttribute("id", 3), new XAttribute("salaried", "true"), new XElement("name", "Kim Abercrombie"), new XElement("hire_date", "12/12/1997")), new XElement("employee", new XAttribute("id", 8), new XAttribute("salaried", "false"), new XElement("name", "Carla Adams"), new XElement("hire_date", "2/6/1998")), new XElement("employee", new XAttribute("id", 9), new XAttribute("salaried", "false"), new XElement("name", "Jay Adams"), new XElement("hire_date", "2/6/1998")) ) ); } Constructing a document in this way is possible because of the functional construction feature in LINQ to XML. Functional construction is simply a means of creating an entire document tree in a single statement. Collapse public XElement(XName name, Object[] content) As we can see from one of the constructors for XElement, it takes an array of objects. In the example above, the employees element is constructed from four XElements, one for each employee, which in turn is constructed from XAttributes and other XElements. In the above example, we could have replaced XDocument with XElement if we removed the XDeclaration and XComment objects. This is because the constructor XDocument used takes a XDeclaration instance, rather than the XName that the XElement constructor takes. Collapse public XDocument(XDeclaration declaration,Object[] content) Another thing to note when running the demo is how both documents are printed to the console window.
  7. As we can see, the old method just streams the contents of the document to the console. The method does that also; however, it is nicely formatted with no extra effort. Namespace support Namespaces are, of course, supported through the XNamespace class. Collapse XNamespace ns =; XElement doc = new XElement( new XElement(ns + "employees", new XElement("employee", One thing to note is that if one element uses a namespace, they all must use one. In the case above, we can see that an empty xmlns attribute will be added to the employee element: Collapse Explicit conversion One of the many nice things with the new XML support is support for explicit conversion of values. Previously, all XML values were treated as strings and had to be converted as necessary. Collapse // Must be string, or converted to string //idElement.InnerText = 42; idElement.InnerText = "42"; int id = Convert.ToInt32(idElement.Value); With the new API, this is much more intuitive: Collapse XElement element1 = new XElement("number", 42); // It doesn't matter it the value is a string or int XElement element2 = new XElement("number", "42"); int num1 = (int)element1; int num2 = (int)element2;
  8. Traversing an XML tree Traversing an XML tree is still very easy. Collapse foreach(var node in doc.Nodes()) We can use the nodes in the collections of the document, or root element. Note here, however, that this will traverse the entire tree, including all children, not just the sibling nodes. Collapse foreach(var node in doc.Nodes().OfType()) This method can be used to traverse specific node types, comments in this case. Or we can get to specific child nodes this way. Collapse foreach(var node in doc.Elements("employees").Elements("employee").Elements("name")) This is an improvement over nested iterations or obtaining an XMLNodeList with an XPath query. XPath XPath support has been built into the API through the use of Extension Methods, such as: • Descendents • Ancestors • DescendentsAndSelf • AncestorsAndSelf • ElementsBeforeSelf • ElementsAfterSelf This is not an extensive list, so check the documentation for all the others available. Transforming XML Transforming an XML document or element is still possible using the methods I'm sure we are all familiar with. Collapse //Load the stylesheet. XslTransform xslt = new XslTransform(); xslt.Load(stylesheet); //Load the file to transform. XPathDocument doc = new XPathDocument(filename); //Create an XmlTextWriter which outputs to the console. XmlTextWriter writer = new XmlTextWriter(Console.Out); //Transform the file and send the output to the console. xslt.Transform(doc, null, writer, null); writer.Close(); However, with the new API, we can make use of function construction and LINQ queries to transform a document: Collapse XElement element = new XElement("salaried_employees", from e in doc.Descendants("employee") where e.Attribute("salaried").Value == "true" select new XElement("employee",
  9. new XElement(e.Element("name")) ) ); Conclusion XML is a fantastic construct that has been deeply ingrained into just about everything. Having the ability to easily construct, query, transform, and manipulate XML documents is an invaluable service that will improve the speed in which applications can be built and the quality of those applications. This article is not an exhaustive investigation of LINQ to XML; there have been many other articles, snippets, and blogs written on the subject. It mainly is just a taste and familiarization of what is possible using .NET 3.5. • •



