Posts Tagged ‘XML’

Generate XML sitemap for Google

There are many sites that deal with returning a Google sitemap from .NET pages. Most of these need you to adjust the IIS settings (yes this is about Windows hosting).

There are also some that deal with creating a sitemap on-the-fly from the web.sitemap file in your project but here I’ve included the code to return an XML sitemap that conforms to the Sitemap protocol that you can submit to Google without modifying IIS – something that should interest those of you who are on shared hosting.

The ZIP download is available at the bottom of this article.

Basically if you create a blank ASPX page and clear out all the HTML elements from the ASPX page you will just be left with the <% @Page %> definition. Below is an example of the only line that needs to be in the front file (.ASPX).

For your purposes, just add the ContentType=”text/xml” section. It may NOT be necessary once you read through the page-behind code, but I’ve left it in as it doesn’t hurt.

Example:

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="XMLSiteMap.aspx.cs" Inherits="XMLSiteMap" ContentType="text/xml" %>

Next you will need to put the GSiteMap.cs file in your App_Code folder.

In the page-behind code, you can then simply call the class and all the work is done for you. The code uses the filesystem (whether it is running locally or on a remote server) to generate the sitemap. It will also return the correct protocol type (http or https) and the port number if not on port 80.

I have used this method before the generate an XML file in the filesystem but since my hosting provider doesn’t allow ASPNET to write to the root directory of the site, returning the sitemap on-the-fly is the only truely automated method for this.

In the page-behind’s On_Load event:

protected void Page_Load(object sender, EventArgs e){GSitemap _siteMap = new GSitemap();_siteMap.ProcessRequestFS(Context);}

This simply passes the current HTTPContext to the sitemapping class allowing it to replace the Response with your pure XML sitemap.

I won’t go into the full code at this point because you can read through it yourself from the download. It’s worth pointing out the following however:

private string[] _Allowed_Extensions = { ".aspx", ".php", ".asp", ".htm", ".html", ".txt", ".doc", ".pdf", ".jpg", ".gif", ".xml" };private string[] _Restricted_Directories = { "App_Data", "App_Code", "admin" };

1. Put any extensions you want to be indexed in the “Allowed Extensions” array.

2. Put any directories you don’t want indexed in the “Restricted Directories” array.

Where the code pulls a list of files from each directory I initially used a file pattern, ie:

"*." + Extention

but found that some files were being indexed twice – this is because of a flaw in the framework that will return .ASPX files when you ask for .ASP files. For this reason I re-worked the code. It’s less efficient this way but it’s guaranteed to work.

The call to “ProcessRequestFS” iteratively goes through each directory adding files to the sitemap. If a directory is blocked by the “Restricted Directories” array then all sub-directories of that Directory are also blocked.

You can see an example of the output of this code by visiting: (not currently available)

On my site you may notice that I have temporary removed the optional tags from the sitemap. They are however created in the version available for download.

In particular, the priority tag is automatically down-graded for each directory further down the path that the script has to look.

There is no real error handling in this version but you can add that as necessary.

I checked with Google and Yahoo! and as far as I can see they have no problem with you adding a sitemap with the .ASPX extension.

The full code can be downloaded here: http://www.aaronreynolds.co.uk/page/Code.aspx

The full code is unavailable at the moment and will be online again soon.

If you have any problems using the code, please let me know.

AR

There is an error in XML document | C#

Whilest using XML serialisation to keep a set of Tasks in my Customer/Project Management App I came across a series of strange occurences which I tied down the the error: "There is an error in XML document."

Basically, the XML parser has come across a character that it doesn't expect to see given the specified encoding. In my case I was using UTF8 encoding.

Now I was puzzled that although I was using UTF8 encoding, the GBP sign (£) was causing the error. It turns out that I was encoding using UTF8 but the deserialisation was using "Default" encoding – OOPS.

Anyway, once I'd discovered that and changed so that UTF8 was the encoding for serialisation AND deserialisation we were good to go.

So if you get this error, check your encoding.

My serialisation code is:

public static bool SaveWorkToXML(TWork[] workArray, TTask thisTask) { string output = ""; try { XmlSerializer ser = new XmlSerializer(typeof(TWork[])); Stream s = new MemoryStream(); XmlTextWriter xmlWriter = new XmlTextWriter(s, Encoding.UTF8); ser.Serialize(xmlWriter, workArray); TextReader rdr = new StreamReader(s); s.Position = 0; output = rdr.ReadToEnd().Replace("\"",'"'.ToString()); xmlWriter.Close(); s.Close(); return SaveWork(thisTask, output); // SaveWork supplies SqlParameters to a stored procedure and returns T/F if inserted/updated OK } catch { } return false; }

and the Deserialisation code:

public static TWork[] LoadWork(TTask thisTask) { TWork[] workArray = null; try { string sql = "SELECT * FROM tblWork WHERE [work_guid] = '" + thisTask.Task_GUID + "'"; DataTable dt = GetDataTable(sql); string xmlText = ""; if (dt != null && dt.Rows != null) { foreach (DataRow dr in dt.Rows) { xmlText = dr["work_details"].ToString(); break; } } dt.Dispose(); XmlSerializer ser = new XmlSerializer(typeof(TWork[])); Stream s = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(xmlText)); workArray = (TWork[])ser.Deserialize(s); } catch { } return workArray; }

Return top