Getting the value of JavaScript/HTML variables in C#

5

There is a webpage I am trying to extract data from. By looking at the HTML in the page Source, I can find the data I am interested inside script tags. It looks like the following:

<html>
<script type="text/javascript">

window.gon = {};
gon.default_profile_mode = false; 
gon.user = null;  
gon.product = "shoes";
gon.books_jsonarray = [
{
    "title": "Little Sun",
    "authors": [
        "John Smith"
    ],
    edition: 2,
    year: 2009
},
{
    "title": "Little Prairie",
    "authors": [
        "John Smith"
    ],
    edition: 3,
    year: 2009
},
{
    "title": "Little World",
    "authors": [
        "John Smith",
        "Mary Neil",
        "Carla Brummer"
    ],
    edition: 3,
    year: 2014
}
];

</script>
</html>

What I would like to achieve is, call the webpage by using its url, then retrieving the 'gon' variable from JavaScript and store it in a C# variable. In other words, in C#, I would like to have a data structure (a dictionary for instance) that would hold the value of 'gon'.

I have tried researching how to get a variable defined in JavaScript via C# WebBrowser, and this is what I found:

using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Net;
using System.Runtime.InteropServices;
using System.Text.RegularExpressions;
using mshtml;

namespace Mynamespace
{

  public partial class Form1 : Form
  {
    public WebBrowser WebBrowser1 = new WebBrowser();

    private void Form1_Load(object sender, EventArgs e)
    {
        string myurl = "http://somewebsite.com"; //Using WebBrowser control to load web page   
        this.WebBrowser1.Navigate(myurl);
    }    


    private void btnGetValueFromJs_Click(object sender, EventArgs e)
    {
        var mydoc = this.WebBrowser1.Document;
        IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
        IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
        Type vWindowType = vWindow.GetType();
        object strfromJS = vWindowType.InvokeMember("mystr",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//Here, I am able to see the string "Hello Sir"

        object gonfromJS = vWindowType.InvokeMember("gon",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//Here, I am able to see the object gonfromJS as a '{System.__ComObject}'

        object gonbooksfromJS = vWindowType.InvokeMember("gon.books_jsonarray",
                            BindingFlags.GetProperty, null, vWindow, new object[] { }); 
//This error is thrown: 'An unhandled exception of type 'System.Runtime.InteropServices.COMException' occurred in mscorlib.dll; (Exception from HRESULT: 0x80020006 (DISP_E_UNKNOWNNAME))'

    }

  }
}

I am able to retrieve values of string or number variables such as:

var mystr = "Hello Sir";
var mynbr = 8;

However, even though I am able to see that the 'gon' variable is being passed as a '{System.__ComObject}', I don't know how to parse it in order to see the values of its sub components. It would be nice if I could parse it, but if not, what I would like to have instead, is a C# Data Structure with keys/values that contains all the sub infos for the gon variable, and especially, be able to view the variable 'gon.books_jsonarray'.

Any help on how to achieve this would be very much appreciated. Note that I cannot change the source html/javascript in anyway, and so, what I need is a C# code that would allow to reach my goal.

javascript
c#
html
web
asked on Stack Overflow Jan 24, 2018 by Nodame • edited Jan 24, 2018 by Nodame

2 Answers

1

You can cast the result of InvokeMember() to dynamic and use the property names directly in your C# code. Array indexing is tricky but can be done with another use of InvokeScript(), see my example:

private void btnGetValueFromJs_Click(object sender, EventArgs e)
{
    var mydoc = this.WebBrowser1.Document;
    IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
    IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
    Type vWindowType = vWindow.GetType();

    var gonfromJS = (dynamic)vWindowType.InvokeMember("gon",
                        BindingFlags.GetProperty, null, vWindow, new object[] { });

    var length = gonfromJS.books_jsonarray.length;

    for (var i = 0; i < length; ++i)
    {
        var book = (dynamic) mydoc.InvokeScript("eval", new object[] { "gon.books_jsonarray[" + i + "]" });
        Console.WriteLine(book.title);
        /* prints:
            * Little Sun
            * Little Prairie
            * Little World
            */
    }
}
answered on Stack Overflow Jan 24, 2018 by ryan
0
  1. You need to use JSON.stringify to convert your gon.books_jsonarray variable to JSON string

  2. After you can retrive JSON using next C# code:

    var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString();

  3. After you can deserialize JSON to object using Newtonsoft.Json

My full code is here:

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Windows.Forms;

namespace WindowsFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            var webBrowser = new WebBrowser();

            webBrowser.DocumentCompleted += (s, ea) =>
            {
                var mydoc = webBrowser.Document;
                var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString();
                var gonObject = JsonConvert.DeserializeObject<List<Books>>(gonFromJS);
            };

            var myurl = "http://localhost/test.html";
            webBrowser.Navigate(myurl);
        }

        private class Books
        {
            public string Title { get; set; }
            public List<string> Authors { get; set; }
            public int Edition { get; set; }
            public int Year { get; set; }
        }
    }
}

Also you can see output on screenshot: enter image description here

EDIT:

Also you can have a trouble with JSON.stringify method.

It can returns null.

In this case you can review SO topics: here and here.

If JSON.stringify method returns null then try to add next code to your HTML page:

<head>
<meta http-equiv='X-UA-Compatible' content='IE=edge' >
</head>
answered on Stack Overflow Jan 24, 2018 by Alexander I. • edited Jan 24, 2018 by Alexander I.

User contributions licensed under CC BY-SA 3.0