How to retrieve and update a file > 1MB from/to master GitHub using Octokit.Net Git Data API within c#

0

I am trying to read and update a single file in my repository using Octokit.Net.

The particular file I am trying to read/update is about 2.1MB in size so when I attempt to read this file using the following code...

var currentFileText = "";

            var contents = await client.Repository.Content.GetAllContentsByRef("jkears", "NextWare.ProductPortal", "domainModel.ddd", "master");
            var targetFile = contents[0];
            if (targetFile.EncodedContent != null)
            {
                currentFileText = Encoding.UTF8.GetString(Convert.FromBase64String(targetFile.EncodedContent));
            }
            else
            {
                currentFileText = targetFile.Content;
            }

I get this exception..

Octokit.ForbiddenException
  HResult=0x80131500
  Message=This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.

My question is how to use Git Data API within c# to read the contents of this large file, and further how would I update changes on this file back into the same repository?

c#
.net-core
octokit
github-api-v3
asked on Stack Overflow Nov 8, 2020 by John Kears

1 Answer

0

Well not hard but not so obvious.

My file that I was trying to read/update was 2.4 Mb and while I was able to compress this file down to 512K (using SevenZip) which allowed me to read/update on repo I wanted to read/update files over 1Mb.

To accomplish this I had to use GitHub's GraphQL API. I required that in order to retrieve the SHA1 for the particular file that I was interested in reading/updating.

Having never worked with Git API or for that matter GraphQL I chose to utilize a GraphQL client (GraphQL.Client and GraphQL.Client.Serializer.Newtonsoft).

With GraphQL I was able to retrieve the SHA-1 id for the existing file/blob in my GitHub Repo. Once I had the SHA-1 for the blob I was easily able to pull down the file in question via the GIT Data API.

I was then able to alter the content and push the changes back to GitHub via Octokit.Net.

While this is not polished by any means, I wanted to close this with something for anyone else who is attempting to do this.

Credit to the following stackover flow thread.

public async Task<string> GetSha1(string owner, string personalToken, string repositoryName,  string pathName, string branch = "master")
        {
            string basicValue = Convert.ToBase64String(Encoding.UTF8.GetBytes($"{owner}:{personalToken}"));

            var graphQLClient = new GraphQLHttpClient("https://api.github.com/graphql", new NewtonsoftJsonSerializer());
            graphQLClient.HttpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", basicValue);

            var getShaRequest = new GraphQLRequest
            {
                Query = @"
                    query {
                      repository(owner: """+owner+@""", name: """+ repositoryName +@""") {
                        object(expression: """ + branch + @":" + pathName +@""") {
                                            ... on Blob {
                                            oid
                                        }
                                    }
                                }
                            }",
                    
                    Variables = new
                    {
                    }
            };

            var graphQLResponse = await graphQLClient.SendQueryAsync<ResponseType>(getShaRequest, cancellationToken: CancellationToken.None);
            return graphQLResponse.Data.Repository.Object.Oid;
        }

Here are my helper classes

public class ContentResponseType
        {
            public string content { get; set; }
            public string encoding { get; set; }
            public string url { get; set; }
            public string sha { get; set; }
            public long size { get; set; }
        }

        public class DataObject
        {
            public string Oid;
        }

        public class Repository
        {
            public DataObject Object;
        }

        public class ResponseType
        {
            public Repository Repository { get; set; }
        }

Here is the file that retrieves the content with the SHA-1 as provided to by method above..

 public async Task<ContentResponseType> RetrieveFileAsync(string owner, string personalToken, string repositoryName, string pathName, string branch = "master")
        {
            var sha1 = await this.GetSha1(owner: owner, personalToken: personalToken, repositoryName: repositoryName, pathName: pathName, branch: branch);
            var url = this.GetBlobUrl(owner, repositoryName, sha1);
            var req = this.BuildRequestMessage(url, personalToken);
            using (var httpClient = new HttpClient())
            {
                var resp = await httpClient.SendAsync(req);
                if (resp.StatusCode != System.Net.HttpStatusCode.OK)
                {
                    throw new Exception($"error happens when downloading the {req.RequestUri}, statusCode={resp.StatusCode}");
                }
                using (var ms = new MemoryStream())
                {
                    await resp.Content.CopyToAsync(ms);
                    ms.Seek(0, SeekOrigin.Begin);
                    StreamReader reader = new StreamReader(ms);
                    var jsonString =  reader.ReadToEnd();
                    return System.Text.Json.JsonSerializer.Deserialize<ContentResponseType>(jsonString);
                }
            }
        }

Here is my console test app...

    static async Task Main(string[] args)
    {

        // GitHub variables
        var owner = "{Put Owner Name here}";
        var personalGitHubToken = "{Put your Token here}";
        var repo = "{Put Repo Name Here}";
        var branch = "master";
        var referencePath = "{Put path and filename here}";

        // Get the existing Domain Model file
        var api = new GitHubRepoApi();
        var response = await api.RetrieveFileAsync(owner:owner, personalToken: personalGitHubToken, repositoryName: repo, pathName: referencePath, branch:branch);
        var currentFileText = Encoding.UTF8.GetString(Convert.FromBase64String(response.content));

        // Change the description of the JSON Domain Model
        currentFileText = currentFileText.Replace(@"""description"":""SubDomain", @"""description"":""Domain");
        
        // Update the changes back to GitHub repo using Octokit
        var client = new GitHubClient(new Octokit.ProductHeaderValue(repo));
        var tokenAuth = new Credentials(personalGitHubToken);
        client.Credentials = tokenAuth;
        
        // Read back the changes to confirm all works
        var updateChangeSet = await client.Repository.Content.UpdateFile(owner, repo, referencePath,
                                    new UpdateFileRequest("Domain Model was updated via automation", currentFileText, response.sha, branch));
         
        response = await api.RetrieveFileAsync(owner: owner, personalToken: personalGitHubToken, repositoryName: repo, pathName: referencePath, branch: branch);
        currentFileText = Encoding.UTF8.GetString(Convert.FromBase64String(response.content));
    }

I am certain there are many other ways to accomplish this but this is what worked for me and I hope this helps to make someone else's life a bit easier.

Cheers John

answered on Stack Overflow Nov 9, 2020 by John Kears • edited Nov 10, 2020 by John Kears

User contributions licensed under CC BY-SA 3.0