CSV Flat Data processing with performance

0

I need some ideas how to increase performance to read from CSV file.

I made one version is working well, but performance is no longer desirable when we have more than 200 000 line per CSV, the processing time is taking a lot of time, and the importing process is going to be very slow in performance.

I will show a little piece of CSV file:

"Oliver Russi,oliverrussi31@somecompany1.com,John Brown,VIVW8562,BankName8,72368997,581349,AUD"
"Oliver Russi,oliverrussi31@somecompany1.com,John Brown,VIVW8562,BankName3,77960361,402376,EUR"
"Oliver Russi,oliverrussi31@somecompany1.com,John Brown,VIVW8562,BankName5,73992264,302171,AUD"
"Oliver Russi,oliverrussi31@somecompany1.com,John Brown,VIVW8562,BankName1,00748228,313303,GBP"
"Oliver Russi,oliverrussi31@somecompany1.com,Alberto John,XIWVW8623,BankName8,32368997,381349,AUD"
"Oliver Russi,oliverrussi31@somecompany1.com,Alberto John,XIWVW8623,BankName3,56960361,602376,EUR"

                                                
"Sale agent 1,sale_agent1@somecompany3831.com,Client Name 1 of Sale Agent 1,VIVW8562, BankName8,72368997,581349,AUD"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 1 of Sale Agent 1, VIVW8562,BankName3,77960361,402376,EUR"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 1 of Sale Agent 1, VIVW8562,BankName5,73992264,302171,AUD"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 2 of Sale Agent 1, XIWVW8623, BankName8,32368997,381349,AUD"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 2 of Sale Agent 1, XIWVW8623, BankName3,56960361,602376,EUR"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 2 of Sale Agent 1, XIWVW8623, BankName5,88992264,702171,AUD"
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 2 of Sale Agent 1, XIWVW8623, BankName1,97748228,913303,GBP"                             
"Sale agent 1,sale_agent1@somecompany3831.com, Client Name 2 of Sale Agent 1, XIWVW8623, BankName2,44648228,223334,EUR"                             
"Sale agent 2,sale_agent2@somecompany382.com, Client Name 1 of Sale agent 2, VIVW8562, BankName8,72368997,581349,AUD"
"Sale agent 2,sale_agent2@somecompany382.com, Client Name 1 of Sale agent 2, VIVW8562, BankName3,77960361,402376,EUR"
"Sale agent 2,sale_agent2@somecompany382.com, Client Name 1 of Sale agent 2, VIVW8562, BankName5,73992264,302171,AUD"
"Sale agent 2,sale_agent2@somecompany382.com,Client Name 2 of Sale agent 2, XIWVW8623, BankName8,32368997,381349,AUD"

The desirable output required, based on the CSV piece it is: List of sale Agents composed of 3 sales agents, each of one has a list composed with 2 clients and respective client has a list of bank accounts.

Something like next:

  "Oliver Russi,oliverrussi31@somecompany1.com,
                John Brown, VIVW8562, 
                        BankName8,72368997,581349,AUD"
                        BankName3,77960361,402376,EUR"
                        BankName5,73992264,302171,AUD"
                        BankName1,00748228,313303,GBP"
  
  "Oliver Russi,oliverrussi31@somecompany1.com,
                Alberto John, XIWVW8623, 
                        BankName8,32368997,381349,AUD"
                        BankName3,56960361,602376,EUR"
                                                        
  "Sale agent 1,sale_agent1@somecompany3831.com,
                Client Name 1 of Sale Agent 1, VIVW8562, 
                        BankName8,72368997,581349,AUD"
                        BankName3,77960361,402376,EUR"
                        BankName5,73992264,302171,AUD"
  
  "Sale agent 1,sale_agent1@somecompany3831.com,
                Client Name 2 of Sale Agent 1, XIWVW8623, 
                        BankName8,32368997,381349,AUD"
                        BankName3,56960361,602376,EUR"
                        BankName5,88992264,702171,AUD"
                        BankName1,97748228,913303,GBP"                              
                        BankName2,44648228,223334,EUR"                              

  "Sale agent 2,sale_agent2@somecompany382.com,
                Client Name 1 of Sale agent 2, VIVW8562, 
                        BankName8,72368997,581349,AUD"
                        BankName3,77960361,402376,EUR"
                        BankName5,73992264,302171,AUD"
  
  "Sale agent 2,sale_agent2@somecompany382.com,
                Client Name 2 of Sale agent 2, XIWVW8623, 
                        BankName8,32368997,381349,AUD"

My new improved code using dictionary it is next:

public SalesAgentList ToSalesAgentList(string fileName)
        {
            List<SalesAgent> lst_salesAgent = new List<SalesAgent>();
            SalesAgentList salesAgentList = new SalesAgentList(lst_salesAgent);

            try
            {
                lst_salesAgents.Clear();

                string skipStr = "SalesAgentName,SalesAgentEmailAddress,ClientName,ClientIdentifier,BankName,AccountNumber,SortCode,Currency";

                string[] fileContents = File.ReadAllLines(fileName, Encoding.UTF8);

                List<string> lst = fileContents.OfType<string>().ToList().Select(s => s.Replace("\"", string.Empty)).ToList();

                lst.Remove(skipStr);

                ICollection<List<string>> t = (from i in (lst.Select(item => item.Split(',')))
                                               select new List<string> { i[0], i[1], i[2], i[3], i[4], i[5], i[6], i[7] }).Distinct().ToList();

                Dictionary<string, BankAccount> dct_clBancAccount = new Dictionary<string, BankAccount>();

                Dictionary<string, Dictionary<string, BankAccount>> clDictionary = new Dictionary<string, Dictionary<string, BankAccount>>();

                Dictionary<string, /*SaleAgent dictionary*/
                        Dictionary<string,/*Client dictionary*/
                                 Dictionary<string, BankAccount>/*Bank Account dictionary*/
                                  >
                          > tmp = new Dictionary<string, Dictionary<string, Dictionary<string, BankAccount>>>();

                foreach (List<string> itm in t)
                {
                    Dictionary<string, Dictionary<string, BankAccount>> tmp_clDictionary = new Dictionary<string, Dictionary<string, BankAccount>>();
                    Dictionary<string, BankAccount> tmp_dct_clBancAccount = new Dictionary<string, BankAccount>();

                    BankAccount clBankAccount = new BankAccount(itm[3], itm[4], itm[5], itm[6], itm[7]);

                    SalesAgentBuilder tmpSaleAgent = new SalesAgentBuilder();
                    tmpSaleAgent.SalesAgentName = itm[0];
                    tmpSaleAgent.SalesAgentEmailAddress = itm[1];

                    if (!tmp.ContainsKey(itm[0]))
                    {
                        dct_clBancAccount.Add(itm[5], clBankAccount);
                        tmp_dct_clBancAccount.Add(itm[5], clBankAccount);
                        tmp_clDictionary.Add(itm[3], tmp_dct_clBancAccount);
                        clDictionary.Add(itm[3], tmp_dct_clBancAccount);
                        tmp.Add(itm[0], tmp_clDictionary);
                    }
                    else
                    {
                        if (!dct_clBancAccount.ContainsKey(itm[5]))
                        {
                            Dictionary<string, BankAccount> tmpclBnkAccDictionary;
                            if (clDictionary.TryGetValue(itm[3], out tmpclBnkAccDictionary))
                            {
                                tmpclBnkAccDictionary.Add(itm[5], clBankAccount);
                                tmp_dct_clBancAccount.Add(itm[5], clBankAccount);
                            }
                        }
                    }
                }


//need to implement salesAgentList = tmp.ToList(); 

            }

            catch (Exception ex) { Console.WriteLine(ex.Message); }
            return salesAgentList;
        }

the issue: I have tryed to change the part of code of foreach to parralel.foreach() but i'm getting next error. Just for mention on the normal foreach way it's not getting error.

System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=mscorlib
  StackTrace:
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at SalesAgentFileRecordList.<>c__DisplayClass23_0.<ToSalesAgentList>b__3(List`1 itm)
   in D:\doc\GH\DevTest-Q3_v\ProcessCSV.cs:line 1139

any help it's appreciated. thanks,

c#
csv
asked on Stack Overflow Sep 8, 2020 by Gheorghe Graur • edited Sep 23, 2020 by marc_s

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0