c# - Reading text files with 64bit process very slow -

August 15, 2014

i'm merging text files (.itf) logic located in folder. when compile 32bit (console application, .net 4.6) works fine except outofmemory exceptions if there lots of data in folders. compiling 64bit solve problem running super slow compared 32bit process (more 15 times slower).

i tried bufferedstream , readalllines, both performing poorly. profiler tells me these methods use 99% of time. don't know problem is...

here's code:

private static void readdata(dictionary<string, topic> topics) {     foreach (string file in directory.enumeratefiles(path, "*.itf"))     {         topic currenttopic = null;         table currenttable = null;         object currentobject = null;         using (var fs = file.open(file, filemode.open))         {             using (var bs = new bufferedstream(fs))             {                 using (var sr = new streamreader(bs, encoding.default))                 {                     string line;                     while ((line = sr.readline()) != null)                     {                         if (line.indexof("etop") > -1)                         {                             currenttopic = null;                         }                         else if (line.indexof("etab") > -1)                         {                             currenttable = null;                         }                         else if (line.indexof("elin") > -1)                         {                             currentobject = null;                         }                         else if (line.indexof("mtid") > -1)                         {                             mtid = line.replace("mtid ", "");                         }                         else if (line.indexof("modl") > -1)                         {                             modl = line.replace("modl ", "");                         }                         else if (line.indexof("topi") > -1)                         {                             var name = line.replace("topi ", "");                             if (topics.containskey(name))                             {                                 currenttopic = topics[name];                             }                             else                             {                                 var topic = new topic(name);                                 currenttopic = topic;                                 topics.add(name, topic);                             }                         }                         else if (line.indexof("tabl") > -1)                         {                             var name = line.replace("tabl ", "");                             if (currenttopic.tables.containskey(name))                             {                                 currenttable = currenttopic.tables[name];                             }                             else                             {                                 var table = new table(name);                                 currenttable = table;                                 currenttopic.tables.add(name, table);                             }                         }                         else if (line.indexof("obje") > -1)                         {                             if (currenttable.name != "metadata" || currenttable.objects.count == 0)                             {                                 var shortline = line.replace("obje ", "");                                 var obje = new object(shortline.substring(shortline.indexof(" ")));                                 currentobject = obje;                                 currenttable.objects.add(obje);                             }                         }                         else if (currenttopic != null && currenttable != null && currentobject != null)                         {                             currentobject.data.add(line);                         }                     }                 }             }         }     } }

the biggest problem program that, when let run in 64-bit mode, can read lot more files. nice, 64-bit process has thousand times more address space 32-bit process, running out of excessively unlikely.

but not thousand times more ram.

the universal principle of "there no free lunch" @ work. having enough ram matters great deal in program this. first , foremost, used file system cache. magical operating system feature makes look reading files disk cheap. not @ all, 1 of slowest things can in program, @ hiding it. you'll invoke when run program more once. second, , subsequent, times won't read disk @ all. that's pretty dangerous feature , hard avoid when test program, very unrealistic assumptions how efficient is.

the problem 64-bit process makes file system cache ineffective. since can read lot more files, overwhelming cache. , getting old file data removed. second time run program not fast anymore. files read not in cache anymore must read disk. you'll see real perf of program, way behave in production. that's thing, though don't :)

secondary problem ram lesser one, if allocate lot of memory store file data you'll force operating system find ram store it. can cause lot of hard page faults, incurred when must unmap memory used process, or yours, free ram need. generic problem called "thrashing". page faults can see in task manager, use view > select columns add it.

given file system cache source of slow-down, simple test can rebooting machine, ensures cache cannot have of file data, run 32-bit version. prediction slow , bufferedstream , readalllines bottlenecks. should be.

one final note, though program doesn't match pattern, cannot make strong assumptions .net 4.6 perf problems yet. not until this nasty bug gets fixed.

Search This Blog

TSQL

c# - Reading text files with 64bit process very slow -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -