{"id":5020,"date":"2021-04-28T14:13:26","date_gmt":"2021-04-28T13:13:26","guid":{"rendered":"https:\/\/solidt.eu\/site\/?p=5020"},"modified":"2023-12-04T20:51:28","modified_gmt":"2023-12-04T19:51:28","slug":"c-read-groups-of-string-separated-streaming","status":"publish","type":"post","link":"https:\/\/solidt.eu\/site\/c-read-groups-of-string-separated-streaming\/","title":{"rendered":"C# Read Groups of string separated (streaming) from STDIN"},"content":{"rendered":"\n<p>Update 2023-12-04: See fastest ReadOnlySpan&lt;T> version at the bottom of the post<\/p>\n\n\n\n<p>Console Input<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"csharp\" data-theme=\"monokai\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"true\" data-copy=\"false\">using System;\nusing System.Collections.Generic;\nusing System.Threading.Tasks;\nusing System.Text;\nusing System.IO;\nusing System.Diagnostics;\n\nnamespace Domainizr.ConsoleApp\n{\n    public class Program\n    {\n        public static async Task Main(string[] args)\n        {\n            Console.WriteLine($\"Starting\");\n            var sw = new Stopwatch();\n            sw.Start();\n            var bufferSize = 1024 * 1024;\n            var buffer = new char[bufferSize];\n            var groupCount = 0;\n            var stream = Console.OpenStandardInput(bufferSize);\n            \/\/using (var stream = File.OpenRead(\"MOCK_DATA.json\"))\n            \/\/{\n                var reader = new StreamReader(stream, Encoding.UTF8);\n                Console.SetIn(reader); \/\/ This will allow input >256 chars\n                Console.WriteLine($\"Reading groups\");\n                \n                await foreach(var group in ReadGroups(reader, buffer))\n                {\n                    if (groupCount % 100 == 0) Console.Write($\".\");\n                    groupCount++;\n                }\n            \/\/}\n            Console.WriteLine($\"\\r\\nCount: {groupCount}\");\n            Console.WriteLine($\"Elapsed: {sw.ElapsedMilliseconds}ms\");\n        }\n\n        public static string HexString(string plainText)\n        {\n            var plainTextBytes = Encoding.UTF8.GetBytes(plainText);\n            return BitConverter.ToString(plainTextBytes);\n        }\n\n        public static async IAsyncEnumerable&lt;string> ReadGroups(StreamReader stream, char[] buffer)\n        {\n            var separator = \"\\n\";\n            Console.WriteLine($\"Separator: {HexString(separator)}\");\n            var lastPart = \"\";\n            while (true)\n            {\n                var size = await stream.ReadAsync(buffer, 0, buffer.Length);\n                if (size &lt;= 0) break;\n                var s = lastPart + new string(buffer[0..size]);\n                var parts = SplitString(s, separator);\n                foreach(var part in parts)\n                {\n                    if (part.complete)\n                    {\n                        lastPart = \"\";\n                        yield return part.value;\n                    }\n                    else\n                    {\n                        lastPart += part.value;\n                    }\n                }\n            }\n            if (!string.IsNullOrEmpty(lastPart))\n                yield return lastPart;\n        }\n\n        public static IEnumerable&lt;(bool complete, string value)> SplitString(string source, string separator)\n        {\n            if (source.Length &lt;= 0) yield break;\n            var start = 0;\n            var sepLen = separator.Length;\n            while(true)\n            {\n                var end = source.IndexOf(separator, start);\n                if (end &lt; 0) break; \n                yield return (true, new string(source.AsSpan()[start..end]));\n                start = end + sepLen;\n            }\n            var last = new string(source.AsSpan()[start..source.Length]);\n            if (!string.IsNullOrEmpty(last)) {\n                yield return (false, last);\n            }\n        }\n    }\n}\n<\/pre><\/div>\n\n\n\n<p>Usage: Run application and type: 123#45#67&lt;ENTER&gt;89#&lt;ENTER&gt;<\/p>\n\n\n\n<p>Update: 2023-28-11: Refactored<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"csharp\" data-theme=\"monokai\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"true\" data-copy=\"false\">using System;\nusing System.Collections.Generic;\nusing System.Diagnostics;\nusing System.IO;\nusing System.Text;\nusing System.Threading.Tasks;\n\nnamespace ConsoleApp\n{\n    class Program\n    {\n        public static async Task Main(string[] args)\n        {\n            if (args.Length == 0)\n            {\n                Console.WriteLine(\"Please provide an input file path as a command-line argument.\");\n                return;\n            }\n\n            string filePath = args[0];\n\n            Console.WriteLine($\"Starting\");\n            var sw = new Stopwatch();\n            sw.Start();\n            \n            var bufferSize = 1024 * 32;\n            var buffer = new char[bufferSize];\n            var groupCount = 0;\n            \n            try\n            {\n                using (var stream = File.OpenRead(filePath))\n                {\n                    var reader = new StreamReader(stream, Encoding.UTF8);\n                    Console.SetIn(reader); \/\/ This will allow input >256 chars\n                    Console.WriteLine($\"Reading groups\");\n                    double groupSizes = 0;\n                    await foreach(var group in ReadGroups(reader, buffer, \"\\n\"))\n                    {\n                        if (groupCount &lt; 10) {\n                            Console.WriteLine($\"group: {group}\");\n                        }\n                        if (groupCount % 1000 == 0) Console.Write($\".\");\n                        groupCount++;\n                        groupSizes += group.Length;\n                    }\n\n                    Console.WriteLine($\"\\nAverage size: {groupSizes \/ groupCount}\");\n                }\n            }\n            catch (Exception ex)\n            {\n                Console.WriteLine($\"\\r\\nError: {ex}\");\n            }\n\n            Console.WriteLine($\"\\r\\nCount: {groupCount}\");\n            Console.WriteLine($\"Elapsed: {FormatElapsedTime(sw.Elapsed)}\");\n        }\n\n        \/\/ A helper method to format elapsed time in a human-readable form\n        public static string FormatElapsedTime(TimeSpan elapsed)\n        {\n            return $\"{(int)elapsed.TotalHours}h {elapsed.Minutes}m {elapsed.Seconds}s {elapsed.Milliseconds}ms\";\n        }\n\n        public static async IAsyncEnumerable&lt;string> ReadGroups(StreamReader stream, char[] buffer, string separator)\n        {\n            var openParts = new List&lt;string>();\n            while (true)\n            {\n                var size = await stream.ReadAsync(buffer, 0, buffer.Length);\n                if (size &lt;= 0) break;\n                var s = new string(buffer, 0, size);\n                var position = 0;\n                while (true)\n                {\n                    var index = s.IndexOf(separator, position);\n                    if (index &lt; 0) break;\n                    openParts.Add(s.Substring(position, index - position));\n                    yield return string.Concat(openParts);\n                    openParts.Clear();\n                    position = index + separator.Length;\n                }\n                openParts.Add(s.Substring(position));\n            }\n            yield return string.Concat(openParts);\n        }\n        \n        public static async IAsyncEnumerable&lt;string> ReadGroups2(StreamReader stream, char[] buffer, string separator)\n        {\n            var stringBuilder = new StringBuilder();\n            \n            while (true)\n            {\n                var size = await stream.ReadAsync(buffer, 0, buffer.Length);\n                if (size &lt;= 0) break;\n                \n                var s = new string(buffer, 0, size);\n                var position = 0;\n\n                while (true)\n                {\n                    var index = s.IndexOf(separator, position);\n                    if (index &lt; 0) break;\n\n                    stringBuilder.Append(s, position, index - position);\n                    yield return stringBuilder.ToString();\n\n                    stringBuilder.Clear();\n                    position = index + separator.Length;\n                }\n\n                stringBuilder.Append(s, position, s.Length - position);\n            }\n\n            if (stringBuilder.Length > 0)\n                yield return stringBuilder.ToString();\n        }\n    }\n}\n<\/pre><\/div>\n\n\n\n<p>Update 2023-12-04: Refactored with ReadOnlySpan&lt;T><\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"csharp\" data-theme=\"monokai\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"true\" data-copy=\"false\">using System;\nusing System.Collections.Generic;\nusing System.Diagnostics;\nusing System.IO;\nusing System.Text;\nusing System.Threading.Tasks;\n\nnamespace ConsoleApp\n{\n    class Program\n    {\n        public static async Task Main(string[] args)\n        {\n            if (args.Length == 0)\n            {\n                Console.WriteLine(\"Please provide an input file path as a command-line argument.\");\n                return;\n            }\n\n            string filePath = args[0];\n\n            Console.WriteLine($\"Starting\");\n            var sw = new Stopwatch();\n            sw.Start();\n\n            var bufferSize = 1024;\n            var buffer = new char[bufferSize];\n            var groupCount = 0;\n\n            try\n            {\n                using (var stream = File.OpenRead(filePath))\n                {\n                    var reader = new StreamReader(stream, Encoding.UTF8);\n                    Console.SetIn(reader); \/\/ This will allow input >256 chars\n                    Console.WriteLine($\"Reading groups\");\n                    double groupSizes = 0;\n                    await foreach (var group in ReadGroups(reader, buffer, \"\\n\"))\n                    {\n                        if (groupCount &lt; 10)\n                        {\n                            Console.WriteLine($\"group: {group}\");\n                        }\n                        if (groupCount % 1000 == 0) Console.Write($\".\");\n                        groupCount++;\n                        groupSizes += group.Length;\n                    }\n\n                    Console.WriteLine($\"\\nAverage size: {groupSizes \/ groupCount}\");\n                }\n            }\n            catch (Exception ex)\n            {\n                Console.WriteLine($\"\\r\\nError: {ex}\");\n            }\n\n            Console.WriteLine($\"\\r\\nCount: {groupCount}\");\n            Console.WriteLine($\"Elapsed: {FormatElapsedTime(sw.Elapsed)}\");\n        }\n\n        \/\/ A helper method to format elapsed time in a human-readable form\n        public static string FormatElapsedTime(TimeSpan elapsed)\n        {\n            return $\"{(int)elapsed.TotalHours}h {elapsed.Minutes}m {elapsed.Seconds}s {elapsed.Milliseconds}ms\";\n        }\n\n        public static async IAsyncEnumerable&lt;string> ReadGroups(StreamReader stream, char[] buffer, string separator)\n        {\n            var stringBuilder = new StringBuilder();\n            while (true)\n            {\n                var size = await stream.ReadAsync(buffer, 0, buffer.Length);\n                if (size &lt;= 0) break;\n\n                foreach (var item in ReadGroupsSync(stringBuilder, buffer, separator, size))\n                    yield return item;\n            }\n\n            if (stringBuilder.Length > 0)\n                yield return stringBuilder.ToString();\n        }\n\n        private static List&lt;string> ReadGroupsSync(StringBuilder stringBuilder, char[] buffer, string separator, int size)\n        {\n            var list = new List&lt;string>();\n            var s = buffer.AsSpan(0, size);\n            var position = 0;\n            while (true)\n            {\n                var index = s.IndexOf(separator.AsSpan());\n                if (index &lt; 0) break;\n\n                stringBuilder.Append(s.Slice(0, index));\n                list.Add(stringBuilder.ToString());\n\n                stringBuilder.Clear();\n                position = position + index + separator.Length;\n                s = buffer.AsSpan(position, size - position);\n            }\n            stringBuilder.Append(s.Slice(0, s.Length));\n            return list;\n        }\n    }\n}\n<\/pre><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Update 2023-12-04: See fastest ReadOnlySpan&lt;T> version at the bottom of the post Console Input Usage: Run application and type: 123#45#67&lt;ENTER&gt;89#&lt;ENTER&gt; Update: 2023-28-11: Refactored Update 2023-12-04: Refactored with ReadOnlySpan&lt;T><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[6,4,1],"tags":[],"class_list":["post-5020","post","type-post","status-publish","format-standard","hentry","category-dotnet","category-programming","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/posts\/5020","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/comments?post=5020"}],"version-history":[{"count":9,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/posts\/5020\/revisions"}],"predecessor-version":[{"id":8174,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/posts\/5020\/revisions\/8174"}],"wp:attachment":[{"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/media?parent=5020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/categories?post=5020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solidt.eu\/site\/wp-json\/wp\/v2\/tags?post=5020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}