{"id":521,"date":"2026-03-07T07:20:23","date_gmt":"2026-03-06T23:20:23","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=521"},"modified":"2026-03-07T07:20:23","modified_gmt":"2026-03-06T23:20:23","slug":"a-production-style-networkit-11-2-1-coding-tutorial-for-large-scale-graph-analytics-communities-cores-and-sparsification","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=521","title":{"rendered":"A Production-Style NetworKit 11.2.1 Coding Tutorial for Large-Scale Graph Analytics, Communities, Cores, and Sparsification"},"content":{"rendered":"<p>In this tutorial, we implement a production-grade, large-scale graph analytics pipeline in <a href=\"https:\/\/github.com\/networkit\/networkit\"><strong>NetworKit<\/strong><\/a>, focusing on speed, memory efficiency, and version-safe APIs in NetworKit 11.2.1. We generate a large-scale free network, extract the largest connected component, and then compute structural backbone signals via k-core decomposition and centrality ranking. We also detect communities with PLM and quantify quality using modularity; estimate distance structure using effective and estimated diameters; and, finally, sparsify the graph to reduce cost while preserving key properties. We export the sparsified graph as an edgelist so we can reuse it in downstream workflows, benchmarking, or graph ML preprocessing.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">!pip -q install networkit pandas numpy psutil\n\n\nimport gc, time, os\nimport numpy as np\nimport pandas as pd\nimport psutil\nimport networkit as nk\n\n\nprint(\"NetworKit:\", nk.__version__)\nnk.setNumberOfThreads(min(2, nk.getMaxNumberOfThreads()))\nnk.setSeed(7, False)\n\n\ndef ram_gb():\n   p = psutil.Process(os.getpid())\n   return p.memory_info().rss \/ (1024**3)\n\n\ndef tic():\n   return time.perf_counter()\n\n\ndef toc(t0, msg):\n   print(f\"{msg}: {time.perf_counter()-t0:.3f}s | RAM~{ram_gb():.2f} GB\")\n\n\ndef report(G, name):\n   print(f\"n[{name}] nodes={G.numberOfNodes():,} edges={G.numberOfEdges():,} directed={G.isDirected()} weighted={G.isWeighted()}\")\n\n\ndef force_cleanup():\n   gc.collect()\n\n\nPRESET = \"LARGE\"\n\n\nif PRESET == \"LARGE\":\n   N = 120_000\n   M_ATTACH = 6\n   AB_EPS = 0.12\n   ED_RATIO = 0.9\nelif PRESET == \"XL\":\n   N = 250_000\n   M_ATTACH = 6\n   AB_EPS = 0.15\n   ED_RATIO = 0.9\nelse:\n   N = 80_000\n   M_ATTACH = 6\n   AB_EPS = 0.10\n   ED_RATIO = 0.9\n\n\nprint(f\"nPreset={PRESET} | N={N:,} | m={M_ATTACH} | approx-betweenness epsilon={AB_EPS}\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We set up the Colab environment with NetworKit and monitoring utilities, and we lock in a stable random seed. We configure thread usage to match the runtime and define timing and RAM-tracking helpers for each major stage. We choose a scale preset that controls graph size and approximation knobs so the pipeline stays large but manageable.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">t0 = tic()\nG = nk.generators.BarabasiAlbertGenerator(M_ATTACH, N).generate()\ntoc(t0, \"Generated BA graph\")\nreport(G, \"G\")\n\n\nt0 = tic()\ncc = nk.components.ConnectedComponents(G)\ncc.run()\ntoc(t0, \"ConnectedComponents\")\nprint(\"components:\", cc.numberOfComponents())\n\n\nif cc.numberOfComponents() &gt; 1:\n   t0 = tic()\n   G = nk.graphtools.extractLargestConnectedComponent(G, compactGraph=True)\n   toc(t0, \"Extracted LCC (compactGraph=True)\")\n   report(G, \"LCC\")\n\n\nforce_cleanup()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We generate a large Barab\u00e1si\u2013Albert graph and immediately log its size and runtime footprint. We compute connected components to understand fragmentation and quickly diagnose topology. We extract the largest connected component and compact it to improve the rest of the pipeline\u2019s performance and reliability.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">t0 = tic()\ncore = nk.centrality.CoreDecomposition(G)\ncore.run()\ntoc(t0, \"CoreDecomposition\")\ncore_vals = np.array(core.scores(), dtype=np.int32)\nprint(\"degeneracy (max core):\", int(core_vals.max()))\nprint(\"core stats:\", pd.Series(core_vals).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())\n\n\nk_thr = int(np.percentile(core_vals, 97))\n\n\nt0 = tic()\nnodes_backbone = [u for u in range(G.numberOfNodes()) if core_vals[u] &gt;= k_thr]\nG_backbone = nk.graphtools.subgraphFromNodes(G, nodes_backbone)\ntoc(t0, f\"Backbone subgraph (k&gt;={k_thr})\")\nreport(G_backbone, \"Backbone\")\n\n\nforce_cleanup()\n\n\nt0 = tic()\npr = nk.centrality.PageRank(G, damp=0.85, tol=1e-8)\npr.run()\ntoc(t0, \"PageRank\")\n\n\npr_scores = np.array(pr.scores(), dtype=np.float64)\ntop_pr = np.argsort(-pr_scores)[:15]\nprint(\"Top PageRank nodes:\", top_pr.tolist())\nprint(\"Top PageRank scores:\", pr_scores[top_pr].tolist())\n\n\nt0 = tic()\nabw = nk.centrality.ApproxBetweenness(G, epsilon=AB_EPS)\nabw.run()\ntoc(t0, \"ApproxBetweenness\")\n\n\nabw_scores = np.array(abw.scores(), dtype=np.float64)\ntop_abw = np.argsort(-abw_scores)[:15]\nprint(\"Top ApproxBetweenness nodes:\", top_abw.tolist())\nprint(\"Top ApproxBetweenness scores:\", abw_scores[top_abw].tolist())\n\n\nforce_cleanup()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We compute the core decomposition to measure degeneracy and identify the network\u2019s high-density backbone. We extract a backbone subgraph using a high core-percentile threshold to focus on structurally important nodes. We run PageRank and approximate betweenness to rank nodes by influence and bridge-like behavior at scale.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">t0 = tic()\nplm = nk.community.PLM(G, refine=True, gamma=1.0, par=\"balanced\")\nplm.run()\ntoc(t0, \"PLM community detection\")\n\n\npart = plm.getPartition()\nnum_comms = part.numberOfSubsets()\nprint(\"communities:\", num_comms)\n\n\nt0 = tic()\nQ = nk.community.Modularity().getQuality(part, G)\ntoc(t0, \"Modularity\")\nprint(\"modularity Q:\", Q)\n\n\nsizes = np.array(list(part.subsetSizeMap().values()), dtype=np.int64)\nprint(\"community size stats:\", pd.Series(sizes).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())\n\n\nt0 = tic()\neff = nk.distance.EffectiveDiameter(G, ED_RATIO)\neff.run()\ntoc(t0, f\"EffectiveDiameter (ratio={ED_RATIO})\")\nprint(\"effective diameter:\", eff.getEffectiveDiameter())\n\n\nt0 = tic()\ndiam = nk.distance.EstimatedDiameter(G)\ndiam.run()\ntoc(t0, \"EstimatedDiameter\")\nprint(\"estimated diameter:\", diam.getDiameter().distance)\n\n\nforce_cleanup()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We detect communities using PLM and record the number of communities found on the large graph. We compute modularity and summarize community-size statistics to validate the structure rather than simply trusting the partition. We estimate global distance behavior using effective diameter and estimated diameter in an API-safe way for NetworKit 11.2.1.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">t0 = tic()\nsp = nk.sparsification.LocalSimilaritySparsifier(G, 0.7)\nG_sparse = sp.getSparsifiedGraph()\ntoc(t0, \"LocalSimilarity sparsification (alpha=0.7)\")\nreport(G_sparse, \"Sparse\")\n\n\nt0 = tic()\npr2 = nk.centrality.PageRank(G_sparse, damp=0.85, tol=1e-8)\npr2.run()\ntoc(t0, \"PageRank on sparse\")\npr2_scores = np.array(pr2.scores(), dtype=np.float64)\nprint(\"Top PR nodes (sparse):\", np.argsort(-pr2_scores)[:15].tolist())\n\n\nt0 = tic()\nplm2 = nk.community.PLM(G_sparse, refine=True, gamma=1.0, par=\"balanced\")\nplm2.run()\ntoc(t0, \"PLM on sparse\")\npart2 = plm2.getPartition()\nQ2 = nk.community.Modularity().getQuality(part2, G_sparse)\nprint(\"communities (sparse):\", part2.numberOfSubsets(), \"| modularity (sparse):\", Q2)\n\n\nt0 = tic()\neff2 = nk.distance.EffectiveDiameter(G_sparse, ED_RATIO)\neff2.run()\ntoc(t0, \"EffectiveDiameter on sparse\")\nprint(\"effective diameter (orig):\", eff.getEffectiveDiameter(), \"| (sparse):\", eff2.getEffectiveDiameter())\n\n\nforce_cleanup()\n\n\nout_path = \"\/content\/networkit_large_sparse.edgelist\"\nt0 = tic()\nnk.graphio.EdgeListWriter(\"t\", 0).write(G_sparse, out_path)\ntoc(t0, \"Wrote edge list\")\nprint(\"Saved:\", out_path)\n\n\nprint(\"nAdvanced large-graph pipeline complete.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We sparsify the graph using local similarity to reduce the number of edges while retaining useful structure for downstream analytics. We rerun PageRank, PLM, and effective diameter on the sparsified graph to check whether key signals remain consistent. We export the sparsified graph as an edgelist so we can reuse it across sessions, tools, or additional experiments.<\/p>\n<p>In conclusion, we developed an end-to-end, scalable NetworKit workflow that mirrors real large-network analysis: we started from generation, stabilized the topology with LCC extraction, characterized the structure through cores and centralities, discovered communities and validated them with modularity, and captured global distance behavior through diameter estimates. We then applied sparsification to shrink the graph while keeping it analytically meaningful and saving it for repeatable pipelines. The tutorial provides a practical template we can reuse for real datasets by replacing the generator with an edgelist reader, while keeping the same analysis stages, performance tracking, and export steps.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Data%20Science\/NetworKit_LargeGraph_Analytics_Pipeline_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/06\/a-production-style-networkit-11-2-1-coding-tutorial-for-large-scale-graph-analytics-communities-cores-and-sparsification\/\">A Production-Style NetworKit 11.2.1 Coding Tutorial for Large-Scale Graph Analytics, Communities, Cores, and Sparsification<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we implement&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-521","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/521","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=521"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/521\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}