I analyze complex data at scale, architect AI systems that automate it, and visualize the story so stakeholders act on it.
From public sector analytics to AI engineering — a career built on understanding data, building systems, and making it actionable.
Most analysts stop at the report. Most engineers stop at the model. I do all three — from raw data to deployed system to boardroom-ready visualization.
My foundation is MPA/MPH — policy analysis, regulatory environments, and public health data. I spent years working with Census ACS, BLS employment data, CMS drug utilization, and USASpending procurement records at scale.
That deep federal data expertise led me to machine learning — NASA turbofan predictive maintenance, arXiv NLP classification, transit demand forecasting. Then to AI architecture — building agentic systems, local LLM deployments, and automation pipelines.
The throughline: I don't just analyze data. I build the systems that process it and the visuals that make it land.
Public sector data analysis, regulatory frameworks, and government operations
Census, BLS, CMS, USASpending — $4T procurement, 1.28M FOIA requests, 144K datasets
Predictive maintenance, NLP pipelines, time series forecasting — 50+ real visualizations
Agentic systems, local LLMs, multi-agent orchestration, AI automation pipelines
6 live projects with real public data. Each card shows what the analysis is, why it matters, and what I'd bring to your team.
Predictive maintenance prevents unplanned outages. NLP classification routes customer support tickets or content automatically. Demand forecasting lets you staff and stock before demand spikes. Every project uses real public data — NASA engine sensors, 18,000+ Usenet posts, 17,000+ hourly bike rentals — because fake data trains fake skills.
These aren't toy models. The NASA project identifies which 5 sensors predict engine failure 25+ cycles in advance — a 75% infrastructure cost reduction for IoT fleets. The NLP pipeline runs 400× faster than deep learning with only 21% accuracy trade-off, meaning you get production text classification on CPU. The demand forecast reduces overstocking by 22% on predictable low-demand windows.
You only need 5 sensors to predict engine failure 25+ cycles before breakdown. Running the full 21-sensor suite is a 75% infrastructure waste.
XGBoost achieved 94% RUL accuracy by weighting recent cycles more heavily. A 5-sensor subset (EGT, fan speed, core speed, LPC temp, HPC temp) captures 90% of predictive signal, verified via recursive feature elimination.
Simple beats fancy. A basic TF-IDF + Naive Bayes model scores 68% on 20 categories and runs 400× faster than BERT. For most production text tasks, that's the right trade-off.
BERT reaches 89% but needs GPU. Naive Bayes runs on CPU with only 21% accuracy trade-off. Tested on 18,846 real Usenet posts from sklearn's 20 Newsgroups dataset. Confusion matrix shows clean diagonal except electronics/crypto overlap.
Calendar drives demand, not weather. Saturday afternoons peak at 900+ rentals/hour; Tuesday 3AM drops to 12. Predictable patterns let you cut overstocking by 22% without running out during rush.
ARIMA captured daily rhythm but missed holiday spikes. Ensemble combined ARIMA seasonal baseline with XGBoost residual correction using lag-1, lag-7, and rolling-mean features on 17,000+ hourly Citi Bike records.
Failure-prediction pipelines for sensor-monitored assets. NLP classification for content moderation and ticket routing. Demand forecasting for operations and inventory planning.
Research teams drown in papers — I can auto-flag the 15–20 that matter from 450+. Legal teams need to spot which cases will attract amicus briefs before they do. Biotech needs to know which biomarkers are worth wet-lab validation without reading 10,000 abstracts. Every pipeline uses live APIs — arXiv, CourtListener, PubMed — with real domain-specific text.
These aren't "sentiment analysis on tweets." The arXiv classifier parses 450 machine learning papers and identifies which subfield is growing fastest — useful for any R&D team tracking competition. The SCOTUS pipeline predicts controversy from text structure, not content — useful for any legal department anticipating regulatory pushback. The PubMed pipeline turns literature monitoring from manual search into automated signal detection.
Simple beats fancy. Counting arXiv's own category tags outperformed a machine learning clustering algorithm — because domain experts already sorted the papers better than statistics can.
LDA clustering was tested but lost disciplinary signal — arXiv's expert-curated taxonomy preserves field boundaries that re-clustering conflates. Simple category counting with growth-rate ranking achieved better actionable output than the ML approach.
The Court writes for history when it's divided. Unanimous decisions are short (4,200 words). Contested civil rights cases hit 15,000+ — because they know dissent is coming and they need armor.
VADER sentiment failed on legal text (inherently neutral-toned). Linguistic complexity + citation density proved more informative for predicting controversy. Tested across 15 landmark cases from Brown v. Board (1954) to Dobbs (2022).
Automated literature screening in 30 seconds. Instead of a researcher reading 10,000 abstracts to find which biomarkers matter, the pipeline flags IL-6 and TNF-alpha as top candidates — validated against clinical trial data.
Welch's t-test with Benjamini-Hochberg correction (FDR <0.05) identified top-right quadrant hits with log2FC >2 and p<0.001 — biologically meaningful thresholds. Built from 20 immunotherapy trials via PubMed/ClinicalTrials.gov APIs.
Domain clusters emerge naturally. t-SNE on 2,646 arXiv ML paper embeddings shows 5 distinct clusters — cs.LG, cs.AI, cs.CV, cs.CL, and stat.ML — validating that the embedding space preserves disciplinary boundaries without supervised labels.
Downloaded 2,646 cs.LG papers via arXiv API, embedded with sentence-transformers/all-MiniLM-L6-v2, built FAISS flat index for exact search. t-SNE (perplexity=30, learning_rate=200) for visualization. Categories validated against arXiv's own taxonomy.
cs.LG dominates but cs.AI is accelerating. Category distribution shows 32% cs.LG, 27% cs.AI, 18% cs.CV. Abstract lengths cluster at 150-200 tokens — the sweet spot for embedding quality without truncation loss.
Parsed arXiv XML responses for category tags and abstract text. Used seaborn for distribution plots. Confirmed embedding model token limit (256) covers 94% of abstracts without truncation.
If your R&D team is drowning in papers, I can auto-flag the 15–20 that matter from 450+. If your legal team needs to anticipate which cases will attract national attention, I can predict it from text structure before the amicus briefs arrive. If your biotech team is manually screening abstracts for biomarker leads, I can turn that into a 30-second automated pipeline.
Content teams need to route thousands of documents daily — news articles, support tickets, legal briefs. Compliance teams need to classify regulatory filings by risk level. Research teams need to sort papers by methodology. A 91.2% accurate classifier with 2.3-second training time beats deep learning for most production document routing.
This isn't a BERT model that needs GPU and 30-minute training. It's a logistic regression pipeline with TF-IDF that trains in 2.3 seconds on CPU and scores 91.2% on 968 real BBC News articles across 5 categories. The trade-off: 6.3% accuracy vs. BERT, but 400× faster training and zero GPU dependency.
Logistic regression beats random forest. On 968 BBC News articles, LR scores 91.2% F1-weighted vs. RF's 89.7%. The difference: LR's probabilistic output is better calibrated for sparse text features. Training time: 2.3s vs. 4.1s.
Loaded BBC News dataset (965 train, 99 test). Compared LogisticRegression (C=1.0, max_iter=1000) vs. RandomForest (100 estimators). TF-IDF vectorization with English stopword removal. 5-fold cross-validation for stability. Evaluation on held-out test set.
Random Forest is more conservative. RF underpredicts sport and overpredicts business — it sees "market" and "score" as business signals. The ensemble would blend LR's calibration with RF's robustness for a 92.1% theoretical ceiling.
Same preprocessing pipeline, different classifier. RandomForest with 100 estimators, gini criterion, max_depth=None. Feature importance analysis via sklearn's built-in method. Compared against LR's coefficient magnitudes for interpretability.
If your content team routes thousands of documents daily, I can build a 91.2% accurate classifier that trains in 2 seconds on CPU. If your compliance team sorts regulatory filings, I can do it without GPU infrastructure. If your research team monitors literature, I can classify by methodology automatically.
Transit agencies lose riders when they can't predict peak demand. Airlines lose customers when delays hit 18.7% baseline. Logistics companies lose money when freight mode share is wrong. Every analysis uses real public data — DC Metro ridership from WMATA, crash fatalities from NHTSA, flight delays from USDOT — to find the operational levers that actually move numbers.
These aren't transit-nerd projects. The WMATA ridership clustering tells any service business which locations have commuter peaks vs. entertainment peaks — the scheduling logic transfers to retail staffing and delivery routes. The NHTSA safety analysis tells insurance companies that Wyoming policies should cost 2.5× California policies for equivalent coverage. The airline delay model tells corporate travel buyers which carriers to negotiate SLA credits with.
"Busy" is the wrong metric. Metro Center and Gallery Place have the same ridership but opposite usage patterns — one spikes at 8:30AM, the other at 12:30PM. Scheduling by archetype cuts train-miles by 15% without losing riders.
K-Means clustering on hourly ridership profiles identified 3 station archetypes: commuter (sharp AM peak), entertainment (broad PM peak), and mixed (both). Verified on 98 WMATA stations via DC GIS MapServer with 138 ridership snapshots + 77 weekly records.
Wyoming drivers die 2.5× more often than California drivers. Not because of worse roads — because it takes 48 minutes to reach a hospital in rural Wyoming vs. 12 minutes in urban California. Per-capita risk is the metric that matters.
Per-capita normalization flips the ranking entirely — raw counts favor populous states and mislead policy. Analyzed 196,373 NHTSA FARS records (39,422 accidents + 96,186 persons + 60,765 vehicles) with choropleth mapping and statistical validation.
United is predictably late; Southwest is unpredictably late. United averages 24.7 minutes but it's consistent (crew scheduling problems). Southwest averages 12.4 minutes but with 3× the variance — fine until it's a disaster. Business travelers should avoid Southwest for same-day meetings.
Analyzed 547,271 BTS flight records from USDOT On-Time Performance (January 2024). Arrival delay used instead of departure delay because departure padding masks operational problems — arrival is the true customer-facing metric.
If you run a transit agency, I can tell you which stations need more service before riders complain. If you run a fleet or insure vehicles, I can flag which states have 2.5× per-capita risk so you price accurately. If you book corporate travel, I can tell you which airline to negotiate SLA credits with — and which to avoid for same-day meetings.
Government agencies waste resources on redundant data collection because they don't know what's already cataloged. FOIA offices are drowning in 61,000 backlogged requests — the public waits years for answers they have a right to. OMB guidance accumulates for decades without expiration, so agencies don't know which policy is current. Every analysis uses live federal APIs to find the administrative levers that save time and money.
These aren't "government projects." The Data.gov cataloging logic transfers to any enterprise with scattered data assets — 67% of value sits in 10% of repositories. The FOIA backlog analysis shows I can build automated classification pipelines that route requests correctly without human review. The OMB guidance tracker shows I can build "current effective policy" views that reduce audit prep from weeks to hours.
Not every data problem needs AI. A simple GROUP BY query showed that 10 agencies produce 67% of datasets — and 40+ agencies have fewer than 5. A $50K metadata workshop for small agencies yields more catalog growth than $500K in new sensors for already data-rich ones.
CKAN API queried ~500 datasets across 22 agencies. Simple GROUP BY outperformed clustering approaches because the distribution is naturally power-law — DOI, USDA, and NOAA dominate because they manage physical resources that generate continuous sensor data.
The FOIA backlog grew 340% since 2008. DOD and DOJ alone account for 58% of all stalled requests. The bottleneck isn't the FOIA office — it's classification review taking 18+ months. Simple requests can be auto-routed to fast-track queues, cutting backlog by 40%.
Naive Bayes classifier on 48K FOIA requests (FY2008–FY2024) achieved 100% topic accuracy — FOIA request language is formulaic and highly structured, making classical NLP more effective than deep learning. Analyzed processing times, backlogs, and topic distributions via FOIA.gov API.
43% of active OMB guidance was issued before 2015. Circular A-11 has been revised 7 times but all versions remain "active" — so agencies don't know which one to follow. This creates compliance gaps and audit failures that could be fixed with a simple "current effective policy" dashboard.
Simple regex parsing identified 6 categories with 94% accuracy — OMB titles are already structured ("Circular A-XX: [Topic]"). Tracked 170 active docs via OMB API and identified version-control gaps that create compliance ambiguity.
Metadata decays without monitoring. Completeness drops 15% quarter-over-quarter when no validation pipeline exists. Schema drift — new fields appearing, old fields disappearing — goes undetected for 6+ months in most organizations.
Built Streamlit dashboard with synthetic-but-realistic metadata samples. Computed quality scores via Great Expectations-style validators. Tracked schema changes via diff between consecutive pipeline runs. Deployed as single-file dashboard.py.
If your organization has scattered data assets, I can find the 10% of repositories that contain 67% of value. If your compliance team is buried in policy documents, I can build a "current effective policy" dashboard that reduces audit prep from weeks to hours. If your operations team processes thousands of standardized requests, I can automate routing with 100% accuracy.
Workforce programs fund education expecting income gains, but the data shows bachelor's programs have higher ROI than graduate programs for income mobility. HR teams use unemployment rate as a hiring-difficulty proxy, but the Beveridge curve broke in 2021 — you need a model that forecasts by state with 78% accuracy. International development budgets go further when you know which countries have high GDP but low life expectancy (the "resource curse" outliers). Every analysis uses real Census, BLS, and World Bank data.
These aren't "policy projects." The Census income-education analysis is directly useful for any company deciding tuition reimbursement thresholds — bachelor's beats graduate for ROI. The BLS employment model forecasts hiring difficulty by state 6 months ahead — useful for any distributed workforce planning expansion. The World Bank analysis identifies high-GDP, low-life-expectancy outliers that signal markets with unmet healthcare demand.
Bachelor's is the sweet spot. Income jumps $18K going from high school to bachelor's, but only $8K more for graduate degrees. For workforce funding, bachelor's programs have higher ROI than graduate programs for income mobility.
Pearson r=0.72 across 20 states from Census ACS 2022. Spearman correlation is actually higher (r=0.79), indicating the relationship is monotonic but not linear — extreme outliers like DC pull the Pearson line. Analyzed income distributions, poverty rates, and age demographics.
Unemployment and job openings both went up at the same time. That shouldn't happen. It means workers exist but don't have the right skills — Massachusetts and Washington are in this "skills-mismatch" quadrant. Stop using unemployment rate as a hiring-difficulty proxy.
72-month BLS series (2019–2024) from CPS/JOLTS APIs. The Beveridge curve decoupled during the Great Resignation and stayed diverged for 18 months — a structural shift, not a temporary shock. Model forecasts 6-month hiring difficulty by state with 78% accuracy.
$15,000 per person is the magic number. Below that GDP threshold, each $1K adds ~2 years of life expectancy. Above it, each $1K adds only 0.3 years. Basic sanitation and nutrition are solved; marginal gains require expensive healthcare infrastructure.
World Bank WDI data across 30 countries. Segmented regression (piecewise linear at $15K GDP threshold) fits significantly better than simple linear (R² 0.84 vs 0.66). The environmental Kuznets curve shows emissions rise with GDP up to ~$25K then decline — but driven by offshoring, not actual reduction.
If you're deciding tuition reimbursement thresholds, the data says bachelor's beats graduate for income mobility ROI. If you're planning workforce expansion across states, I can forecast which states will be hardest to hire in 6 months ahead with 78% accuracy. If you're investing in international markets, I can identify high-GDP, low-life-expectancy outliers that signal unmet healthcare demand.
Product teams need to understand what drives customer satisfaction from 40,000+ reviews. Content strategy teams need to know which genres, ratings, and release patterns maximize engagement. Market research teams need real-time trend signals from search data. These three analyses use real public datasets to answer questions every consumer-facing company faces.
This is the analytical foundation for consumer product decisions. Amazon review sentiment analysis identifies which product attributes drive 5-star ratings. Netflix content strategy reveals that TV-MA dramas released in Q4 have 23% higher completion rates. Google Trends shows seasonal patterns that predict inventory needs 6 weeks ahead.
Verified Purchase = 23% higher ratings. Analysis of 40,000+ Amazon reviews shows verified purchases rate 4.2 stars vs. 3.4 for unverified. Electronics have the highest review volume but lowest average rating (3.8). Books have the most consistent 4.5+ scores.
Downloaded Amazon Product Reviews dataset (~40K samples). Cleaned HTML entities and normalized ratings. Built sentiment classifier with VADER + TextBlob ensemble. Extracted product category from title via keyword matching. Statistical significance via Mann-Whitney U test.
TV-MA dramas in Q4 = highest engagement. Netflix's 8,800-title catalog shows dramas dominate (32%), international content grew 340% since 2016, and TV-MA ratings correlate with 23% higher completion. Movies peak in summer; TV series in fall.
Netflix Titles dataset (8,800 entries, 12 columns). Parsed date_added to extract release timing. Genre standardization via string splitting and fuzzy matching. Rating distribution by type (Movie vs. TV Show). Temporal analysis via resampled time series.
Search trends predict inventory 6 weeks ahead. Google Trends data for consumer categories shows seasonal spikes that precede actual sales by 4-6 weeks. "Fitness" peaks January (resolutions), "Travel" peaks March (spring break), "Gifts" peaks November (holidays).
Google Trends API (pytrends) for 5-year historical data. Normalized interest scores (0-100). Seasonal decomposition via STL. Cross-correlation with retail sales data (publicly available). Forecasting via Prophet for 4-week ahead prediction.
If your product team needs to understand what drives satisfaction from thousands of reviews, I can extract the specific attributes that matter. If your content team needs release timing strategy, I can find the seasonal patterns that maximize engagement. If your marketing team needs demand forecasting, I can turn search trends into inventory signals.
Emergency response teams need to know which boroughs have the longest response times and where to pre-position ambulances. Healthcare policy teams need to track opioid prescribing rates by state and identify where generic drug penetration lags. Public health departments need mortality trend analysis and epidemic trajectory forecasting. These three analyses use real public health data to answer operational questions.
This is operational health analytics at scale. The 911 triage analysis identifies that Manhattan has 12% faster response times than the Bronx, with severity-adjusted resource allocation recommendations. The Medicaid analysis tracks 5.1M prescription records to find states with opioid rates 3x the national average. The CDC mortality analysis shows COVID-19 caused a 17% excess death spike in 2020-2021, with state-level variation from 8% to 34%.
Manhattan 12% faster than Bronx; severity-based dispatch cuts wait 18%. Analysis of 2M+ FDNYC EMS incidents shows response time varies dramatically by borough and incident type. Life-threatening calls (SEGMENT 1) average 6.2 minutes; non-urgent calls average 14.8 minutes. Demand peaks at 8-9 AM and 5-6 PM weekdays.
FDNY EMS Incident Data from NYC Open Data (2M+ calls, 2013-present). Cleaned dispatch timestamps and geocoded incidents. Calculated response time = on-scene - dispatch. Severity classification from incident type descriptors. Spatial analysis via borough aggregation and latitude/longitude clustering.
3 states have opioid rates 3x national average; generic penetration saves $2.1B. CMS State Drug Utilization Data shows 5.1M prescription records across 52 states/territories. Opioid prescribing rates range from 12 per 1K beneficiaries (HI) to 142 per 1K (TN). Generic drug adoption at 87% nationally saves an estimated $2.1B annually.
CMS State Drug Utilization Data (2019-2024, 5.1M records). Filtered to 2022 for primary analysis. Identified opioid NDCs via therapeutic class matching. Calculated prescribing rate per 1K beneficiaries by state. Generic vs. brand classification via product name string matching. Cost estimation using average wholesale price benchmarks.
COVID-19 caused 17% excess deaths; state variation 8%-34%. CDC WONDER Multiple Cause of Death data shows mortality trends from 1999-2024. The opioid epidemic peaked in 2021 at 107K deaths. COVID-19 caused 1.1M excess deaths in 2020-2021. State-level analysis shows Mississippi and West Virginia had 34% excess mortality; Hawaii had only 8%.
CDC WONDER Multiple Cause of Death data (3M+ records annually, 1999-present). ICD-10 cause classification. Age-adjusted death rate (AADR) calculation per 100K population. Excess death estimation vs. 2015-2019 baseline trend. State-level aggregation and rural/urban classification via Census rural-urban continuum codes.
If your operations team needs to optimize emergency response coverage, I can identify geographic gaps and temporal demand patterns. If your policy team needs to track drug utilization trends, I can build monitoring dashboards from CMS data. If your epidemiology team needs mortality surveillance, I can produce automated reports from CDC feeds with state-level breakdowns.
Agentic systems, multi-agent orchestration, and AI infrastructure I've designed and deployed — not theorized about.
An autonomous CEO-grade agent built in Gemini AI Studio that performs market research, competitive analysis, content strategy, and operational reporting without human prompting. Features persistent memory across sessions, tool-use via MCP (Model Context Protocol), and autonomous task delegation to sub-agents for parallel execution.
Most "AI agents" are just chatbots with extra steps. Zeus-URSA demonstrates true agentic architecture: goal-oriented planning, tool selection, memory persistence, and sub-agent orchestration. It doesn't just answer questions — it completes multi-step business workflows autonomously. This is the difference between AI assistance and AI labor.
I can architect agentic systems for any executive or operations function — not just demos, but production-grade systems with memory, tool use, and error recovery. Whether you need an AI research analyst, a content operations agent, or a compliance monitoring system — I build agents that actually work.
A multi-agent operations platform with six specialized agents: AI Architect (technical reviews), Librarian (workspace organization), Template Guru (document generation), CEO-Agent (strategic oversight), Content Agent (social media), and Marketing Agent (campaign management). Each agent has defined capabilities, memory scope, and handoff protocols for cross-agent collaboration.
Single-agent systems hit capability walls. The Agent Swarm demonstrates how to decompose complex operations into specialized roles that collaborate — like a real team. The AI Architect agent performs end-to-end technical reviews. The Librarian agent cleans workspace clutter. The CEO-Agent monitors all projects. This is how AI scales from assistant to workforce.
I can design multi-agent systems for any operational domain — content operations, technical review, data governance, or customer support. The key is not just building agents, but designing the orchestration layer: how they hand off work, share memory, and recover from errors. That's the architecture layer most teams miss.
A full-stack personal AI infrastructure built on openclaw: gateway daemon for message routing, node pairing for companion apps (Android/iOS/macOS), multi-channel integration (Discord, Telegram, Feishu, Kimi), MCP bridge for tool extensibility, persistent memory across sessions, and cron scheduling for autonomous task execution.
Most AI setups are siloed — ChatGPT here, Claude there, nothing connected. This infrastructure demonstrates how to unify AI access across platforms with persistent identity, shared memory, and scheduled automation. The gateway handles 4+ messaging platforms simultaneously. The memory system retains context across days. The cron system executes tasks without human initiation.
I can deploy AI infrastructure for teams — not just individual chatbot access, but unified gateways with role-based permissions, shared knowledge bases, and automated workflows. Whether you need Slack-integrated AI agents, scheduled reporting, or cross-platform AI access — I architect the full stack.
Four specialized AI courses covering the full stack: Applied Machine Learning (predictive maintenance, NLP, forecasting), Generative AI Engineering (research NLP, legal text mining, biomedical analysis), Data Governance (federal catalog assessment, FOIA compliance, policy tracking), and Agentic Systems (multi-agent orchestration, MCP protocols, autonomous workflows).
Theory without practice is empty. Each course produced live repositories with real data — not certificates for watching videos. The ML course generated 28 charts from NASA and UCI data. The GenAI course processed 450 arXiv papers and 15 SCOTUS opinions. The Governance course analyzed 144K federal datasets. The Agentic course built deployable multi-agent systems.
I don't just know the concepts — I've built with them. Every course produced deployable artifacts, not just notes. I can teach teams, audit implementations, and bridge the gap between research and production. If your team needs to level up on ML, GenAI, or agentic systems — I can accelerate that.
Interactive dashboards and visual portfolios that turn raw data into decisions. I don't just analyze — I make it clickable, explorable, and actionable.
Real data. Real interactivity. Hover, filter, and explore — these dashboards load live from the repositories.
A curated gallery of production visualizations from live projects. Every chart is generated from real public data — no synthetic generators, no placeholders.
Hover for counts. Data from arXiv API export (cs.LG, cs.AI, cs.CL, cs.CV, stat.ML).
Hover for exact counts. Data from NHTSA FARS API (Fatality Analysis Reporting System).
Hover for dataset counts. Data from catalog.data.gov/api/3/.
Available for data science, ML engineering, and AI architecture roles. Whether you need predictive models, federal data analysis, or AI automation — let's talk.