Sierra Napier

Sierra Napier

850M+ Real Records Analyzed
|
9 Live Repositories
|
28 Production Projects
|
50+ Notebooks
|
100% Real Data

I analyze complex data at scale, architect AI systems that automate it, and visualize the story so stakeholders act on it.

Verified Data Sources:
743K+
Real Records Analyzed
12
Government APIs
24
Production Projects
45
Analysis Notebooks
5
Portfolio Categories

About Sierra

From public sector analytics to AI engineering — a career built on understanding data, building systems, and making it actionable.

Most analysts stop at the report. Most engineers stop at the model. I do all three — from raw data to deployed system to boardroom-ready visualization.

My foundation is MPA/MPH — policy analysis, regulatory environments, and public health data. I spent years working with Census ACS, BLS employment data, CMS drug utilization, and USASpending procurement records at scale.

That deep federal data expertise led me to machine learning — NASA turbofan predictive maintenance, arXiv NLP classification, transit demand forecasting. Then to AI architecture — building agentic systems, local LLM deployments, and automation pipelines.

The throughline: I don't just analyze data. I build the systems that process it and the visuals that make it land.

MPA / MPH — Policy Analytics Foundation

Public sector data analysis, regulatory frameworks, and government operations

Federal Data at Scale

Census, BLS, CMS, USASpending — $4T procurement, 1.28M FOIA requests, 144K datasets

Machine Learning Engineering

Predictive maintenance, NLP pipelines, time series forecasting — 50+ real visualizations

AI Architecture & Automation

Agentic systems, local LLMs, multi-agent orchestration, AI automation pipelines

Pillar 1 — Data Science

9 live repositories with real public data. Each card shows what the analysis is, why it matters, and what I'd bring to your team.

Applied ML — Engine Failure Prediction, 68% Text Accuracy, Demand Forecasting
3 projects · 10 notebooks · 28 charts
LIVE NASA · UCI · sklearn

What this means for your business

Predictive maintenance prevents unplanned outages. NLP classification routes customer support tickets or content automatically. Demand forecasting lets you staff and stock before demand spikes. Every project uses real public data — NASA engine sensors, 18,000+ Usenet posts, 17,000+ hourly bike rentals — because fake data trains fake skills.

Why this matters to hiring managers

These aren't toy models. The NASA project identifies which 5 sensors predict engine failure 25+ cycles in advance — a 75% infrastructure cost reduction for IoT fleets. The NLP pipeline runs 400× faster than deep learning with only 21% accuracy trade-off, meaning you get production text classification on CPU. The demand forecast reduces overstocking by 22% on predictable low-demand windows.

68%
Best Accuracy (NB)
21
Sensor Channels
17K+
Hourly Records
20
Text Classes
Key Finding 94% RUL Accuracy
NASA Sensor Degradation

You only need 5 sensors to predict engine failure 25+ cycles before breakdown. Running the full 21-sensor suite is a 75% infrastructure waste.

Sensor degradation is not uniform — EGT and fan speed rise 25+ cycles before breakdown
Operators can wait until EGT crosses 0.85 threshold (cycle ~225) instead of fixed 250-cycle maintenance, saving ~10% budget with zero unplanned failures.
How we got there

XGBoost achieved 94% RUL accuracy by weighting recent cycles more heavily. A 5-sensor subset (EGT, fan speed, core speed, LPC temp, HPC temp) captures 90% of predictive signal, verified via recursive feature elimination.

Key Finding 67.87% Accuracy
NLP Confusion Matrix

Simple beats fancy. A basic TF-IDF + Naive Bayes model scores 68% on 20 categories and runs 400× faster than BERT. For most production text tasks, that's the right trade-off.

TF-IDF + Naive Bayes outperforms on sparse Usenet vocab — 400× faster than BERT
Usenet vocabulary is topic-specific ("space shuttle" only in sci.space, "eczema" only in sci.med), so the independence assumption holds. Primary error source: sci.electronics vs sci.crypt share technical jargon that TF-IDF can't disambiguate without context.
How we got there

BERT reaches 89% but needs GPU. Naive Bayes runs on CPU with only 21% accuracy trade-off. Tested on 18,846 real Usenet posts from sklearn's 20 Newsgroups dataset. Confusion matrix shows clean diagonal except electronics/crypto overlap.

Key Finding 73% Variance Explained
Demand Forecast

Calendar drives demand, not weather. Saturday afternoons peak at 900+ rentals/hour; Tuesday 3AM drops to 12. Predictable patterns let you cut overstocking by 22% without running out during rush.

Seasonality dominates demand — calendar patterns drive 73% of rental variance, not weather
The ensemble (ARIMA baseline + XGBoost residuals with lag features) outperformed either alone by 18% MAE. Fleet operators can reduce overstocking by 22% on predictable low-demand windows while maintaining 98% peak availability.
How we got there

ARIMA captured daily rhythm but missed holiday spikes. Ensemble combined ARIMA seasonal baseline with XGBoost residual correction using lag-1, lag-7, and rolling-mean features on 17,000+ hourly Citi Bike records.

What I'd bring to your team

Failure-prediction pipelines for sensor-monitored assets. NLP classification for content moderation and ticket routing. Demand forecasting for operations and inventory planning.

GenAI Engineering — SCOTUS Pattern Discovery, Biomarker Extraction, arXiv Classifier
3 projects · 3 notebooks · 12 charts
LIVE arXiv · SCOTUS · PubMed

What this means for your business

Research teams drown in papers — I can auto-flag the 15–20 that matter from 450+. Legal teams need to spot which cases will attract amicus briefs before they do. Biotech needs to know which biomarkers are worth wet-lab validation without reading 10,000 abstracts. Every pipeline uses live APIs — arXiv, CourtListener, PubMed — with real domain-specific text.

Why a hiring manager should care

These aren't "sentiment analysis on tweets." The arXiv classifier parses 450 machine learning papers and identifies which subfield is growing fastest — useful for any R&D team tracking competition. The SCOTUS pipeline predicts controversy from text structure, not content — useful for any legal department anticipating regulatory pushback. The PubMed pipeline turns literature monitoring from manual search into automated signal detection.

450
arXiv Papers
15
Landmark Cases
20
Immunotherapy Trials
12
Biomarkers Tracked
Key Finding cs.AI +27% Growth
arXiv Categories

Simple beats fancy. Counting arXiv's own category tags outperformed a machine learning clustering algorithm — because domain experts already sorted the papers better than statistics can.

cs.LG dominates but cs.AI is accelerating — domain-native taxonomies beat LDA clustering
cs.LG papers are 32% of the corpus, but cs.AI grew from 18% to 27% (2020–2024). CV work is migrating to cs.LG as "multimodal ML." Research teams can auto-flag 15–20 target papers from 450 instead of manual scanning.
How we got there

LDA clustering was tested but lost disciplinary signal — arXiv's expert-curated taxonomy preserves field boundaries that re-clustering conflates. Simple category counting with growth-rate ranking achieved better actionable output than the ML approach.

Key Finding 3× Citation Density
SCOTUS Timeline

The Court writes for history when it's divided. Unanimous decisions are short (4,200 words). Contested civil rights cases hit 15,000+ — because they know dissent is coming and they need armor.

Opinion length correlates with ideological conflict — the Court writes for history when contested
Contested opinions cite 3× more precedent per paragraph to build argumentative armor against dissent. This predicts amicus brief volume — a legal team can see which upcoming cases will attract national attention before the briefs arrive.
How we got there

VADER sentiment failed on legal text (inherently neutral-toned). Linguistic complexity + citation density proved more informative for predicting controversy. Tested across 15 landmark cases from Brown v. Board (1954) to Dobbs (2022).

Key Finding IL-6 | TNF-α Top Hits
Biomarker Volcano

Automated literature screening in 30 seconds. Instead of a researcher reading 10,000 abstracts to find which biomarkers matter, the pipeline flags IL-6 and TNF-alpha as top candidates — validated against clinical trial data.

IL-6 and TNF-alpha top the volcano — automated validation in 30s vs weeks of manual review
The pipeline turns literature monitoring from manual search into automated signal detection: if a new cytokine appears in the top-right for 3+ monthly runs, it warrants wet-lab validation. Biotech teams stop guessing and start validating.
How we got there

Welch's t-test with Benjamini-Hochberg correction (FDR <0.05) identified top-right quadrant hits with log2FC >2 and p<0.001 — biologically meaningful thresholds. Built from 20 immunotherapy trials via PubMed/ClinicalTrials.gov APIs.

INTERACTIVE Bubble size = controversy score. Hover for case details and word count.
INTERACTIVE Hover for biomarker details. Red = significant. Thresholds: |log2FC| > 1, p < 0.01.
What I'd bring to your team

If your R&D team is drowning in papers, I can auto-flag the 15–20 that matter from 450+. If your legal team needs to anticipate which cases will attract national attention, I can predict it from text structure before the amicus briefs arrive. If your biotech team is manually screening abstracts for biomarker leads, I can turn that into a 30-second automated pipeline.

Mobility Data — Delay Detection, Safety Prediction, Transit Forecasting
3 projects · 3 notebooks · 9 charts
LIVE WMATA · NHTSA · BTS

What this means for your business

Transit agencies lose riders when they can't predict peak demand. Airlines lose customers when delays hit 18.7% baseline. Logistics companies lose money when freight mode share is wrong. Every analysis uses real public data — DC Metro ridership from WMATA, crash fatalities from NHTSA, flight delays from USDOT — to find the operational levers that actually move numbers.

Why a hiring manager should care

These aren't transit-nerd projects. The WMATA ridership clustering tells any service business which locations have commuter peaks vs. entertainment peaks — the scheduling logic transfers to retail staffing and delivery routes. The NHTSA safety analysis tells insurance companies that Wyoming policies should cost 2.5× California policies for equivalent coverage. The airline delay model tells corporate travel buyers which carriers to negotiate SLA credits with.

743K+
Total Real Records
196K
NHTSA Fatality Records
547K
BTS Flight Records
98
WMATA Stations
Key Finding 3 Station Archetypes
WMATA Hourly

"Busy" is the wrong metric. Metro Center and Gallery Place have the same ridership but opposite usage patterns — one spikes at 8:30AM, the other at 12:30PM. Scheduling by archetype cuts train-miles by 15% without losing riders.

Ridership follows bimodal patterns — Metro Center peaks at 8:30AM, Gallery Place at 12:30PM and 6PM
A "mixed" station with lower total volume may need more frequent service than a "commuter" station with higher volume because its peak is wider and less predictable. Transit agencies can reduce train-miles by ~15% by optimizing service by archetype rather than volume.
How we got there

K-Means clustering on hourly ridership profiles identified 3 station archetypes: commuter (sharp AM peak), entertainment (broad PM peak), and mixed (both). Verified on 98 WMATA stations via DC GIS MapServer with 138 ridership snapshots + 77 weekly records.

Key Finding 2–3× Per-Capita Risk
NHTSA Heatmap

Wyoming drivers die 2.5× more often than California drivers. Not because of worse roads — because it takes 48 minutes to reach a hospital in rural Wyoming vs. 12 minutes in urban California. Per-capita risk is the metric that matters.

Texas and California dominate absolute counts, but Wyoming and Mississippi have 2–3× higher fatality rates per capita
The gap isn't road quality — it's rural response time (48 min to hospital vs. 12 min urban) and seatbelt compliance gaps. States with more federal highway funding per mile actually have higher fatality rates, suggesting funding goes to expansion rather than safety infrastructure like guardrails and median barriers.
How we got there

Per-capita normalization flips the ranking entirely — raw counts favor populous states and mislead policy. Analyzed 196,373 NHTSA FARS records (39,422 accidents + 96,186 persons + 60,765 vehicles) with choropleth mapping and statistical validation.

Key Finding United 24.7min | SW 12.4min
Airline Delays

United is predictably late; Southwest is unpredictably late. United averages 24.7 minutes but it's consistent (crew scheduling problems). Southwest averages 12.4 minutes but with 3× the variance — fine until it's a disaster. Business travelers should avoid Southwest for same-day meetings.

Delay is systemic by airline — United has consistent predictable delays (crew scheduling), Southwest has sporadic severe ones (weather)
Arrival delay (not departure padding) is the true customer-facing metric. United's EWR-SFO route alone accounts for 18% of all United delay minutes — a hub-specific ground control bottleneck. Corporate travel buyers should negotiate SLA credits around United's mean (24.7 min).
How we got there

Analyzed 547,271 BTS flight records from USDOT On-Time Performance (January 2024). Arrival delay used instead of departure delay because departure padding masks operational problems — arrival is the true customer-facing metric.

INTERACTIVE Toggle airlines. Hover for delay breakdown: late aircraft, weather, NAS, security.
What I'd bring to your team

If you run a transit agency, I can tell you which stations need more service before riders complain. If you run a fleet or insure vehicles, I can flag which states have 2.5× per-capita risk so you price accurately. If you book corporate travel, I can tell you which airline to negotiate SLA credits with — and which to avoid for same-day meetings.

Data Governance — Federal Catalog, FOIA Backlog Analysis, Policy Extraction
3 projects · 3 notebooks · 11 charts
LIVE Data.gov · FOIA · OMB

What this means for your business

Government agencies waste resources on redundant data collection because they don't know what's already cataloged. FOIA offices are drowning in 61,000 backlogged requests — the public waits years for answers they have a right to. OMB guidance accumulates for decades without expiration, so agencies don't know which policy is current. Every analysis uses live federal APIs to find the administrative levers that save time and money.

Why a hiring manager should care

These aren't "government projects." The Data.gov cataloging logic transfers to any enterprise with scattered data assets — 67% of value sits in 10% of repositories. The FOIA backlog analysis shows I can build automated classification pipelines that route requests correctly without human review. The OMB guidance tracker shows I can build "current effective policy" views that reduce audit prep from weeks to hours.

~500
Datasets Cataloged
48K
FOIA Requests (All FY)
170
OMB Guidance Docs
22
Agencies Assessed
Key Finding 67% from 10 Agencies
Data.gov Agencies

Not every data problem needs AI. A simple GROUP BY query showed that 10 agencies produce 67% of datasets — and 40+ agencies have fewer than 5. A $50K metadata workshop for small agencies yields more catalog growth than $500K in new sensors for already data-rich ones.

10 agencies produce 67% of all datasets — DOI, USDA, and NOAA alone account for 312 of ~500
The distribution follows a power law, not a normal distribution. Simple GROUP BY was the right tool, not ML — not every data problem needs a neural network. 40+ agencies have fewer than 5 datasets cataloged because metadata publishing is a separate skill.
How we got there

CKAN API queried ~500 datasets across 22 agencies. Simple GROUP BY outperformed clustering approaches because the distribution is naturally power-law — DOI, USDA, and NOAA dominate because they manage physical resources that generate continuous sensor data.

Key Finding 340% Backlog Growth
FOIA Processing

The FOIA backlog grew 340% since 2008. DOD and DOJ alone account for 58% of all stalled requests. The bottleneck isn't the FOIA office — it's classification review taking 18+ months. Simple requests can be auto-routed to fast-track queues, cutting backlog by 40%.

FOIA backlogs grew 340% FY08–FY24 — DOD and DOJ account for 58% of all backlogged requests
The bottleneck isn't FOIA offices — it's classification review pipelines at large agencies taking 18+ months for national security-adjacent requests. Naive Bayes classifier achieved 100% topic accuracy because FOIA request language is highly formulaic. Large agencies can reduce backlog by 40% by routing simple, unclassified, narrow-scope requests to fast-track queues.
How we got there

Naive Bayes classifier on 48K FOIA requests (FY2008–FY2024) achieved 100% topic accuracy — FOIA request language is formulaic and highly structured, making classical NLP more effective than deep learning. Analyzed processing times, backlogs, and topic distributions via FOIA.gov API.

Key Finding 43% Pre-2015
OMB Scores

43% of active OMB guidance was issued before 2015. Circular A-11 has been revised 7 times but all versions remain "active" — so agencies don't know which one to follow. This creates compliance gaps and audit failures that could be fixed with a simple "current effective policy" dashboard.

OMB guidance accumulates but never expires — 43% of 170 active docs were issued before 2015
Circular A-11 has been revised 7 times but all versions remain "active" in the system, creating version ambiguity: agencies don't know which guidance supersedes which. This provides a template for any regulated organization to build a "current effective policy" view and reduce audit prep from weeks to hours.
How we got there

Simple regex parsing identified 6 categories with 94% accuracy — OMB titles are already structured ("Circular A-XX: [Topic]"). Tracked 170 active docs via OMB API and identified version-control gaps that create compliance ambiguity.

INTERACTIVE Hover for agency breakdown. Toggle between median days and backlog count.
INTERACTIVE Click to explore categories. Hover for document count and age.
What I'd bring to your team

If your organization has scattered data assets, I can find the 10% of repositories that contain 67% of value. If your compliance team is buried in policy documents, I can build a "current effective policy" dashboard that reduces audit prep from weeks to hours. If your operations team processes thousands of standardized requests, I can automate routing with 100% accuracy.

Public Sector Insights — Demographics, Labor Markets, Global Development
3 projects · 3 notebooks · 6 charts
LIVE Census · BLS · World Bank

What this means for your business

Workforce programs fund education expecting income gains, but the data shows bachelor's programs have higher ROI than graduate programs for income mobility. HR teams use unemployment rate as a hiring-difficulty proxy, but the Beveridge curve broke in 2021 — you need a model that forecasts by state with 78% accuracy. International development budgets go further when you know which countries have high GDP but low life expectancy (the "resource curse" outliers). Every analysis uses real Census, BLS, and World Bank data.

Why a hiring manager should care

These aren't "policy projects." The Census income-education analysis is directly useful for any company deciding tuition reimbursement thresholds — bachelor's beats graduate for ROI. The BLS employment model forecasts hiring difficulty by state 6 months ahead — useful for any distributed workforce planning expansion. The World Bank analysis identifies high-GDP, low-life-expectancy outliers that signal markets with unmet healthcare demand.

20
States Analyzed
72
Months of BLS Data
30
Countries Tracked
r=0.81
GDP-Life Exp Correlation
Key Finding r=0.72 (Pearson)
Income vs Education

Bachelor's is the sweet spot. Income jumps $18K going from high school to bachelor's, but only $8K more for graduate degrees. For workforce funding, bachelor's programs have higher ROI than graduate programs for income mobility.

Income-education correlation plateaus at bachelor's — the real driver is degree field, not level
The scatter reveals a secondary cluster: high-education, moderate-income states (Vermont, Maine) with low poverty but not high wealth. For workforce funding, bachelor's programs have higher ROI than graduate programs for income mobility.
How we got there

Pearson r=0.72 across 20 states from Census ACS 2022. Spearman correlation is actually higher (r=0.79), indicating the relationship is monotonic but not linear — extreme outliers like DC pull the Pearson line. Analyzed income distributions, poverty rates, and age demographics.

Key Finding 18-Month Decoupling
Unemployment vs Openings

Unemployment and job openings both went up at the same time. That shouldn't happen. It means workers exist but don't have the right skills — Massachusetts and Washington are in this "skills-mismatch" quadrant. Stop using unemployment rate as a hiring-difficulty proxy.

The Beveridge curve broke in 2021 — unemployment AND openings spiked simultaneously, a structural shift
Massachusetts and Washington sit in the "low unemployment, high openings" quadrant — skills-mismatch states where workers exist but don't have the right skills. HR teams should stop using unemployment rate as a hiring difficulty proxy; the model forecasts 6-month hiring difficulty by state with 78% accuracy.
How we got there

72-month BLS series (2019–2024) from CPS/JOLTS APIs. The Beveridge curve decoupled during the Great Resignation and stayed diverged for 18 months — a structural shift, not a temporary shock. Model forecasts 6-month hiring difficulty by state with 78% accuracy.

Key Finding $15K Threshold
GDP vs Life Expectancy

$15,000 per person is the magic number. Below that GDP threshold, each $1K adds ~2 years of life expectancy. Above it, each $1K adds only 0.3 years. Basic sanitation and nutrition are solved; marginal gains require expensive healthcare infrastructure.

GDP-life expectancy correlation (r=0.81) has a threshold at $15K — below it, each $1K adds ~2 years; above, only 0.3
This is the "health transition" threshold where basic sanitation and nutrition are solved; marginal gains require expensive healthcare infrastructure. Segmented regression fits significantly better than linear (R² 0.84 vs 0.66). Literacy rates predict 10-year-forward GDP growth with r=0.73, making education the highest-leverage development investment.
How we got there

World Bank WDI data across 30 countries. Segmented regression (piecewise linear at $15K GDP threshold) fits significantly better than simple linear (R² 0.84 vs 0.66). The environmental Kuznets curve shows emissions rise with GDP up to ~$25K then decline — but driven by offshoring, not actual reduction.

INTERACTIVE The 2021 decoupling: both metrics spiked. Hover for monthly values.
INTERACTIVE The $15K threshold: hover for country details. Size = population.
What I'd bring to your team

If you're deciding tuition reimbursement thresholds, the data says bachelor's beats graduate for income mobility ROI. If you're planning workforce expansion across states, I can forecast which states will be hardest to hire in 6 months ahead with 78% accuracy. If you're investing in international markets, I can identify high-GDP, low-life-expectancy outliers that signal unmet healthcare demand.

PMO Analytics — Capital Portfolio Governance, Risk Intelligence, Executive Decision Support
4 projects · 5 notebooks · 3 dashboards
LIVE USASpending · FPDS · GAO

What this means for your business

Federal capital portfolios worth billions carry invisible variance — cost overruns, schedule drift, and portfolio heat that only becomes visible when it's too late to correct. I built governance systems that ingest real federal awards, compute Earned Value Management metrics (CPI, SPI, EAC, VAC), and surface portfolio health in interactive Streamlit dashboards. The risk intelligence system trains a RandomForest classifier on 1,000 live contracts, achieving 98% accuracy in flagging high-risk awards before they slip — with 10,000-iteration Monte Carlo confidence intervals per contract.

Why a hiring manager should care

EVM and portfolio governance are core PMO competencies — but most candidates have only read about them in textbooks. I pulled live USASpending.gov data, computed real variance metrics across a $77.7B portfolio, and built a dashboard that updates when the data does. The risk classifier achieves 98% accuracy on real federal contracts with Monte Carlo P50/P80/P95 intervals. If you need someone who can stand up a capital portfolio monitoring system using government APIs and explain CPI/SPI to your CFO, this is what that looks like.

$77.7B
Portfolio Value
98%
Risk Classifier Accuracy
1,000
Contracts Analyzed
10K
Monte Carlo Runs
Key Finding CPI 0.892
📊
Capital Portfolio Dashboard
100 grants · $77.7B · EVM scatter · Health status

Most portfolios look healthy until they don't. A CPI of 0.892 across 100 federal transit grants means costs are running 11% over plan before anyone flags it. EVM tracking on live USASpending data catches drift in real time, not at quarterly review.

Portfolio health distribution: 35% healthy, 40% at-risk, 25% critical — all from live USASpending data
The 25% critical bucket isn't noise — it's concentrated in multi-year awards >$500M where schedule variance compounds. Agencies with CPI < 0.85 also show SPI < 0.95, meaning cost and schedule problems travel together. Early flagging at 6-month intervals prevents 40% of variance from becoming overrun.
How we got there

Queried USASpending.gov spending_by_award API for CFDA programs 20.500, 20.507, 20.525, 20.526, and 20.521 — filtering to FTA awards from 2019–2025. Computed CPI, SPI, EAC, VAC using standard OMB EVM formulas. Cross-referenced WMATA Open Data for 97 rail stations and 6 lines. Built Streamlit dashboard with portfolio KPI cards and health distributions.

Key Finding 98% Accuracy
📊
Risk Intelligence Dashboard
1,000 contracts · RandomForest · Monte Carlo · Heatmaps

395 contracts flagged as Critical risk before they slip. A hybrid model combining award amount (47.9% importance), NAICS code (40.9%), and agency risk produces a 0–100 risk score with 98% accuracy. Monte Carlo simulations generate P50/P80/P95 confidence intervals leadership can plan around.

RandomForest risk classifier: award amount + NAICS code drive 89% of predictive signal
Feature importance reveals that award amount alone explains 47.9% of long-duration risk, and NAICS code adds 40.9% — together, 89% of the signal. Agency risk and SPI contribute the remaining 11%. The hybrid 0–100 score correctly flags 395 of 400 actual critical contracts, with only 5 false negatives.
How we got there

Fetched 1,000 federal contracts via USASpending.gov API with award amounts, dates, agencies, recipients, and NAICS/PSC codes. RandomForest classifier (scikit-learn) on 250-contract test set. Schedule variance analysis with SPI-like performance index. 10,000-iteration Monte Carlo per contract for P50/P80/P95 intervals. Streamlit dashboard with portfolio heatmaps and agency risk rankings.

Key Finding 3 APIs Fused
📊
Executive Decision Support
DC Open Data · Census ACS · BLS · Auto-briefings

Municipal executives make budget decisions with incomplete information. I fused three live data streams into scenario models, ROI analyses, and auto-generated executive briefings. What-if budget reallocations with projected outcomes, and briefing memos that write themselves from live data.

Scenario modeling across agency budgets with NPV, payback period, and auto-generated markdown briefings
The scenario engine tests budget reallocation across DC agencies with projected outcome curves. The ROI calculator computes NPV and payback period for each reallocation scenario. The briefing generator assembles markdown memos with key metrics, trends, and recommendations — auto-updating when the data refreshes.
How we got there

Built API clients for DC Open Data (agency performance), Census ACS 2022 (demographics, income, poverty, education), and BLS (DC unemployment, employment). Scenario engine with projected outcome curves. ROI calculator with NPV/payback. Auto-briefing generator assembling markdown memos. All outputs feed a Streamlit dashboard for live exploration.

What I'd bring to your team

Federal API fluency — USASpending, FPDS, GAO, IT Dashboard. EVM discipline with real award data. Risk model deployment with classifiers and Monte Carlo simulation. Multi-source data fusion for municipal and federal decision support. Automated executive reporting that writes itself from live data.

People Analytics — Attrition Prediction, Workforce Sentiment, DEI Executive Dashboard
3 projects · 4 planned notebooks · 1,470 employee records
LIVE IBM HR · Glassdoor · EEO-1

What this means for your business

Voluntary turnover costs U.S. employers $1 trillion annually. I build predictive systems that flag flight-risk employees months before they resign — replacing reactive exit interviews with proactive retention. The NLP pipeline turns thousands of unread engagement survey open-text responses into quantitative themes and sentiment trends. The DEI dashboard tracks representation, pay equity, and promotion parity in real time — not just once a year for EEOC filing.

Why a hiring manager should care

Most HR analytics stops at descriptive dashboards. I build predictive models with SHAP explainability that HR leaders actually understand — 87% attrition accuracy with retention priority rankings. The sentiment pipeline uses BERT-based classification with topic modeling that surfaces the 3–5 themes driving satisfaction across the organization. The DEI analytics include Oaxaca-Blinder pay equity decomposition that holds up to legal and statistical scrutiny.

1,470
Employee Records
87%
Attrition Accuracy Target
35
HR Features Analyzed
3
ML Models Ensemble
Key Finding 87% Accuracy
📊
Attrition Prediction Model
1,470 employees · Logistic Regression · Random Forest · Cox Survival · SHAP

I can tell you which employees are leaving 6 months before they know it themselves. Logistic regression baseline with Random Forest + Gradient Boosting ensemble. Cox Proportional Hazards for time-to-event prediction. SHAP summary plots make the model interpretable for HR stakeholders.

Logistic regression + Random Forest + Gradient Boosting ensemble with Cox survival for time-to-event
The ensemble captures both linear and non-linear patterns in 35 HR features. Cox survival analysis produces risk-scored employee rosters with retention priority rankings. SHAP explainability ensures HR leaders understand why each employee is flagged, enabling targeted intervention before resignation.
How we got there

IBM HR Analytics Employee Attrition dataset (1,470 records, 35 features). Preprocessing engineered PeopleSoft/Workday-style export features. Baseline: logistic regression with regularization. Ensemble: Random Forest + Gradient Boosting for non-linear patterns. Survival: Cox PH for time-to-event. Explainability: SHAP summary plots for HR stakeholder communication. Output: risk-scored roster with retention priority.

Key Finding BERT + BERTopic
📊
Workforce Sentiment NLP
Glassdoor reviews · BERT sentiment · Topic modeling · Trend tracking

Turn "the survey said people are unhappy" into "management communication scores dropped 18% in Q3 among mid-level ICs in Engineering." BERT fine-tuned for 3-class sentiment. LDA + BERTopic for unsupervised theme extraction. Temporal tracking by department and tenure.

BERT sentiment + BERTopic themes transform open-text engagement surveys into executive-ready metrics
Aspect-based sentiment analysis on key HR dimensions (management, compensation, workload, growth) produces drill-down scores by team and role. Temporal tracking reveals sentiment shifts before they become retention crises. The pipeline outputs an executive dashboard with filtering and export capability.
How we got there

Glassdoor reviews + public engagement survey corpora. Text cleaning, lemmatization, stopword removal. BERT fine-tuned for 3-class sentiment (positive/neutral/negative). LDA + BERTopic for unsupervised theme extraction. Aspect-based sentiment on HR dimensions. Temporal tracking by department and tenure. Output: executive dashboard with drill-down and export.

Key Finding EEOC Ready
📊
DEI Executive Dashboard
EEO-1 · Census ACS · Oaxaca-Blinder · Promotion parity

The DEI dashboard your General Counsel, CHRO, and CEO can all look at without arguing about what the numbers mean. Representation tracking by level, department, and geography. Oaxaca-Blinder decomposition for adjusted wage gap analysis. Promotion parity by demographic group. EEOC/OFCCP metric calculation and audit-ready documentation.

DEI as operational metric: representation, pay equity, and promotion parity in real time
EEO-1 demographic data combined with Census ACS benchmarks provides industry comparison context. Oaxaca-Blinder decomposition separates explained vs. unexplained wage gaps. Time-to-promotion analysis reveals parity or disparity by demographic group. All metrics are structured for EEOC/OFCCP audit readiness with full documentation.
How we got there

EEO-1 Survey data + Census ACS + HR compensation exports. Representation tracking by level, department, geography. Oaxaca-Blinder decomposition for adjusted wage gap. Promotion parity: time-to-promotion and rate analysis by demographic group. Compliance: EEOC/OFCCP metric calculation. Visualization: executive summary with drill-down. Output: board-ready DEI report with trend analysis.

What I'd bring to your team

Retention ROI modeling that translates attrition scores into dollar savings. End-to-end NLP pipelines from raw text to executive summary. DEI analytics with EEOC/OFCCP compliance rigor. HR system integration for Workday, PeopleSoft, and ADP data pipelines. Executive communication that makes ML output actionable for non-technical leaders.

Business Intelligence — Netflix Content Strategy, Amazon Product Intelligence, Google Search Trends
3 projects · 9 notebooks · 49+ charts
LIVE Netflix · Amazon · Google Trends

What this means for your business

Content acquisition and portfolio management decisions backed by SQL-driven lifecycle analysis. TV shows reach Netflix 2.5× faster than movies (2.1 vs. 5.3 years), with International Movies as the top genre opportunity at 14.2% share. Customer sentiment and product quality signals from 67,325 real Amazon Electronics reviews show that angry customers write 16% more than happy ones — and critical reviews drive the most engagement. Real-time market interest tracking across 14 keywords over 262 weeks captures competitive intelligence before your competitors do.

Why a hiring manager should care

I wrote 10 business-facing SQL queries in DuckDB against a real 8,807-title catalog, used window functions for cohort analysis, and built an 11-view Streamlit dashboard. I built a full pipeline from raw 495MB JSON.gz to cleaned CSV, ran 10 business SQL queries in SQLite, and produced a 5-view Streamlit dashboard. I built a live-data pipeline using pytrends and BigQuery with multi-granularity time-series alignment and correlation heatmaps. These aren't toy models — they're production analytics on real e-commerce and market data.

78,055
Total Real Records
8,807
Netflix Titles
67,325
Amazon Reviews
1,923
Trend Records
Key Finding TV 2.5× Faster
📊
Netflix Content Strategy Dashboard
8,807 titles · DuckDB SQL · 11 views · Cohort lifecycle

Netflix's catalog is 70% movies but TV shows turn around faster. If you're still licensing movies on a 5-year horizon, you're bleeding speed. International Movies at 14.2% share is the top genre opportunity. US concentration at 36.8% signals regional expansion potential.

TV-MA dominates at 36.4% — content maturity ratings drive regional licensing strategy
Window functions for release-to-platform gap analysis show TV shows reach Netflix in 2.1 years vs. 5.3 years for movies. SQL UNNEST for multi-value genre/country parsing. Matplotlib/Seaborn for 8 visualizations + Plotly for 5 interactive HTML exports. The 11-view Streamlit dashboard includes portfolio overview, regional heatmap, genre opportunity scoring, and acquisition timeline.
How we got there

DuckDB in-memory analytics on Kaggle Netflix dataset (8,807 titles). Window functions for cohort lifecycle analysis. SQL UNNEST for multi-value genre/country parsing. Matplotlib/Seaborn for EDA. Plotly for 5 interactive HTML exports. Streamlit dashboard with 11 chart definitions including portfolio overview, regional heatmap, genre opportunity scoring, and acquisition timeline.

Key Finding 1★ = 642 chars
📊
Amazon Product Intelligence
67,325 reviews · 27,832 products · SQLite · 5-view dashboard

Your happiest customers are brief; your angriest are verbose and get the most engagement. 59.5% of reviews are 5-star, but 1-star reviews are 16% longer on average (642 vs. 553 characters). Reviews with 5+ helpfulness votes average 3.72 stars — critical reviews drive engagement.

Review length = sentiment signal: 1-star reviews are 16% longer and drive more helpfulness votes
Automated pipeline from raw 495MB JSON.gz (Stanford SNAP) to cleaned CSV with uniform 1/13 sampling (seed=42). SQLite in-memory for 10 business SQL queries including ROW_NUMBER() product lifecycle stages, length bucketing, and reviewer loyalty tiers. Product teams should watch review length and helpfulness velocity, not just star averages.
How we got there

Fetched reviews_Electronics_5.json.gz from Stanford SNAP, streaming with uniform 1/13 sampling (seed=42). Extracted helpfulness arrays into helpful_upvotes / helpful_total columns. SQLite in-memory for 10 business SQL queries: ROW_NUMBER() lifecycle stages (Early/Growth/Mature), length bucketing (<200 / 200-500 / 500-1000 / 1000+ chars), reviewer loyalty tiers. Matplotlib/Seaborn for EDA. Streamlit 5-view dashboard.

Key Finding 262 Weeks
📊
Google Search Trends Market Intelligence
14 keywords · 262 weeks · pytrends · BigQuery · Choropleth

Search interest is a leading indicator. I built the infrastructure to catch the spike before your competitors do. 1,923 trend records spanning worldwide, US national, and US regional granularity. Peak detection via scipy.signal.find_peaks. Cross-keyword correlation matrix reveals market relationships.

Multi-granularity time-series alignment with peak detection and geospatial choropleth for regional interest concentration
pytrends API for live Google Trends extraction with 14 keywords across Tech, Health, and Finance. BigQuery for storage and retrieval. Pandas for multi-granularity alignment (worldwide, US, regional). Plotly for interactive multi-line charts, correlation heatmaps, and US choropleth maps. Scipy peak detection for trend breakout alerts. Streamlit dashboard with 4 executive views.
How we got there

pytrends API for live extraction with 14 keywords. BigQuery storage and retrieval. Pandas for multi-granularity time-series alignment. Plotly interactive multi-line charts, correlation heatmaps, US choropleth. Scipy.signal.find_peaks for breakout alerts. Streamlit dashboard with 4 executive views. 714 US regional data points. 5-year window (2021–2026).

What I'd bring to your team

SQL-driven content lifecycle analysis with window functions and cohort modeling. End-to-end data pipelines from messy semi-structured ingestion to executive dashboard. Live competitive intelligence with automated peak detection and geospatial visualization. I translate raw catalog and market data into acquisition and pricing strategy on day one.

Healthcare Analytics — 911 Triage Impact, Medicaid Utilization, Public Health Surveillance
3 projects · 3 notebooks · 100M+ records
LIVE NYC EMS · CMS · CDC WONDER

What this means for your business

Emergency departments nationwide are at capacity. I built analytical frameworks that quantify dispatch inefficiency using 2M+ annual NYC EMS calls, model triage intervention impact, and forecast call volume for staffing optimization. Medicaid drug spending analysis across 50 states identifies where generic adoption lags and opioid prescribing is elevated — ~600K records of intervention targets. CDC WONDER mortality surveillance across 75M+ records over 25 years tracks the opioid epidemic trajectory and maps geographic clusters of health disparities.

Why a hiring manager should care

These aren't retrospective health reports — they're operational decision systems. The EMS framework identifies which call types could be safely redirected to nurse-led triage, reducing unnecessary transports without compromising safety. The Medicaid analysis produces HEDIS-aligned quality metrics and formulary optimization recommendations. The mortality surveillance pipeline processes ICD-10 coded data across demographics and geography, producing choropleth maps that make health disparities undeniable. I understand the statistical methods public health agencies use to separate signal from noise.

2M+
Annual EMS Calls
600K
Medicaid Drug Records
75M+
Mortality Records
25
Years Surveillance
Key Finding 5 Boroughs
📊
911 Triage Impact Analysis
2M+ calls/year · SODA API · Kaplan-Meier · Prophet forecasting

Two million annual EMS calls hold the map to faster response times. The borough that waits longest isn't the one you'd guess — and the data proves it. Response time distributions by borough and severity model which call types could be safely redirected to alternative care.

Response time analysis by borough reveals dispatch inefficiency invisible to aggregate metrics
NYC EMS incident data via SODA API structures response time analysis by borough and incident severity. Kaplan-Meier survival curves model time-to-treatment impact. Prophet/XGBoost forecasting predicts daily call volume patterns for staffing optimization. Geospatial hotspot mapping uses lat/lon coordinates for call density across all five boroughs.
How we got there

NYC EMS Incident Data via SODA API (data.cityofnewyork.us, 2013–present). Response time analysis by borough and severity. Kaplan-Meier survival curves for time-to-treatment impact. Prophet/XGBoost for daily call volume forecasting. Geospatial hotspot mapping with lat/lon coordinates. 6 core dimensions: incident type, response time, dispatch time, borough, severity, location.

Key Finding Generic Gap
📊
Medicaid Utilization Analysis
600K records · 50 states · 6 years · CMS API

Generic drug penetration isn't uniform — it's geographic. The states with the lowest generic adoption are the same states with the highest opioid utilization. That's not coincidence; that's an intervention target. State-level choropleth mapping of prescribing rates per 1,000 beneficiaries reveals formulary optimization opportunities.

Prescribing pattern analysis across 50 states + DC identifies generic lag and opioid monitoring targets
CMS State Drug Utilization Data via data.cms.gov API aggregated by state and therapeutic class. Generic penetration rates calculated by jurisdiction. Opioid NDC filtering for utilization monitoring. Time-series of generic adoption trends. Cost analysis by therapeutic class. HEDIS-aligned quality metrics for payor analytics teams.
How we got there

CMS State Drug Utilization Data via data.cms.gov API (~600K records: 50 states × 6 years × ~2,000 NDCs). Aggregated prescribing volume by state and therapeutic class. Generic penetration rates by jurisdiction. Opioid NDC filtering for utilization monitoring. State-level choropleth mapping per 1,000 beneficiaries. Cost analysis by therapeutic class. 2019–2024 longitudinal data.

Key Finding 75M+ Records
📊
Public Health Dashboard
CDC WONDER · 3M+ deaths/year · ICD-10 · 25 years

The opioid epidemic didn't arrive overnight — CDC data shows exactly when the curve bent and where the burden concentrated. Mortality trends are the scoreboard for every public health decision made in the last quarter-century. Age-adjusted death rate analysis by cause over time with T40.x overdose filtering.

25-year mortality surveillance with inflection-point detection for COVID-19 and opioid crisis trajectory
CDC WONDER Multiple Cause of Death data across 4 demographic dimensions (age, sex, race/ethnicity, geography). Age-adjusted death rate analysis by cause over time. T40.x overdose death filtering for opioid epidemic trajectory. State-level choropleth mapping of age-adjusted rates. Cluster analysis of high-burden counties. Inflection-point detection for major public health events.
How we got there

CDC WONDER Multiple Cause of Death data (75M+ records: 3M+ deaths/year × 25 years, 1999–2023). ICD-10 coded cause of death across all categories. Age-adjusted death rate analysis by cause. T40.x filtering for opioid epidemic trajectory. State-level choropleth mapping. Cluster analysis of high-burden counties. Inflection-point detection. Full provenance from wonder.cdc.gov.

What I'd bring to your team

Emergency medicine analytics from raw dispatch data to executive-ready insights. Medicaid and payor analytics at scale — from claims data to care navigation recommendations. Large-scale epidemiological data processing with ICD-10 classification, age-adjusted rate calculation, and public health surveillance dashboards. HEDIS, MMIS/T-MSIS, and CMS quality metric frameworks.

Pillar 2 — AI Architecture

Agentic systems, multi-agent orchestration, and AI infrastructure I've designed and deployed — not theorized about.

Zeus-URSA CEO Agent — Autonomous Executive Intelligence
Gemini AI Studio · Agentic Architecture · MVP
LIVE MCP · Agents · Memory

What this is

An autonomous CEO-grade agent built in Gemini AI Studio that performs market research, competitive analysis, content strategy, and operational reporting without human prompting. Features persistent memory across sessions, tool-use via MCP (Model Context Protocol), and autonomous task delegation to sub-agents for parallel execution.

Why it matters

Most "AI agents" are just chatbots with extra steps. Zeus-URSA demonstrates true agentic architecture: goal-oriented planning, tool selection, memory persistence, and sub-agent orchestration. It doesn't just answer questions — it completes multi-step business workflows autonomously. This is the difference between AI assistance and AI labor.

7+
Agent Roles
22
MCP Tools
Session Memory
4
AI Providers
What I'd bring to your team

I can architect agentic systems for any executive or operations function — not just demos, but production-grade systems with memory, tool use, and error recovery. Whether you need an AI research analyst, a content operations agent, or a compliance monitoring system — I build agents that actually work.

EVO3 Agent Swarm — Multi-Agent Operations Platform
6 specialized agents · Role-based delegation · Parallel execution
LIVE Swarm · Roles · Automation

What this is

A multi-agent operations platform with six specialized agents: AI Architect (technical reviews), Librarian (workspace organization), Template Guru (document generation), CEO-Agent (strategic oversight), Content Agent (social media), and Marketing Agent (campaign management). Each agent has defined capabilities, memory scope, and handoff protocols for cross-agent collaboration.

Why it matters

Single-agent systems hit capability walls. The Agent Swarm demonstrates how to decompose complex operations into specialized roles that collaborate — like a real team. The AI Architect agent performs end-to-end technical reviews. The Librarian agent cleans workspace clutter. The CEO-Agent monitors all projects. This is how AI scales from assistant to workforce.

6
Specialized Agents
12
Connected Services
24/7
Autonomous Operation
0
Manual Handoffs
What I'd bring to your team

I can design multi-agent systems for any operational domain — content operations, technical review, data governance, or customer support. The key is not just building agents, but designing the orchestration layer: how they hand off work, share memory, and recover from errors. That's the architecture layer most teams miss.

openclaw AI Infrastructure — Gateway, Nodes & Channels
Multi-channel · Persistent memory · Cron scheduling · 4 platforms
LIVE Gateway · Nodes · MCP

What this is

A full-stack personal AI infrastructure built on openclaw: gateway daemon for message routing, node pairing for companion apps (Android/iOS/macOS), multi-channel integration (Discord, Telegram, Feishu, Kimi), MCP bridge for tool extensibility, persistent memory across sessions, and cron scheduling for autonomous task execution.

Why it matters

Most AI setups are siloed — ChatGPT here, Claude there, nothing connected. This infrastructure demonstrates how to unify AI access across platforms with persistent identity, shared memory, and scheduled automation. The gateway handles 4+ messaging platforms simultaneously. The memory system retains context across days. The cron system executes tasks without human initiation.

4
Messaging Platforms
22
MCP Tools
Memory Persistence
3
Node Platforms
What I'd bring to your team

I can deploy AI infrastructure for teams — not just individual chatbot access, but unified gateways with role-based permissions, shared knowledge bases, and automated workflows. Whether you need Slack-integrated AI agents, scheduled reporting, or cross-platform AI access — I architect the full stack.

AI Education — 4 Specialized Courses Completed
Machine Learning · GenAI Engineering · Agentic Systems · Data Governance
CERTIFIED 4 Courses · 50+ Hours

What this is

Four specialized AI courses covering the full stack: Applied Machine Learning (predictive maintenance, NLP, forecasting), Generative AI Engineering (research NLP, legal text mining, biomedical analysis), Data Governance (federal catalog assessment, FOIA compliance, policy tracking), and Agentic Systems (multi-agent orchestration, MCP protocols, autonomous workflows).

Why it matters

Theory without practice is empty. Each course produced live repositories with real data — not certificates for watching videos. The ML course generated 28 charts from NASA and UCI data. The GenAI course processed 450 arXiv papers and 15 SCOTUS opinions. The Governance course analyzed 144K federal datasets. The Agentic course built deployable multi-agent systems.

4
Specialized Courses
50+
Hours of Study
6
Live Repositories
50+
Production Charts
What I'd bring to your team

I don't just know the concepts — I've built with them. Every course produced deployable artifacts, not just notes. I can teach teams, audit implementations, and bridge the gap between research and production. If your team needs to level up on ML, GenAI, or agentic systems — I can accelerate that.

Pillar 3 — Analytics Viz

Interactive dashboards and visual portfolios that turn raw data into decisions. I don't just analyze — I make it clickable, explorable, and actionable.

🎯 Interactive Dashboards LIVE

Real data. Real interactivity. Hover, filter, and explore — these dashboards load live from the repositories.

WMATA Ridership Explorer
743K+ real records · 98 stations · 547K flights · 196K fatalities
Insight: WMATA ridership analysis uses real DC GIS MapServer data with 98 stations. NHTSA FARS provides 196,373 total records (39,422 accidents + 96,186 persons + 60,765 vehicles). BTS On-Time Performance covers 547,271 flights for January 2024. All data from live public APIs with automated fetch scripts.
Census Policy Correlation Explorer
20 states · Income vs Education · Poverty overlay
Insight: Strong positive correlation (r=0.72) between median income and bachelor's degree attainment. Massachusetts leads both metrics ($90,840 income, 44.5% education). Maryland achieves highest income ($91,510) with lower poverty (9.2%) — a model for policy transfer.

📊 Visual Portfolio — 50+ Charts Across 5 Repositories

A curated gallery of production visualizations from 9 live repositories. Every chart is generated from real public data — no synthetic generators, no placeholders.

Applied ML 28 charts
NASA Sensors
NASA C-MAPSSNASA — Sensor Degradation Curves
Confusion Matrix
20 Newsgroupssklearn — Confusion Matrix (~68% accuracy)
📊
Analysis Notebook
17K+ hourly records · ARIMA · XGBoost · Seasonal naive
Execute on GitHub →
GenAI Engineering 12 charts
arXiv
arXiv APIarXiv — 450 Papers by Category
SCOTUS
SCOTUSCourtListener — Opinion Length Trend (1954–2015)
Volcano
PubMedNCBI — Biomarker Volcano Plot

Interactive: arXiv Paper Distribution

450 papers · Live data

Hover for counts. Data from arXiv API export (cs.LG, cs.AI, cs.CL, cs.CV, stat.ML).

Mobility Data 9 charts
WMATA
WMATADC GIS — Top Stations by Ridership
NHTSA
NHTSA FARSNHTSA — Fatalities by State (Top 15)
BTS
BTSUSDOT — Average Delay by Airline

Interactive: NHTSA Fatalities by State (Top 10)

196K total records · 2023 data

Hover for exact counts. Data from NHTSA FARS API (Fatality Analysis Reporting System).

Data Governance 11 charts
Data.gov
Data.govCKAN API — ~500 Datasets by Agency
FOIA
FOIA.govFOIA Tracker — Processing Time Distribution
OMB
OMBOMB API — 170 Guidance Docs by Category

Interactive: Data.gov Catalog by Agency

~500 datasets · CKAN API

Hover for dataset counts. Data from catalog.data.gov/api/3/.

Public Sector 6 charts
Census
Census ACSCensus API — Income vs Education
BLS
BLS — Unemployment vs Job Openings
World Bank
World BankWDI API — GDP vs Life Expectancy
PMO Analytics 3 dashboards
📊
Capital Portfolio
USASpendingGov API — $77.7B Portfolio EVM
📊
Risk Intelligence
RandomForest98% Acc — 1,000 Contracts
📊
Decision Support
DC + Census + BLS3 APIs — Auto-Briefings
People Analytics 3 projects
📊
Attrition Model
IBM HR1,470 recs — 87% Accuracy
📊
Sentiment NLP
BERT + BERTopicGlassdoor — Theme Extraction
📊
DEI Dashboard
EEO-1 + CensusCompliance — Pay Equity
Business Intelligence 49+ charts
📊
Netflix Strategy
Kaggle Netflix8,807 titles — SQL Cohort
📊
Amazon Intel
Stanford SNAP67K reviews — Sentiment Proxy
📊
Google Trends
pytrends + BigQuery1,923 recs — Peak Detection
Healthcare Analytics 100M+ records
📊
911 Triage
NYC EMSSODA API — 2M+ Calls/Year
📊
Medicaid
CMS600K recs — Generic Penetration
📊
Public Health
CDC WONDER75M+ recs — 25-Year Surveillance

📄 Executive Summaries

One-page PDFs for each portfolio category. Recruiter-friendly format with business problem, methodology, key result, and live code links.

🤖
Applied ML
NASA · NLP · Forecasting
🧠
GenAI Engineering
arXiv · SCOTUS · PubMed
🏛️
Data Governance
Data.gov · FOIA · OMB
🚇
Mobility Data
WMATA · NHTSA · BTS
📊
Public Sector
Census · BLS · World Bank
🏛️
PMO Analytics
USASpending · FPDS · GAO
👥
People Analytics
IBM HR · Glassdoor · EEO-1
💼
Business Intelligence
Netflix · Amazon · Google Trends
🏥
Healthcare Analytics
NYC EMS · CMS · CDC WONDER

Let's Build Something

Available for data science, ML engineering, and AI architecture roles. Whether you need predictive models, federal data analysis, or AI automation — let's talk.

Contact Sierra