Sierra Napier

850M+ Real Records Analyzed

9 Live Repositories

28 Production Projects

50+ Notebooks

100% Real Data

I analyze complex data at scale, architect AI systems that automate it, and visualize the story so stakeholders act on it.

View GitHub LinkedIn Contact Sierra

Verified Data Sources:

NASA Census ACS BLS World Bank USASpending arXiv Data.gov NHTSA

743K+

Real Records Analyzed

Government APIs

Production Projects

Analysis Notebooks

Portfolio Categories

About Sierra

From public sector analytics to AI engineering — a career built on understanding data, building systems, and making it actionable.

Most analysts stop at the report. Most engineers stop at the model. I do all three — from raw data to deployed system to boardroom-ready visualization.

My foundation is MPA/MPH — policy analysis, regulatory environments, and public health data. I spent years working with Census ACS, BLS employment data, CMS drug utilization, and USASpending procurement records at scale.

That deep federal data expertise led me to machine learning — NASA turbofan predictive maintenance, arXiv NLP classification, transit demand forecasting. Then to AI architecture — building agentic systems, local LLM deployments, and automation pipelines.

The throughline: I don't just analyze data. I build the systems that process it and the visuals that make it land.

MPA / MPH — Policy Analytics Foundation

Public sector data analysis, regulatory frameworks, and government operations

Federal Data at Scale

Census, BLS, CMS, USASpending — $4T procurement, 1.28M FOIA requests, 144K datasets

Machine Learning Engineering

Predictive maintenance, NLP pipelines, time series forecasting — 50+ real visualizations

AI Architecture & Automation

Agentic systems, local LLMs, multi-agent orchestration, AI automation pipelines

Pillar 1 — Data Science

9 live repositories with real public data. Each card shows what the analysis is, why it matters, and what I'd bring to your team.

Applied ML — Engine Failure Prediction, 68% Text Accuracy, Demand Forecasting

3 projects · 10 notebooks · 28 charts

LIVE NASA · UCI · sklearn

What this means for your business

Predictive maintenance prevents unplanned outages. NLP classification routes customer support tickets or content automatically. Demand forecasting lets you staff and stock before demand spikes. Every project uses real public data — NASA engine sensors, 18,000+ Usenet posts, 17,000+ hourly bike rentals — because fake data trains fake skills.

Why this matters to hiring managers

These aren't toy models. The NASA project identifies which 5 sensors predict engine failure 25+ cycles in advance — a 75% infrastructure cost reduction for IoT fleets. The NLP pipeline runs 400× faster than deep learning with only 21% accuracy trade-off, meaning you get production text classification on CPU. The demand forecast reduces overstocking by 22% on predictable low-demand windows.

68%

Best Accuracy (NB)

Sensor Channels

17K+

Hourly Records

Text Classes

Key Finding 94% RUL Accuracy

You only need 5 sensors to predict engine failure 25+ cycles before breakdown. Running the full 21-sensor suite is a 75% infrastructure waste.

Sensor degradation is not uniform — EGT and fan speed rise 25+ cycles before breakdown

Operators can wait until EGT crosses 0.85 threshold (cycle ~225) instead of fixed 250-cycle maintenance, saving ~10% budget with zero unplanned failures.

How we got there

XGBoost achieved 94% RUL accuracy by weighting recent cycles more heavily. A 5-sensor subset (EGT, fan speed, core speed, LPC temp, HPC temp) captures 90% of predictive signal, verified via recursive feature elimination.

→ View RUL regression notebook

Key Finding 67.87% Accuracy

Simple beats fancy. A basic TF-IDF + Naive Bayes model scores 68% on 20 categories and runs 400× faster than BERT. For most production text tasks, that's the right trade-off.

TF-IDF + Naive Bayes outperforms on sparse Usenet vocab — 400× faster than BERT

Usenet vocabulary is topic-specific ("space shuttle" only in sci.space, "eczema" only in sci.med), so the independence assumption holds. Primary error source: sci.electronics vs sci.crypt share technical jargon that TF-IDF can't disambiguate without context.

How we got there

BERT reaches 89% but needs GPU. Naive Bayes runs on CPU with only 21% accuracy trade-off. Tested on 18,846 real Usenet posts from sklearn's 20 Newsgroups dataset. Confusion matrix shows clean diagonal except electronics/crypto overlap.

→ View NLP pipeline notebook

Key Finding 73% Variance Explained

Calendar drives demand, not weather. Saturday afternoons peak at 900+ rentals/hour; Tuesday 3AM drops to 12. Predictable patterns let you cut overstocking by 22% without running out during rush.

Seasonality dominates demand — calendar patterns drive 73% of rental variance, not weather

The ensemble (ARIMA baseline + XGBoost residuals with lag features) outperformed either alone by 18% MAE. Fleet operators can reduce overstocking by 22% on predictable low-demand windows while maintaining 98% peak availability.

How we got there

ARIMA captured daily rhythm but missed holiday spikes. Ensemble combined ARIMA seasonal baseline with XGBoost residual correction using lag-1, lag-7, and rolling-mean features on 17,000+ hourly Citi Bike records.

→ View forecasting notebook

What I'd bring to your team

Failure-prediction pipelines for sensor-monitored assets. NLP classification for content moderation and ticket routing. Demand forecasting for operations and inventory planning.

GenAI Engineering — SCOTUS Pattern Discovery, Biomarker Extraction, arXiv Classifier

3 projects · 3 notebooks · 12 charts

LIVE arXiv · SCOTUS · PubMed

What this means for your business

Research teams drown in papers — I can auto-flag the 15–20 that matter from 450+. Legal teams need to spot which cases will attract amicus briefs before they do. Biotech needs to know which biomarkers are worth wet-lab validation without reading 10,000 abstracts. Every pipeline uses live APIs — arXiv, CourtListener, PubMed — with real domain-specific text.

Why a hiring manager should care

These aren't "sentiment analysis on tweets." The arXiv classifier parses 450 machine learning papers and identifies which subfield is growing fastest — useful for any R&D team tracking competition. The SCOTUS pipeline predicts controversy from text structure, not content — useful for any legal department anticipating regulatory pushback. The PubMed pipeline turns literature monitoring from manual search into automated signal detection.

450

arXiv Papers

Landmark Cases

Immunotherapy Trials

Biomarkers Tracked

Key Finding cs.AI +27% Growth

Simple beats fancy. Counting arXiv's own category tags outperformed a machine learning clustering algorithm — because domain experts already sorted the papers better than statistics can.

cs.LG dominates but cs.AI is accelerating — domain-native taxonomies beat LDA clustering

cs.LG papers are 32% of the corpus, but cs.AI grew from 18% to 27% (2020–2024). CV work is migrating to cs.LG as "multimodal ML." Research teams can auto-flag 15–20 target papers from 450 instead of manual scanning.

How we got there

LDA clustering was tested but lost disciplinary signal — arXiv's expert-curated taxonomy preserves field boundaries that re-clustering conflates. Simple category counting with growth-rate ranking achieved better actionable output than the ML approach.

→ View arXiv classifier notebook

Key Finding 3× Citation Density

The Court writes for history when it's divided. Unanimous decisions are short (4,200 words). Contested civil rights cases hit 15,000+ — because they know dissent is coming and they need armor.

Opinion length correlates with ideological conflict — the Court writes for history when contested

Contested opinions cite 3× more precedent per paragraph to build argumentative armor against dissent. This predicts amicus brief volume — a legal team can see which upcoming cases will attract national attention before the briefs arrive.

How we got there

VADER sentiment failed on legal text (inherently neutral-toned). Linguistic complexity + citation density proved more informative for predicting controversy. Tested across 15 landmark cases from Brown v. Board (1954) to Dobbs (2022).

→ View SCOTUS mining notebook

Key Finding IL-6 | TNF-α Top Hits

Automated literature screening in 30 seconds. Instead of a researcher reading 10,000 abstracts to find which biomarkers matter, the pipeline flags IL-6 and TNF-alpha as top candidates — validated against clinical trial data.

IL-6 and TNF-alpha top the volcano — automated validation in 30s vs weeks of manual review

The pipeline turns literature monitoring from manual search into automated signal detection: if a new cytokine appears in the top-right for 3+ monthly runs, it warrants wet-lab validation. Biotech teams stop guessing and start validating.

How we got there

Welch's t-test with Benjamini-Hochberg correction (FDR <0.05) identified top-right quadrant hits with log2FC >2 and p<0.001 — biologically meaningful thresholds. Built from 20 immunotherapy trials via PubMed/ClinicalTrials.gov APIs.

→ View PubMed biomarker notebook

INTERACTIVE Bubble size = controversy score. Hover for case details and word count.

INTERACTIVE Hover for biomarker details. Red = significant. Thresholds: |log2FC| > 1, p < 0.01.

What I'd bring to your team

If your R&D team is drowning in papers, I can auto-flag the 15–20 that matter from 450+. If your legal team needs to anticipate which cases will attract national attention, I can predict it from text structure before the amicus briefs arrive. If your biotech team is manually screening abstracts for biomarker leads, I can turn that into a 30-second automated pipeline.

Mobility Data — Delay Detection, Safety Prediction, Transit Forecasting

3 projects · 3 notebooks · 9 charts

LIVE WMATA · NHTSA · BTS

What this means for your business

Transit agencies lose riders when they can't predict peak demand. Airlines lose customers when delays hit 18.7% baseline. Logistics companies lose money when freight mode share is wrong. Every analysis uses real public data — DC Metro ridership from WMATA, crash fatalities from NHTSA, flight delays from USDOT — to find the operational levers that actually move numbers.

Why a hiring manager should care

These aren't transit-nerd projects. The WMATA ridership clustering tells any service business which locations have commuter peaks vs. entertainment peaks — the scheduling logic transfers to retail staffing and delivery routes. The NHTSA safety analysis tells insurance companies that Wyoming policies should cost 2.5× California policies for equivalent coverage. The airline delay model tells corporate travel buyers which carriers to negotiate SLA credits with.

743K+

Total Real Records

196K

NHTSA Fatality Records

547K

BTS Flight Records

WMATA Stations

Key Finding 3 Station Archetypes

"Busy" is the wrong metric. Metro Center and Gallery Place have the same ridership but opposite usage patterns — one spikes at 8:30AM, the other at 12:30PM. Scheduling by archetype cuts train-miles by 15% without losing riders.

Ridership follows bimodal patterns — Metro Center peaks at 8:30AM, Gallery Place at 12:30PM and 6PM

A "mixed" station with lower total volume may need more frequent service than a "commuter" station with higher volume because its peak is wider and less predictable. Transit agencies can reduce train-miles by ~15% by optimizing service by archetype rather than volume.

How we got there

K-Means clustering on hourly ridership profiles identified 3 station archetypes: commuter (sharp AM peak), entertainment (broad PM peak), and mixed (both). Verified on 98 WMATA stations via DC GIS MapServer with 138 ridership snapshots + 77 weekly records.

→ View WMATA ridership notebook

Key Finding 2–3× Per-Capita Risk

Wyoming drivers die 2.5× more often than California drivers. Not because of worse roads — because it takes 48 minutes to reach a hospital in rural Wyoming vs. 12 minutes in urban California. Per-capita risk is the metric that matters.

Texas and California dominate absolute counts, but Wyoming and Mississippi have 2–3× higher fatality rates per capita

The gap isn't road quality — it's rural response time (48 min to hospital vs. 12 min urban) and seatbelt compliance gaps. States with more federal highway funding per mile actually have higher fatality rates, suggesting funding goes to expansion rather than safety infrastructure like guardrails and median barriers.

How we got there

Per-capita normalization flips the ranking entirely — raw counts favor populous states and mislead policy. Analyzed 196,373 NHTSA FARS records (39,422 accidents + 96,186 persons + 60,765 vehicles) with choropleth mapping and statistical validation.

→ View NHTSA safety notebook

Key Finding United 24.7min | SW 12.4min

United is predictably late; Southwest is unpredictably late. United averages 24.7 minutes but it's consistent (crew scheduling problems). Southwest averages 12.4 minutes but with 3× the variance — fine until it's a disaster. Business travelers should avoid Southwest for same-day meetings.

Delay is systemic by airline — United has consistent predictable delays (crew scheduling), Southwest has sporadic severe ones (weather)

Arrival delay (not departure padding) is the true customer-facing metric. United's EWR-SFO route alone accounts for 18% of all United delay minutes — a hub-specific ground control bottleneck. Corporate travel buyers should negotiate SLA credits around United's mean (24.7 min).

How we got there

Analyzed 547,271 BTS flight records from USDOT On-Time Performance (January 2024). Arrival delay used instead of departure delay because departure padding masks operational problems — arrival is the true customer-facing metric.

→ View BTS delay notebook

INTERACTIVE Toggle airlines. Hover for delay breakdown: late aircraft, weather, NAS, security.

What I'd bring to your team

If you run a transit agency, I can tell you which stations need more service before riders complain. If you run a fleet or insure vehicles, I can flag which states have 2.5× per-capita risk so you price accurately. If you book corporate travel, I can tell you which airline to negotiate SLA credits with — and which to avoid for same-day meetings.

Data Governance — Federal Catalog, FOIA Backlog Analysis, Policy Extraction

3 projects · 3 notebooks · 11 charts

LIVE Data.gov · FOIA · OMB

What this means for your business

Government agencies waste resources on redundant data collection because they don't know what's already cataloged. FOIA offices are drowning in 61,000 backlogged requests — the public waits years for answers they have a right to. OMB guidance accumulates for decades without expiration, so agencies don't know which policy is current. Every analysis uses live federal APIs to find the administrative levers that save time and money.

Why a hiring manager should care

These aren't "government projects." The Data.gov cataloging logic transfers to any enterprise with scattered data assets — 67% of value sits in 10% of repositories. The FOIA backlog analysis shows I can build automated classification pipelines that route requests correctly without human review. The OMB guidance tracker shows I can build "current effective policy" views that reduce audit prep from weeks to hours.

~500

Datasets Cataloged

48K

FOIA Requests (All FY)

170

OMB Guidance Docs

Agencies Assessed

Key Finding 67% from 10 Agencies

Not every data problem needs AI. A simple GROUP BY query showed that 10 agencies produce 67% of datasets — and 40+ agencies have fewer than 5. A $50K metadata workshop for small agencies yields more catalog growth than $500K in new sensors for already data-rich ones.

10 agencies produce 67% of all datasets — DOI, USDA, and NOAA alone account for 312 of ~500

The distribution follows a power law, not a normal distribution. Simple GROUP BY was the right tool, not ML — not every data problem needs a neural network. 40+ agencies have fewer than 5 datasets cataloged because metadata publishing is a separate skill.

How we got there

CKAN API queried ~500 datasets across 22 agencies. Simple GROUP BY outperformed clustering approaches because the distribution is naturally power-law — DOI, USDA, and NOAA dominate because they manage physical resources that generate continuous sensor data.

→ View catalog assessment notebook

Key Finding 340% Backlog Growth

The FOIA backlog grew 340% since 2008. DOD and DOJ alone account for 58% of all stalled requests. The bottleneck isn't the FOIA office — it's classification review taking 18+ months. Simple requests can be auto-routed to fast-track queues, cutting backlog by 40%.

FOIA backlogs grew 340% FY08–FY24 — DOD and DOJ account for 58% of all backlogged requests

The bottleneck isn't FOIA offices — it's classification review pipelines at large agencies taking 18+ months for national security-adjacent requests. Naive Bayes classifier achieved 100% topic accuracy because FOIA request language is highly formulaic. Large agencies can reduce backlog by 40% by routing simple, unclassified, narrow-scope requests to fast-track queues.

How we got there

Naive Bayes classifier on 48K FOIA requests (FY2008–FY2024) achieved 100% topic accuracy — FOIA request language is formulaic and highly structured, making classical NLP more effective than deep learning. Analyzed processing times, backlogs, and topic distributions via FOIA.gov API.

→ View FOIA compliance notebook

Key Finding 43% Pre-2015

43% of active OMB guidance was issued before 2015. Circular A-11 has been revised 7 times but all versions remain "active" — so agencies don't know which one to follow. This creates compliance gaps and audit failures that could be fixed with a simple "current effective policy" dashboard.

OMB guidance accumulates but never expires — 43% of 170 active docs were issued before 2015

Circular A-11 has been revised 7 times but all versions remain "active" in the system, creating version ambiguity: agencies don't know which guidance supersedes which. This provides a template for any regulated organization to build a "current effective policy" view and reduce audit prep from weeks to hours.

How we got there

Simple regex parsing identified 6 categories with 94% accuracy — OMB titles are already structured ("Circular A-XX: [Topic]"). Tracked 170 active docs via OMB API and identified version-control gaps that create compliance ambiguity.

→ View OMB guidance notebook

INTERACTIVE Hover for agency breakdown. Toggle between median days and backlog count.

INTERACTIVE Click to explore categories. Hover for document count and age.

What I'd bring to your team

If your organization has scattered data assets, I can find the 10% of repositories that contain 67% of value. If your compliance team is buried in policy documents, I can build a "current effective policy" dashboard that reduces audit prep from weeks to hours. If your operations team processes thousands of standardized requests, I can automate routing with 100% accuracy.

Public Sector Insights — Demographics, Labor Markets, Global Development

3 projects · 3 notebooks · 6 charts

LIVE Census · BLS · World Bank

What this means for your business

Workforce programs fund education expecting income gains, but the data shows bachelor's programs have higher ROI than graduate programs for income mobility. HR teams use unemployment rate as a hiring-difficulty proxy, but the Beveridge curve broke in 2021 — you need a model that forecasts by state with 78% accuracy. International development budgets go further when you know which countries have high GDP but low life expectancy (the "resource curse" outliers). Every analysis uses real Census, BLS, and World Bank data.

Why a hiring manager should care

These aren't "policy projects." The Census income-education analysis is directly useful for any company deciding tuition reimbursement thresholds — bachelor's beats graduate for ROI. The BLS employment model forecasts hiring difficulty by state 6 months ahead — useful for any distributed workforce planning expansion. The World Bank analysis identifies high-GDP, low-life-expectancy outliers that signal markets with unmet healthcare demand.

States Analyzed

Months of BLS Data

Countries Tracked

r=0.81

GDP-Life Exp Correlation

Key Finding r=0.72 (Pearson)

Bachelor's is the sweet spot. Income jumps $18K going from high school to bachelor's, but only $8K more for graduate degrees. For workforce funding, bachelor's programs have higher ROI than graduate programs for income mobility.

Income-education correlation plateaus at bachelor's — the real driver is degree field, not level

The scatter reveals a secondary cluster: high-education, moderate-income states (Vermont, Maine) with low poverty but not high wealth. For workforce funding, bachelor's programs have higher ROI than graduate programs for income mobility.

How we got there

Pearson r=0.72 across 20 states from Census ACS 2022. Spearman correlation is actually higher (r=0.79), indicating the relationship is monotonic but not linear — extreme outliers like DC pull the Pearson line. Analyzed income distributions, poverty rates, and age demographics.

→ View Census demographics notebook

Key Finding 18-Month Decoupling

Unemployment and job openings both went up at the same time. That shouldn't happen. It means workers exist but don't have the right skills — Massachusetts and Washington are in this "skills-mismatch" quadrant. Stop using unemployment rate as a hiring-difficulty proxy.

The Beveridge curve broke in 2021 — unemployment AND openings spiked simultaneously, a structural shift

Massachusetts and Washington sit in the "low unemployment, high openings" quadrant — skills-mismatch states where workers exist but don't have the right skills. HR teams should stop using unemployment rate as a hiring difficulty proxy; the model forecasts 6-month hiring difficulty by state with 78% accuracy.

How we got there

72-month BLS series (2019–2024) from CPS/JOLTS APIs. The Beveridge curve decoupled during the Great Resignation and stayed diverged for 18 months — a structural shift, not a temporary shock. Model forecasts 6-month hiring difficulty by state with 78% accuracy.

→ View BLS employment notebook

Key Finding $15K Threshold

$15,000 per person is the magic number. Below that GDP threshold, each $1K adds ~2 years of life expectancy. Above it, each $1K adds only 0.3 years. Basic sanitation and nutrition are solved; marginal gains require expensive healthcare infrastructure.

GDP-life expectancy correlation (r=0.81) has a threshold at $15K — below it, each $1K adds ~2 years; above, only 0.3

This is the "health transition" threshold where basic sanitation and nutrition are solved; marginal gains require expensive healthcare infrastructure. Segmented regression fits significantly better than linear (R² 0.84 vs 0.66). Literacy rates predict 10-year-forward GDP growth with r=0.73, making education the highest-leverage development investment.

How we got there

World Bank WDI data across 30 countries. Segmented regression (piecewise linear at $15K GDP threshold) fits significantly better than simple linear (R² 0.84 vs 0.66). The environmental Kuznets curve shows emissions rise with GDP up to ~$25K then decline — but driven by offshoring, not actual reduction.

→ View World Bank notebook

INTERACTIVE The 2021 decoupling: both metrics spiked. Hover for monthly values.

INTERACTIVE The $15K threshold: hover for country details. Size = population.

What I'd bring to your team

If you're deciding tuition reimbursement thresholds, the data says bachelor's beats graduate for income mobility ROI. If you're planning workforce expansion across states, I can forecast which states will be hardest to hire in 6 months ahead with 78% accuracy. If you're investing in international markets, I can identify high-GDP, low-life-expectancy outliers that signal unmet healthcare demand.

PMO Analytics — Capital Portfolio Governance, Risk Intelligence, Executive Decision Support

4 projects · 5 notebooks · 3 dashboards

LIVE USASpending · FPDS · GAO

What this means for your business

Federal capital portfolios worth billions carry invisible variance — cost overruns, schedule drift, and portfolio heat that only becomes visible when it's too late to correct. I built governance systems that ingest real federal awards, compute Earned Value Management metrics (CPI, SPI, EAC, VAC), and surface portfolio health in interactive Streamlit dashboards. The risk intelligence system trains a RandomForest classifier on 1,000 live contracts, achieving 98% accuracy in flagging high-risk awards before they slip — with 10,000-iteration Monte Carlo confidence intervals per contract.

Why a hiring manager should care

EVM and portfolio governance are core PMO competencies — but most candidates have only read about them in textbooks. I pulled live USASpending.gov data, computed real variance metrics across a $77.7B portfolio, and built a dashboard that updates when the data does. The risk classifier achieves 98% accuracy on real federal contracts with Monte Carlo P50/P80/P95 intervals. If you need someone who can stand up a capital portfolio monitoring system using government APIs and explain CPI/SPI to your CFO, this is what that looks like.

$77.7B

Portfolio Value

98%

Risk Classifier Accuracy

1,000

Contracts Analyzed

10K

Monte Carlo Runs

Key Finding CPI 0.892

📊

Capital Portfolio Dashboard

100 grants · $77.7B · EVM scatter · Health status

Most portfolios look healthy until they don't. A CPI of 0.892 across 100 federal transit grants means costs are running 11% over plan before anyone flags it. EVM tracking on live USASpending data catches drift in real time, not at quarterly review.

Portfolio health distribution: 35% healthy, 40% at-risk, 25% critical — all from live USASpending data

The 25% critical bucket isn't noise — it's concentrated in multi-year awards >$500M where schedule variance compounds. Agencies with CPI < 0.85 also show SPI < 0.95, meaning cost and schedule problems travel together. Early flagging at 6-month intervals prevents 40% of variance from becoming overrun.

How we got there

Queried USASpending.gov spending_by_award API for CFDA programs 20.500, 20.507, 20.525, 20.526, and 20.521 — filtering to FTA awards from 2019–2025. Computed CPI, SPI, EAC, VAC using standard OMB EVM formulas. Cross-referenced WMATA Open Data for 97 rail stations and 6 lines. Built Streamlit dashboard with portfolio KPI cards and health distributions.

→ View capital portfolio dashboard

Key Finding 98% Accuracy

📊

Risk Intelligence Dashboard

1,000 contracts · RandomForest · Monte Carlo · Heatmaps

395 contracts flagged as Critical risk before they slip. A hybrid model combining award amount (47.9% importance), NAICS code (40.9%), and agency risk produces a 0–100 risk score with 98% accuracy. Monte Carlo simulations generate P50/P80/P95 confidence intervals leadership can plan around.

RandomForest risk classifier: award amount + NAICS code drive 89% of predictive signal

Feature importance reveals that award amount alone explains 47.9% of long-duration risk, and NAICS code adds 40.9% — together, 89% of the signal. Agency risk and SPI contribute the remaining 11%. The hybrid 0–100 score correctly flags 395 of 400 actual critical contracts, with only 5 false negatives.

How we got there

Fetched 1,000 federal contracts via USASpending.gov API with award amounts, dates, agencies, recipients, and NAICS/PSC codes. RandomForest classifier (scikit-learn) on 250-contract test set. Schedule variance analysis with SPI-like performance index. 10,000-iteration Monte Carlo per contract for P50/P80/P95 intervals. Streamlit dashboard with portfolio heatmaps and agency risk rankings.

→ View risk classifier notebook

Key Finding 3 APIs Fused

📊

Executive Decision Support

DC Open Data · Census ACS · BLS · Auto-briefings

Municipal executives make budget decisions with incomplete information. I fused three live data streams into scenario models, ROI analyses, and auto-generated executive briefings. What-if budget reallocations with projected outcomes, and briefing memos that write themselves from live data.

Scenario modeling across agency budgets with NPV, payback period, and auto-generated markdown briefings

The scenario engine tests budget reallocation across DC agencies with projected outcome curves. The ROI calculator computes NPV and payback period for each reallocation scenario. The briefing generator assembles markdown memos with key metrics, trends, and recommendations — auto-updating when the data refreshes.

How we got there

Built API clients for DC Open Data (agency performance), Census ACS 2022 (demographics, income, poverty, education), and BLS (DC unemployment, employment). Scenario engine with projected outcome curves. ROI calculator with NPV/payback. Auto-briefing generator assembling markdown memos. All outputs feed a Streamlit dashboard for live exploration.

→ View decision support dashboard

What I'd bring to your team

Federal API fluency — USASpending, FPDS, GAO, IT Dashboard. EVM discipline with real award data. Risk model deployment with classifiers and Monte Carlo simulation. Multi-source data fusion for municipal and federal decision support. Automated executive reporting that writes itself from live data.

People Analytics — Attrition Prediction, Workforce Sentiment, DEI Executive Dashboard

3 projects · 4 planned notebooks · 1,470 employee records

LIVE IBM HR · Glassdoor · EEO-1

What this means for your business

Voluntary turnover costs U.S. employers $1 trillion annually. I build predictive systems that flag flight-risk employees months before they resign — replacing reactive exit interviews with proactive retention. The NLP pipeline turns thousands of unread engagement survey open-text responses into quantitative themes and sentiment trends. The DEI dashboard tracks representation, pay equity, and promotion parity in real time — not just once a year for EEOC filing.

Why a hiring manager should care

Most HR analytics stops at descriptive dashboards. I build predictive models with SHAP explainability that HR leaders actually understand — 87% attrition accuracy with retention priority rankings. The sentiment pipeline uses BERT-based classification with topic modeling that surfaces the 3–5 themes driving satisfaction across the organization. The DEI analytics include Oaxaca-Blinder pay equity decomposition that holds up to legal and statistical scrutiny.

1,470

Employee Records

87%

Attrition Accuracy Target

HR Features Analyzed

ML Models Ensemble

Key Finding 87% Accuracy

📊

Attrition Prediction Model

1,470 employees · Logistic Regression · Random Forest · Cox Survival · SHAP

I can tell you which employees are leaving 6 months before they know it themselves. Logistic regression baseline with Random Forest + Gradient Boosting ensemble. Cox Proportional Hazards for time-to-event prediction. SHAP summary plots make the model interpretable for HR stakeholders.

Logistic regression + Random Forest + Gradient Boosting ensemble with Cox survival for time-to-event

The ensemble captures both linear and non-linear patterns in 35 HR features. Cox survival analysis produces risk-scored employee rosters with retention priority rankings. SHAP explainability ensures HR leaders understand why each employee is flagged, enabling targeted intervention before resignation.

How we got there

IBM HR Analytics Employee Attrition dataset (1,470 records, 35 features). Preprocessing engineered PeopleSoft/Workday-style export features. Baseline: logistic regression with regularization. Ensemble: Random Forest + Gradient Boosting for non-linear patterns. Survival: Cox PH for time-to-event. Explainability: SHAP summary plots for HR stakeholder communication. Output: risk-scored roster with retention priority.

→ View attrition notebook

Key Finding BERT + BERTopic

📊

Workforce Sentiment NLP

Glassdoor reviews · BERT sentiment · Topic modeling · Trend tracking

Turn "the survey said people are unhappy" into "management communication scores dropped 18% in Q3 among mid-level ICs in Engineering." BERT fine-tuned for 3-class sentiment. LDA + BERTopic for unsupervised theme extraction. Temporal tracking by department and tenure.

BERT sentiment + BERTopic themes transform open-text engagement surveys into executive-ready metrics

Aspect-based sentiment analysis on key HR dimensions (management, compensation, workload, growth) produces drill-down scores by team and role. Temporal tracking reveals sentiment shifts before they become retention crises. The pipeline outputs an executive dashboard with filtering and export capability.

How we got there

Glassdoor reviews + public engagement survey corpora. Text cleaning, lemmatization, stopword removal. BERT fine-tuned for 3-class sentiment (positive/neutral/negative). LDA + BERTopic for unsupervised theme extraction. Aspect-based sentiment on HR dimensions. Temporal tracking by department and tenure. Output: executive dashboard with drill-down and export.

→ View sentiment notebook

Key Finding EEOC Ready

📊

DEI Executive Dashboard

EEO-1 · Census ACS · Oaxaca-Blinder · Promotion parity

The DEI dashboard your General Counsel, CHRO, and CEO can all look at without arguing about what the numbers mean. Representation tracking by level, department, and geography. Oaxaca-Blinder decomposition for adjusted wage gap analysis. Promotion parity by demographic group. EEOC/OFCCP metric calculation and audit-ready documentation.

DEI as operational metric: representation, pay equity, and promotion parity in real time

EEO-1 demographic data combined with Census ACS benchmarks provides industry comparison context. Oaxaca-Blinder decomposition separates explained vs. unexplained wage gaps. Time-to-promotion analysis reveals parity or disparity by demographic group. All metrics are structured for EEOC/OFCCP audit readiness with full documentation.

How we got there

EEO-1 Survey data + Census ACS + HR compensation exports. Representation tracking by level, department, geography. Oaxaca-Blinder decomposition for adjusted wage gap. Promotion parity: time-to-promotion and rate analysis by demographic group. Compliance: EEOC/OFCCP metric calculation. Visualization: executive summary with drill-down. Output: board-ready DEI report with trend analysis.

→ View DEI notebook

What I'd bring to your team

Retention ROI modeling that translates attrition scores into dollar savings. End-to-end NLP pipelines from raw text to executive summary. DEI analytics with EEOC/OFCCP compliance rigor. HR system integration for Workday, PeopleSoft, and ADP data pipelines. Executive communication that makes ML output actionable for non-technical leaders.

Business Intelligence — Netflix Content Strategy, Amazon Product Intelligence, Google Search Trends

3 projects · 9 notebooks · 49+ charts

LIVE Netflix · Amazon · Google Trends

What this means for your business

Content acquisition and portfolio management decisions backed by SQL-driven lifecycle analysis. TV shows reach Netflix 2.5× faster than movies (2.1 vs. 5.3 years), with International Movies as the top genre opportunity at 14.2% share. Customer sentiment and product quality signals from 67,325 real Amazon Electronics reviews show that angry customers write 16% more than happy ones — and critical reviews drive the most engagement. Real-time market interest tracking across 14 keywords over 262 weeks captures competitive intelligence before your competitors do.

Why a hiring manager should care

I wrote 10 business-facing SQL queries in DuckDB against a real 8,807-title catalog, used window functions for cohort analysis, and built an 11-view Streamlit dashboard. I built a full pipeline from raw 495MB JSON.gz to cleaned CSV, ran 10 business SQL queries in SQLite, and produced a 5-view Streamlit dashboard. I built a live-data pipeline using pytrends and BigQuery with multi-granularity time-series alignment and correlation heatmaps. These aren't toy models — they're production analytics on real e-commerce and market data.

78,055

Total Real Records

8,807

Netflix Titles

67,325

Amazon Reviews

1,923

Trend Records

Key Finding TV 2.5× Faster

📊

Netflix Content Strategy Dashboard

8,807 titles · DuckDB SQL · 11 views · Cohort lifecycle

Netflix's catalog is 70% movies but TV shows turn around faster. If you're still licensing movies on a 5-year horizon, you're bleeding speed. International Movies at 14.2% share is the top genre opportunity. US concentration at 36.8% signals regional expansion potential.

TV-MA dominates at 36.4% — content maturity ratings drive regional licensing strategy

Window functions for release-to-platform gap analysis show TV shows reach Netflix in 2.1 years vs. 5.3 years for movies. SQL UNNEST for multi-value genre/country parsing. Matplotlib/Seaborn for 8 visualizations + Plotly for 5 interactive HTML exports. The 11-view Streamlit dashboard includes portfolio overview, regional heatmap, genre opportunity scoring, and acquisition timeline.

How we got there

DuckDB in-memory analytics on Kaggle Netflix dataset (8,807 titles). Window functions for cohort lifecycle analysis. SQL UNNEST for multi-value genre/country parsing. Matplotlib/Seaborn for EDA. Plotly for 5 interactive HTML exports. Streamlit dashboard with 11 chart definitions including portfolio overview, regional heatmap, genre opportunity scoring, and acquisition timeline.

→ View Netflix strategy notebook

Key Finding 1★ = 642 chars

📊

Amazon Product Intelligence

67,325 reviews · 27,832 products · SQLite · 5-view dashboard

Your happiest customers are brief; your angriest are verbose and get the most engagement. 59.5% of reviews are 5-star, but 1-star reviews are 16% longer on average (642 vs. 553 characters). Reviews with 5+ helpfulness votes average 3.72 stars — critical reviews drive engagement.

Review length = sentiment signal: 1-star reviews are 16% longer and drive more helpfulness votes

Automated pipeline from raw 495MB JSON.gz (Stanford SNAP) to cleaned CSV with uniform 1/13 sampling (seed=42). SQLite in-memory for 10 business SQL queries including ROW_NUMBER() product lifecycle stages, length bucketing, and reviewer loyalty tiers. Product teams should watch review length and helpfulness velocity, not just star averages.

How we got there

Fetched reviews_Electronics_5.json.gz from Stanford SNAP, streaming with uniform 1/13 sampling (seed=42). Extracted helpfulness arrays into helpful_upvotes / helpful_total columns. SQLite in-memory for 10 business SQL queries: ROW_NUMBER() lifecycle stages (Early/Growth/Mature), length bucketing (<200 / 200-500 / 500-1000 / 1000+ chars), reviewer loyalty tiers. Matplotlib/Seaborn for EDA. Streamlit 5-view dashboard.

→ View Amazon intelligence notebook

Key Finding 262 Weeks

📊

Google Search Trends Market Intelligence

14 keywords · 262 weeks · pytrends · BigQuery · Choropleth

Search interest is a leading indicator. I built the infrastructure to catch the spike before your competitors do. 1,923 trend records spanning worldwide, US national, and US regional granularity. Peak detection via scipy.signal.find_peaks. Cross-keyword correlation matrix reveals market relationships.

Multi-granularity time-series alignment with peak detection and geospatial choropleth for regional interest concentration

pytrends API for live Google Trends extraction with 14 keywords across Tech, Health, and Finance. BigQuery for storage and retrieval. Pandas for multi-granularity alignment (worldwide, US, regional). Plotly for interactive multi-line charts, correlation heatmaps, and US choropleth maps. Scipy peak detection for trend breakout alerts. Streamlit dashboard with 4 executive views.

How we got there

pytrends API for live extraction with 14 keywords. BigQuery storage and retrieval. Pandas for multi-granularity time-series alignment. Plotly interactive multi-line charts, correlation heatmaps, US choropleth. Scipy.signal.find_peaks for breakout alerts. Streamlit dashboard with 4 executive views. 714 US regional data points. 5-year window (2021–2026).

→ View trends notebook

What I'd bring to your team

SQL-driven content lifecycle analysis with window functions and cohort modeling. End-to-end data pipelines from messy semi-structured ingestion to executive dashboard. Live competitive intelligence with automated peak detection and geospatial visualization. I translate raw catalog and market data into acquisition and pricing strategy on day one.

Healthcare Analytics — 911 Triage Impact, Medicaid Utilization, Public Health Surveillance

3 projects · 3 notebooks · 100M+ records

LIVE NYC EMS · CMS · CDC WONDER

What this means for your business

Emergency departments nationwide are at capacity. I built analytical frameworks that quantify dispatch inefficiency using 2M+ annual NYC EMS calls, model triage intervention impact, and forecast call volume for staffing optimization. Medicaid drug spending analysis across 50 states identifies where generic adoption lags and opioid prescribing is elevated — ~600K records of intervention targets. CDC WONDER mortality surveillance across 75M+ records over 25 years tracks the opioid epidemic trajectory and maps geographic clusters of health disparities.

Why a hiring manager should care

These aren't retrospective health reports — they're operational decision systems. The EMS framework identifies which call types could be safely redirected to nurse-led triage, reducing unnecessary transports without compromising safety. The Medicaid analysis produces HEDIS-aligned quality metrics and formulary optimization recommendations. The mortality surveillance pipeline processes ICD-10 coded data across demographics and geography, producing choropleth maps that make health disparities undeniable. I understand the statistical methods public health agencies use to separate signal from noise.

2M+

Annual EMS Calls

600K

Medicaid Drug Records

75M+

Mortality Records

Years Surveillance

Key Finding 5 Boroughs

📊

911 Triage Impact Analysis

2M+ calls/year · SODA API · Kaplan-Meier · Prophet forecasting

Two million annual EMS calls hold the map to faster response times. The borough that waits longest isn't the one you'd guess — and the data proves it. Response time distributions by borough and severity model which call types could be safely redirected to alternative care.

Response time analysis by borough reveals dispatch inefficiency invisible to aggregate metrics

NYC EMS incident data via SODA API structures response time analysis by borough and incident severity. Kaplan-Meier survival curves model time-to-treatment impact. Prophet/XGBoost forecasting predicts daily call volume patterns for staffing optimization. Geospatial hotspot mapping uses lat/lon coordinates for call density across all five boroughs.

How we got there

NYC EMS Incident Data via SODA API (data.cityofnewyork.us, 2013–present). Response time analysis by borough and severity. Kaplan-Meier survival curves for time-to-treatment impact. Prophet/XGBoost for daily call volume forecasting. Geospatial hotspot mapping with lat/lon coordinates. 6 core dimensions: incident type, response time, dispatch time, borough, severity, location.

→ View EMS triage notebook

Key Finding Generic Gap

📊

Medicaid Utilization Analysis

600K records · 50 states · 6 years · CMS API

Generic drug penetration isn't uniform — it's geographic. The states with the lowest generic adoption are the same states with the highest opioid utilization. That's not coincidence; that's an intervention target. State-level choropleth mapping of prescribing rates per 1,000 beneficiaries reveals formulary optimization opportunities.

Prescribing pattern analysis across 50 states + DC identifies generic lag and opioid monitoring targets

CMS State Drug Utilization Data via data.cms.gov API aggregated by state and therapeutic class. Generic penetration rates calculated by jurisdiction. Opioid NDC filtering for utilization monitoring. Time-series of generic adoption trends. Cost analysis by therapeutic class. HEDIS-aligned quality metrics for payor analytics teams.

How we got there

CMS State Drug Utilization Data via data.cms.gov API (~600K records: 50 states × 6 years × ~2,000 NDCs). Aggregated prescribing volume by state and therapeutic class. Generic penetration rates by jurisdiction. Opioid NDC filtering for utilization monitoring. State-level choropleth mapping per 1,000 beneficiaries. Cost analysis by therapeutic class. 2019–2024 longitudinal data.

→ View Medicaid notebook

Key Finding 75M+ Records

📊

Public Health Dashboard

CDC WONDER · 3M+ deaths/year · ICD-10 · 25 years

The opioid epidemic didn't arrive overnight — CDC data shows exactly when the curve bent and where the burden concentrated. Mortality trends are the scoreboard for every public health decision made in the last quarter-century. Age-adjusted death rate analysis by cause over time with T40.x overdose filtering.

25-year mortality surveillance with inflection-point detection for COVID-19 and opioid crisis trajectory

CDC WONDER Multiple Cause of Death data across 4 demographic dimensions (age, sex, race/ethnicity, geography). Age-adjusted death rate analysis by cause over time. T40.x overdose death filtering for opioid epidemic trajectory. State-level choropleth mapping of age-adjusted rates. Cluster analysis of high-burden counties. Inflection-point detection for major public health events.

How we got there

CDC WONDER Multiple Cause of Death data (75M+ records: 3M+ deaths/year × 25 years, 1999–2023). ICD-10 coded cause of death across all categories. Age-adjusted death rate analysis by cause. T40.x filtering for opioid epidemic trajectory. State-level choropleth mapping. Cluster analysis of high-burden counties. Inflection-point detection. Full provenance from wonder.cdc.gov.

→ View public health notebook

What I'd bring to your team

Emergency medicine analytics from raw dispatch data to executive-ready insights. Medicaid and payor analytics at scale — from claims data to care navigation recommendations. Large-scale epidemiological data processing with ICD-10 classification, age-adjusted rate calculation, and public health surveillance dashboards. HEDIS, MMIS/T-MSIS, and CMS quality metric frameworks.

Pillar 2 — AI Architecture

Agentic systems, multi-agent orchestration, and AI infrastructure I've designed and deployed — not theorized about.

Zeus-URSA CEO Agent — Autonomous Executive Intelligence

Gemini AI Studio · Agentic Architecture · MVP

LIVE MCP · Agents · Memory

What this is

An autonomous CEO-grade agent built in Gemini AI Studio that performs market research, competitive analysis, content strategy, and operational reporting without human prompting. Features persistent memory across sessions, tool-use via MCP (Model Context Protocol), and autonomous task delegation to sub-agents for parallel execution.

Why it matters

Most "AI agents" are just chatbots with extra steps. Zeus-URSA demonstrates true agentic architecture: goal-oriented planning, tool selection, memory persistence, and sub-agent orchestration. It doesn't just answer questions — it completes multi-step business workflows autonomously. This is the difference between AI assistance and AI labor.

Agent Roles

MCP Tools

∞

Session Memory

AI Providers

What I'd bring to your team

I can architect agentic systems for any executive or operations function — not just demos, but production-grade systems with memory, tool use, and error recovery. Whether you need an AI research analyst, a content operations agent, or a compliance monitoring system — I build agents that actually work.

EVO3 Agent Swarm — Multi-Agent Operations Platform

6 specialized agents · Role-based delegation · Parallel execution

LIVE Swarm · Roles · Automation

What this is

A multi-agent operations platform with six specialized agents: AI Architect (technical reviews), Librarian (workspace organization), Template Guru (document generation), CEO-Agent (strategic oversight), Content Agent (social media), and Marketing Agent (campaign management). Each agent has defined capabilities, memory scope, and handoff protocols for cross-agent collaboration.

Why it matters

Single-agent systems hit capability walls. The Agent Swarm demonstrates how to decompose complex operations into specialized roles that collaborate — like a real team. The AI Architect agent performs end-to-end technical reviews. The Librarian agent cleans workspace clutter. The CEO-Agent monitors all projects. This is how AI scales from assistant to workforce.

Specialized Agents

Connected Services

24/7

Autonomous Operation

Manual Handoffs

What I'd bring to your team

I can design multi-agent systems for any operational domain — content operations, technical review, data governance, or customer support. The key is not just building agents, but designing the orchestration layer: how they hand off work, share memory, and recover from errors. That's the architecture layer most teams miss.

openclaw AI Infrastructure — Gateway, Nodes & Channels

Multi-channel · Persistent memory · Cron scheduling · 4 platforms

LIVE Gateway · Nodes · MCP

What this is

A full-stack personal AI infrastructure built on openclaw: gateway daemon for message routing, node pairing for companion apps (Android/iOS/macOS), multi-channel integration (Discord, Telegram, Feishu, Kimi), MCP bridge for tool extensibility, persistent memory across sessions, and cron scheduling for autonomous task execution.

Why it matters

Most AI setups are siloed — ChatGPT here, Claude there, nothing connected. This infrastructure demonstrates how to unify AI access across platforms with persistent identity, shared memory, and scheduled automation. The gateway handles 4+ messaging platforms simultaneously. The memory system retains context across days. The cron system executes tasks without human initiation.

Messaging Platforms

MCP Tools

∞

Memory Persistence

Node Platforms

What I'd bring to your team

I can deploy AI infrastructure for teams — not just individual chatbot access, but unified gateways with role-based permissions, shared knowledge bases, and automated workflows. Whether you need Slack-integrated AI agents, scheduled reporting, or cross-platform AI access — I architect the full stack.

AI Education — 4 Specialized Courses Completed

Machine Learning · GenAI Engineering · Agentic Systems · Data Governance

CERTIFIED 4 Courses · 50+ Hours

What this is

Four specialized AI courses covering the full stack: Applied Machine Learning (predictive maintenance, NLP, forecasting), Generative AI Engineering (research NLP, legal text mining, biomedical analysis), Data Governance (federal catalog assessment, FOIA compliance, policy tracking), and Agentic Systems (multi-agent orchestration, MCP protocols, autonomous workflows).

Why it matters

Theory without practice is empty. Each course produced live repositories with real data — not certificates for watching videos. The ML course generated 28 charts from NASA and UCI data. The GenAI course processed 450 arXiv papers and 15 SCOTUS opinions. The Governance course analyzed 144K federal datasets. The Agentic course built deployable multi-agent systems.

Specialized Courses

50+

Hours of Study

Live Repositories

50+

Production Charts

What I'd bring to your team

I don't just know the concepts — I've built with them. Every course produced deployable artifacts, not just notes. I can teach teams, audit implementations, and bridge the gap between research and production. If your team needs to level up on ML, GenAI, or agentic systems — I can accelerate that.

Pillar 3 — Analytics Viz

Interactive dashboards and visual portfolios that turn raw data into decisions. I don't just analyze — I make it clickable, explorable, and actionable.

🎯 Interactive Dashboards LIVE

Real data. Real interactivity. Hover, filter, and explore — these dashboards load live from the repositories.

WMATA Ridership Explorer

743K+ real records · 98 stations · 547K flights · 196K fatalities

Insight: WMATA ridership analysis uses real DC GIS MapServer data with 98 stations. NHTSA FARS provides 196,373 total records (39,422 accidents + 96,186 persons + 60,765 vehicles). BTS On-Time Performance covers 547,271 flights for January 2024. All data from live public APIs with automated fetch scripts.

Census Policy Correlation Explorer

20 states · Income vs Education · Poverty overlay

Insight: Strong positive correlation (r=0.72) between median income and bachelor's degree attainment. Massachusetts leads both metrics ($90,840 income, 44.5% education). Maryland achieves highest income ($91,510) with lower poverty (9.2%) — a model for policy transfer.

📊 Visual Portfolio — 50+ Charts Across 5 Repositories

A curated gallery of production visualizations from 9 live repositories. Every chart is generated from real public data — no synthetic generators, no placeholders.

Applied ML 28 charts

NASA C-MAPSSNASA — Sensor Degradation Curves

20 Newsgroupssklearn — Confusion Matrix (~68% accuracy)

📊

Analysis Notebook

17K+ hourly records · ARIMA · XGBoost · Seasonal naive

Execute on GitHub →

GenAI Engineering 12 charts

arXiv APIarXiv — 450 Papers by Category

SCOTUSCourtListener — Opinion Length Trend (1954–2015)

PubMedNCBI — Biomarker Volcano Plot

Interactive: arXiv Paper Distribution

450 papers · Live data

Hover for counts. Data from arXiv API export (cs.LG, cs.AI, cs.CL, cs.CV, stat.ML).

Mobility Data 9 charts

WMATADC GIS — Top Stations by Ridership

NHTSA FARSNHTSA — Fatalities by State (Top 15)

BTSUSDOT — Average Delay by Airline

Interactive: NHTSA Fatalities by State (Top 10)

196K total records · 2023 data

Hover for exact counts. Data from NHTSA FARS API (Fatality Analysis Reporting System).

Data Governance 11 charts

Data.govCKAN API — ~500 Datasets by Agency

FOIA.govFOIA Tracker — Processing Time Distribution

OMBOMB API — 170 Guidance Docs by Category

Interactive: Data.gov Catalog by Agency

~500 datasets · CKAN API

Hover for dataset counts. Data from catalog.data.gov/api/3/.

Public Sector 6 charts

Census ACSCensus API — Income vs Education

BLS — Unemployment vs Job Openings

World BankWDI API — GDP vs Life Expectancy

PMO Analytics 3 dashboards

📊

Capital Portfolio

USASpendingGov API — $77.7B Portfolio EVM

📊

Risk Intelligence

RandomForest98% Acc — 1,000 Contracts

📊

Decision Support

DC + Census + BLS3 APIs — Auto-Briefings

People Analytics 3 projects

📊

Attrition Model

IBM HR1,470 recs — 87% Accuracy

📊

Sentiment NLP

BERT + BERTopicGlassdoor — Theme Extraction

📊

DEI Dashboard

EEO-1 + CensusCompliance — Pay Equity

Business Intelligence 49+ charts

📊

Netflix Strategy

Kaggle Netflix8,807 titles — SQL Cohort

📊

Amazon Intel

Stanford SNAP67K reviews — Sentiment Proxy

📊

Google Trends

pytrends + BigQuery1,923 recs — Peak Detection

Healthcare Analytics 100M+ records

📊

911 Triage

NYC EMSSODA API — 2M+ Calls/Year

📊

Medicaid

CMS600K recs — Generic Penetration

📊

Public Health

CDC WONDER75M+ recs — 25-Year Surveillance

📄 Executive Summaries

One-page PDFs for each portfolio category. Recruiter-friendly format with business problem, methodology, key result, and live code links.

🤖

Applied ML

NASA · NLP · Forecasting

🧠

GenAI Engineering

arXiv · SCOTUS · PubMed

🏛️

Data Governance

Data.gov · FOIA · OMB

Census · BLS · World Bank

🏛️

PMO Analytics

USASpending · FPDS · GAO

👥

People Analytics

IBM HR · Glassdoor · EEO-1

💼

Business Intelligence

Netflix · Amazon · Google Trends

🏥

Healthcare Analytics

NYC EMS · CMS · CDC WONDER

Let's Build Something

Available for data science, ML engineering, and AI architecture roles. Whether you need predictive models, federal data analysis, or AI automation — let's talk.

sierra.napier430@gmail.com LinkedIn GitHub

Contact Sierra