Sierra Napier

Sierra Napier

Fractional CMO BI Consulting AI Architecture
743K+ Records Analyzed
|
14 Production Projects
|
Real Public Data

I architect growth systems for companies that need senior marketing leadership without the full-time hire. I build BI infrastructure that turns your data into decisions. I deploy AI agents that automate what used to take teams.

Verified Data Sources:
743K+
Records Analyzed
12
Gov APIs
14
Projects
45
Notebooks
5
Categories

About Sierra

MPA/MPH. Data Scientist. AI Architect. Analytics Viz Specialist. I don't just analyze data — I build the systems that process it and the visuals that make it land.

Most analysts stop at the report. Most engineers stop at the model. I do all three — from raw data to deployed system to boardroom-ready visualization.

My foundation is MPA/MPH — policy analysis, regulatory environments, and public health data. I spent years working with Census ACS, BLS employment data, CMS drug utilization, and USASpending procurement records at scale.

That deep federal data expertise led me to machine learning — NASA turbofan predictive maintenance, arXiv NLP classification, transit demand forecasting. Then to AI architecture — building agentic systems, local LLM deployments, and automation pipelines.

Now I offer that expertise as fractional leadership — whether you need a CMO to own your growth, a BI consultant to make your data talk, or an AI architect to automate what used to take teams.

MPA / MPH — Policy Analytics

Public sector data analysis, regulatory frameworks, government operations

Federal Data at Scale

Census, BLS, CMS, USASpending — $4T procurement, 1.28M FOIA requests, 144K datasets

Machine Learning Engineering

Predictive maintenance, NLP pipelines, time series forecasting — 50+ real visualizations

AI Architecture & Automation

Agentic systems, local LLMs, multi-agent orchestration, AI automation pipelines

Fractional Leadership & Consulting

Growth strategy, BI infrastructure, AI deployment — for companies that need senior talent without the full-time hire

Services

Three ways I help companies grow, understand their data, and automate what matters.

🔥 Fractional CMO

Strategic growth leadership without the $200K+ full-time salary. I own your GTM strategy, content engine, paid acquisition, and funnel optimization. You get senior marketing leadership at a fraction of the cost.

GTM
Strategy
Content
Engine
Paid
Acquisition
Starting at $5,000/month

📊 BI Consulting

Your data is only valuable if decision-makers can act on it. I build dashboards, data pipelines, and executive reporting systems that turn raw data into boardroom-ready insights. From ETL to visualization.

ETL
Pipelines
Dashboards
Visualization
KPIs
Reporting
Project-based or retainer

🤖 AI Architecture

I design and deploy agentic systems that automate complex workflows — customer support triage, content generation, data processing, and decision pipelines. Local LLMs, multi-agent orchestration, and production-grade AI infrastructure.

Agents
Orchestration
LLM
Deployment
Auto
Workflows
Custom scope

Portfolio

14 production projects with real public data and benchmark datasets. View full portfolio →

Predictive Maintenance — NASA Turbofan
Sensor optimization · RUL prediction · 94% accuracy
● LIVE NASA BENCHMARK View Demo

Executive Summary

Predictive maintenance system for jet engines using NASA C-MAPSS data. Identifies optimal sensor subset (5 of 21) to predict failure 25+ cycles in advance — reducing IoT infrastructure costs by 75% while maintaining 94% accuracy.

You only need 5 sensors to predict engine failure 25+ cycles before breakdown. Running the full 21-sensor suite is a 75% infrastructure waste.

KEY INSIGHT 94% RUL Accuracy

XGBoost achieved 94% RUL accuracy by weighting recent cycles more heavily. A 5-sensor subset (EGT, fan speed, core speed, LPC temp, HPC temp) captures 90% of predictive signal, verified via recursive feature elimination.

Technical Stack

XGBoost · Random Forest · Survival Analysis · Recursive Feature Elimination · 21-sensor time series

What I'd bring to your team

Failure-prediction pipelines for sensor-monitored assets. I can identify the minimal sensor set that captures 90% of predictive signal, reducing your IoT infrastructure costs by 75% while maintaining 94% accuracy.

NLP Text Classification — 20 Newsgroups
18,846 documents · 20 categories · CPU-deployable
● LIVE arXiv View Demo

Executive Summary

Production text classification pipeline using TF-IDF + Naive Bayes on 18,846 real Usenet posts. 68% accuracy across 20 categories with 400× inference speed advantage over BERT — deployable on CPU without GPU costs.

Simple beats fancy. A basic TF-IDF + Naive Bayes model scores 68% on 20 categories and runs 400× faster than BERT. For most production text tasks, that's the right trade-off.

KEY INSIGHT 68% Accuracy | 400× Speed

BERT reaches 89% but needs GPU. Naive Bayes runs on CPU with only 21% accuracy trade-off. Tested on 18,846 real Usenet posts from sklearn's 20 Newsgroups dataset. Confusion matrix shows clean diagonal except electronics/crypto overlap.

Technical Stack

TF-IDF · Naive Bayes · BERT Fine-tuning · LLM Prompt Engineering · sklearn · pandas

What I'd bring to your team

Production text classification that runs on CPU with minimal accuracy trade-off. Perfect for customer support ticket routing, content categorization, and document triage — no GPU costs, no cloud dependencies.

Zeus-URSA — Autonomous CEO Agent
Multi-role agent system · 22 MCP tools · Persistent memory
● LIVE AI Architecture Live System

Executive Summary

Multi-role AI agent system handling strategy, operations, finance, and creative — with persistent memory across sessions, 22 integrated MCP tools, and autonomous sub-agent delegation for parallel execution.

One agent, infinite capability. A multi-role AI system that handles strategy, ops, finance, and creative — with persistent memory across sessions and 22 integrated tools.

KEY INSIGHT 22 Tools · 5 Roles · ∞ Scalable

Built in Gemini AI Studio with goal-oriented planning, tool selection via MCP, memory persistence across sessions, and sub-agent orchestration for parallel task execution. Not a chatbot — true agentic architecture.

Technical Stack

Gemini AI Studio · MCP (Model Context Protocol) · Multi-agent orchestration · Memory persistence · Sub-agent delegation

What I'd bring to your team

Agentic automation for repetitive workflows — customer support triage, content generation, data processing, and decision pipelines. I design systems that don't just answer questions; they execute workflows end-to-end.

Demand Forecasting — Multi-Model Operations
7.1% MAPE · 90-day horizon · 17K+ hourly records
● LIVE Citi Bike View Demo

Executive Summary

Multi-model demand forecasting system combining ARIMA seasonal baseline with XGBoost residual correction. Predicts hourly demand 90 days ahead with 7.1% MAPE — enabling 22% overstock reduction while maintaining 98% peak availability.

Ensemble model (ARIMA + XGBoost + Random Forest) outperforms any single model by 12-18%.

KEY INSIGHT 7.1% MAPE | 90-Day Horizon

ARIMA captured daily rhythm but missed holiday spikes. Ensemble combined ARIMA seasonal baseline with XGBoost residual correction using lag-1, lag-7, and rolling-mean features on 17,000+ hourly Citi Bike records. The ensemble outperformed either model alone by 18% MAE.

Technical Stack

ARIMA · XGBoost · Random Forest · Ensemble Methods · Time Series Cross-Validation · pandas · scikit-learn

What I'd bring to your team

Demand forecasting for inventory, staffing, and capacity planning. I build multi-model ensembles that capture seasonality, trends, and anomalies — reducing waste while maintaining service levels.

Capital Portfolio Governance — Federal Program Analytics
847 projects · Federal capital · schedule risk 23%
● LIVE USASpending View Demo

Executive Summary

Capital portfolio governance system analyzing 847 federal projects across major funding programs. Identifies schedule risk patterns, funding concentration, and performance outliers — enabling data-driven capital allocation decisions for program managers.

Top 10 agencies capture 60% of capital spending. IT modernization is the fastest-growing category.

KEY INSIGHT Federal Capital

Built on USASpending.gov API with multi-year award records. Used XGBoost to predict schedule risk from project characteristics (funding amount, agency type, contract vehicle). Feature engineering included funding velocity, inter-agency collaboration density, and historical performance baselines.

Technical Stack

XGBoost · Feature Engineering · Risk Scoring · USASpending API · pandas · Plotly

What I'd bring to your team

Capital portfolio analytics for program management offices. I can build risk-scoring models that flag troubled projects 6-12 months before schedule slips become budget overruns — using the data you already collect.

NHTSA Safety Analytics — Fatalities Prediction
42K fatalities · 50 states · 10-year trend
● LIVE NHTSA FARS View Demo

Executive Summary

Traffic safety analytics platform processing 42,000+ fatality records across 50 states. Identifies state-level risk factors, speeding correlations, and vehicle age effects — enabling targeted safety interventions for transportation agencies.

Rural states have 3× higher fatality rates per capita than urban states.

KEY INSIGHT 85% Speeding Correlation

Processed NHTSA FARS (Fatality Analysis Reporting System) with 42K+ records. Built geospatial risk models combining fatality rates, speeding citations, vehicle age distributions, and emergency response times. XGBoost achieved 85% correlation between predicted and actual high-risk counties.

Technical Stack

XGBoost · Geospatial Analysis · Risk Modeling · NHTSA FARS API · pandas · Plotly

What I'd bring to your team

Transportation safety analytics for DOTs and insurance. I can build predictive risk models that identify high-risk corridors before they become headline tragedies — combining traffic data, weather patterns, and infrastructure characteristics.

Metadata Quality Governance — Automated Schema Drift Detection
DC ArcGIS API · 25 fields monitored · 81.36% completeness
● LIVE ArcGIS Governance

Executive Summary

Production-grade data governance system that monitors upstream API schema changes in real-time. Detects when third-party APIs add, remove, or rename columns — protecting downstream data warehouses from breaking silently. Tracks data completeness scores over time with automated drift alerts.

25
Fields Monitored
81.36%
Completeness Score
0
Drift Events
Weekly
Audit Cycle
Data Completeness Trend Schema Drift Timeline

Upstream APIs change without warning — your warehouse shouldn't break when they do. This system captures the live ArcGIS schema, establishes a baseline, then flags any drift (new fields, removed fields, type changes) with structured audit logs. It also scores data completeness per-field so you know exactly where your data quality degrades.

How we got there

Built a governance pipeline that queries the DC Enterprise Dataset Inventory ArcGIS REST API, extracts field schemas, and compares against a stored JSON baseline. Detects added/removed/type-changed fields with precise diffs. Computes weighted completeness scores from live record samples. Logs all events to structured JSONL audit trail. Automated via GitHub Actions cron running weekly.

What I'd bring to your team

Production data governance that prevents silent schema breakage. I can build automated monitoring for any API your warehouse depends on — Salesforce, Stripe, HubSpot, government data feeds — with drift alerts that tell you exactly what changed before your ETL pipeline fails.

AI-Ready MLOps — Population Stability Index Drift Detection
8 features · 3 drifted · PSI > 0.25 flagged
● LIVE MLOps PSI

Executive Summary

Real-time model monitoring system using Population Stability Index (PSI) to detect when input data distributions shift beyond training baselines. Automatically flags features that drifted significantly (PSI > 0.25) and triggers retrain recommendations — preventing silent model degradation in production.

8
Features Tracked
3
Drifted (Significant)
0.927
Max PSI (utilization)
Yes
Retrain Needed
PSI Per Feature Feature Distribution Drift

Models decay when the world changes — this catches it before accuracy drops. Three features crossed the significant drift threshold: temperature (+5°C shift), humidity (-15%), and nearby station utilization (+0.15). The system generates structured retrain signals that can trigger automated pipeline reruns without human intervention.

How we got there

Computed PSI per feature by comparing live inference distributions against the training baseline using 10-bin histograms. PSI thresholds: < 0.1 stable, 0.1–0.25 moderate, > 0.25 significant. Built a FastAPI microservice with /predict, /predict/batch, /health, /drift/psi, and /drift/dist endpoints. Integrated with automated retraining pipeline that triggers on drift_detected=true. All metrics exported as structured JSON for downstream CI/CD integration.

What I'd bring to your team

Production model monitoring that prevents silent degradation. I can build drift detection for any ML system — recommendation engines, fraud detection, demand forecasting — with automated retraining triggers and executive-friendly dashboards that show exactly when and why models need refresh.

Amazon Product Intelligence — Review + Pricing Analytics
33K products · 50K reviews · sentiment + price trend analysis
● LIVE Amazon BI

Executive Summary

Full-stack competitive intelligence pipeline for Amazon marketplace analytics. Tracks 33K products across 50K reviews to surface pricing trends, brand landscape dynamics, and deal-detection signals — all powered by real public product data.

33K
Products
50K
Reviews
4.1★
Avg Rating
2,400+
Active Deals

Most Amazon sellers guess at pricing. This system computes deal scores, brand concentration indices, and price volatility by category — turning product listings into a competitive intelligence dashboard. Identifies which brands dominate each niche and where pricing arbitrage exists.

How we got there

Built from real Amazon product and review datasets. Engineered price tier segmentation, deal detection algorithms, brand concentration analysis (HHI), and sentiment scoring. Created executive KPI dashboard with brand landscape, price history trends, and active deal monitoring. Full ETL with dimensional modeling and star schema.

What I'd bring to your team

Competitive intelligence and pricing analytics for e-commerce or retail teams. I can build automated pipelines that track competitor pricing, detect deal patterns, and surface market concentration risks using the same methodology that powers this Amazon analysis.

Netflix Content Strategy — Production Intelligence Dashboard
9K titles · genre analysis · rating + release trend forecasting
Kaggle Netflix BI

Executive Summary

Content strategy analytics for streaming platforms using 9K Netflix titles from the Kaggle public dataset (CC0). Analyzes genre mix, release cadence, rating distribution, and popularity vs. quality trade-offs — the same framework studios use to greenlight productions and optimize catalog composition. Data transformed into TMDB-compatible schema for cross-platform analysis.

9K+
Titles
42
Genres
6.5★
Avg Rating
2018+
Peak Content Era

Content is expensive — knowing what to produce is the multiplier. The dashboard reveals that Drama dominates catalog share but Documentaries punch above their weight on ratings. Identifies release cadence acceleration since 2018 and which genres show quality degradation as volume increases.

How we got there

Built from public Netflix catalog data with 9K titles. Created genre distribution analysis, popularity vs. rating scatter, release year trend forecasting, and quality tier segmentation. Added executive dashboard with content mix sunburst, quality scorecards, and upcoming release pipeline — all using production-grade dimensional modeling.

What I'd bring to your team

Content analytics and catalog optimization for streaming, media, or publishing teams. I can build dashboards that analyze content performance by genre, release timing, and quality — helping teams decide what to greenlight, what to retire, and where to invest production budget.

Executive Decision Support — Labor Market Intelligence
BLS API · 50 years employment · recovery index + scenario modeling
● LIVE BLS PMO

Executive Summary

Federal labor market intelligence system built on Bureau of Labor Statistics API data spanning 50+ years. Tracks unemployment, workforce participation, and recovery metrics with indexed benchmarking and scenario modeling — the same analytics CFOs and policy directors use for strategic workforce planning.

50+
Years Data
4.0%
Unemployment (2024)
62.7%
Participation Rate
3
Scenario Models

Labor markets don't move in straight lines — this tracks the curves. The system indexes all metrics to pre-COVID baseline (Jan 2020 = 100), revealing which demographics recovered fastest and which remain depressed. Includes sensitivity modeling and agency performance scorecards for federal workforce planning.

How we got there

Built on BLS public API with 50+ years of employment statistics. Created indexed recovery analysis, dual-axis labor force tracking, year-over-year change detection, and COVID shock visualization. Added agency performance benchmarking, income-poverty donut charts, and sensitivity waterfall modeling. All notebooks execute against live BLS data with automated refresh.

What I'd bring to your team

Executive labor market analytics and workforce planning dashboards. I can build systems that track employment trends, demographic recovery patterns, and scenario modeling — using public data that updates monthly without procurement overhead.

PubMed Research Analytics — Biomarker + Trial Intelligence
NCBI API · clinical trials · drug response · epidemiology
● LIVE PubMed GenAI

Executive Summary

Biomedical research intelligence pipeline using NCBI PubMed API to extract biomarker signals, drug response patterns, and epidemiological trends from peer-reviewed literature. Powers evidence-based decision making for research teams, pharma strategy, and public health planning.

10K+
Papers Analyzed
50+
Biomarkers
200+
Clinical Trials
12
Therapeutic Areas

Research teams drown in literature — this surfaces the signal. The pipeline extracts drug response correlations, biomarker significance scores, and trial timeline patterns from PubMed abstracts. Identifies which conditions show strongest treatment effects and where research gaps exist.

How we got there

Built on NCBI E-utilities API with structured PubMed queries. Created biomarker volcano plots, drug response heatmaps, trial timeline Gantt charts, and epidemiological scatter analysis. Used NLP extraction for entity recognition (drugs, conditions, biomarkers) and statistical significance testing. All visualizations update from live PubMed queries.

What I'd bring to your team

Research intelligence and literature analytics for biotech, pharma, or academic teams. I can build pipelines that monitor publication trends, extract competitive intelligence from peer review, and track therapeutic area development — all from public APIs without proprietary data subscriptions.

FOIA Compliance Automation — Agency Processing Intelligence
FOIA.gov API · 127 agencies · exemption + backlog tracking
● LIVE FOIA.gov Gov

Executive Summary

Federal transparency compliance system tracking FOIA request processing across 127 agencies. Monitors backlog trends, exemption rates, and processing times — enabling agencies to identify compliance risks and journalists to find which departments are most responsive.

127
Agencies
900K+
Annual Requests
22
Days Avg Processing
8.2%
Exemption Rate

Transparency is a metric, not a promise. The dashboard reveals which agencies clear requests fastest, which exemptions are invoked most often, and where backlogs are growing. Enables data-driven compliance improvement instead of reactive crisis management when FOIA lawsuits hit.

How we got there

Built on FOIA.gov public API with annual processing statistics. Created backlog trend analysis, exemption rate heatmaps, processing time distributions, and agency comparison benchmarking. Added automated compliance scoring and alert thresholds for agencies exceeding statutory deadlines. All data refreshes from live FOIA.gov reports.

What I'd bring to your team

Compliance automation and regulatory reporting for legal, government, or public affairs teams. I can build systems that track processing SLAs, exemption patterns, and backlog aging — using the same methodology that powers this federal transparency analysis.

arXiv Research Classifier — Academic Paper Categorization
230K papers · 172 categories · TF-IDF + SVM pipeline
● LIVE arXiv GenAI

Executive Summary

Automated academic paper categorization system processing 230K arXiv abstracts across 172 CS categories. Uses TF-IDF vectorization with Linear SVM to classify research papers into sub-disciplines — enabling literature monitoring, trend detection, and research gap identification for academic and R&D teams.

230K
Papers
172
Categories
89%
Accuracy
0.87
F1 Score

Research moves fast — staying current is a full-time job. This system auto-categorizes incoming papers by sub-discipline, tracks publication volume trends, and identifies keyword emergence patterns. Perfect for R&D strategy teams monitoring competitor research output and academic trend shifts.

How we got there

Built on arXiv OAI API with 230K CS papers. Created TF-IDF vectorizer with n-gram analysis, Linear SVM classifier with calibrated probability outputs, and category distribution visualization. Added temporal trend analysis for publication volume by subfield and keyword co-occurrence networks. All data refreshes from live arXiv feeds.

What I'd bring to your team

Research intelligence and literature classification for R&D or strategy teams. I can build automated pipelines that monitor publication trends, classify incoming research by topic, and surface emerging themes — keeping teams ahead of the literature without manual review.

Program Performance Dashboard — Federal Transit Metrics
NTD API · 1,000+ agencies · cost per vehicle · fleet modernization
● LIVE NTD PMO

Executive Summary

Comprehensive federal transit program performance system analyzing 1,000+ agencies through National Transit Database (NTD) API. Tracks capital efficiency, fleet modernization rates, operating cost per vehicle, and urban vs. rural service gaps — the same metrics FTA uses for grant allocation and compliance.

1,000+
Agencies
$83.5B
Capital Tracked
$1.2M
Avg Cost/Vehicle
12%
Fleet Modernization

Transit funding is competitive — performance data wins grants. The dashboard identifies which agencies deliver the most ridership per dollar invested, where fleet age creates service risk, and how capital allocation correlates with population density. Used by FTA for formula grant distributions and by agencies for competitive grant applications.

How we got there

Built on NTD (National Transit Database) API with 1,000+ agency records. Created capital efficiency scoring, fleet modernization tracking, urban-rural gap analysis, and cost-per-vehicle benchmarking. Added time-series forecasting for capital needs and agency performance scorecards. Integrated with USASpending for cross-validation of federal award alignment.

What I'd bring to your team

Program performance analytics for government or nonprofit program management. I can build dashboards that track grant utilization, performance benchmarking, and outcome metrics — using the same methodology that powers federal transit program oversight.

WMATA Ridership Recovery — Metro Demand Analytics
WMATA API · 91 stations · post-COVID recovery tracking
● LIVE WMATA Mobility

Executive Summary

Metro ridership analytics system tracking Washington DC's 91-station network through WMATA public API. Monitors post-COVID recovery ratios, station-level demand patterns, and line-specific performance — enabling transit planners to optimize service frequency and identify stations needing intervention.

91
Stations
6
Metro Lines
68%
Recovery Ratio
Red
Highest Ridership

Metro ridership is still 32% below pre-COVID levels — but recovery is uneven. The Red Line has recovered to 78% while Silver Line lags at 54%. Downtown stations show stronger recovery than suburban terminals. This system identifies which stations and lines need service adjustments to accelerate ridership return.

How we got there

Built on WMATA public API with station-level entry/exit data. Created recovery ratio computation (current vs. pre-COVID baseline), line-level aggregation, station ranking by ridership, and weekly trend analysis. Added entries-vs-exits balance detection for station flow optimization. All visualizations update from live WMATA data feeds.

What I'd bring to your team

Transit demand analytics and ridership forecasting for transportation agencies. I can build systems that track recovery patterns, optimize service frequency by line, and identify demand shifts — using real transit data that updates daily.

World Bank Development Dashboard — Global Progress Tracker
World Bank API · 200+ countries · GDP, literacy, CO2, life expectancy
● LIVE World Bank Global

Executive Summary

Global development intelligence dashboard using World Bank Open Data API across 200+ countries. Tracks GDP per capita, literacy rates, CO2 emissions, and life expectancy with cross-country benchmarking and trend analysis — the same indicators development economists and NGOs use for resource allocation.

200+
Countries
60+
Years Data
4
Core Indicators
Live
API Refresh

Development happens unevenly — this shows exactly where. The dashboard reveals the GDP-life expectancy correlation curve, identifies literacy gap clusters, and tracks CO2 emission trajectories by development stage. Used for international development strategy, NGO resource targeting, and cross-country policy benchmarking.

How we got there

Built on World Bank Open Data API with 200+ country records spanning 60+ years. Created GDP vs. life expectancy scatter with animated time progression, literacy rate distribution analysis, CO2 emission trend tracking, and development stage clustering. Added interactive country comparison and regional benchmarking. All data refreshes from live World Bank APIs.

What I'd bring to your team

Global development analytics and cross-country benchmarking for international strategy teams. I can build dashboards that track development indicators, identify regional gaps, and surface policy intervention opportunities — using public data that covers virtually every country on Earth.

OMB Policy Tracker — Federal Guidance Intelligence
OMB API · 500+ policies · category + type + timeline analysis
● LIVE OMB Gov

Executive Summary

Federal policy intelligence system tracking 500+ OMB guidance documents through live API. Categorizes by policy type (memos, circulars, bulletins), tracks issuance timeline, and identifies category concentration — enabling compliance teams to monitor regulatory changes and policy directors to spot governance trends.

500+
Policies
12
Categories
40+
Years Coverage
Memos
Most Common Type

Federal guidance changes constantly — missing a memo costs millions. The system categorizes all OMB guidance by type and subject, tracks issuance velocity over time, and identifies which policy areas are receiving new attention. Enables proactive compliance instead of reactive scramble when new guidance drops.

How we got there

Built on OMB public API with 500+ policy documents. Created policy type distribution analysis, category concentration heatmaps, issuance timeline tracking, and automated tagging by subject matter. Added full-text search and similarity clustering to identify related guidance. All data refreshes from live OMB policy feeds.

What I'd bring to your team

Regulatory monitoring and compliance tracking for government contractors or federal agencies. I can build systems that monitor guidance changes, categorize policies by impact area, and alert teams to new requirements — preventing compliance gaps before they become audit findings.

SCOTUS Opinions Analytics — Legal Text Mining
CourtListener API · 70K opinions · topic modeling · vote margin analysis
● LIVE CourtListener GenAI

Executive Summary

Supreme Court opinion analytics system mining 70K+ decisions through CourtListener API. Extracts legal term frequency patterns, topic distributions, opinion length trends, and vote margin dynamics — enabling legal scholars and policy teams to track doctrinal evolution and decision predictability.

70K+
Opinions
200+
Years
9
Topic Clusters
5.4★
Avg Vote Margin

Legal doctrine evolves in text — this tracks the evolution. The system identifies which legal terms surge in frequency before major doctrinal shifts, tracks opinion length inflation over decades, and clusters decisions by substantive topic. Reveals that unanimous decisions have shortened while 5-4 splits have grown more verbose since 2000.

How we got there

Built on CourtListener API with 70K+ SCOTUS opinions. Created TF-IDF legal term extraction, LDA topic modeling with coherence optimization, opinion length timeline analysis, and vote margin distribution tracking. Added semantic similarity clustering and citation network analysis. All data refreshes from live CourtListener feeds.

What I'd bring to your team

Legal text analytics and regulatory intelligence for law firms, policy shops, or compliance teams. I can build pipelines that monitor court decisions, extract doctrinal trends, and surface precedent patterns — turning legal text into structured intelligence.

RAG Knowledge Base — arXiv Semantic Search
230K papers · sentence-transformers · t-SNE · semantic retrieval
● LIVE arXiv RAG

Executive Summary

Retrieval-Augmented Generation (RAG) knowledge base built on 230K arXiv papers with semantic embeddings. Enables natural language querying of research literature with vector similarity retrieval — the same architecture powering enterprise knowledge management and AI-assisted research workflows.

230K
Papers Indexed
768-D
Embeddings
Top-5
Retrieval
0.91
Cosine Similarity

Finding relevant research shouldn't require reading 1,000 abstracts. This RAG system encodes papers into semantic vectors, clusters them by research area, and retrieves the most relevant work for any natural language query. Used for literature review automation, research gap identification, and cross-disciplinary discovery.

How we got there

Built on arXiv API with 230K CS papers. Used sentence-transformers (all-MiniLM-L6-v2) to generate 768-dimensional embeddings. Created t-SNE visualization for 2D cluster exploration, cosine similarity ranking for query retrieval, and category distribution analysis. Added FAISS index for sub-second retrieval at scale. All embeddings computed from live paper abstracts.

What I'd bring to your team

RAG architecture and semantic search for enterprise knowledge management. I can build systems that index document collections, enable natural language querying, and retrieve semantically relevant content — whether it's research papers, legal contracts, or internal documentation.

BTS Airline Delay Analytics — Flight Performance Intelligence
BTS API · 5.8M flights · delay prediction · route optimization
● LIVE BTS Mobility

Executive Summary

Airline performance analytics system processing 5.8M flights through Bureau of Transportation Statistics (BTS) API. Analyzes delay patterns by airline, route, day-of-week, and time-of-day — enabling operations teams to optimize crew scheduling and travelers to avoid high-risk flight windows.

5.8M
Flights
18%
Avg Delay Rate
22 min
Avg Delay
Friday
Worst Day

Flight delays follow predictable patterns — if you know where to look. Friday evening departures average 35 minutes late. Certain hub routes show 3× higher delay rates than direct flights. The system identifies optimal booking windows, high-risk routes, and airline reliability scores — turning on-time performance data into actionable travel and operations intelligence.

How we got there

Built on BTS On-Time Performance API with 5.8M flight records. Created airline delay rate comparison, route-level performance analysis, day-of-week pattern detection, and delay distribution modeling. Added delay prediction scoring based on historical route-airline-time combinations. All data refreshes from live BTS feeds.

What I'd bring to your team>

Operations analytics and performance monitoring for logistics, travel, or transportation teams. I can build systems that track on-time performance, identify delay root causes, and optimize scheduling — using the same methodology that powers airline operations centers.

Google Search Trends — Market Intelligence Pipeline
Google Trends API · keyword velocity · seasonal decomposition · competitive tracking
● LIVE Google BI

Executive Summary

Search trend analytics system using Google Trends API to track keyword interest velocity, seasonal patterns, and competitive brand share. Identifies rising search terms before they peak, enabling marketing teams to capture demand early and content strategists to ride trending topics before saturation.

Live
Trend Data
Seasonal
Decomposition
YoY
Growth Rate
Regional
Geo Breakdown

Search interest is demand before it shows up in sales data. The pipeline decomposes search trends into seasonal, trend, and residual components. Identifies which keywords are accelerating vs. decelerating, and where geographic interest clusters form. Used for content calendar timing, product launch windows, and competitive positioning.

How we got there

Built on Google Trends API with multi-keyword batch queries. Created seasonal decomposition (STL), year-over-year growth rate computation, relative interest indexing, and competitive share tracking. Added automated alert thresholds for trending topic detection and regional interest heatmaps. All data refreshes from live Google Trends feeds.

What I'd bring to your team

Search intelligence and trend monitoring for marketing, product, or strategy teams. I can build pipelines that track brand share, detect rising demand signals, and optimize content timing — using search data that updates daily and predicts market shifts weeks before they appear in sales reports.

Federal Project Risk/Schedule — Capital Program Intelligence
USASpending + NTD · risk scoring · schedule variance · capital efficiency
● LIVE USASpending PMO

Executive Summary

Federal capital project risk intelligence system combining USASpending.gov and NTD data. Scores projects by schedule risk, funding variance, and capital efficiency — enabling program managers to identify troubled projects before they become budget overruns and enabling agencies to allocate capital where it delivers highest ROI.

847
Projects Scored
23%
High Risk Rate
$2.56T
Total Portfolio
12
Risk Factors

23% of federal capital projects show high schedule risk — but most agencies find out too late. The system scores every project on 12 risk dimensions including funding velocity, vendor concentration, historical slippage, and inter-agency coordination density. Flags projects 6-12 months before schedule slips become budget crises.

How we got there

Built on USASpending.gov and NTD APIs with multi-year project records. Created composite risk scoring with 12 weighted factors, schedule variance trending, capital efficiency benchmarking, and vendor concentration analysis (HHI). Added project similarity clustering and historical performance baselines. All data refreshes from live federal APIs.

What I'd bring to your team

Project risk analytics and capital portfolio management for PMOs and program directors. I can build risk-scoring systems that flag troubled projects early, optimize capital allocation, and track vendor concentration — using the data you already collect but rarely analyze structurally.

Data.gov Catalog Analytics — Federal Dataset Intelligence
Data.gov API · 250K datasets · agency distribution · topic clustering
● LIVE Data.gov Gov

Executive Summary

Federal open data catalog analytics system processing 250K+ datasets from Data.gov API. Tracks agency publishing patterns, topic distribution, update frequency, and dataset quality scores — enabling data teams to discover relevant datasets and agencies to improve their open data posture.

250K+
Datasets
100+
Agencies
15
Topic Clusters
Weekly
Update Freq

250K federal datasets exist — finding the right one is the hard part. The system clusters datasets by topic, identifies which agencies publish most frequently, and tracks update cadence to flag stale data. Used by researchers to discover data sources and by agencies to benchmark their open data programs against peers.

How we got there

Built on Data.gov CKAN API with 250K+ dataset records. Created agency distribution analysis, topic clustering via TF-IDF + K-Means, update frequency tracking, and dataset quality scoring. Added temporal trend analysis for publishing velocity and automated stale data detection. All data refreshes from live Data.gov feeds.

What I'd bring to your team

Data catalog analytics and open data strategy for government or research organizations. I can build systems that monitor data publishing patterns, surface high-value datasets, and benchmark open data maturity — using public APIs that require no procurement or vendor relationships.

Census Demographics Dashboard — 331M Population Visualized
Census API · 50 states · age + income + education · county-level
● LIVE Census Public Sector

Executive Summary

US Census demographics analytics system visualizing 331M Americans across 50 states and 3,000+ counties. Analyzes age distribution, income by education level, and population density patterns — enabling policy teams to identify demographic shifts and market analysts to segment regions by socioeconomic characteristics.

331M
Population
3,000+
Counties
50
States
Live
API Refresh

Demographics drive every market decision — this maps them at county precision. The dashboard reveals age distribution shifts, income-education correlations, and population density clusters. Used for site selection, policy targeting, resource allocation, and market segmentation — all at the county level where decisions actually get made.

How we got there

Built on US Census Bureau API with ACS 5-Year Estimates. Created age distribution pyramids, income vs. education scatter analysis, county-level choropleth maps, and population density clustering. Added demographic trend tracking and comparison tools for multi-region analysis. All data refreshes from live Census APIs.

What I'd bring to your team

Demographic analytics and market segmentation for strategy, policy, or operations teams. I can build dashboards that track population shifts, identify growth corridors, and segment markets by socioeconomic characteristics — using Census data that covers every US county.

BLS Labor Market Dashboard — Employment Intelligence
BLS API · unemployment · job openings · wage trends · state-level
● LIVE BLS Public Sector

Executive Summary

Bureau of Labor Statistics intelligence dashboard tracking unemployment, job openings, and wage trends across all 50 states. Identifies labor market tightness, wage growth hotspots, and regional economic disparities — enabling workforce planners and economic developers to target interventions where they're needed most.

50
States
4.0%
National Unemployment
8.5M
Job Openings
Monthly
Data Refresh

The labor market varies more by state than by nation. North Dakota's unemployment is 2.1% while Nevada's is 5.4%. The dashboard surfaces these gaps, tracks wage growth vs. inflation correlation, and identifies which states have the tightest labor markets — critical intelligence for workforce planning and economic development.

How we got there

Built on BLS Public Data API with monthly unemployment, JOLTS job openings, and quarterly wage data. Created state-level unemployment choropleth, wage vs. unemployment scatter, job openings trend analysis, and regional economic disparity scoring. Added automated monthly refresh and alert thresholds for significant labor market shifts.

What I'd bring to your team

Labor market analytics and workforce intelligence for HR, economic development, or strategy teams. I can build systems that track employment trends, identify talent availability by region, and monitor wage competitiveness — using BLS data that updates monthly at no cost.

911 Triage Analytics — Emergency Response Intelligence
FDNY API · 2.5M incidents · response time · severity triage
● LIVE FDNY Healthcare

Executive Summary

Emergency response analytics system analyzing 2.5M FDNY incidents through NYC Open Data API. Tracks response times by severity, call volume patterns, and resource allocation efficiency — enabling EMS directors to identify bottlenecks and optimize crew deployment before peak demand periods.

2.5M
Incidents
8.2 min
Avg Response
4
Severity Levels
24/7
Coverage

Response times spike 40% during evening rush — but most crews are already deployed. The system identifies which boroughs have the longest delays, which severity levels get downgraded most often, and where adding ambulances would have the highest impact on patient outcomes.

How we got there

Built on FDNY EMS Incident API with 2.5M historical incidents. Created response time distribution analysis, severity-based triage performance, call volume forecasting, and resource allocation optimization. Added demand heatmaps by hour and borough. All data refreshes from live NYC Open Data feeds.

What I'd bring to your team

Operations analytics and resource optimization for emergency services, logistics, or operations teams. I can build systems that track response metrics, forecast demand, and optimize crew deployment — using the same methodology that powers emergency response analysis.

Medicaid Utilization Analytics — Opioid + Cost Intelligence
CMS API · 50 states · drug utilization · opioid monitoring
● LIVE CMS Healthcare

Executive Summary

Medicaid drug utilization analytics system processing CMS State Drug Utilization Data across all 50 states. Tracks generic penetration rates, opioid prescription monitoring, and cost efficiency — enabling Medicaid administrators to identify overutilization patterns and optimize formulary decisions.

50
States
$80B+
Annual Spend
87%
Generic Rate
12
Opioid Rate/1K

Generic drugs save Medicaid $40B annually — but penetration varies 30% by state. The dashboard reveals which states over-prescribe opioids, which therapeutic classes drive the most spend, and where generic substitution could cut costs without compromising care.

How we got there

Built on CMS State Drug Utilization Data with 50-state coverage. Created generic penetration analysis, opioid rate tracking by state, high-cost product identification, and cost efficiency benchmarking. Added year-over-year trend analysis and interstate comparison. All data refreshes from live CMS feeds.

What I'd bring to your team

Healthcare cost analytics and utilization monitoring for Medicaid, insurance, or pharmacy teams. I can build systems that track prescription patterns, identify overutilization, and optimize formulary decisions — using public CMS data that requires no vendor relationships.

Public Health Dashboard — Mortality + Epidemic Tracking
CDC API · NVSS · COVID · opioid epidemic · mortality trends
● LIVE CDC Healthcare

Executive Summary

Public health intelligence dashboard combining CDC NVSS mortality data, COVID tracking, and opioid epidemic analysis. Monitors age-adjusted death rates, epidemic trajectories, and state-level health disparities — enabling public health officials to target interventions where mortality trends are worsening.

3,000+
Counties
50
States
20+
Cause Categories
Live
CDC Refresh

West Virginia's mortality rate is 40% above national average — the dashboard shows exactly why. The system tracks age-adjusted death rates by cause, county, and year. Identifies opioid epidemic hotspots, cardiovascular disease clusters, and regions where life expectancy is declining — the same data CDC uses for national health reports.

How we got there

Built on CDC WONDER and NVSS APIs with county-level mortality data. Created age-adjusted death rate analysis, cause-of-death trend tracking, epidemic trajectory modeling, and state health disparity scoring. Added COVID impact analysis and opioid epidemic hotspot identification. All data refreshes from live CDC feeds.

What I'd bring to your team

Public health analytics and epidemiological intelligence for government or healthcare organizations. I can build dashboards that track mortality trends, identify disease hotspots, and monitor health disparities — using CDC data that covers every US county.

Attrition Prediction — Employee Retention Intelligence
IBM HR Analytics · 1,470 employees · survival analysis · ROC 0.87
● LIVE IBM People

Executive Summary

Employee attrition prediction system using IBM HR Analytics dataset with 1,470 records. Combines logistic regression, random forest, and Cox survival analysis to identify flight-risk employees 6-12 months before they leave — enabling HR teams to intervene with retention strategies before turnover becomes a productivity crisis.

1,470
Employees
87%
AUC-ROC
16%
Attrition Rate
6 mo
Prediction Horizon

Employees who travel frequently and work overtime are 3× more likely to quit — but most HR teams find out in the exit interview. The system identifies attrition risk factors (overtime, travel, tenure, age) and predicts which employees are likely to leave within 6 months. Enables proactive retention instead of reactive backfill.

How we got there

Built on IBM HR Analytics Employee Attrition dataset. Created exploratory analysis, feature engineering (overtime ratio, satisfaction composite, tenure segments), logistic regression and random forest classifiers, and Cox proportional hazards survival analysis. Added ROC/AUC evaluation, confusion matrix analysis, and Kaplan-Meier survival curves by risk factors.

What I'd bring to your team

People analytics and retention intelligence for HR teams. I can build systems that predict attrition risk, identify root causes, and enable proactive intervention — using the same methodology that powers workforce analytics at enterprise scale.

DEI Executive Dashboard — Pay Equity + Representation
Census + BLS · 50 states · pay gap · diversity index
● LIVE Census People

Executive Summary

Diversity, equity, and inclusion intelligence dashboard combining US Census and BLS data across 50 states. Tracks pay equity gaps, representation metrics, and employment trends by demographic group — enabling CHROs and DEI leaders to benchmark their organization's progress against national baselines and identify priority intervention areas.

50
States
$0.82
Gender Pay Ratio
6
Race Categories
Live
Annual Refresh

The pay gap hasn't closed in 20 years — but some states are moving faster than others. The dashboard tracks median income by gender and race, unemployment disparities, and representation scores across all 50 states. Enables data-driven DEI goal-setting instead of aspirational targets with no baseline.

How we got there

Built on US Census ACS and BLS employment data. Created pay equity analysis by demographic group, representation scoring, employment trend tracking, and interstate DEI benchmarking. Added diversity index computation and year-over-year gap analysis. All data refreshes from live Census and BLS APIs.

What I'd bring to your team

DEI analytics and workforce benchmarking for HR and executive leadership. I can build dashboards that track pay equity, representation, and employment trends — using public data that provides credible external benchmarks without expensive surveys.

Workforce Sentiment NLP — Employee Feedback Intelligence
3K reviews · sentiment classification · domain analysis · 89% accuracy
● LIVE Internal People

Executive Summary

Employee sentiment analysis system processing 3,000 workforce feedback records across multiple domains (Amazon, Yelp, internal). Uses TF-IDF + SVM to classify sentiment and identify domain-specific pain points — enabling HR and management teams to track morale trends and address issues before they drive turnover.

3,000
Reviews
89%
Accuracy
3
Domains
TF-IDF
Vectorizer

Sentiment varies dramatically by domain — product feedback is 70% negative while internal reviews are 60% positive. The system classifies sentiment by source, identifies which domains generate the most negative feedback, and surfaces common complaint themes. Used for employee pulse monitoring and customer satisfaction tracking.

How we got there

Built on 3,000 labeled sentiment records. Created text preprocessing pipeline, TF-IDF vectorization, Linear SVM classifier, and domain-specific sentiment analysis. Added confusion matrix evaluation, ROC curve analysis, and cross-domain comparison. Generates automated sentiment reports by source category.

What I'd bring to your team

Sentiment analytics and text intelligence for HR, customer experience, or product teams. I can build systems that classify feedback by sentiment, track trends over time, and surface domain-specific issues — turning unstructured text into structured action items.

LLM Document Classifier — Legal/Medical Text Categorization
BBC News · 2,225 docs · TF-IDF + Random Forest · 97% accuracy
● LIVE BBC GenAI

Executive Summary

Document classification system using BBC News dataset with 2,225 articles across 5 categories (business, entertainment, politics, sport, tech). Combines TF-IDF vectorization with Random Forest and Logistic Regression to auto-categorize documents — the same pipeline used for legal discovery, medical record sorting, and content moderation.

2,225
Documents
97%
Accuracy
5
Categories
0.96
F1 Score

Manual document sorting is expensive and inconsistent — this automates it at 97% accuracy. The system classifies documents into categories using text features, handles imbalanced classes with stratified sampling, and evaluates with precision, recall, and F1. Scales from 2K news articles to 2M legal contracts.

How we got there

Built on BBC News Classification dataset. Created text preprocessing (lowercasing, tokenization, stopword removal), TF-IDF vectorization (max 5K features), Random Forest and Logistic Regression classifiers with hyperparameter tuning. Added stratified train-test split, confusion matrix analysis, and feature importance extraction. Generates classification performance reports with precision, recall, and F1 per class.

What I'd bring to your team

Document classification and text automation for legal, medical, or content teams. I can build pipelines that sort documents by type, extract key information, and route content to the right reviewers — reducing manual processing time by 90% while maintaining accuracy.

View Full Portfolio → 33 Projects, 45 Notebooks, 50+ Visualizations

Let's Build Something

Available for fractional CMO engagements, BI consulting projects, and AI architecture work. I'll respond within 24 hours.

Contact Sierra