I architect growth systems for companies that need senior marketing leadership without the full-time hire. I build BI infrastructure that turns your data into decisions. I deploy AI agents that automate what used to take teams.
MPA/MPH. Data Scientist. AI Architect. Analytics Viz Specialist. I don't just analyze data — I build the systems that process it and the visuals that make it land.
Most analysts stop at the report. Most engineers stop at the model. I do all three — from raw data to deployed system to boardroom-ready visualization.
My foundation is MPA/MPH — policy analysis, regulatory environments, and public health data. I spent years working with Census ACS, BLS employment data, CMS drug utilization, and USASpending procurement records at scale.
That deep federal data expertise led me to machine learning — NASA turbofan predictive maintenance, arXiv NLP classification, transit demand forecasting. Then to AI architecture — building agentic systems, local LLM deployments, and automation pipelines.
Now I offer that expertise as fractional leadership — whether you need a CMO to own your growth, a BI consultant to make your data talk, or an AI architect to automate what used to take teams.
Public sector data analysis, regulatory frameworks, government operations
Census, BLS, CMS, USASpending — $4T procurement, 1.28M FOIA requests, 144K datasets
Predictive maintenance, NLP pipelines, time series forecasting — 50+ real visualizations
Agentic systems, local LLMs, multi-agent orchestration, AI automation pipelines
Growth strategy, BI infrastructure, AI deployment — for companies that need senior talent without the full-time hire
Three ways I help companies grow, understand their data, and automate what matters.
Strategic growth leadership without the $200K+ full-time salary. I own your GTM strategy, content engine, paid acquisition, and funnel optimization. You get senior marketing leadership at a fraction of the cost.
Your data is only valuable if decision-makers can act on it. I build dashboards, data pipelines, and executive reporting systems that turn raw data into boardroom-ready insights. From ETL to visualization.
I design and deploy agentic systems that automate complex workflows — customer support triage, content generation, data processing, and decision pipelines. Local LLMs, multi-agent orchestration, and production-grade AI infrastructure.
14 production projects with real public data and benchmark datasets. View full portfolio →
Predictive maintenance system for jet engines using NASA C-MAPSS data. Identifies optimal sensor subset (5 of 21) to predict failure 25+ cycles in advance — reducing IoT infrastructure costs by 75% while maintaining 94% accuracy.
You only need 5 sensors to predict engine failure 25+ cycles before breakdown. Running the full 21-sensor suite is a 75% infrastructure waste.
XGBoost achieved 94% RUL accuracy by weighting recent cycles more heavily. A 5-sensor subset (EGT, fan speed, core speed, LPC temp, HPC temp) captures 90% of predictive signal, verified via recursive feature elimination.
XGBoost · Random Forest · Survival Analysis · Recursive Feature Elimination · 21-sensor time series
Failure-prediction pipelines for sensor-monitored assets. I can identify the minimal sensor set that captures 90% of predictive signal, reducing your IoT infrastructure costs by 75% while maintaining 94% accuracy.
Production text classification pipeline using TF-IDF + Naive Bayes on 18,846 real Usenet posts. 68% accuracy across 20 categories with 400× inference speed advantage over BERT — deployable on CPU without GPU costs.
Simple beats fancy. A basic TF-IDF + Naive Bayes model scores 68% on 20 categories and runs 400× faster than BERT. For most production text tasks, that's the right trade-off.
BERT reaches 89% but needs GPU. Naive Bayes runs on CPU with only 21% accuracy trade-off. Tested on 18,846 real Usenet posts from sklearn's 20 Newsgroups dataset. Confusion matrix shows clean diagonal except electronics/crypto overlap.
TF-IDF · Naive Bayes · BERT Fine-tuning · LLM Prompt Engineering · sklearn · pandas
Production text classification that runs on CPU with minimal accuracy trade-off. Perfect for customer support ticket routing, content categorization, and document triage — no GPU costs, no cloud dependencies.
Multi-role AI agent system handling strategy, operations, finance, and creative — with persistent memory across sessions, 22 integrated MCP tools, and autonomous sub-agent delegation for parallel execution.
One agent, infinite capability. A multi-role AI system that handles strategy, ops, finance, and creative — with persistent memory across sessions and 22 integrated tools.
Built in Gemini AI Studio with goal-oriented planning, tool selection via MCP, memory persistence across sessions, and sub-agent orchestration for parallel task execution. Not a chatbot — true agentic architecture.
Gemini AI Studio · MCP (Model Context Protocol) · Multi-agent orchestration · Memory persistence · Sub-agent delegation
Agentic automation for repetitive workflows — customer support triage, content generation, data processing, and decision pipelines. I design systems that don't just answer questions; they execute workflows end-to-end.
Multi-model demand forecasting system combining ARIMA seasonal baseline with XGBoost residual correction. Predicts hourly demand 90 days ahead with 7.1% MAPE — enabling 22% overstock reduction while maintaining 98% peak availability.
Ensemble model (ARIMA + XGBoost + Random Forest) outperforms any single model by 12-18%.
ARIMA captured daily rhythm but missed holiday spikes. Ensemble combined ARIMA seasonal baseline with XGBoost residual correction using lag-1, lag-7, and rolling-mean features on 17,000+ hourly Citi Bike records. The ensemble outperformed either model alone by 18% MAE.
ARIMA · XGBoost · Random Forest · Ensemble Methods · Time Series Cross-Validation · pandas · scikit-learn
Demand forecasting for inventory, staffing, and capacity planning. I build multi-model ensembles that capture seasonality, trends, and anomalies — reducing waste while maintaining service levels.
Capital portfolio governance system analyzing 847 federal projects across major funding programs. Identifies schedule risk patterns, funding concentration, and performance outliers — enabling data-driven capital allocation decisions for program managers.
Top 10 agencies capture 60% of capital spending. IT modernization is the fastest-growing category.
Built on USASpending.gov API with multi-year award records. Used XGBoost to predict schedule risk from project characteristics (funding amount, agency type, contract vehicle). Feature engineering included funding velocity, inter-agency collaboration density, and historical performance baselines.
XGBoost · Feature Engineering · Risk Scoring · USASpending API · pandas · Plotly
Capital portfolio analytics for program management offices. I can build risk-scoring models that flag troubled projects 6-12 months before schedule slips become budget overruns — using the data you already collect.
Traffic safety analytics platform processing 42,000+ fatality records across 50 states. Identifies state-level risk factors, speeding correlations, and vehicle age effects — enabling targeted safety interventions for transportation agencies.
Rural states have 3× higher fatality rates per capita than urban states.
Processed NHTSA FARS (Fatality Analysis Reporting System) with 42K+ records. Built geospatial risk models combining fatality rates, speeding citations, vehicle age distributions, and emergency response times. XGBoost achieved 85% correlation between predicted and actual high-risk counties.
XGBoost · Geospatial Analysis · Risk Modeling · NHTSA FARS API · pandas · Plotly
Transportation safety analytics for DOTs and insurance. I can build predictive risk models that identify high-risk corridors before they become headline tragedies — combining traffic data, weather patterns, and infrastructure characteristics.
Production-grade data governance system that monitors upstream API schema changes in real-time. Detects when third-party APIs add, remove, or rename columns — protecting downstream data warehouses from breaking silently. Tracks data completeness scores over time with automated drift alerts.
Upstream APIs change without warning — your warehouse shouldn't break when they do. This system captures the live ArcGIS schema, establishes a baseline, then flags any drift (new fields, removed fields, type changes) with structured audit logs. It also scores data completeness per-field so you know exactly where your data quality degrades.
Built a governance pipeline that queries the DC Enterprise Dataset Inventory ArcGIS REST API, extracts field schemas, and compares against a stored JSON baseline. Detects added/removed/type-changed fields with precise diffs. Computes weighted completeness scores from live record samples. Logs all events to structured JSONL audit trail. Automated via GitHub Actions cron running weekly.
Production data governance that prevents silent schema breakage. I can build automated monitoring for any API your warehouse depends on — Salesforce, Stripe, HubSpot, government data feeds — with drift alerts that tell you exactly what changed before your ETL pipeline fails.
Real-time model monitoring system using Population Stability Index (PSI) to detect when input data distributions shift beyond training baselines. Automatically flags features that drifted significantly (PSI > 0.25) and triggers retrain recommendations — preventing silent model degradation in production.
Models decay when the world changes — this catches it before accuracy drops. Three features crossed the significant drift threshold: temperature (+5°C shift), humidity (-15%), and nearby station utilization (+0.15). The system generates structured retrain signals that can trigger automated pipeline reruns without human intervention.
Computed PSI per feature by comparing live inference distributions against the training baseline using 10-bin histograms. PSI thresholds: < 0.1 stable, 0.1–0.25 moderate, > 0.25 significant. Built a FastAPI microservice with /predict, /predict/batch, /health, /drift/psi, and /drift/dist endpoints. Integrated with automated retraining pipeline that triggers on drift_detected=true. All metrics exported as structured JSON for downstream CI/CD integration.
Production model monitoring that prevents silent degradation. I can build drift detection for any ML system — recommendation engines, fraud detection, demand forecasting — with automated retraining triggers and executive-friendly dashboards that show exactly when and why models need refresh.
Full-stack competitive intelligence pipeline for Amazon marketplace analytics. Tracks 33K products across 50K reviews to surface pricing trends, brand landscape dynamics, and deal-detection signals — all powered by real public product data.
Most Amazon sellers guess at pricing. This system computes deal scores, brand concentration indices, and price volatility by category — turning product listings into a competitive intelligence dashboard. Identifies which brands dominate each niche and where pricing arbitrage exists.
Built from real Amazon product and review datasets. Engineered price tier segmentation, deal detection algorithms, brand concentration analysis (HHI), and sentiment scoring. Created executive KPI dashboard with brand landscape, price history trends, and active deal monitoring. Full ETL with dimensional modeling and star schema.
Competitive intelligence and pricing analytics for e-commerce or retail teams. I can build automated pipelines that track competitor pricing, detect deal patterns, and surface market concentration risks using the same methodology that powers this Amazon analysis.
Content strategy analytics for streaming platforms using 9K Netflix titles from the Kaggle public dataset (CC0). Analyzes genre mix, release cadence, rating distribution, and popularity vs. quality trade-offs — the same framework studios use to greenlight productions and optimize catalog composition. Data transformed into TMDB-compatible schema for cross-platform analysis.
Content is expensive — knowing what to produce is the multiplier. The dashboard reveals that Drama dominates catalog share but Documentaries punch above their weight on ratings. Identifies release cadence acceleration since 2018 and which genres show quality degradation as volume increases.
Built from public Netflix catalog data with 9K titles. Created genre distribution analysis, popularity vs. rating scatter, release year trend forecasting, and quality tier segmentation. Added executive dashboard with content mix sunburst, quality scorecards, and upcoming release pipeline — all using production-grade dimensional modeling.
Content analytics and catalog optimization for streaming, media, or publishing teams. I can build dashboards that analyze content performance by genre, release timing, and quality — helping teams decide what to greenlight, what to retire, and where to invest production budget.
Federal labor market intelligence system built on Bureau of Labor Statistics API data spanning 50+ years. Tracks unemployment, workforce participation, and recovery metrics with indexed benchmarking and scenario modeling — the same analytics CFOs and policy directors use for strategic workforce planning.
Labor markets don't move in straight lines — this tracks the curves. The system indexes all metrics to pre-COVID baseline (Jan 2020 = 100), revealing which demographics recovered fastest and which remain depressed. Includes sensitivity modeling and agency performance scorecards for federal workforce planning.
Built on BLS public API with 50+ years of employment statistics. Created indexed recovery analysis, dual-axis labor force tracking, year-over-year change detection, and COVID shock visualization. Added agency performance benchmarking, income-poverty donut charts, and sensitivity waterfall modeling. All notebooks execute against live BLS data with automated refresh.
Executive labor market analytics and workforce planning dashboards. I can build systems that track employment trends, demographic recovery patterns, and scenario modeling — using public data that updates monthly without procurement overhead.
Biomedical research intelligence pipeline using NCBI PubMed API to extract biomarker signals, drug response patterns, and epidemiological trends from peer-reviewed literature. Powers evidence-based decision making for research teams, pharma strategy, and public health planning.
Research teams drown in literature — this surfaces the signal. The pipeline extracts drug response correlations, biomarker significance scores, and trial timeline patterns from PubMed abstracts. Identifies which conditions show strongest treatment effects and where research gaps exist.
Built on NCBI E-utilities API with structured PubMed queries. Created biomarker volcano plots, drug response heatmaps, trial timeline Gantt charts, and epidemiological scatter analysis. Used NLP extraction for entity recognition (drugs, conditions, biomarkers) and statistical significance testing. All visualizations update from live PubMed queries.
Research intelligence and literature analytics for biotech, pharma, or academic teams. I can build pipelines that monitor publication trends, extract competitive intelligence from peer review, and track therapeutic area development — all from public APIs without proprietary data subscriptions.
Federal transparency compliance system tracking FOIA request processing across 127 agencies. Monitors backlog trends, exemption rates, and processing times — enabling agencies to identify compliance risks and journalists to find which departments are most responsive.
Transparency is a metric, not a promise. The dashboard reveals which agencies clear requests fastest, which exemptions are invoked most often, and where backlogs are growing. Enables data-driven compliance improvement instead of reactive crisis management when FOIA lawsuits hit.
Built on FOIA.gov public API with annual processing statistics. Created backlog trend analysis, exemption rate heatmaps, processing time distributions, and agency comparison benchmarking. Added automated compliance scoring and alert thresholds for agencies exceeding statutory deadlines. All data refreshes from live FOIA.gov reports.
Compliance automation and regulatory reporting for legal, government, or public affairs teams. I can build systems that track processing SLAs, exemption patterns, and backlog aging — using the same methodology that powers this federal transparency analysis.
Automated academic paper categorization system processing 230K arXiv abstracts across 172 CS categories. Uses TF-IDF vectorization with Linear SVM to classify research papers into sub-disciplines — enabling literature monitoring, trend detection, and research gap identification for academic and R&D teams.
Research moves fast — staying current is a full-time job. This system auto-categorizes incoming papers by sub-discipline, tracks publication volume trends, and identifies keyword emergence patterns. Perfect for R&D strategy teams monitoring competitor research output and academic trend shifts.
Built on arXiv OAI API with 230K CS papers. Created TF-IDF vectorizer with n-gram analysis, Linear SVM classifier with calibrated probability outputs, and category distribution visualization. Added temporal trend analysis for publication volume by subfield and keyword co-occurrence networks. All data refreshes from live arXiv feeds.
Research intelligence and literature classification for R&D or strategy teams. I can build automated pipelines that monitor publication trends, classify incoming research by topic, and surface emerging themes — keeping teams ahead of the literature without manual review.
Comprehensive federal transit program performance system analyzing 1,000+ agencies through National Transit Database (NTD) API. Tracks capital efficiency, fleet modernization rates, operating cost per vehicle, and urban vs. rural service gaps — the same metrics FTA uses for grant allocation and compliance.
Transit funding is competitive — performance data wins grants. The dashboard identifies which agencies deliver the most ridership per dollar invested, where fleet age creates service risk, and how capital allocation correlates with population density. Used by FTA for formula grant distributions and by agencies for competitive grant applications.
Built on NTD (National Transit Database) API with 1,000+ agency records. Created capital efficiency scoring, fleet modernization tracking, urban-rural gap analysis, and cost-per-vehicle benchmarking. Added time-series forecasting for capital needs and agency performance scorecards. Integrated with USASpending for cross-validation of federal award alignment.
Program performance analytics for government or nonprofit program management. I can build dashboards that track grant utilization, performance benchmarking, and outcome metrics — using the same methodology that powers federal transit program oversight.
Metro ridership analytics system tracking Washington DC's 91-station network through WMATA public API. Monitors post-COVID recovery ratios, station-level demand patterns, and line-specific performance — enabling transit planners to optimize service frequency and identify stations needing intervention.
Metro ridership is still 32% below pre-COVID levels — but recovery is uneven. The Red Line has recovered to 78% while Silver Line lags at 54%. Downtown stations show stronger recovery than suburban terminals. This system identifies which stations and lines need service adjustments to accelerate ridership return.
Built on WMATA public API with station-level entry/exit data. Created recovery ratio computation (current vs. pre-COVID baseline), line-level aggregation, station ranking by ridership, and weekly trend analysis. Added entries-vs-exits balance detection for station flow optimization. All visualizations update from live WMATA data feeds.
Transit demand analytics and ridership forecasting for transportation agencies. I can build systems that track recovery patterns, optimize service frequency by line, and identify demand shifts — using real transit data that updates daily.
Global development intelligence dashboard using World Bank Open Data API across 200+ countries. Tracks GDP per capita, literacy rates, CO2 emissions, and life expectancy with cross-country benchmarking and trend analysis — the same indicators development economists and NGOs use for resource allocation.
Development happens unevenly — this shows exactly where. The dashboard reveals the GDP-life expectancy correlation curve, identifies literacy gap clusters, and tracks CO2 emission trajectories by development stage. Used for international development strategy, NGO resource targeting, and cross-country policy benchmarking.
Built on World Bank Open Data API with 200+ country records spanning 60+ years. Created GDP vs. life expectancy scatter with animated time progression, literacy rate distribution analysis, CO2 emission trend tracking, and development stage clustering. Added interactive country comparison and regional benchmarking. All data refreshes from live World Bank APIs.
Global development analytics and cross-country benchmarking for international strategy teams. I can build dashboards that track development indicators, identify regional gaps, and surface policy intervention opportunities — using public data that covers virtually every country on Earth.
Federal policy intelligence system tracking 500+ OMB guidance documents through live API. Categorizes by policy type (memos, circulars, bulletins), tracks issuance timeline, and identifies category concentration — enabling compliance teams to monitor regulatory changes and policy directors to spot governance trends.
Federal guidance changes constantly — missing a memo costs millions. The system categorizes all OMB guidance by type and subject, tracks issuance velocity over time, and identifies which policy areas are receiving new attention. Enables proactive compliance instead of reactive scramble when new guidance drops.
Built on OMB public API with 500+ policy documents. Created policy type distribution analysis, category concentration heatmaps, issuance timeline tracking, and automated tagging by subject matter. Added full-text search and similarity clustering to identify related guidance. All data refreshes from live OMB policy feeds.
Regulatory monitoring and compliance tracking for government contractors or federal agencies. I can build systems that monitor guidance changes, categorize policies by impact area, and alert teams to new requirements — preventing compliance gaps before they become audit findings.
Supreme Court opinion analytics system mining 70K+ decisions through CourtListener API. Extracts legal term frequency patterns, topic distributions, opinion length trends, and vote margin dynamics — enabling legal scholars and policy teams to track doctrinal evolution and decision predictability.
Legal doctrine evolves in text — this tracks the evolution. The system identifies which legal terms surge in frequency before major doctrinal shifts, tracks opinion length inflation over decades, and clusters decisions by substantive topic. Reveals that unanimous decisions have shortened while 5-4 splits have grown more verbose since 2000.
Built on CourtListener API with 70K+ SCOTUS opinions. Created TF-IDF legal term extraction, LDA topic modeling with coherence optimization, opinion length timeline analysis, and vote margin distribution tracking. Added semantic similarity clustering and citation network analysis. All data refreshes from live CourtListener feeds.
Legal text analytics and regulatory intelligence for law firms, policy shops, or compliance teams. I can build pipelines that monitor court decisions, extract doctrinal trends, and surface precedent patterns — turning legal text into structured intelligence.
Retrieval-Augmented Generation (RAG) knowledge base built on 230K arXiv papers with semantic embeddings. Enables natural language querying of research literature with vector similarity retrieval — the same architecture powering enterprise knowledge management and AI-assisted research workflows.
Finding relevant research shouldn't require reading 1,000 abstracts. This RAG system encodes papers into semantic vectors, clusters them by research area, and retrieves the most relevant work for any natural language query. Used for literature review automation, research gap identification, and cross-disciplinary discovery.
Built on arXiv API with 230K CS papers. Used sentence-transformers (all-MiniLM-L6-v2) to generate 768-dimensional embeddings. Created t-SNE visualization for 2D cluster exploration, cosine similarity ranking for query retrieval, and category distribution analysis. Added FAISS index for sub-second retrieval at scale. All embeddings computed from live paper abstracts.
RAG architecture and semantic search for enterprise knowledge management. I can build systems that index document collections, enable natural language querying, and retrieve semantically relevant content — whether it's research papers, legal contracts, or internal documentation.
Airline performance analytics system processing 5.8M flights through Bureau of Transportation Statistics (BTS) API. Analyzes delay patterns by airline, route, day-of-week, and time-of-day — enabling operations teams to optimize crew scheduling and travelers to avoid high-risk flight windows.
Flight delays follow predictable patterns — if you know where to look. Friday evening departures average 35 minutes late. Certain hub routes show 3× higher delay rates than direct flights. The system identifies optimal booking windows, high-risk routes, and airline reliability scores — turning on-time performance data into actionable travel and operations intelligence.
Built on BTS On-Time Performance API with 5.8M flight records. Created airline delay rate comparison, route-level performance analysis, day-of-week pattern detection, and delay distribution modeling. Added delay prediction scoring based on historical route-airline-time combinations. All data refreshes from live BTS feeds.
Operations analytics and performance monitoring for logistics, travel, or transportation teams. I can build systems that track on-time performance, identify delay root causes, and optimize scheduling — using the same methodology that powers airline operations centers.
Search trend analytics system using Google Trends API to track keyword interest velocity, seasonal patterns, and competitive brand share. Identifies rising search terms before they peak, enabling marketing teams to capture demand early and content strategists to ride trending topics before saturation.
Search interest is demand before it shows up in sales data. The pipeline decomposes search trends into seasonal, trend, and residual components. Identifies which keywords are accelerating vs. decelerating, and where geographic interest clusters form. Used for content calendar timing, product launch windows, and competitive positioning.
Built on Google Trends API with multi-keyword batch queries. Created seasonal decomposition (STL), year-over-year growth rate computation, relative interest indexing, and competitive share tracking. Added automated alert thresholds for trending topic detection and regional interest heatmaps. All data refreshes from live Google Trends feeds.
Search intelligence and trend monitoring for marketing, product, or strategy teams. I can build pipelines that track brand share, detect rising demand signals, and optimize content timing — using search data that updates daily and predicts market shifts weeks before they appear in sales reports.
Federal capital project risk intelligence system combining USASpending.gov and NTD data. Scores projects by schedule risk, funding variance, and capital efficiency — enabling program managers to identify troubled projects before they become budget overruns and enabling agencies to allocate capital where it delivers highest ROI.
23% of federal capital projects show high schedule risk — but most agencies find out too late. The system scores every project on 12 risk dimensions including funding velocity, vendor concentration, historical slippage, and inter-agency coordination density. Flags projects 6-12 months before schedule slips become budget crises.
Built on USASpending.gov and NTD APIs with multi-year project records. Created composite risk scoring with 12 weighted factors, schedule variance trending, capital efficiency benchmarking, and vendor concentration analysis (HHI). Added project similarity clustering and historical performance baselines. All data refreshes from live federal APIs.
Project risk analytics and capital portfolio management for PMOs and program directors. I can build risk-scoring systems that flag troubled projects early, optimize capital allocation, and track vendor concentration — using the data you already collect but rarely analyze structurally.
Federal open data catalog analytics system processing 250K+ datasets from Data.gov API. Tracks agency publishing patterns, topic distribution, update frequency, and dataset quality scores — enabling data teams to discover relevant datasets and agencies to improve their open data posture.
250K federal datasets exist — finding the right one is the hard part. The system clusters datasets by topic, identifies which agencies publish most frequently, and tracks update cadence to flag stale data. Used by researchers to discover data sources and by agencies to benchmark their open data programs against peers.
Built on Data.gov CKAN API with 250K+ dataset records. Created agency distribution analysis, topic clustering via TF-IDF + K-Means, update frequency tracking, and dataset quality scoring. Added temporal trend analysis for publishing velocity and automated stale data detection. All data refreshes from live Data.gov feeds.
Data catalog analytics and open data strategy for government or research organizations. I can build systems that monitor data publishing patterns, surface high-value datasets, and benchmark open data maturity — using public APIs that require no procurement or vendor relationships.
US Census demographics analytics system visualizing 331M Americans across 50 states and 3,000+ counties. Analyzes age distribution, income by education level, and population density patterns — enabling policy teams to identify demographic shifts and market analysts to segment regions by socioeconomic characteristics.
Demographics drive every market decision — this maps them at county precision. The dashboard reveals age distribution shifts, income-education correlations, and population density clusters. Used for site selection, policy targeting, resource allocation, and market segmentation — all at the county level where decisions actually get made.
Built on US Census Bureau API with ACS 5-Year Estimates. Created age distribution pyramids, income vs. education scatter analysis, county-level choropleth maps, and population density clustering. Added demographic trend tracking and comparison tools for multi-region analysis. All data refreshes from live Census APIs.
Demographic analytics and market segmentation for strategy, policy, or operations teams. I can build dashboards that track population shifts, identify growth corridors, and segment markets by socioeconomic characteristics — using Census data that covers every US county.
Bureau of Labor Statistics intelligence dashboard tracking unemployment, job openings, and wage trends across all 50 states. Identifies labor market tightness, wage growth hotspots, and regional economic disparities — enabling workforce planners and economic developers to target interventions where they're needed most.
The labor market varies more by state than by nation. North Dakota's unemployment is 2.1% while Nevada's is 5.4%. The dashboard surfaces these gaps, tracks wage growth vs. inflation correlation, and identifies which states have the tightest labor markets — critical intelligence for workforce planning and economic development.
Built on BLS Public Data API with monthly unemployment, JOLTS job openings, and quarterly wage data. Created state-level unemployment choropleth, wage vs. unemployment scatter, job openings trend analysis, and regional economic disparity scoring. Added automated monthly refresh and alert thresholds for significant labor market shifts.
Labor market analytics and workforce intelligence for HR, economic development, or strategy teams. I can build systems that track employment trends, identify talent availability by region, and monitor wage competitiveness — using BLS data that updates monthly at no cost.
Emergency response analytics system analyzing 2.5M FDNY incidents through NYC Open Data API. Tracks response times by severity, call volume patterns, and resource allocation efficiency — enabling EMS directors to identify bottlenecks and optimize crew deployment before peak demand periods.
Response times spike 40% during evening rush — but most crews are already deployed. The system identifies which boroughs have the longest delays, which severity levels get downgraded most often, and where adding ambulances would have the highest impact on patient outcomes.
Built on FDNY EMS Incident API with 2.5M historical incidents. Created response time distribution analysis, severity-based triage performance, call volume forecasting, and resource allocation optimization. Added demand heatmaps by hour and borough. All data refreshes from live NYC Open Data feeds.
Operations analytics and resource optimization for emergency services, logistics, or operations teams. I can build systems that track response metrics, forecast demand, and optimize crew deployment — using the same methodology that powers emergency response analysis.
Medicaid drug utilization analytics system processing CMS State Drug Utilization Data across all 50 states. Tracks generic penetration rates, opioid prescription monitoring, and cost efficiency — enabling Medicaid administrators to identify overutilization patterns and optimize formulary decisions.
Generic drugs save Medicaid $40B annually — but penetration varies 30% by state. The dashboard reveals which states over-prescribe opioids, which therapeutic classes drive the most spend, and where generic substitution could cut costs without compromising care.
Built on CMS State Drug Utilization Data with 50-state coverage. Created generic penetration analysis, opioid rate tracking by state, high-cost product identification, and cost efficiency benchmarking. Added year-over-year trend analysis and interstate comparison. All data refreshes from live CMS feeds.
Healthcare cost analytics and utilization monitoring for Medicaid, insurance, or pharmacy teams. I can build systems that track prescription patterns, identify overutilization, and optimize formulary decisions — using public CMS data that requires no vendor relationships.
Public health intelligence dashboard combining CDC NVSS mortality data, COVID tracking, and opioid epidemic analysis. Monitors age-adjusted death rates, epidemic trajectories, and state-level health disparities — enabling public health officials to target interventions where mortality trends are worsening.
West Virginia's mortality rate is 40% above national average — the dashboard shows exactly why. The system tracks age-adjusted death rates by cause, county, and year. Identifies opioid epidemic hotspots, cardiovascular disease clusters, and regions where life expectancy is declining — the same data CDC uses for national health reports.
Built on CDC WONDER and NVSS APIs with county-level mortality data. Created age-adjusted death rate analysis, cause-of-death trend tracking, epidemic trajectory modeling, and state health disparity scoring. Added COVID impact analysis and opioid epidemic hotspot identification. All data refreshes from live CDC feeds.
Public health analytics and epidemiological intelligence for government or healthcare organizations. I can build dashboards that track mortality trends, identify disease hotspots, and monitor health disparities — using CDC data that covers every US county.
Employee attrition prediction system using IBM HR Analytics dataset with 1,470 records. Combines logistic regression, random forest, and Cox survival analysis to identify flight-risk employees 6-12 months before they leave — enabling HR teams to intervene with retention strategies before turnover becomes a productivity crisis.
Employees who travel frequently and work overtime are 3× more likely to quit — but most HR teams find out in the exit interview. The system identifies attrition risk factors (overtime, travel, tenure, age) and predicts which employees are likely to leave within 6 months. Enables proactive retention instead of reactive backfill.
Built on IBM HR Analytics Employee Attrition dataset. Created exploratory analysis, feature engineering (overtime ratio, satisfaction composite, tenure segments), logistic regression and random forest classifiers, and Cox proportional hazards survival analysis. Added ROC/AUC evaluation, confusion matrix analysis, and Kaplan-Meier survival curves by risk factors.
People analytics and retention intelligence for HR teams. I can build systems that predict attrition risk, identify root causes, and enable proactive intervention — using the same methodology that powers workforce analytics at enterprise scale.
Diversity, equity, and inclusion intelligence dashboard combining US Census and BLS data across 50 states. Tracks pay equity gaps, representation metrics, and employment trends by demographic group — enabling CHROs and DEI leaders to benchmark their organization's progress against national baselines and identify priority intervention areas.
The pay gap hasn't closed in 20 years — but some states are moving faster than others. The dashboard tracks median income by gender and race, unemployment disparities, and representation scores across all 50 states. Enables data-driven DEI goal-setting instead of aspirational targets with no baseline.
Built on US Census ACS and BLS employment data. Created pay equity analysis by demographic group, representation scoring, employment trend tracking, and interstate DEI benchmarking. Added diversity index computation and year-over-year gap analysis. All data refreshes from live Census and BLS APIs.
DEI analytics and workforce benchmarking for HR and executive leadership. I can build dashboards that track pay equity, representation, and employment trends — using public data that provides credible external benchmarks without expensive surveys.
Employee sentiment analysis system processing 3,000 workforce feedback records across multiple domains (Amazon, Yelp, internal). Uses TF-IDF + SVM to classify sentiment and identify domain-specific pain points — enabling HR and management teams to track morale trends and address issues before they drive turnover.
Sentiment varies dramatically by domain — product feedback is 70% negative while internal reviews are 60% positive. The system classifies sentiment by source, identifies which domains generate the most negative feedback, and surfaces common complaint themes. Used for employee pulse monitoring and customer satisfaction tracking.
Built on 3,000 labeled sentiment records. Created text preprocessing pipeline, TF-IDF vectorization, Linear SVM classifier, and domain-specific sentiment analysis. Added confusion matrix evaluation, ROC curve analysis, and cross-domain comparison. Generates automated sentiment reports by source category.
Sentiment analytics and text intelligence for HR, customer experience, or product teams. I can build systems that classify feedback by sentiment, track trends over time, and surface domain-specific issues — turning unstructured text into structured action items.
Document classification system using BBC News dataset with 2,225 articles across 5 categories (business, entertainment, politics, sport, tech). Combines TF-IDF vectorization with Random Forest and Logistic Regression to auto-categorize documents — the same pipeline used for legal discovery, medical record sorting, and content moderation.
Manual document sorting is expensive and inconsistent — this automates it at 97% accuracy. The system classifies documents into categories using text features, handles imbalanced classes with stratified sampling, and evaluates with precision, recall, and F1. Scales from 2K news articles to 2M legal contracts.
Built on BBC News Classification dataset. Created text preprocessing (lowercasing, tokenization, stopword removal), TF-IDF vectorization (max 5K features), Random Forest and Logistic Regression classifiers with hyperparameter tuning. Added stratified train-test split, confusion matrix analysis, and feature importance extraction. Generates classification performance reports with precision, recall, and F1 per class.
Document classification and text automation for legal, medical, or content teams. I can build pipelines that sort documents by type, extract key information, and route content to the right reviewers — reducing manual processing time by 90% while maintaining accuracy.
Available for fractional CMO engagements, BI consulting projects, and AI architecture work. I'll respond within 24 hours.