Portfolio

My work centers on the belief that every data point represents a real-world behavior waiting to be understood. Here you will find the projects where I have applied technical modeling to answer critical business questions.

Using Data to Solve Real-World Problems

Real problems can be solved with real data. Here is how I have implimented that.

Campuslytics.com — College Finance Tracker + Analytics

Product • Analytics • SQL • Node.Js • Python •Pandas • AWS RDS Freelance Project

Problem: Many college students live paycheck to paycheck and struggle to understand where their money goes between pay periods. Traditional finance tools focus on individual transactions and running balances, which makes it difficult to identify spending patterns, compare behavior across pay periods, or understand how quickly money is being spent after each paycheck. Students need visibility into their habits within each spending period so they can make better decisions and avoid blowing through their income.

What I built: I designed and deployed a full-stack finance analytics application that models spending around a one-deposit to many-withdrawals relationship rather than a single mixed transaction table. This structure allows each pay period to be analyzed independently. Using PostgreSQL and Node.js for data storage and workflows, I integrated Python analytics with Pandas and Matplotlib to generate personalized summaries, category trends, outlier detection, and behavioral insights for each student. The system also includes AI-generated recommendations that translate analytical findings into clear, actionable guidance.

Impact: The application turns raw financial data into period-based insights that help students understand how their spending changes over time and where adjustments can be made. By focusing on trends rather than individual transactions, users gain clarity on saving opportunities after each paycheck. The project is deployed and actively used by students, demonstrating real-world adoption and practical value.

**Note for usage: Feel free to test things out and explore the analysis. You may use username "testUser" with password "admin". If you decide to edit or add data to test out how things change, that is totally okay and welcome! Just please do not change the data too much. Thanks!

SQL Queries/Data Manipulation & Prediction Modeling

Data cleaning, manipulations, joins, all tied into modeling prediction logic.

European Football Predictive Simulator

Pandas • Python • SQL Freelance Project

Problem: Build a realistic match simulation system that predicts European football match outcomes based on team quality and roster construction rather than randomness. The goal was to model how differences in player ability, positional balance, and lineup choices influence match results in a way that reflects real football dynamics.

What I built: A predictive simulation application built in Python using Pandas for data transformation and SQLite for relational data storage. The simulator operates on a normalized football database containing over 200,000 FIFA player records across multiple seasons. I designed and executed complex multi table SQL queries to join teams, players, attributes, then converted player level ratings into aggregated team strength scores. These scores are fed into probabilistic logic that simulates goals, match outcomes, and win probabilities which constantly change alongside user interactions and decisions.

Key functionality: The application allows users to adjust team rosters and lineups, rerun simulations, and compare outcomes across scenarios. By modifying player selections or tactical balance, users can observe how small changes in roster strength or positional depth affect predicted results. This enabled scenario testing and sensitivity analysis rather than a single static prediction.

Focus: Advanced data manipulation with Pandas, writing and optimizing multi table SQL queries, enforcing relational integrity across datasets, and translating domain knowledge into predictive logic. Additional emphasis was placed on reproducibility, debugging simulation bias, and iteratively refining assumptions to produce stable and interpretable results.

Data Analysis

EDA that connects data patterns to real behavioral trends to solve problems.

Ella Rises — STEAM Engagement & Outcomes EDA

Python • Pandas • EDA • Nonprofit Analytics • Presentation Skills

Context: Ella Rises is a nonprofit focused on increasing participation, graduation, and hiring of Latin American women in STEAM (STEM + Arts) fields. The goal of this analysis was to identify which factors most strongly influence academic and career outcomes (the company's KPIs) and which do not.

Problem: The organization tracked demographics, donations, milestones, surveys, and event attendance, but lacked clarity on what actually drives STEAM degree completion and STEAM job placement.

What I did: I conducted an end-to-end exploratory data analysis in Python (Pandas, Seaborn, SciPy), cleaning and joining multi-table program data to evaluate relationships between demographics, engagement, milestones, and outcomes. I used statistical tests (t-tests, ANOVA, chi-square, correlation) to separate meaningful signals from noise.

Key Insight: Demographic and background variables (age, city, donations, milestone type) showed no meaningful statistical relationship to degree completion or job outcomes. In contrast, engagement,especially attendance at STEAM-focused events, was strongly associated with both STEAM degree completion and STEAM job placement. General attendance mattered somewhat, but non-STEAM event attendance showed no comparable effect.

Impact: The findings suggest that targeted STEAM exposure drives outcomes more than demographics. This supports prioritizing structured STEAM programming, increasing repeat engagement, and designing pathways that keep participants consistently involved to improve long-term academic and career success. This led me to assist a team to deploy a web app solution to improve event engagement.


Ames Housing Market — Residential Real Estate Price Drivers EDA

Python • Pandas • EDA

Context: Ames, Iowa, serves as a primary case study for understanding housing market dynamics. Acting as a data analyst for a real estate firm, I investigated eight key factors influencing home valuations to provide data-driven insights for property investment and pricing strategies.

Problem: Needed to understand which specific features (such as neighborhood regions, garage types, material quality, etc.) statistically correlate with higher sale prices and exactly how much value these individual components add to a listing.

What I did: I performed an end to end exploratory data analysis in Python, utilizing Pandas for data cleaning and engineering. I applied a variety of statistical techniques, including linear regression to model price trends, ANOVA and Tukey HSD tests to compare categorical groups like neighborhood regions, and Chi-Square tests to evaluate the distribution of building types across the city.

Key Insight: The analysis revealed that Overall Quality and Living Area are the strongest predictors of price, with each additional square foot of living space adding approximately $107 to the home value. Significant regional disparities were identified; homes in the Northeast region sell for an average of $116,451 more than those in the Central region. While garage types and finished basements also showed statistical significance, exterior material quality proved to be a universal driver of value across all property types.

Impact: These findings provided a framework for the real estate team to prioritize renovations and identify undervalued properties. By quantifying the expected increase for specific upgrades, like finishing a basement or improving exterior materials, the firm can better predict ROI. This data served as the foundation for a polished data story used to guide future acquisition strategies and stakeholder presentations.


Postpartum Mental Health EDA

Python • Pandas • EDA • Presentation Skills
Problem: Maternal mental health outcomes are influenced by many overlapping factors, but it is often unclear which variables are meaningfully associated with anxiety, depression, and birth-related PTSD. This analysis aimed to evaluate whether demographic characteristics, birth context, and infant sleep patterns showed statistically meaningful relationships with maternal mental health indicators. The goal was to move beyond surface-level assumptions and identify which factors warrant deeper attention and which do not.

What I did: I conducted a full exploratory data analysis in Python using Pandas, Matplotlib, Seaborn, and statistical testing libraries. The dataset required extensive preprocessing, including removing invalid survey responses, aggregating multi-part survey instruments into total depression, anxiety, and PTSD scores, and converting encoded numeric values into interpretable categorical variables. I performed univariate analysis to understand distributions and skewness, followed by bivariate analysis using correlation, regression, ANOVA, and chi-square testing to evaluate relationships between mental health outcomes and contextual variables such as marital status and infant sleep behavior.

Focus: The analysis emphasized data cleaning, careful variable construction, and statistically grounded interpretation. Key findings showed strong positive relationships between birth-related PTSD and both maternal depression and anxiety, while demographic factors such as marital status showed no statistically significant effect. Results were communicated using clear visualizations and plain-language explanations to support interpretation by non-technical audiences.

Data Visualization

Exploratory dashboards built to help managment understand how factors effect outcomes.

IS Recruiting Insights Dashboard and Storytelling Chart

Tableau

Context and Problem: The Information Systems department uses recruiting events, prerequisite classes, and advising touchpoints to encourage students to apply to the program. While survey data existed on which activities students attended and how influential they were, the recruiting committee lacked a clear, data-driven view of what was actually working and where efforts should be adjusted. The challenge was to translate raw survey and application data into insights that could directly inform future recruiting decisions.

What I built: I designed an interactive Tableau dashboard using real recruiting survey data from 2024 and 2025 that allows non-technical users to explore influence patterns across classes, recruiting activities, and demographic filters. The dashboard is fully filterable by year and gender and includes views showing average influence scores, counts of “very influential” ratings, and how students heard about the IS program. In addition, I created a one-slide storytelling chart focused on explaining the decline in applications in 2025, with particular attention to changes in female applicants.

Impact: The dashboard highlights which classes and recruiting activities consistently influence students’ decisions and which show weaker or declining impact. These insights provide the recruiting committee with clear evidence to support reallocating resources, refining outreach strategies, and strengthening high-impact touchpoints. The work was created to address an active departmental need and is suitable for sharing with faculty, staff, and advisory board members to guide real recruiting strategy.

Power BI — Superstore Profit Dashboard (Tableau Recreation)

Power BI

Objective: Recreate a previously built Tableau profit dashboard in Power BI to evaluate Superstore performance, while learning Power BI’s data modeling, visualization, and formatting mechanics. The goal was to preserve the original dashboard’s layout and design principles while adapting to a new BI tool.

What I built: An interactive Power BI dashboard featuring key performance indicators for total profit and total sales, a continuous line chart showing profit trends over time by order date, and multiple breakdowns of profitability by category and sub-category. Conditional formatting was used to clearly distinguish positive and negative profit, helping surface loss-driving products at a glance.

Design & interactivity: The dashboard mirrors the Tableau layout and includes a category slicer to support focused analysis. Visual hierarchy, consistent color usage, and minimal chart clutter were prioritized to ensure the dashboard is easy to interpret for non-technical stakeholders.