Into the Unknown
Stephen Barrie
Bridging the gap between data and information
A blog by Stephen Barrie
Categories
All
(46)
ANOVA
(1)
API
(1)
Apache Spark
(1)
Audio
(1)
BigQuery
(6)
Biomedical
(1)
Blogging
(1)
CLV
(1)
Chi-Square
(1)
Collaborative Filtering
(1)
Convolutional Neural Networks
(1)
Cross-Entropy
(1)
Dash
(1)
Data Carpentry
(3)
Data Cleansing
(1)
Data Lakes
(2)
Data Science
(1)
DataCamp
(6)
DataTalksClub
(9)
Database
(1)
Dataproc
(1)
Docker
(1)
ETL
(2)
Ensembling
(1)
Forecasting
(1)
GCP
(3)
GCS
(1)
Geospatial
(2)
GoogleSheets
(1)
Gradient Accumulation
(1)
Hugging Face
(1)
Image
(2)
Imbalanced data
(1)
JSON
(1)
K-means
(1)
Kaggle
(2)
LDA
(1)
Logistic Regression
(1)
Looker
(2)
MLOps
(3)
Marketing
(1)
MongoDB
(1)
Multi-Label
(1)
My Journey
(1)
NumPy
(1)
Numpy
(1)
OSM
(1)
PostgreSQL
(1)
Power BI
(1)
Prefect
(3)
Projects
(3)
Python
(3)
Quarto
(1)
Random Forests
(1)
SMOTE
(1)
SQL
(1)
SciPy
(1)
Segmentation
(1)
Spark
(1)
T-test
(1)
TIL
(8)
Tableau
(1)
Terraform
(2)
Text mining
(1)
Tukey's Range
(1)
Uber H3
(1)
W&B
(1)
XGBoost
(1)
dbt
(2)
fastai
(10)
librosa
(1)
pandas
(2)
parkrun
(1)
seaborn
(2)
timm
(1)
vision
(1)
Fraud Detection using Python
Imbalanced data
SMOTE
K-means
Text mining
LDA
In this…
Stephen Barrie
Jan 31, 2024
Customer Lifetime Value (CLV)
CLV
Marketing
BigQuery
Dash
My starting point for this project is to frame two common business problems faced by the Google Merchandise Store and ecommerce businesses in general.
Stephen Barrie
Jan 29, 2024
Hypotheses Testing
Data Science
T-test
ANOVA
Tukey's Range
Chi-Square
In…
Stephen Barrie
`Dec 7, 2023`{=html}
MongoDB - The Basics
MongoDB
Database
If you are conversant with Python and prefer the JSON format to the tabular data (rows and columns) approach used by SQL then MongoDB mi…
Stephen Barrie
Nov 10, 2023
Urban Accessibility - How close is my nearest Żabka ?
Geospatial
OSM
Uber H3
This project leans heavily on the methodologies used by Milan Janosov in Urban Accessibility — How to Reach Defibrillators on Time. His articles are a true…
Stephen Barrie
Nov 7, 2023
Google Sheets API Connector
API
GoogleSheets
JSON
In this tutorial I will show you how to access real-time (historical data behind a paywall) flight status, airports, airlines and aircraft data using the aviationstack API. I will also…
Stephen Barrie
Sep 27, 2023
mlops-zoomcamp | Module 3: Orchestration
MLOps
DataTalksClub
…
Stephen Barrie
Jun 8, 2023
Weights and Biases
TIL
W&B
In this blog we will cover how to visualize metrics while training models, how…
Stephen Barrie
May 30, 2023
Power BI
TIL
Power BI
Outside of Excel spreadsheets, my first…
Stephen Barrie
May 29, 2023
mlops-zoomcamp | Module 2: Experiment tracking and model management
MLOps
DataTalksClub
Spreadsheets are a familiar tool and widely used across different industries. Many people are already…
Stephen Barrie
May 25, 2023
parkrun Kraków 20/05/2023 | #461
pandas
seaborn
Data Cleansing
parkrun
A couple of weeks ago I signed up for parkrun Kraków. I had a dry run on 6 May, just to see how it all worked. There was a bit of a…
Stephen Barrie
May 23, 2023
mlops-zoomcamp | Module 1: Introduction
MLOps
DataTalksClub
Jupyter Notebooks offer a convenient and interactive environment for experimentation, prototyping, and data exploration in machine learning projects. However, when…
Stephen Barrie
May 17, 2023
Biomedical Image Analysis in Python
Numpy
SciPy
Image
Biomedical
Segmentation
DataCamp
Since the first x-ray in 1895, medical imaging technology has advanced clinical care and opened up new fields of scientific investigation. The amount of imaging data is…
Stephen Barrie
May 12, 2023
Data Engineering Zoomcamp - Final Project
Terraform
Prefect
GCS
BigQuery
dbt
Looker
DataTalksClub
It’s time to…
Stephen Barrie
Apr 21, 2023
Data Engineering Zoomcamp - Week 5
Spark
Dataproc
BigQuery
DataTalksClub
Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other…
Stephen Barrie
Apr 3, 2023
Data Engineering Zoomcamp - Week 4
dbt
BigQuery
Looker
DataTalksClub
Goal: Transforming the data previously loaded in to our data warehouse (in my case BigQuery) by building models using a dbt project, testing and deploying those models in a production environment…
Stephen Barrie
Mar 28, 2023
Data Engineering Zoomcamp - Week 3
GCP
BigQuery
Data Lakes
ETL
Prefect
DataTalksClub
A
Data Warehouse
is an OLAP solution used for reporting and data analysis and generally includes raw, meta and summary data.
Stephen Barrie
Mar 21, 2023
Data Engineering Zoomcamp - Week 2
GCP
BigQuery
Data Lakes
ETL
Prefect
DataTalksClub
Just like a physical transport logistics system, it is important to have a smooth data logistics system. This process is also known as
Workflow Orchestration
. Workflow orchestration allows us to turn any code into a workflow that we can…
Stephen Barrie
Mar 16, 2023
Data Engineering Zoomcamp - Week 1
Docker
GCP
Terraform
PostgreSQL
DataTalksClub
This course will cover a number of technologies, including Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google, Google Cloud Storage (GCS): Data Lake…
Stephen Barrie
Mar 10, 2023
Tableau
Tableau
DataCamp
Tableau is a widely used business intelligence (BI) and analytics software trusted by companies like Amazon, Experian, and Unilever to explore, visualize, and securely share…
Stephen Barrie
Feb 7, 2023
Credit Risk Modeling in Python
XGBoost
Logistic Regression
DataCamp
If you’ve ever applied for a credit card or loan, you know that financial firms process your information before making a decision. This is because giving you a loan can have…
Stephen Barrie
Jan 23, 2023
Financial Forecasting in Python
Forecasting
DataCamp
In
Financial Forecasting in Python
, we will step into the role of CFO and learn how to advise a board of directors on key metrics while building a financial forecast, the basics of income statements and…
Stephen Barrie
Jan 19, 2023
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks
fastai
This is my follow up to the second half of
Lesson 8: Practical Deep Learning for Coders 2022
in which Jeremy demonstrates the inner workings of a Conv…
Stephen Barrie
Jan 18, 2023
Collaborative Filtering
Collaborative Filtering
fastai
This is my follow up to the second part of
Lesson 7: Practical Deep Learning for Coders 2022
in which Jeremy shows how to build a…
Stephen Barrie
Jan 13, 2023
Multi-label classification
TIL
Multi-Label
Cross-Entropy
fastai
This blog further develops the ideas included in the earlier Paddy Dcotor: Paddy Disease Classification blog. We’re going to build a model that doesn’t just predict what disease the rice paddy has, but also predicts what kind of rice is shown. This…
Stephen Barrie
Jan 12, 2023
Scaling up
Gradient Accumulation
fastai
Ensembling
This is my follow up to the first part of
Lesson 7: Practical Deep Learning for Coders 2022
in which Jeremy introduces a technique known as
Gradient Accumulation
which allows us to train larger models, despite apparent…
Stephen Barrie
Jan 11, 2023
Paddy Doctor: Paddy Disease Classification
Kaggle
fastai
vision
timm
This is my follow up to the second part of
Lesson 6: Practical Deep Learning for Coders 2022
in which Jeremy walks us through his approach to…
Stephen Barrie
Jan 10, 2023
Kaggle
TIL
Kaggle
In order to download datasets from Kaggle when working outwith the Kaggle environment you will need to make use of a Kaggle API. You can get this by clicking on
Account
below your profile name, and then…
Stephen Barrie
Jan 10, 2023
Random Forests
Random Forests
fastai
This is my follow up to the first part of
Lesson 6: Practical Deep Learning for Coders 2022
in which Jeremy introduces Decision Trees and Random Forests.
Stephen Barrie
Jan 9, 2023
Seaborn Tutorial
seaborn
There is no universally best way to visualize data. Different questions are best answered by different plots. Seaborn makes it easy to switch between different visual representations by using a consistent…
Stephen Barrie
Jan 4, 2023
Numpy Tutorial
NumPy
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as…
Stephen Barrie
Jan 3, 2023
Efficient Pandas
pandas
Having code that is clean, readable and has a logical flow is invaluable. I discovered Structured Query Language (SQL) before Python, and as the name suggests, this already…
Stephen Barrie
Dec 31, 2022
Image Processing with Python
TIL
Image
Data Carpentry
As computer systems have become faster and more powerful, and cameras and other imaging systems have become commonplace in many other areas of life, the need has grown for…
Stephen Barrie
Dec 20, 2022
Programming with Python
Python
Data Carpentry
This blog has been produced after working through the
Programming with Python
lessons provided by Data Carpentry.
Stephen Barrie
Dec 19, 2022
Introduction to Geospatial Raster and Vector data with Python
TIL
Geospatial
Data Carpentry
This blog has been produced after working through the
Introduction to Geospatial Raster and Vector data with Python
lesson provided by
Data Carpentry
.
Stephen Barrie
Dec 14, 2022
Audio Feature Extraction
TIL
Audio
librosa
In this blog…
Stephen Barrie
Dec 9, 2022
Name that Genre
Projects
Python
DataCamp
Using a…
Stephen Barrie
Dec 9, 2022
What is Spark, anyway?
TIL
DataCamp
Apache Spark
I’ve noticed that Apache Spark is cited as a requirement on many data science job specs. My natural curiosity led me to the
Introduction to PySpark
course available through
DataC…
Stephen Barrie
Dec 7, 2022
Wisła Kraków vs. KS Cracovia
Projects
SQL
I arrived in Kraków a few months back and during my wanderings I noticed two football stadiums on either side of Błonia separated by just 700 metres. I didn’t know anything about either of these clubs but was told that the rivalry between them is intense. The so-called Holy War between Wisła…
Stephen Barrie
Dec 5, 2022
FIFA World Cup - Qatar 2022
Projects
Python
At the time of writing the 2022 World Cup is already underway, with 32 teams…
Stephen Barrie
Nov 25, 2022
Excelosaurus meets Python
fastai
This is my follow up to
Lesson 5: Practical Deep Learning for Coders 2022
in which Jeremy builds a linear regresson model and neural net from scratch using Python.
Stephen Barrie
Nov 18, 2022
Excelosaurus
fastai
This is my follow up to
Lesson 3: Practical Deep Learning for Coders 2022
in which Jeremy built a linear regression…
Stephen Barrie
Nov 8, 2022
Huggy Bear
fastai
Hugging Face
This is my follow up to
Lesson 2: Practical Deep Learning for Coders 2022
in which Jeremy created a dog | cat classifier model and deployed to Hugging Face. During this project I will try to replicate on an image classification model, which…
Stephen Barrie
Oct 29, 2022
Cat or Dog?
fastai
This is my follow up to
Lesson 1: Practical Deep Learning for Coders 2022
taught by Jeremy Howard, co-founder, along with Dr. Rachel Thomas, of fast.ai. This is my first attempt at…
Stephen Barrie
Oct 22, 2022
Hello, World!
My Journey
Hello world! This is all very new to me.
Stephen Barrie
Oct 21, 2022
Introducing….
Blogging
Quarto
….wait for it!
Stephen Barrie
Oct 21, 2022
No matching items