“Zoltan provided educational services to dbt learners. During the project, he was very professional, making it very easy to work with him, as I was able to provide feedback to his plans which he incorporated very well, and he delivered high quality work. I am looking forward to working with Zoltan on our next project together!”
Zoltan C. Toth
Scalable Data Architecture Expert with 20 years of experience, trusted by Databricks, dbt Labs, T-Mobile, and Fortune 500 companies worldwide. As a Principal Solutions Architect at Databricks, I designed and delivered some of the company's earliest and most strategic engagements. I help organizations build cloud-native data platforms, modernize their architectures, and enable their teams to operate at scale.
What I Can Help You With
Architecture, design, and training to help your team build and scale data platforms. Available as a freelance architect or with a team of contractors.
Databricks
Platform architecture, Lakehouse design, workspace strategy, and implementation. Delivering solutions-architecture projects since 2016.
Apache Spark
Architecture review, performance optimization, and pipeline design for large-scale data processing. Active in the Spark ecosystem since 2014.
Scalable Cloud Data Architectures
End-to-end architecture design. Data Lakes, Lakehouses, cloud migration strategy, and platform evaluation.
AI Systems & MLOps
Architecture and evaluation of AI systems, MLOps, and LLMOps platforms.
AI Integration & MCP
AI agent architecture, MCP development, and LLM-powered application design.
Training & Enablement
Teaching your team to architect, build, and scale data systems independently.
Training Catalog
Every training is fully hands-on, with a dedicated training environment provided. Can also be delivered on your own infrastructure. Available on-site or fully remote.
Spark Programming with Databricks
A comprehensive introduction to Apache Spark on Databricks, covering everything from DataFrames and SQL to Structured Streaming and performance optimization. Aligned with the Databricks Spark Programming certification.
Module 1: Introduction to Databricks Apache Spark
- Databricks Overview
- Spark Runtime Architecture
- Exploring Apache Spark Architecture in Databricks
- Introduction to Spark DataFrames and SQL
- Reading and Writing Data with DataFrames
- Distributed System Programming Fundamentals
Module 2: Developing Applications with Apache Spark
- Introduction to the SQL & DataFrame API
- DataFrame API Fundamentals
- Grouping and Aggregating Data
- Relational Operations in Apache Spark
- Working with Complex Data Types
Module 3: Stream Processing and Analysis
- Introduction to Stream Processing
- Spark Structured Streaming
- Window Aggregation in Spark Structured Streaming
Module 4: Monitoring and Optimizing Spark on Databricks
- Delta Lake Introduction and Deep Dive
- Introduction to the Unity Catalog
- Understanding and Optimizing Apache Spark Workloads
- Performance Tuning
Machine Learning Operations with MLflow
Learn to manage the complete ML lifecycle — from experiment tracking and model registry to deployment and production monitoring. Includes hands-on coverage of LLM evaluation with MLflow.
Module 1: Experimentation
- Experiment Tracking with MLflow
- Recording Parameters, Metrics, and Artifacts
- Advanced Tracking: Autologging, Nested Runs, and Hyperparameter Tuning
Module 2: Model Management
- MLflow Models and Model Flavors
- Custom Models with pyfunc
- MLflow Model Registry
- Model Versioning and Stage Transitions
- Webhooks, Automated Testing, and CI/CD Integration
Module 3: Deployment Paradigms
- Batch Inference with Spark
- Real-Time Serving with REST APIs
- Databricks Model Serving and Managed Endpoints
Module 4: Production
- Monitoring for Data, Feature, and Concept Drift
- Statistical Drift Detection Methods
- CI/CD for Machine Learning Pipelines
Module 5: LLM Operations
- Tracing AI Agents
- Evaluating LLMs with MLflow
- Using LLM-as-a-Judge methods for evaluating LLM outputs
dbt (Data Build Tool)
A comprehensive, hands-on deep dive into dbt covering the full development lifecycle — from models and testing to macros, documentation, and the latest dbt Fusion tooling.
Module 1: Introduction to dbt
- What is dbt and why it matters
- Setting up your dbt project
- dbt Core vs. dbt Cloud vs. dbt Fusion
- Project structure and data flow overview
Module 2: Models and Materializations
- Building models with CTEs and the ref tag
- Materialization strategies: views, tables, incremental, and ephemeral
- Model dependencies and the DAG
Module 3: Seeds, Sources & Snapshots
- Working with seeds and sources
- Source freshness checks
- Snapshots for slowly changing dimensions
Module 4: Testing and Data Quality
- Generic, singular, and unit tests
- Data contracts
- Custom generic tests with parameters
- Test severity configuration
- Advanced data quality with dbt-expectations
Module 5: Jinja, Macros & Packages
- Jinja templating fundamentals
- Writing custom macros
- Installing and using third-party packages
Module 6: Documentation, Hooks & Exposures
- Writing and exploring documentation
- The lineage graph (DAG)
- Hooks and grants
- Exposures and BI tool integration
Module 7: dbt Fusion & Tooling
- dbt Fusion overview and feature matrix
- VSCode extension and development workflow
- Column-level lineage
- Orchestration with Dagster
Cloud Computing and Data Engineering on AWS
A practical, hands-on course on cloud computing fundamentals and AWS services for data engineering. Covers networking, compute, storage, serverless architectures, and building data lake solutions with real-world exercises.
Module 1: Internet and Networking Fundamentals
- TCP/IP protocols and how the internet works
- IP addressing, DNS, and routing
- HTTP/HTTPS and web communication
- Encryption: symmetric, asymmetric, and public key infrastructure
- Digital signatures and certificates
Module 2: AWS Core Services
- AWS regions, availability zones, and global infrastructure
- EC2: virtual machines in the cloud
- Storage: S3, EBS, and ephemeral storage
- Route 53 and domain management
- High availability and disaster recovery patterns
Module 3: Serverless Computing
- AWS Lambda and serverless architecture
- Using AWS programmatically (Boto3)
- Web scraping and data extraction
- Building serverless data processing pipelines
- Managed AI services (Image Recognition, Text Recognition, Transcription, and others)
Module 4: Data Lake and Analytics
- Data lake concepts and architecture
- Amazon Athena: querying S3 with SQL
- Data formats: CSV, JSON, Parquet
- Cost-effective analytics at scale
Building Agents and MCP with the OpenAI Agents SDK
Build AI agents from scratch using the OpenAI Agents SDK, integrate them with external tools via the Model Context Protocol (MCP), and orchestrate multi-agent systems for production use cases.
Part 1: AI Agents Fundamentals
- What are AI Agents and why they matter
- The OpenAI Agents SDK & Agents Builder
- Building your first agent
- Tool use and function calling
Part 2: Model Context Protocol (MCP)
- Understanding MCP architecture
- Building MCP servers and clients
- Integrating MCP with agents
- Real-world MCP patterns
Part 3: Multi-Agent Systems
- Agent orchestration patterns
- Handoffs between agents
- Guardrails and safety
- Production deployment considerations
Trusted By
Selected consulting, architecture, and training clients.
Detailed profile and references available upon request.
What Clients Say
Select testimonials from recent engagements.
“I worked with Zoltan to deliver a tailored dbt course at my company — I found him easy to work with and collaborate with and very knowledgeable about industry standards. I would recommend Zoltan for any training engagements.”
“I really enjoyed collaborating with Zoltan to create a data stack from the ground up. He was committed to finding the best solution that aligned with the project's needs and the company's workflow. The execution not only met but exceeded our expectations, and the handover process was smooth and comprehensive.”
“Working with Zoltan on our AI project was a fantastic experience. He expertly handled the LLMOps architecture, guiding us from requirements analysis through PoC to full implementation. Zoltan stands out not only for his technical skills but for his focus on business impact, consistently suggesting ways to create more value. His unique blend of AI expertise and strategic thinking makes him an exceptional asset to any team.”
“Zoltan helped us professionalize our data engineering landscape. He did so not only by contributing excellent code, but by actively teaching our teams and enabling them to achieve the same quality themselves. In addition, he also had the bigger picture in mind, helping us also with topics not related to the initial engagement description.”
“Zoltan helped us successfully pilot and deploy dbt at Alfa VIG Hungary that has been an integral part of our transform pipelines ever since. His expertise and experience in solving complex data engineering challenges was pivotal in the project just as much as his dedication to fully understand the client’s needs and fit the solution to the agreed requirements. I highly and honestly recommend Zoltan for any data and AI engineering projects independent of industry and company size.”
“I had the pleasure of collaborating with Zoltan on the creation of a cutting-edge Data Platform at Schneider Electric. His expertise on Databricks and on data and AI projects was invaluable in structuring the project during its early stages and providing meaningful insights on the technology. I highly recommend Zoltan for his exceptional skills and dedication.”
“Zoltan helped us build our cloud infrastructure and automation for ML models. After discussing our plan, he suggested improvements, explained tradeoffs, and recommended optimal solutions. He built the system skeleton, focusing on complex parts, and documented everything thoroughly. His work gave us a flexible, high-performing system. He is fast, precise, and a true expert in ML, cloud, and system design.”
“Zoltan was so helpful when we asked for his support on our Data & AI project at Schneider-Electric. His expertise in data engineering and data science in a cloud context helped us accelerate and set up sustainable solutions. His strong soft skills in communication and his ability to transfer knowledge allowed our team to fully take over the evolutions and operations of what he built. Would love to have him as a permanent member of our organization.”
For a complete list of reviews, visit my LinkedIn Services Page.