Syllabus
EE 547: Applied and Cloud Computing for Electrical Engineers
Fall 2025 (2 units)
This course introduces tools and concepts to build and deploy machine learning systems in modern computing environments. It is a project-driven course that develops from concept to production deployment. The course is intended for graduate electrical engineering students with prior programming and machine learning experience. Students will learn about technologies and practices essential for scaling machine learning from experimental notebooks to production systems. The course covers three main areas: (1) cloud technologies and distributed computing for ML workloads, (2) system architecture and infrastructure programming, and (3) deployment and orchestration in global computing infrastructure. Students gain hands-on experience with GPU clusters, containerization, and cloud environments while learning concepts that apply across modern ML platforms.
Lecture: Tuesday (section: 30897), 15:00 – 16:50
Discussion†: Friday (section: 30979), 14:00 – 14:50
Enrollment is in-person ONLY. Attendance is mandatory to all lectures. Taping or recording lectures or discussions is strictly forbidden without the instructor’s explicit written permission.
Course materials
Designing Machine Learning Systems, Huyen, C., O’Reilly Media, 2022. online, USC libraries.
Cloud Native Patterns: Designing change-tolerant software, Davis, C., Manning, 2019. online, USC libraries.
The Good Parts of AWS, Vassallo, D., Pschorr, J., 2020. (optional).
High Performance Python: Practical Performant Programming for Humans, 3rd edition, Gorelick, M., Ozsvald, I., O’Reilly Media, 2020. online, USC libraries.
Kubernetes in Action, 2nd edition, Lukša, M., Manning, 2024. online, USC libraries.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing, Akidau, T., Chernyshev, S., Lax, R., O’Reilly Media, 2018. online, USC libraries.
“AI” policy
You may use AI-powered tools in this course to enhance your learning and productivity. Use AI as a collaborative tool for understanding concepts, generating ideas, and troubleshooting. Approach AI-generated content critically and use it responsibly. Engage with AI as you would with a knowledgeable peer or tutor, using iterative conversations to deepen your understanding. You must attribute all AI-generated content in your work, including the prompts you used. You are fully accountable for the accuracy and appropriateness of any AI-assisted work. AI should supplement, not substitute, your own critical thinking and problem-solving. For assignments, you may use AI to clarify concepts or resolve issues, but submitted work must be your own. Submitting AI-generated work as your own without proper attribution or understanding is academic misconduct and will be treated as such.
You must develop complete mastery of all course material independent of AI assistance. Your knowledge and skills will be evaluated in contexts where AI tools are not accessible, mirroring real-world scenarios where you must rely solely on your own expertise. This ensures you can perform effectively in any situation, with or without AI support. Violations of this policy will result in severe academic penalties. The goal is to prepare you to use AI effectively in your future work while ensuring you develop a strong, self-reliant foundation in the course material.
Learning Objectives
Upon completion of this course, a student will be able to:
- Design and implement distributed systems for machine learning workloads, understanding memory hierarchies, network topologies, and scaling limits.
- Apply containerization and orchestration technologies to manage ML training and inference at scale.
- Optimize cloud resource utilization through spot instances, placement strategies, and data locality principles.
- Implement fault-tolerant ML systems using checkpointing, service discovery, and recovery mechanisms.
- Build production ML services with proper monitoring, versioning, and rollback capabilities.
- Navigate the transition from experimental notebooks to production-ready ML systems.
Course Outline
| Week | Topics |
|---|---|
| Week 1 26 Aug |
Cloud fundamentals. Virtualization concepts, service models, infrastructure basics. |
| Week 2 02 Sep |
Containerization. Docker, multi-container apps, orchestration basics. |
| Week 3 09 Sep |
Kubernetes. Deployments, services, auto-scaling, resource management. |
| Week 4 16 Sep |
Distributed systems fundamentals. Consensus, consistency models, fault tolerance patterns. |
| Week 5 23 Sep |
Databases and storage. SQL/NoSQL, ACID vs BASE, distributed databases. |
| Week 6 30 Sep |
Data systems. Object storage, streaming, pipeline architectures. |
| Week 7 07 Oct |
ML workflows. Development to production, experiment tracking, versioning. |
| Week 8 14 Oct |
Model serving and deployment. Batch vs real-time, APIs, scaling. |
| Week 9 21 Oct |
Performance and caching. Feature stores, CDNs, optimization strategies. |
| Week 10 28 Oct |
Monitoring and observability. Metrics, logging, drift detection. Draft proposal due (31 Oct). |
| Week 11 04 Nov |
Project proposal meetings. |
| Week 12 11 Nov |
Exam. Revised proposal due (09 Nov). |
| Week 13 18 Nov |
MLOps and CI/CD. Deployment pipelines, A/B testing. |
| Week 14 25 Nov |
Security and compliance. Access control, data privacy, model security. |
| Week 15 02 Dec |
Project meetings and wrap-up. Status report due (30 Nov). |
| Thursday 11 Dec |
Technical review and demos, 14:00 - 17:00 |
| Monday 15 Dec |
Project deliverables due, 12:00 |
Grading Procedure
Homework (45%)
Assignments include a mix of applied and programmatic problems. Your total homework score sums your best homework scores (as a percentage) after removing the one lowest score (of minimum 50%). You may discuss homework problems with classmates but each student must submit their own original work. Cheating warrants an “F” on the assignment. Turning in substantively identical homework solutions counts as cheating.
Late homework is accepted with a 0.5% deduction per hour, up to 48-hours – no exceptions. Technical issues while submitting are not grounds for extension. No submissions will be accepted 48-hours after the due date. Graders score what is submitted and will not follow up if the file is incorrect, incomplete, or corrupt. It is your responsibility to ensure you submit the correct files and that they are accessible.
Exam (25%)
The exam tests your ability to apply major principles, demonstrate conceptual understanding, and requires writing code. It occurs during week 12 (tentative). You are expected to bring a scientific (non-graphing) calculator. You may use a single 8.5”x11” reference sheet (front and back OK). You may not use any additional resources.
The exam includes multiple-choice and short answer questions. It also include free-response or open-ended questions to demonstrate conceptual understanding. You are expected to write reasonably correct code as well as determine expected behavior of novel computer code. Grading primarily follows correct reasoning but may include deductions for major syntax errors, algorithmic inefficiency, or poor implementation.
Final Project (30%)
This course culminates with a final project in lieu of a final exam. Teams of three students design and implement a complete application integrating multiple independent services that communicate asynchronously. Projects incorporate asynchronous processing, data persistence, and machine learning as part of the system architecture. The emphasis is on integration — connecting services that process data, handle ML inference, and coordinate through message queues or event streams rather than building individual components in isolation.
Teams are encouraged to tackle problems of personal interest to their background or research. The instructor will guide teams having difficulty identifying suitable applications. Teams may build applications similar to existing services provided their implementation demonstrates understanding of distributed architectures and the progression from initial design through deployed system. All projects require the instructor’s written approval.
Teams will propose their architecture, implement and deploy their application, and demonstrate working functionality. Evaluation focuses on how components work together, technical decision-making, and successful deployment rather than production-level optimization.
Course Grade
A if 90 - 100 points, B if 80 - 89 points, C if 70 - 79 points, D if 60 - 69 points, F if 0 - 59 points. (“+” and “–” at ≈ 1.5% of grade boundary).
Cheating
Cheating is not tolerated on homework or exams. Penalty ranges from F on exam to F in course to recommended expulsion.
Final Project
Requirements
Teams of three students design and implement a complete cloud application that integrates multiple independent components. Your application must demonstrate understanding of system architecture, asynchronous processing, data persistence, and deployment practices covered throughout the course.
Projects must incorporate asynchronous processing, data persistence, and machine learning as part of the system architecture. The emphasis is on integration—connecting services that process data, handle ML inference, and coordinate through message queues or event streams rather than building individual components in isolation. Projects must be deployed to AWS and accessible for evaluation.
All projects must use Python as the primary language unless approved explicitly in writing by the instructor. Projects may use additional languages for specific components where justified (frontend frameworks, performance-critical code). All projects must implement and expose an API or service to consumers.
Scoring and Milestones
| Deliverable | Timing | Weight |
|---|---|---|
| Draft Proposal | Week 10 | 3% |
| Revised Proposal | Week 12 | 8% |
| Status Report | Week 14 | 6% |
| Technical Demo | Finals Week | 20% |
| Final Report | Finals Week | 25% |
| Video | Finals Week | 3% |
| Source Code | Finals Week | 35% |
Project Deliverables
Proposals: The draft proposal establishes project direction and allows early feedback before significant implementation effort. It describes the problem, system architecture, technical approach, and data sources. The revised proposal incorporates instructor feedback from proposal meetings and reflects early implementation insights. It provides detailed architecture, technology stack, implementation plan, and timeline. Proposals are guideposts—reasonable deviations in method, approach, and scope are expected as understanding evolves.
Status Report: Documents implementation status, deployment progress, technical challenges, and remaining work. This checkpoint demonstrates substantial progress toward a working deployed system on AWS.
Technical Demo: A scheduled 12-15 minute session demonstrating the working system and discussing implementation with the instructor. Teams show deployed application functionality, explain architecture and integration, discuss technical decisions, and describe AWS deployment. A template reference deck is provided for completion before the demo. All team members must be present.
Final Report: A comprehensive technical document describing the complete system including project overview, architecture and implementation, user experience, technical challenges and solutions, and critical reflection. The report must document AWS deployment architecture, REST API design, database schema, authentication mechanisms, and all external dependencies. It must provide sufficient detail for someone familiar with cloud computing to understand the architecture, implementation decisions, and results. The report must explicitly address what was fundamentally misunderstood before starting the project, critical technical decisions, and how understanding of integration and deployment evolved.
Video: A 3-4 minute summary aimed at a broader technical audience. Demonstrate the application and explain major system components and their interaction. The video should be engaging and provide enough detail for a knowledgeable viewer to understand the product without reading the full report.
Source Code: Submitted through a private GitHub repository with read access granted to the instructor. Code must include comprehensive README files describing repository structure, setup and deployment instructions, environment configuration, and all dependencies. The repository should show regular commits from all team members demonstrating ongoing development and collaboration.