Syllabus

EE 547: Applied and Cloud Computing for Electrical Engineers

Fall 2025 (2 units)

PDF Version

This course introduces tools and concepts to build and deploy machine learning systems in modern computing environments. It is a project-driven course that develops from concept to production deployment. The course is intended for graduate electrical engineering students with prior programming and machine learning experience. Students will learn about technologies and practices essential for scaling machine learning from experimental notebooks to production systems. The course covers three main areas: (1) cloud technologies and distributed computing for ML workloads, (2) system architecture and infrastructure programming, and (3) deployment and orchestration in global computing infrastructure. Students gain hands-on experience with GPU clusters, containerization, and cloud environments while learning concepts that apply across modern ML platforms.

Class Information

Lecture: Tuesday (section: 30897), 15:00 – 16:50

Discussion†: Friday (section: 30979), 14:00 – 14:50

Enrollment is in-person ONLY. Attendance is mandatory to all lectures. Taping or recording lectures or discussions is strictly forbidden without the instructor’s explicit written permission.

Course materials

  1. Designing Machine Learning Systems, Huyen, C., O’Reilly Media, 2022. online, USC libraries.

  2. Cloud Native Patterns: Designing change-tolerant software, Davis, C., Manning, 2019. online, USC libraries.

  3. The Good Parts of AWS, Vassallo, D., Pschorr, J., 2020. (optional).

  4. High Performance Python: Practical Performant Programming for Humans, 3rd edition, Gorelick, M., Ozsvald, I., O’Reilly Media, 2020. online, USC libraries.

  5. Kubernetes in Action, 2nd edition, Lukša, M., Manning, 2024. online, USC libraries.

  6. Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing, Akidau, T., Chernyshev, S., Lax, R., O’Reilly Media, 2018. online, USC libraries.

“AI” policy

You may use AI-powered tools in this course to enhance your learning and productivity. Use AI as a collaborative tool for understanding concepts, generating ideas, and troubleshooting. Approach AI-generated content critically and use it responsibly. Engage with AI as you would with a knowledgeable peer or tutor, using iterative conversations to deepen your understanding. You must attribute all AI-generated content in your work, including the prompts you used. You are fully accountable for the accuracy and appropriateness of any AI-assisted work. AI should supplement, not substitute, your own critical thinking and problem-solving. For assignments, you may use AI to clarify concepts or resolve issues, but submitted work must be your own. Submitting AI-generated work as your own without proper attribution or understanding is academic misconduct and will be treated as such.

You must develop complete mastery of all course material independent of AI assistance. Your knowledge and skills will be evaluated in contexts where AI tools are not accessible, mirroring real-world scenarios where you must rely solely on your own expertise. This ensures you can perform effectively in any situation, with or without AI support. Violations of this policy will result in severe academic penalties. The goal is to prepare you to use AI effectively in your future work while ensuring you develop a strong, self-reliant foundation in the course material.

Learning Objectives

Upon completion of this course, a student will be able to:

  • Design and implement distributed systems for machine learning workloads, understanding memory hierarchies, network topologies, and scaling limits.
  • Apply containerization and orchestration technologies to manage ML training and inference at scale.
  • Optimize cloud resource utilization through spot instances, placement strategies, and data locality principles.
  • Implement fault-tolerant ML systems using checkpointing, service discovery, and recovery mechanisms.
  • Build production ML services with proper monitoring, versioning, and rollback capabilities.
  • Navigate the transition from experimental notebooks to production-ready ML systems.

Course Outline

Week Topics
Week 1
26 Aug
Cloud fundamentals. Virtualization concepts, service models, infrastructure basics.
Week 2
02 Sep
Containerization. Docker, multi-container apps, orchestration basics.
Week 3
09 Sep
Kubernetes. Deployments, services, auto-scaling, resource management.
Week 4
16 Sep
Distributed systems fundamentals. Consensus, consistency models, fault tolerance patterns.
Week 5
23 Sep
Databases and storage. SQL/NoSQL, ACID vs BASE, distributed databases.
Week 6
30 Sep
Data systems. Object storage, streaming, pipeline architectures.
Week 7
07 Oct
ML workflows. Development to production, experiment tracking, versioning.
Week 8
14 Oct
Model serving and deployment. Batch vs real-time, APIs, scaling.
Week 9
21 Oct
Performance and caching. Feature stores, CDNs, optimization strategies.
Week 10
28 Oct
Monitoring and observability. Metrics, logging, drift detection. Draft proposal due (31 Oct).
Week 11
04 Nov
Project proposal meetings.
Week 12
11 Nov
Exam. Revised proposal due (09 Nov).
Week 13
18 Nov
MLOps and CI/CD. Deployment pipelines, A/B testing.
Week 14
25 Nov
Security and compliance. Access control, data privacy, model security.
Week 15
02 Dec
Project meetings and wrap-up. Status report due (30 Nov).
Thursday
11 Dec
Technical review and demos, 14:00 - 17:00
Monday
15 Dec
Project deliverables due, 12:00

Grading Procedure

Homework (45%)

Assignments include a mix of applied and programmatic problems. Your total homework score sums your best homework scores (as a percentage) after removing the one lowest score (of minimum 50%). You may discuss homework problems with classmates but each student must submit their own original work. Cheating warrants an “F” on the assignment. Turning in substantively identical homework solutions counts as cheating.

Late homework is accepted with a 0.5% deduction per hour, up to 48-hours – no exceptions. Technical issues while submitting are not grounds for extension. No submissions will be accepted 48-hours after the due date. Graders score what is submitted and will not follow up if the file is incorrect, incomplete, or corrupt. It is your responsibility to ensure you submit the correct files and that they are accessible.

Exam (25%)

The exam tests your ability to apply major principles, demonstrate conceptual understanding, and requires writing code. It occurs during week 12 (tentative). You are expected to bring a scientific (non-graphing) calculator. You may use a single 8.5”x11” reference sheet (front and back OK). You may not use any additional resources.

The exam includes multiple-choice and short answer questions. It also include free-response or open-ended questions to demonstrate conceptual understanding. You are expected to write reasonably correct code as well as determine expected behavior of novel computer code. Grading primarily follows correct reasoning but may include deductions for major syntax errors, algorithmic inefficiency, or poor implementation.

Final Project (30%)

This course culminates with a final project in lieu of a final exam. Teams of three students design and implement a complete application integrating multiple independent services that communicate asynchronously. Projects incorporate asynchronous processing, data persistence, and machine learning as part of the system architecture. The emphasis is on integration — connecting services that process data, handle ML inference, and coordinate through message queues or event streams rather than building individual components in isolation.

Teams are encouraged to tackle problems of personal interest to their background or research. The instructor will guide teams having difficulty identifying suitable applications. Teams may build applications similar to existing services provided their implementation demonstrates understanding of distributed architectures and the progression from initial design through deployed system. All projects require the instructor’s written approval.

Teams will propose their architecture, implement and deploy their application, and demonstrate working functionality. Evaluation focuses on how components work together, technical decision-making, and successful deployment rather than production-level optimization.

Course Grade

A if 90 - 100 points, B if 80 - 89 points, C if 70 - 79 points, D if 60 - 69 points, F if 0 - 59 points. (“+” and “–” at ≈ 1.5% of grade boundary).

Cheating

Cheating is not tolerated on homework or exams. Penalty ranges from F on exam to F in course to recommended expulsion.


Final Project

Requirements

Teams of three students design and implement a complete cloud application that integrates multiple independent components. Your application must demonstrate understanding of system architecture, asynchronous processing, data persistence, and deployment practices covered throughout the course.

Projects must incorporate asynchronous processing, data persistence, and machine learning as part of the system architecture. The emphasis is on integration—connecting services that process data, handle ML inference, and coordinate through message queues or event streams rather than building individual components in isolation. Projects must be deployed to AWS and accessible for evaluation.

All projects must use Python as the primary language unless approved explicitly in writing by the instructor. Projects may use additional languages for specific components where justified (frontend frameworks, performance-critical code). All projects must implement and expose an API or service to consumers.

Scoring and Milestones

Deliverable Timing Weight
Draft Proposal Week 10 3%
Revised Proposal Week 12 8%
Status Report Week 14 6%
Technical Demo Finals Week 20%
Final Report Finals Week 25%
Video Finals Week 3%
Source Code Finals Week 35%

Project Deliverables

Proposals: The draft proposal establishes project direction and allows early feedback before significant implementation effort. It describes the problem, system architecture, technical approach, and data sources. The revised proposal incorporates instructor feedback from proposal meetings and reflects early implementation insights. It provides detailed architecture, technology stack, implementation plan, and timeline. Proposals are guideposts—reasonable deviations in method, approach, and scope are expected as understanding evolves.

Status Report: Documents implementation status, deployment progress, technical challenges, and remaining work. This checkpoint demonstrates substantial progress toward a working deployed system on AWS.

Technical Demo: A scheduled 12-15 minute session demonstrating the working system and discussing implementation with the instructor. Teams show deployed application functionality, explain architecture and integration, discuss technical decisions, and describe AWS deployment. A template reference deck is provided for completion before the demo. All team members must be present.

Final Report: A comprehensive technical document describing the complete system including project overview, architecture and implementation, user experience, technical challenges and solutions, and critical reflection. The report must document AWS deployment architecture, REST API design, database schema, authentication mechanisms, and all external dependencies. It must provide sufficient detail for someone familiar with cloud computing to understand the architecture, implementation decisions, and results. The report must explicitly address what was fundamentally misunderstood before starting the project, critical technical decisions, and how understanding of integration and deployment evolved.

Video: A 3-4 minute summary aimed at a broader technical audience. Demonstrate the application and explain major system components and their interaction. The video should be engaging and provide enough detail for a knowledgeable viewer to understand the product without reading the full report.

Source Code: Submitted through a private GitHub repository with read access granted to the instructor. Code must include comprehensive README files describing repository structure, setup and deployment instructions, environment configuration, and all dependencies. The repository should show regular commits from all team members demonstrating ongoing development and collaboration.