Final Project Guidelines

EE 547: Applied and Cloud Computing for Electrical Engineers

See project deliverables for submission requirements and deadlines.

Overview

The final project requires teams of three students to design and implement a complete cloud application that integrates multiple independent components. Your application must demonstrate understanding of system architecture, asynchronous processing, data persistence, and deployment practices covered throughout the course.

Teams will propose, implement, and deploy a working system to AWS, presenting both the technical architecture and a live demonstration.

Technical Requirements

Your application must integrate the following components:

Multiple Independent Services: Design at least two asynchronous components that communicate through defined interfaces. These might be API services, batch processors, event handlers, or background workers. Components should operate independently and coordinate through APIs, message queues, or event streams.
REST API: Implement an HTTP API that exposes application state and services. Your API must handle authentication, use proper HTTP semantics (methods, status codes, error handling), and support concurrent requests.
Data Persistence: Use at least one database or storage system appropriate to your data model. This might be relational (PostgreSQL, MySQL), document (DynamoDB), key-value, or object storage (S3). Your choice should reflect understanding of trade-offs between different storage patterns.
Asynchronous Processing: Include at least one component that processes tasks asynchronously. This might be batch jobs, webhook handlers, queue-based workers, or streaming pipelines. The goal is demonstrating decoupled architectures where components don’t block waiting for operations to complete.
Machine Learning Component: Incorporate a machine learning model or algorithm as part of your application logic. This could be classification, regression, clustering, embedding generation, similarity search, recommendation, or another ML technique. The ML component should serve your application’s purpose—it’s one tool among several, not the entire focus.
User Interface: Provide a frontend for interacting with your system. This might be a web interface, a command-line tool with rich interaction, or an API client demonstrating your system’s capabilities. The interface must show user-specific content and handle authentication.

Deployment Requirements

AWS Infrastructure: Deploy your application to AWS. Use EC2 instances, container services (ECS), serverless functions (Lambda), managed databases (RDS, DynamoDB), object storage (S3), or other AWS services as appropriate. Provision resources through manual configuration, CLI scripts, or infrastructure-as-code tools.
Python as Primary Language: Implement core application logic in Python. You may use other languages for specific components where justified (frontend JavaScript frameworks, performance-critical code), but Python should dominate your backend implementation.
Version Control: Maintain your project in a private GitHub repository. Source code is submitted via GitHub through Gradescope. Your repository should show regular commits from all team members throughout the development period.

Project Scope

Your project should demonstrate integration of multiple course concepts without attempting production-scale systems. A well-designed application with three solid components working together is far better than an overly ambitious architecture with incomplete implementations.

What Your Project Is Not

Not a web development showcase: The frontend exists to demonstrate your system’s functionality, not to showcase UI/UX skills.
Not ML infrastructure: Do not build model registries, feature stores, training orchestration platforms, or MLOps tooling. These are infrastructure projects, not applications.
Not a homework extension: Simply scaling up the ArXiv assignments (bigger database, more embeddings, fancier queries) is insufficient. You’ve already implemented those components individually.
Not a simple CRUD application: Database-backed forms with create/read/update/delete operations do not demonstrate the architectural concepts this course emphasizes.

What Your Project Is

A complete system solving a problem: Your application should address a real need or provide useful functionality. Users should be able to accomplish something meaningful with your system.
An integration of independent components: The value comes from how components work together, not from any single component. Well-defined interfaces and asynchronous coordination demonstrate architectural understanding.
A demonstration of technical decisions: Your project should show you understand trade-offs. Why did you choose PostgreSQL over DynamoDB? Why process tasks asynchronously instead of synchronously? Why use this ML technique instead of alternatives?
A working deployment: Your system must actually run on AWS and be accessible for evaluation. Partially working code or “it works on my laptop” is insufficient.

Choosing a Project

Select a problem domain that interests your team and allows integration of multiple technologies. Strong projects often come from research areas, personal interests, or practical problems you’ve encountered.

Several patterns work well for this course:

Data Processing Applications: Systems that collect data from external sources (APIs, webhooks, public datasets), process or transform it (possibly using ML), and provide access through an API or interface. These typically involve periodic data collection, processing pipelines, storage, and serving results.
Event-Driven Systems: Applications that respond to external events by evaluating conditions and executing tasks asynchronously. For example, monitoring external services for specific conditions, then triggering analysis or notification workflows.
Real-Time Analysis Systems: Applications that process streaming data or frequent updates, maintain state, and provide current information. These integrate data ingestion, processing, persistence, and presentation layers.
Content Organization and Discovery: Systems that collect, index, or organize information (documents, papers, media, structured data) and provide search, recommendation, or discovery capabilities enhanced by ML techniques.

The key is finding a domain where you can naturally integrate asynchronous processing, data persistence, ML, and APIs without forcing components together artificially.

Thinking About Scope

Consider what you built in homework assignments. In HW1, you fetched ArXiv papers and built a multi-container processing pipeline. In HW2, you created an HTTP API server and trained text embeddings. In HW3, you designed database schemas and implemented complex queries.

Your project should feel like integrating all three assignments, but in a new domain and with additional complexity. If your entire project could have been a homework problem, the scope is too small. If you need more than three core components operating independently, you’re probably overcommitted.

Technical Integration

The strength of your project comes from how components work together.

Well-Defined Interfaces: Components should communicate through clean interfaces—REST APIs, message queues, shared storage with clear contracts. Avoid tight coupling where one component knows implementation details of another.
Appropriate Asynchrony: Use asynchronous processing where operations are slow, unpredictable, or can fail. Fetching external data, running ML inference on large inputs, processing batches—these belong in background workers, not in API request handlers.
Data Flow: Think carefully about how data moves through your system. Where is it collected? Where is it processed? Where is it stored? How do users access it? Clear data flow indicates clear architecture.
Error Handling: Components should handle failures gracefully. What happens when an external API is unavailable? What happens when ML inference fails? What happens when the database connection drops? You don’t need production-grade reliability, but you should think about failure modes.

Machine Learning Integration

Your ML component should be part of the application, not a research exercise.

You might use ML to: - Classify or categorize inputs (documents, images, time series) - Generate embeddings for similarity search or recommendation - Predict outcomes or detect anomalies - Extract information or features from unstructured data - Rank or score items for presentation

The ML model doesn’t need to be novel or state-of-the-art. Using logistic regression, k-means clustering, or a simple neural network is fine if it serves your application’s purpose. Training a small model on domain-specific data is better than deploying a pre-trained model you don’t understand.

What matters is integration: how does the ML component fit into your architecture? Is inference synchronous or asynchronous? Where do you store models? How do you handle inference failures? These questions demonstrate understanding of ML in production systems.

Evaluation Criteria

Projects will be evaluated on technical integration, implementation quality, appropriate scope, and successful deployment.

Technical Integration: Does your system integrate multiple independent components through well-defined interfaces? Does the architecture demonstrate understanding of distributed systems concepts? Are asynchronous components truly decoupled?
Implementation Quality: Is your code well-structured and maintainable? Do components handle errors appropriately? Does the system work reliably under normal conditions? Have you made reasonable technical choices given project constraints?
Scope and Ambition: Does your project demonstrate appropriate complexity for a team of three over five weeks? Have you tackled interesting technical challenges? Does the system go beyond trivial integration of existing tools?
Deployment and Operations: Is your system successfully deployed to AWS? Can it be accessed and evaluated? Is the deployment architecture documented?

Evaluation emphasizes conceptual understanding and architectural decisions over production-ready optimization. Proper HTTP status code usage matters more than sub-100ms response times. Correct database schema design matters more than query optimization. Successful deployment matters more than auto-scaling configuration.

Academic Integrity

All code must be written by your team. You may use standard libraries, frameworks, and cloud SDKs. You may reference documentation and troubleshoot with AI assistance. You may not copy substantial code from other projects or submit AI-generated work without understanding.