Melike Demirci - T2305 - UThere

September

October

I contributed to Project Spesification Document by writing the introduction, description, and non-functional requirements parts.
I did research on the factors which can show the attention level of humans, and came up with some analysis suggestions. These were eye blink ratio, gaze, drowsines and talk detection.
Started to work on Analysis Report with the team.

November

I contributed to Analysis Report by writing the introduction, description, and factors in engineering design parts.
We draw the object and class diagram with Kimya.
I draw the Sequence Diagrams and wrote their explanations
I conducted research on how to detect facial landmarks and how to find out if a person is drowsy.

December

I tried to implement facial landmark detection with dlib library first but then switched to mediapipe library. Dlib was predicting only 68 points while Mediapipe was predicting 468 points. Also, Mediapipe provides an estimate of the coordinates in 3D space. Therefore, I suggested to the team to work with Mediapipe..
I worked on blink detection but the initial accuracy was low, it needs to be improved.
I conducted research about libraries or approaches that could be used for gaze tracking. I found WebGazer.js as one of the best tools and suggested to the team to use it in our project.
I implemented the drowsiness detection algorithm and tested it on a real-time video.
I conducted research on emotion recognition practices. I have two suggestions for this issue; training the classification model with transfer learning or using a model directly (Emonet- Official implementation of the paper "Estimation of continuous valence and arousal levels from faces in naturalistic conditions").
With Kimya, we discussed the ways of getting image data from the javascript and integrating it into the analysis in python.
Prepared the demo presentation with Kimya.

January

I implemented a head pose estimation algorithm that solves the Perspective-n-Point problem (PNP).
I wrote the function for iris pose estimation using facial points extracted with Mediapipe.
I implemented the Feature Extractor module.
I integrated the first version of the emotion recognition model (emonet) into the Feature Extractor module.

February

I started building a FastAPI endpoint to receive frames, extract features, and make predictions.
I implemented the Dataset Preparation module.
I conducted research on the most efficient way to send video frames from the frontend to the backend.

March

I implemented video stream creation and recording functions in the React module.
I implemented the first version of video sending from React and receiving in FastAPI.
We encountered a bug in this version; on some computers, we could successfully send the video chunks to the backend, but on others, the received video frames were distorted. I worked on debugging this issue, which took a lot of time due to the multifactorial environment. I eventually solved the bug.
I conducted research on RNN models for our attention model.

April

Myself and all other group members prepared the dataset by recording our videos and labeling them in chunks.
I collected these videos from group members and created the final dataset by extracting features.
I trained and tested the RNN model with this dataset. I conducted hyperparameter tuning to increase the model's performance. The test accuracy of the model was 74.64%.

May

We realized that 10 seconds was a bit long for video chunks. In order to show the attention score more frequently, we decided to train a model for 5-second-long videos. That's why I recreated the dataset by labeling and extracting features from the existing videos.
I conducted the training, testing, and hyperparameter tuning processes again. This time, we were able to reach 88.55% test accuracy.
I implemented the Poll functionality in both the backend and frontend. Bilgehan added CSS on top of it.
Conducted debugging of some errors.
Wrote development/implementation details in Final Report.