My data science projects
HomeAbout Me

  • Analyzing Singapore's HDB flats resale price

    Mar 08, 2023 About 24 mins
    # analysis # visualization

    Explore & model Singapore's HDB flats resale price using different prediction models. Engineering new features such as centrality and proximity to MRT stations. Out of the box, RandomForest performs better than single complex models with minimal hyperparameters tuning ...Read more

  • Predicting loan defaults

    Feb 23, 2023 About 30 mins
    # sklearn # classification


    Fitting a statistical model to historical credit data and estimate the value of current loans. Along with model building, I will demonstrate the use of sklearn's Pipeline as a more convenient approach for Feature Enginerring, Cross Validation and Hyperparameters Tuning ...Read more

  • Guessing user drawn digit

    Feb 15, 2023 About 4 mins
    # keras # streamlit


    Using CNN to build a digit guessing game. The model is trained using Keras and GUI is created using Streamlit ...Read more

  • Analyzing Vietnam's high school graduation exam results

    Jan 31, 2023 About 19 mins
    # analysis # visualization


    Analyzing Vietnam High School graduation exam scores. Identifying trends, predicting missing scores and determining whether it's fair to give bonus score based on geographical regions ...Read more

  • Simulating a queue system

    Sep 29, 2022 About 8 mins
    # simulation # visualization


    Using simpy to simulate a queueing line at the airport check in counters. This is an example of Discret-event simulation (DES). In contrast to Monte-Carlo simulation, DES are useful when you need to keep track of a system’s state and analyze resource usage over time ...Read more

  • Finding the most similar questions on r/AskReddit

    Sep 25, 2022 About 7 mins
    # tf-idf # sklearn


    r/AskReddit is a popular subreddit where users can submit open-ended questions. In this analysis, I analyze the top 1,000 questions and find those that are most similar. The resulting technique can be applied to recommending similar questions to increase engagement, or identifying reposted questions ...Read more

  • Analyzing my poker games

    Aug 20, 2022 About 7 mins
    # pandas # visualization


    Using pandas to parse raw game logs to analyze game fairness, players trends and behaviors. ...Read more

  • Cohort and Retention analysis

    Aug 05, 2022 About 3 mins
    # retention # visualization # analysis


    Using pandas to create a retention chart segmented by cohorts. Identifying trends across rows, columns and diagonal features. ...Read more

  • Creating a pokemon guesser

    Jun 01, 2022 About 5 mins
    # image-processing # opencv # numpy


    Using openCV to extract image contours and process image akin to guess the pokemon games ...Read more

  • Building a DCA simulation apps with Streamlit

    Apr 01, 2022 About 2 mins
    # visualization # streamlit


    Creating a Dollar Cost Average (DCA) calculator using streamlit ...Read more

  • Creating choropleth map with geopandas and matplotlib

    Sep 12, 2020 About 2 mins
    # visualization


    Creating a choropleth map to present geographical data ...Read more

  • Estimate pi using Monte Carlo simulation

    Aug 21, 2020 About 4 mins
    # animation # visualization


    Estimating pi using Monte Carlo simulation and presenting the results in an animated chart ...Read more

  • Web scraping with requests and BeautifulSoup

    Aug 20, 2020 About 3 mins
    # web-scraping

    Basic web scraping with Requests and BeautifulSoup ...Read more

  • All13
  • analysis 3
  • animation 1
  • classification 1
  • image-processing 1
  • keras 1
  • numpy 1
  • opencv 1
  • pandas 1
  • retention 1
  • simulation 1
  • sklearn 2
  • streamlit 2
  • tf-idf 1
  • visualization 8
  • web-scraping 1
© 2020-2023 Vo Duy Do
My Github | About Me