A way to begin your first data analysis competition

Before starting…

Now it is my turn to give birth to my first blog to introduce the thoungt dived in my time capsules. Each period, i think, is destined to be a dramatic and meaningful recall to me and for you too.

Today’s topic: how to get ready for a kaggle competition as a beginner?

Content

  1. Pre-requirements
    • Windows
    • Linux
  2. Main procedures
    • Question: ask, ask and ask
    • Train-Validate-Test: beyond a cycle
    • Display: what about telling a story?
  3. Find your push!

Pre-requirements

The requirements could be divided into two parts: platform requirements and term or prior knowledge requirements. For platform requirements, commonly, linux systems always make a start more convenient than windows and also bring more potential conflicts and bugs. For terms you need, it highly depends on the field and the problem you dive into. For simplicity, this section mainly focus on the first part: platform requirements.

Linux

Let’s start with linux first. As we know, there are many kinds of linux systems lilke RedHat, Ubuntu, Centos and so on. Here Ubuntu is choosed for its low dependent of hardware support and desktop like windows.

Since the main purpose of preparation is aimed at reducing those unnecessary time costs, checking the version of your system will be a nice start because the pre-installed softwares in most linux system are often different with each other. Like python, different ubuntu system pre-install different version. What you should check include several key points listed below:

  • Pre-installed hardware governing tools
  • Pre-installed compiler
  • The enviroment path

Tutorials to guide your check in Ubuntu could be found in other blog of mine labeled as “computer-toolbox”. Here, a brief, whole procedure is the main focus.

Windows

Compared with linxu, using windows could save time of installing hardware and their governing tools most of time, meanwhile the installation of compiler, IDE and other developer helper maybe much easier. However, a correct environment path weights more for windows because of the potential annoying bugs. Then, find out the weakness brought by the programming language you choose and fix it. For python, you could refer to the discussions in reddit here. At last, a file manger is a good helper to lead you among the messy cache files to find out your demand. Everything is recommmended and could be downloaded freely here.

Main procedures

Question: ask, ask and ask

Beleive your initial impulse and vision could make you much more comfortable for a continuous contribution. The first step to beat down a competition is asking questions. No matter how many time you have, no matter how experienced you are, asking could help you have a clear task and vision all the time. Just stop looking for experiences but looking for questions and it will help you much easier get ready for the competition. The next question is, where should you ask? Platform like stackflow, reddit and quora are pretty good choice for the veteran while the kernel in each kaggle competition, issues in each github are much acceptable for the novice. In a word, find out a suitable platform for yourself and keep asking during the competition, then you are on the way to the champion ~

Train-Validate-Test: why not?

Besides the necessary habit formed from the above, let’s focus on more detail about the common tricks. Typically, we would like to split data into three parts: train, validation and test to avoid over-fitting. Remember never touch test data when training? That is it.

The point is different models, different split schemes. For example, convolutional neutral network generally show more sensitivity to the number of training data than generative adversarial network, thus for the first kind of model, more data is needed for training. In a word, no eternal truth and the truth is always changeable but also worth to explore.

Display: what about telling a story?

After completing the above steps, most difficult programming parts shall be worked out. Just take a breathe, say a congratulations to yourself! Good job! But wait, how to share your great findings with your team members? Well, for this question, i heared so many points to advocate the importance of colorful graphs, charts while it really confuses me: why do we rely on them so much? Only a presentation full of beautiful telling graphs shall be recognized as a sucess? For me, they seem too dogmatic to use, but why so many people like it?

The focus of nowadays presentation put too much attention to tools rather than the speaker itself. Take a display of your work is somehow like telling a story. Imagine the way you persuaded your mother or father to buy an attractive toy or clothes when you were a child, then write it in your words to create a prototype of story. Next, it’s time to dress-up your work. Think from the view of audiences is suggested here, just ask yourself some simple questions like What major, profession do they come from? How many audiences require prior knowledge and how many not? And imagine, if you are one of them, what do you want to hear from this story is exactly what your polishing should carefully focus on. So far, the only thing left to be done is talk. Share your story with your team numbers and ask for advices will bring you more fresh ideas and never let you down. Enjoy talking! Enjoy the gradual grow up of new stories!

Find your push!

Many competition will provide timeline and key points in their website. During the long long procedure, find your push, especially your internal push will give you endless courage. Like a honest, determined, energetic friend, he/she may whisper warmfully: forget the fantastic reward, taste the procedure, witness the growth up of one and another idea. How amazing it is! For me, break your comfort zone like a brave always push me from stopping for a computer game. And you? Have you find your push yet?

Awesome! The main contents of this blog ends here. After finishing the whole reading, why not give it a try? I mean, everyone has their first time to move from their comfort zone, just don’t let the coward in your heart stop your infinite possibility of journey :smile::smile:

Ziqiang Huang wechat
Intersted? just subscribe to my blog by scanning my public wechat account :)