Repost

Frontend

Backend

Pulling an all-nighter for ten thousand words! This should be the most comprehensive Codex hands-on tutorial on the entire internet.

Aling

2026-05-21

Est. time: - min

Hello everyone, I’m Sunday.

Recently, I’ve been asked the same question so many times in my inbox that it’s become overwhelming.

How exactly do you use Codex?

There are three entry points: CLI, App, and VS Code extension — all seemingly usable.

Then out pop plugins, skills, MCP, automations, AGENTS.md, sub-agents, permission modes…

You just want it to help you modify some code.
But first, it dumps a whole set of new concepts on you.

Seriously, man, I just wanted to fix a bug — how did I suddenly end up learning a new profession…

So yesterday, I spent an entire afternoon re-organizing how Codex works, and recorded a fairly comprehensive video along the way.

After recording, I realized there are actually many parts that are easy to miss if you only watch the video.

Especially things like prompt templates, permission boundaries, version management, context management, and when to let Codex be read-only versus when to let it actively modify code.

If you don’t write these down, they’re really easy to forget after watching.

So I went ahead and reorganized the script from the video into this article.

I won’t claim it’s highly authoritative.

But if you’re a programmer, or you’re using Codex, Claude Code, Cursor, etc., for projects, I think this article can help you avoid quite a few pitfalls.

This article will mainly cover:

The three major entry points of Codex
Things to note when using Codex
Permission modes
Codex best practices
Relationship between prompts and token consumption
Multimodal capabilities
Version management
Context management
Long-term memory
Skills
Plugins
MCP
Automation
A complete case study
My commonly used Codex command templates
Easter egg

Alright, let’s get started～～

#The Three Major Entry Points of Codex

Let’s start with Codex itself.

Currently, Codex has roughly three main entry points:

- Codex CLI

Codex App
Codex IDE extension

Let’s go through them one by one.

#The first is Codex CLI.

That is, the command-line version.

If you’re a programmer, you can install it directly via npm:

bash
npm install -g @openai/codex

After installation, navigate to the project root directory and enter:

bash
codex

to launch it.

To update, you can also use:

bash
codex --upgrade

The advantage of CLI is that it’s lightweight, direct, and great for terminal enthusiasts.

It works based on whichever project directory you open it in.

When we let Codex first encounter a project, it’s best to have it read the project first.

My usual prompt is:

text
Please first read the current project structure, without modifying any code.

Tell me:
1. What does this project do
2. What tech stack is used
3. Where is the entry file
4. What are the startup, build, and test commands
5. If I want to add a feature, where should I start

This habit is very important.

Many people fail at Vibe Coding not because the model is bad, but because they rush to let it modify code.

No wonder things go wrong.

#The second entry point is the Codex App

I think this is more suitable for beginners.

You can download it directly from the official website; there are installers for macOS and Windows.

Because the App feels more like a complete AI programming workstation than the CLI.

You can start new conversations, search history, manage plugins, install skills, configure automations.

Officially, Codex App is now positioned as a multi-agent workstation.

That means it’s not just letting you chat with one AI.

You can run multiple tasks simultaneously, with different agents working in different projects, branches, or worktrees.

If you’re using Codex for the first time, I recommend starting with the App.

CLI is nice, but less beginner-friendly.

Things like directories, permissions, commands, context, sandboxing, git status — mixed together, they can easily confuse newcomers.

At least the App gives you a clearer workstation view.

#The third entry point is the IDE extension

For example, using Codex directly in VS Code or Cursor.

This is suitable when you’re already coding and want Codex to assist you with a specific file, bug, or component in the current project.

Let’s take VS Code as an example.

Simply search for Codex in the extension marketplace and install it.

Then open the corresponding project, and you’ll find the Codex icon in the top right corner.

Clicking it takes you straight into the Codex workspace.

Also note:

If you have both Codex App and the Codex IDE extension installed, their data can互通.

#Things to Note When Using Codex

So, how should you start using Codex for the first time?

I suggest you don’t jump straight into practicing on a company’s core project.

Start with a demo project.

Or a small personal project that you’re not afraid to mess up.

Then make sure the project is under git management.

If not, initialize it first with:

bash
git init
git add .
git commit -m "initial commit"

The purpose of this is: to ensure that if Codex modifies your code incorrectly, you can revert.

Also, when doing Vibe Coding, it’s best to make a git commit after each code modification by Codex, just in case.

So, version management is essential.

Additionally, don’t let Codex write code the first time you launch it.

Just have it read the project.

Here’s a prompt I often use:

text
Please do not modify any code for now.

You only need to read the project structure and tell me:
1. What does this project do
2. What tech stack is used
3. Where is the entry file
4. What are the startup and build commands
5. If I want to add a page, where should I start

After this step, check whether its understanding is accurate.

If it’s wrong, correct it.

Don’t rush to let it act.

You can continue asking:

text
In your earlier understanding of the project, which parts were uncertain?

List them and explain which files you still need to look at.

This method has proven very effective — sharing it with you.

#Permission Modes

Next, let’s talk about permission modes.

Many overlook this, but I think it’s crucial, so I’ll discuss it separately.

We know Codex can read and modify your local code and files.

This involves local file security.

For example, if the AI accidentally deletes something important on your computer, that’s trouble.

To avoid this, Codex’s permission modes are generally divided into three types.

#The first is Default Permission

Default permission is the safest, or most conservative.

It can read files, suggest modifications, and propose shell commands to execute.

However, before actually modifying files or executing commands, it needs your explicit approval.

Advantage: safe.

Disadvantage: you can’t walk away.

After all, you must confirm every step.

#The second is Auto-Review Mode

Slightly bolder than default permission.

In default mode, the AI must ask you before modifying files or running commands.

Auto-review means Codex first judges whether an operation is safe.

If it’s just modifying ordinary code in the current project, or running common commands like installing dependencies, running tests, or builds, it may auto-approve without requiring you to click “confirm” each time.

Benefit: higher efficiency.

For example, if you ask Codex to fix a bug, it might read code, modify it, run tests, and keep adjusting based on errors.

If every step requires confirmation, it interrupts the flow.

But auto-review isn’t completely unrestricted.

If Codex attempts high-risk actions — deleting files, accessing directories outside the project, reading sensitive files, downloading from the internet, or running dangerous commands — it will still pause for your confirmation.

So what scenarios suit auto-review?

When you’re familiar with the project and trust the current task.

You want Codex to carry out several consecutive steps without asking each time.

But you don’t want full open permissions.

Auto-review is a compromise:

More convenient than default permission, but safer than full access.

#The third is Full Access Permission

Use this mode cautiously.

Think of it as minimizing Codex’s restrictions.

In this mode, Codex can more freely read files, modify files, execute commands, and even access directories and networks outside the project.

Advantage: highest execution efficiency.

For example, large tasks like migrating a project, bulk refactoring, installing dependencies, running scripts, or modifying code across multiple directories — it can proceed continuously without waiting for your confirmation, acting more like a true development assistant.

But risk is also highest.

If it misunderstands your intent or executes an unintended command, it could genuinely affect your local files.

So I don’t recommend keeping full access on all the time.

Especially for unfamiliar projects, freshly cloned open-source projects, company core projects, or projects containing .env, keys, tokens, config files — don’t enable it casually.

A more stable approach:

Use default permission normally.
Use auto-review when efficiency matters.
Temporarily switch to full access only when you’re certain the project is safe, the task is clear, and you know exactly what Codex will do next.

Afterward, switch back to default or auto-review.

These three modes can be summarized simply:

Default permission: safest, but requires frequent confirmation.
Auto-review: more efficient, suited for continuous tasks in daily development.
Full access: most powerful, but highest risk, only for temporary use.

#Codex Best Practices

Now let’s look at best practices for Vibe Coding with Codex.

This breaks down into 4 steps:

Step 1: Specify scope.
Step 2: Specify state.
Step 3: Specify acceptance criteria.
Step 4: Interrupt anytime.

Let’s go through them one by one.

#Step 1 – Specify Scope

Don’t let Codex wander aimlessly through the entire project.

Especially large ones.

If you only want to change the login page, tell it the relevant login page files.

If you only want to inspect a hook, specify the hook file and its call sites.

In the App or IDE, you can use @ to specify files.

In CLI, you can also clearly write paths.

Example:

text
Please only read files related to src/pages/login and src/hooks/useAuth.

Help me determine if there’s a timing issue in the login redirect logic.

This is far more reliable than saying “help me see what’s wrong with login.”

#Step 2 – Specify State

Make it clear whether it’s read-only or allowed to modify.

Example:

text
Do not modify code, only output analysis.

Or:

text
Can modify code, but give me a plan before making changes.

Or:

text
Follow the plan to modify directly, then run relevant tests after completion.

See? These three sentences correspond to completely different working states.

Many problems arise with Codex because the boundaries aren’t clearly stated.

#Step 3 – Specify Acceptance Criteria

Example:

text
Upon completion, ensure:
1. npm run lint passes
2. Relevant tests pass
3. No new dependencies introduced
4. No unrelated files modified
5. Summarize modifications, verification results, and remaining risks

That’s called acceptance criteria.

Without them, it will just finish according to its own understanding.

And AI’s “I think it’s done” often doesn’t match engineering’s “really done.”

#Step 4 – Interrupt Anytime

If you notice Codex heading in a strange direction, don’t wait for it to finish.

Interrupt immediately.

During AI execution, click the pause button and tell it:

text
Pause the previous direction.

Your current understanding is wrong.
The correct goal is XXX.
Reassess based on the current diff, and do not expand the modification scope further.

It’s like mentoring an intern.

Correct the course early when the direction is off.

#Relationship Between Prompts and Token Consumption

Next, let’s examine the relationship between prompts and token consumption.

Many think: should I keep instructions to Codex as short as possible?

Does shorter speech mean lower token usage?

Not necessarily.

Short instructions can sometimes consume more tokens.

Because if you’re too brief, Codex has to search, guess, and fill in context on its own.

You might think you’re saving effort.

In reality, it takes a long detour, and token usage goes up.

Example:

Say:

text
Optimize this page.

Nice and short.

But Codex will struggle.

What exactly to optimize? UI, performance, or code maintainability?

It can only guess.

And guessing consumes potentially huge amounts of tokens.

A better approach is to specify concrete optimization items.

Example:

text
Please read files related to src/pages/dashboard.

I currently see three issues with this page:
1. First-screen information is scattered
2. Mobile layout is prone to squeezing
3. No obvious feedback on API failure

Do not modify code yet.
First output your modification plan, telling me which files will be changed and why.

This instruction is much better.

Remember: In Vibe Coding, the worst thing is letting AI improvise freely.

#Codex’s Multimodal Capabilities

Codex isn’t limited to text.

You can also feed it screenshots, design mockups, error screenshots, UI problem screenshots.

For instance, if a button is squeezed on the page, or mobile layout is misaligned.

You can directly paste or drag the screenshot in, then say:

text
This is a screenshot of the current page.

Combine the screenshot and code to identify where the layout issue might be.
Do not modify code, only output cause and solution.

This works much better than pure text description.

After all, a picture is worth a thousand words.

#Version Management

Next, version management and rollback.

Before Codex modifies code, check git status.

Git is a version control tool; you can download it from the official site.

Making version management part of your habit before each Vibe Coding session is strongly recommended.

You can do this via Git commands or using visual tools.

You can also have Codex check first:

text
Please check the current git status.

Tell me which uncommitted changes exist.
Do not modify any files.

If your working directory already has half-written changes of your own, be careful.

Codex might continue modifying those files.

Not that it can’t — just make sure you know what it changed.

After completing a small task, have it summarize the diff once.

Example:

text
Summarize this modification:
1. Which files were changed
2. What was changed in each file
3. Why it was changed
4. Any remaining risks
5. How I should verify

Then review the diff yourself.

Don’t blindly commit.

#Context Management

Next, context management.

This is especially critical in AI Agents.

Each Codex conversation carries context.

The content of the same dialog box is the context
Everything you said before, what it did, what files it read, what it modified — all gradually piles up.

Initially, this is great.

It remembers what you just did.

But over time, context gets heavier.

Responses slow down, and it may mix up old and new tasks.

That’s what people usually mean when they say: the AI seems dumber.

At this point, consciously manage context.

My advice: use one conversation per type of task.

E.g., one thread only fixes login bugs.

Another thread only handles component refactoring.

Don’t jump between writing pages, checking CI, summarizing meetings in the same thread.

That clutters the context.

When to continue an old conversation?

When the task is strongly tied to previous context.

E.g., you fixed half and tests haven’t passed yet — keep going.

When to start a new conversation?

When tackling a new problem, or when old context is too messy.

E.g., you just finished login, now moving to payments — start a new thread (new dialog).

Also, at task end, have it produce a closing summary.

Example:

text
Summarize this task.

Include:
1. Goal
2. Final changes
3. What was verified
4. Remaining risks
5. Where to start continuing later

Such summaries are extremely useful later.

When you search history or reopen the thread, you won’t need to sift through fragments — you’ll see the engineering record directly.

That’s the value of Codex App’s search feature.

#Long-Term Memory

Speaking of long-term memory, we come to AGENTS.md.

If you’ve used Claude Code, you may know CLAUDE.md.

In Codex, the more common equivalent is AGENTS.md.

Think of it as a project manual written for the AI Agent.

Codex reads these files and treats their rules as long-term context.

It can have multiple layers.

#Global level

Put your personal preferences here.

E.g., answer in Chinese, give a plan before modifying code, explain reasons for test failures.

#Project root directory

Put project conventions here

E.g., tech stack, startup commands, test commands, code style, directory conventions.

#Subdirectories

Special rules for a module

E.g., this directory contains legacy code — don’t refactor public interfaces.

Or all requests in this module must go through unified wrappers.

This is very useful.

You don’t have to repeat yourself each time.

For example, place an AGENTS.md in the project root like this:

text
# Project Instructions

- Default to answering in Chinese.
- Before modifying code, explain the plan.
- Do not introduce new dependencies unless reason is given.
- Do not refactor unrelated files.
- Frontend components must consider loading, empty, error states.
- For user input, consider validation and error messages.
- After modification, prioritize running npm run lint and related tests.
- If tests cannot run, explain why and remaining risks.

From then on, Codex will follow these rules in the project.

That’s long-term memory.

Solidify your work standards.

Encode team code conventions, branching rules, testing rules, directory agreements.

Much more reliable than repeatedly reminding in group chats.

#Skills

Then there are skills.

Their popularity needs no elaboration.

Nearly all major model products now offer skills functionality.

And I believe this is a very worthwhile area to explore in Codex.

Many confuse skills with plugins.

They’re not the same.

Skills are more like a way of doing things.

They tell Codex what process to follow for certain tasks, what risks to check, and what standards to use for output.

For example, you could have a frontend review skill.

It specifies that during frontend code reviews, focus on component boundaries, state management, interface error handling, mobile adaptation, performance issues, XSS risks.

Later, when you ask Codex to review, it won’t just glance — it’ll apply this standard.

You could also have a writing skill.

Specify your article structure, tone, banned words, case organization.

Next time you ask Codex to draft, it won’t need to adapt to your style each time.

Turning these into skills makes them reusable workflows.

Codex has a built-in Skill Creator.

Use it to create your own skills directly.

Example — ask Codex to create a skill:

text
Please create a frontend code review skill.

It should focus on checking during review:
1. Whether existing component contracts are broken
2. Whether unnecessary dependencies are introduced
3. Whether there is chaotic responsive state
4. Whether loading, empty, error states are missing
5. Whether there are mobile adaptation issues
6. Whether there are XSS risks
7. Whether necessary tests are added

You can also have it install others’ skills.

Find a skill on GitHub, then tell Codex:

text
Please install this skill and check its purpose and security risks.
Link: XXX

It will handle installation for you.

#Plugins

Next, plugins.

Biggest difference between plugin and skill:

Skill is method; plugin is toolbox.

Plugins let Codex connect to more external tools and information sources.

E.g., GitHub, browser, documentation, email, calendar, desktop apps, various MCP services.

Once connected, Codex is no longer just a coding assistant.

It enters real workflows.

E.g., have it check GitHub PR review comments:

text
Please view the review comments of the current PR.

Summarize issues that must be fixed.
Do not modify code yet.

Or have it read a document, then check if related modules in the project are affected:

text
Please read this requirements doc.

Identify change points relevant to the current project.
Only output potentially affected files and modules.
Do not modify code.

It’s like an AI colleague that can move between tools.

But I must emphasize again:

Greater capability means greater boundary importance.

More plugins ≠ better.

Higher permissions ≠ better.

Especially regarding company repos, emails, chat logs, docs — pay attention to data boundaries.

#MCP

Then there’s MCP.

This term is trending lately.

MCP can be simply understood as a protocol allowing AI to connect to external tools and data sources.

E.g., databases, browsers, knowledge bases, design tools, internal systems.

Previously, AI had to adapt individually to each tool.

Now, MCP provides a unified interface.

Once Codex connects to MCP, it gains access to more external capabilities.

E.g.:

Query database schema
Read internal documents
Operate browser
Search knowledge base

Of course, supported MCPs depend on your configuration and permissions.

I don’t recommend diving into a bunch of MCPs as a beginner.

First master basic project read/write, permissions, context, AGENTS.md, git workflow.

Those are fundamentals.

MCP is an extension.

Without solid fundamentals, more extensions just mean more chaos.

#Automation

Next, automation.

Many underestimate this feature.

Automation means giving Codex a task plus a time rule, so it runs periodically on its own.

Kinda like a scheduled task.

But it’s not an ordinary one.

Because the executor is an Agent.

E.g., have it summarize yesterday’s commits every morning:

text
Every day at 9 AM, check commit records of the current project from the past 24 hours.

Output:
1. Main changes
2. Modules involved
3. Potential risks
4. Areas I should focus on today

Do not modify code.

Or check PR status every 30 minutes:

text
Check current PR every 30 minutes.

If there are new review comments, summarize them and judge which need fixing.

Do not auto-modify code.
Only remind me when new issues appear.

Or summarize today’s diff every evening:

text
Every day at 6 PM, summarize today’s git diff for the current project.

Tell me:
1. Which files were changed
2. Rough changes in each file
3. Any obvious risks
4. Where to start tomorrow

Tasks like these suit Codex well.

They’re repetitive, clear, with fixed judgment criteria.

I suggest starting automation with read-only tasks.

First let it report, not auto-modify.

Once you confirm its judgments are stable, gradually expand permissions.

Two categories:

#One: Automation bound to current conversation

E.g., waiting for deployment results.

Have Codex return to this thread in 10 minutes to check.

Good for short-cycle, context-heavy tasks.

#Two: Independent automation

E.g., check project commits every morning.

Generate weekly reports every Friday.

Summarize error logs every night.

These run independently, not relying on current conversation.

Rule of thumb:

If task continues current conversation → use current thread automation.
If task is fixed periodic inspection → use independent automation.

That makes it easier to grasp.

#A Complete Case Study

Now, let’s tie everything together with a concrete example.

Suppose I want to build a Pomodoro app.

First, see the final result

A full engineering practice can be broken into 6 steps.

Some may say: so troublesome?

Why not just say:

text
Help me write a Pomodoro app.

Wouldn’t that work?

Yes.

But the result will likely be mediocre.

A more reliable approach is to turn Codex programming into a complete engineering practice.

#Step 1 – Break down requirements

text
I want to make a Pomodoro app.

First help me break down requirements, no code.

Output:
1. Core features
2. Optional features
3. Minimum viable scope for v1
4. Possible edge cases
5. Recommended implementation order

You can use GPT or Codex here.

If you’re unsure, I recommend GPT first.

GPT is better for discussion and breakdown.

#Step 2 – Have Codex read the project

text
Please read the current project structure.

Determine where Pomodoro features should go.

Do not modify code.
Only output file plan: which files to add, which to modify, and why.

This step is critical.

Same Pomodoro feature can differ vastly across projects.

Some use React, some Vue, some Next.js, some have their own component libraries or state management.

Codex must read the project first to match its style.

#Step 3 – Small-step v1 implementation

text
Implement v1 per the earlier plan.

Requirements:
1. 25-minute work timer + 5-minute break timer
2. Start, pause, reset support
3. Show work/break status
4. Simple UI, no complex animations
5. No new dependencies

After completion, run project to check for errors.

Note: I didn’t ask it to do a pile of things at once.

No history stats, notifications, sound effects, complex themes.

Just core features.

Because Codex struggles with large tasks if overloaded.

#Step 4 – Self-review

text
Please review these changes.

Focus on:
1. Whether timer is properly cleaned up
2. Whether setState might run after component unmount
3. Whether start/pause states are confused
4. Whether reset logic is correct
5. Whether mobile layout overflows
6. Any unnecessarily complex code

Only output review results, no modifications.

This step is very valuable.

Having Codex write code is one thing.

Having it review from a reviewer’s perspective is another.

#Step 5 – Fix accordingly

text
Fix issues confirmed in the earlier review.

Modify only directly related code.
No extra refactoring.
After completion, summarize diff again.

#Step 6 – Capture experience

text
Summarize the process of implementing Pomodoro.

Include:
1. Final file structure
2. Core implementation ideas
3. Problems encountered
4. Where to start adding history & notifications later
5. Which rules to put into AGENTS.md

See? This full cycle yields much more stable Codex output quality.

#My Commonly Used Codex Command Templates

Here’s a set of templates I frequently use.

#First – Project Reading Template

text
Please first read the current project, without modifying any code.

Help me summarize:
1. What the project does
2. Tech stack
3. Directory organization
4. Startup, build, test commands
5. Where to start adding a page
6. Obvious maintenance risks in the project

#Second – Bug-Fixing Template

text
I encountered a bug: XXX.

First read related code, no modifications.

Output:
1. Possible causes
2. Files involved
3. Info needed for further confirmation
4. Fix plan

#Third – Execution Modification Template

text
Please modify code per the confirmed plan.

Requirements:
1. Modify only within XXX scope
2. No new dependencies
3. No refactoring unrelated code
4. Run relevant tests after completion
5. Finally summarize changes and risks

#Fourth – Review Template

text
Please review the current git diff.

Focus on:
1. Whether existing functionality is broken
2. Missing edge cases
3. Missing error handling
4. Type issues
5. Security risks
6. Need for additional tests

Only output issues, no modifications.

#Fifth – Task Wrap-Up Template

text
Summarize this task.

Include:
1. Goal
2. Final changes
3. What was verified
4. Remaining risks
5. Where to start continuing later

You don’t need to memorize these.

Just remember the underlying logic:

Read first.
Plan next.
Execute.
Verify.
Summarize.

That’s the simplest and most stable Codex workflow.

#Easter Egg

When opened, it looks like this:

Contains lots of Codex configurations.

E.g., personalization, MCP, browser usage, etc.

You can explore these later.

But there’s a fun little feature previously seen in Claude Code: pet.

Earlier Claude Code pets felt half-baked.

Codex’s, though not heavily promoted, feels quite complete.

Under “Appearance,” scroll to the bottom for the Pet module.

Lots of pets available, plus custom pets.

Click “Wake Pet” and your pet appears on the desktop.

Feels like the old PC Manager mini basketball.

If you think Codex’s default pets are ugly, there are now many Codex Pet Markets.

E.g., https://codex-pets.net/

Look much better than official ones, right?

Installation is simple.

E.g., install my favorite Brother Chicken:

Use the command on the right: npx codex-pets add ikun

Of course, this is just an easter egg.

Fun aside, what truly determines Codex’s usefulness is the earlier stuff.

New conversations, search, plugins, skills, automation.

These are the real workflow core of Codex APP.

#Summary

Today’s content is quite long — I checked, it’s about 16,000 characters. And the video is long too, 40 minutes after editing.

Final summary:

AI is not a god. It’s a very powerful tool.

But the stronger the tool, the more you need to know how to use it.

Beginners’ most common mistake: give a vague task and expect it done in one go.

That’s unreliable.

A reliable approach is:

Start with read-only.
Then plan.
Then small-step modifications.
Then test and verify.
Then summarize and capture.
Put common rules into AGENTS.md.
Turn repeated standards into skills.
Connect external tools via plugins and MCP.
Delegate repetitive inspections to automation.
Split complex tasks among multiple agents in parallel.

You remain responsible for direction, judgment, and acceptance.

I believe that’s where Codex delivers the most value.

This article is reprinted from: Spent a bloody 10k words! Probably the most comprehensive Codex practical tutorial online

AI Harness Enterprise-Level Implementation Practices (Part I)

[Study Notes] — Temporal Dead Zone